11-1 Biology - Textbook (Ncert)
11-1 Biology - Textbook (Ncert)
STATISTICS
he
FOR
pu T
is
ECONOMICS
re ER
bl
Textbook for Class XI
be C
o N
©
tt
no
ISBN 81-7450-497-4
First Edition ALL RIGHTS RESERVED
February 2006 Phalguna 1927
q No part of this publication may be reproduced, stored in a
Reprinted
d
retrieval system or transmitted, in any form or by any means,
December 2006 Pausa 1928 electronic, mechanical, photocopying, recording or otherwise
December 2007 Pausa 1929 without the prior permission of the publisher.
January 2009 Magha 1930 q This book is sold subject to the condition that it shall not, by
he
January 2010 Magha 1931 way of trade, be lent, re-sold, hired out or otherwise disposed
of without the publisher ’s consent, in any form of binding or
January 2011 Magha 1932 cover other than that in which it is published.
January 2012 Magha 1933 q The correct price of this publication is the price printed on
December 2012 Agrahayana 1934 this page, Any revised price indicated by a rubber stamp or
November 2013 Kartika 1935 by a sticker or by any other means is incorrect and should be
pu T
unacceptable.
is
PD 115T MJ OFFICES OF THE PUBLICATION
DIVISION, NCERT
re ER
© National Council of Educational
Research and Training, 2006
NCERT Campus
Sri Aurobindo Marg
bl
New Delhi 110 016 Phone : 011-26562708
P.O.Navjivan
Ahmedabad 380 014 Phone : 079-27541446
CWC Campus
Opp. Dhankal Bus Stop
o N
Panihati
Kolkata 700 114 Phone : 033-25530454
CWC Complex
` 45.00
Maligaon
Guwahati 781 021 Phone : 0361-2674869
©
Publication Team
he
The National Curriculum Framework (NCF), 2005, recommends that
children’s life at school must be linked to their life outside the school.
pu T
This principle marks a departure from the legacy of bookish learning
is
which continues to shape our system and causes a gap between the
re ER
school, home and community. The syllabi and textbooks developed on
bl
the basis of NCF signify an attempt to implement this basic idea. They
also attempt to discourage rote learning and the maintenance of sharp
boundaries between different subject areas. We hope these measures
be C
d
reorienting knowledge at different stages with greater consideration for
child psychology and the time available for teaching. The textbook
he
attempts to enhance this endeavour by giving higher priority and space
to opportunities for contemplation and wondering, discussion in small
pu T
groups, and activities requiring hands-on experience.
is
The National Council of Educational Research and Training
re ER
(NCERT) appreciates the hard work done by the textbook development
bl
team responsible for this book. We wish to thank the Chairperson of
the advisory group for Social Sciences textbooks at Higher Secondary
Level, Professor Hari Vasudevan and the Chief Advisor for this book,
be C
are grateful to them and their principals for making this possible. We
are indebted to the institutions and organisations which have generously
permitted us to draw upon their resources, material and personnel. We
©
Director
no
d
TEXTBOOK DEVELOPMENT COMMITTEE
he
CHAIRPERSON, ADVISORY C OMMITTEE FOR SOCIAL SCIENCE TEXTBOOKS AT HIGHER
SECONDARY LEVEL
Hari Vasudevan, Professor, Department of History, University of
pu T
Calcutta, Kolkata
is
CHIEF ADVISOR
re ER
Tapas Majumdar, Emeritus Professor, Jawaharlal Nehru University, New
bl
Delhi
MEMBERS
Bhawna Rajput, Sr. Lecturer, Aditi Mahavidyalaya, Delhi University, Delhi
be C
MEMBER-COORDINATOR
Neeraja Rashmi, Reader, Economics, DESSH, NCERT, New Delhi
tt
no
d
ACKNOWLEDGEMENTS
he
Acknowledgements are due to Savita Sinha, Professor and Head,
Department of Education in Social Sciences and Humanities for her
support in developing this textbook.
pu T
The Council is also thankful to J. Khuntia, Sr. Lecturer, School
is
of Correspondence Courses, Delhi University; M. V. Srinivasan and
Jaya Singh, Lecturers, DESSH, NCERT for helping in finalising the
re ER
textbook.
bl
Special thanks are due to Vandana R. Singh, Consultant Editor for
going through the manuscript and suggesting relevant changes.
The Council also gratefully acknowledges the contributions of
Girish Goyal, DTP Operator, Dillip Kumar Agasti, Proof Reader,
Dinesh Kumar, Incharge, Computer Station, in shaping this book. The
be C
he
Foreword iii
Chapter 1 : Introduction 1
pu T
is
Chapter 2 : Collection of Data
re ER 9
bl
Chapter 4 : Presentation of Data 40
Chapter 7 : Correlation 91
o N
1
e d Introduction
T s h
R li
E u b
C
N re p
© e
told this subject is mainly around
Studying this chapter should
enable you to: what Alfred Marshall (one of the
• know what the subject of founders of modern economics) called
b
economics is about; “the study of man in the ordinary
• understand how economics is business of life”. Let us understand
linked with the study of economic
o
activities in consumption, what that means.
t
production and distribution; When you buy goods (you may
• understand why knowledge of want to satisfy your own personal
t
statistics can help in describing
needs or those of your family or those
consumption, production and
of any other person to whom you want
o
distribution;
• learn about some uses of to make a gift) you are called
statistics in the understanding of
n
a consumer.
economic activities.
When you sell goods to make
a profit for yourself (you may be
1. W H Y ECONOMICS? a shopkeeper), you are called a seller.
You have, perhaps, already had When you produce goods (you may
Economics as a subject for your earlier be a farmer or a manufacturer), you
classes at school. You might have been are called a producer.
2 STATISTICS FOR ECONOMICS
When you are in a job, working for In real life we cannot be as lucky
some other person, and you get paid as Aladdin. Though, like him we have
for it (you may be employed by unlimited wants, we do not have a
somebody who pays you wages or a magic lamp. Take, for example, the
salary), you are called a service- pocket money that you get to spend.
d
holder. If you had more of it then you could
e
When you provide some kind of have purchased almost all the things
service to others for a payment (you you wanted. But since your pocket
h
may be a lawyer or a doctor or a money is limited, you have to choose
banker or a taxi driver or a transporter only those things that you want the
T s
of goods), you are called a service- most. This is a basic teaching of
i
provider. Economics.
R l
In all these cases you will be called
gainfully employed in an economic Activities
E b
activity. Economic activities are ones • Can you think for yourself of
that are undertaken for a monetary
u
some other examples where a
gain. This is what economists mean
C
person with a given income has
by ordinary business of life.
p
to choose which things and in
N re
what quantities he or she can
Activities buy at the prices that are being
• List different activities of the charged (called the current
members of your family. Would prices)?
© e
you call them economic • What will happen if the current
activities? Give reasons. prices go up?
b
• Do you consider yourself a Scarcity is the root of all economic
consumer? Why? problems. Had there been no scarcity,
there would have been no economic
o
We cannot get something for problem. And you would not have
t
nothing studied Economics either. In our daily
If you ever heard the story of Aladdin life, we face various forms of scarcity.
t
and his Magic Lamp, you would agree The long queues at railway booking
that Aladdin was a lucky guy. counters, crowded buses and trains,
o
Whenever and whatever he wanted, he shortage of essential commodities, the
just had to rub his magic lamp on rush to get a ticket to watch a new
n
when a genie appeared to fulfill his film, etc., are all manifestations of
wish. When he wanted a palace to live scarcity. We face scarcity because the
in, the genie instantly made one for things that satisfy our wants are
him. When he wanted expensive gifts limited in availability. Can you think
to bring to the king when asking for of some more instances of scarcity?
his daughter’s hand, he got them at The resources which the producers
the bat of an eyelid. have are limited and also have
INTRODUCTION 3
alternative uses. Take the case of food activities of various kinds. For this,
that you eat every day. It satisfies your you need to know reliable facts about
want of nourishment. Farmers all the diverse economic activities like
employed in agriculture raise crops production, consumption and
that produce your food. At any point distribution. Economics is often
d
of time, the resources in agriculture discussed in three parts: consum-
e
like land, labour, water, fertiliser, etc., ption, production and distribution.
are given. All these resources have We want to know how the
h
alternative uses. The same resources consumer decides, given his income
can be used in the production of non-
s
and many alternative goods to choose
T
food crops such as rubber, cotton, jute
i
from, what to buy when he knows the
etc. Thus alternative uses of resources
l
prices. This is the study of Consum-
R
give rise to the problem of choice ption.
b
between different commodities that
We also want to know how the
E
can be produced by those resources.
producer, similarly, chooses what to
u
produce for the market when he
C
Activities
knows the costs and prices. This is the
p
• Identify your wants. How many study of Production.
N re
of them can you fulfill? How
many of them are unfulfilled?
Finally, we want to know how the
Why you are unable to fulfill national income or the total income
them? arising from what has been produced
© e
• What are the different kinds of in the country (called the Gross
scarcity that you face in your Domestic Product or GDP) is
daily life? Identify their causes. distributed through wages (and
b
salaries), profits and interest (We will
Consumption, Production and leave aside here income from
Distribution
o
international trade and investment).
t
If you thought about it, you might This is study of Distribution.
have realised that Economics involves Besides these three conventional
t
the study of man engaged in economic divisions of the study of Economics
about which we want to know all the
o
facts, modern economics has to
include some of the basic problems
n
facing the country for special studies.
For example, you might want to
know why or to what extent some
households in our society have the
capacity to earn much more than
others. You may want to know how
many people in the country are really
4 STATISTICS FOR ECONOMICS
poor, how many are middle-class, how of numbers relating to selected facts
many are relatively rich and so on. You in a systematic form) to be added to
may want to know how many are all modern courses of modern
illiterate, who will not get jobs, economics.
requiring education, how many are Would you now agree with the
d
highly educated and will have the best following definition of economics that
e
job opportunities and so on. In other many economists use?
words, you may want to know more “Economics is the study of how
h
facts in terms of numbers that would people and society choose to
answer questions about poverty and employ scarce resources that could
T s
disparity in society. If you do not like have alternative uses in order to
i
the continuance of poverty and gross produce various commodities that
R l
disparity and want to do something satisfy their wants and to
about the ills of society you will need distribute them for consumption
E b
to know the facts about all these among various persons and groups
things before you can ask for in society.”
u
appropriate actions by the
C
government. If you know the facts it Activity
p
may also be possible to plan your own
N re
• Would you say, in the light of the
life better. Similarly, you hear of — discussion above, that this
some of you may even have definition used to be given seems
experienced disasters like Tsunami, a little inadequate now? What
© e
earthquakes, the bird flu — dangers does it miss out?
threatening our country and so on
that affect man’s ‘ordinary business 2. STATISTICS ECONOMICS
b
IN
of life’ enormously. Economists can
look at these things provided they In the previous section you were told
know how to collect and put together about certain special studies that
o
the facts about what these disasters concern the basic problems facing a
t
cost systematically and correctly. You country. These studies required that
may perhaps think about it and ask we know more about economic facts
t
yourselves whether it is right that in terms of numbers. Such economic
modern economics now includes facts are also known as data.
o
learning the basic skills involved in The purpose of collecting data
making useful studies for measuring about these economic problems is to
n
poverty, how incomes are distributed, understand and explain these
how earning opportunities are related problems in terms of the various
to your education, how environmental causes behind them. In other words,
disasters affect our lives and so on? we try to analyse them. For example,
Obviously, if you think along these when we analyse the hardships of
lines, you will also appreciate why we poverty, we try to explain it in terms
needed Statistics (which is the study of the various factors such as
INTRODUCTION 5
d
may, therefore, also try to find those increased from 39.58 million tonnes
e
measures that help solve an economic in 1974–75 to 58.64 million tonnes in
problem. In Economics, such 1984–85”, is a quantitative fact. The
h
measures are known as policies. numerical figures such as ‘39.58
So, do you realise, then, that no million tonnes’ and ‘58.64 million
T s
analysis of a problem would be tonnes’ are statistics of the
i
possible without the availability of production of rice in India for
R l
data on various factors underlying an 1974–75 and 1984–85 respectively.
economic problem? And, that, in such In addition to the quantitative
b
a situation, no policies can be
E
data, Economics also uses qualitative
formulated to solve it. If yes, then you data. The chief characteristic of such
u
have, to a large extent, understood the
C
information is that they describe
basic relationship between Economics
p
attributes of a single person or a group
and Statistics.
of persons that is important to record
N re
3. WHAT IS STATISTICS? as accurately as possible even though
they cannot be measured in
At this stage you are probably ready quantitative terms. Take, for example,
© e
to know more about Statistics. You “gender” that distinguishes a person
might very well want to know what the as man/woman or boy/girl. It is often
subject “Statistics” is all about. What
b
possible (and useful) to state the
are its specific uses in Economics?
information about an attribute of a
Does it have any other meaning? Let
person in terms of degrees (like better/
us see how we can answer these
o
questions to get closer to the subject. worse; sick/ healthy/ more healthy;
t
In our daily language the word unskilled/ skilled/ highly skilled etc.).
‘Statistics’ is used in two distinct Such qualitative information or
t
senses: singular and plural. In the statistics is often used in Economics
plural sense, ‘statistics’ means and other social sciences and
o
‘numerical facts systematically collected and stored systematically
collected’ as described by Oxford like quantitative information (on
n
Dictionary. Thus, the simple meaning prices, incomes, taxes paid etc.),
of statistics in plural sense is data. whether for a single person or a group
Do you know that the term statistics
of persons.
in singular means the ‘science of
You will study in the subsequent
collecting, classifying and using chapters that statistics involves
statistics’ or a ‘statistical fact’. collection and organisation of data. The
next step is to present the data in
6 STATISTICS FOR ECONOMICS
d
the broad characteristics of the etc., about which you will learn later).
e
collected set of information. These numerical measures help
summarise data. For example, it
h
Activities would be impossible for you to
remember the incomes of all the
T s
• Think of two examples of
people in a data if the number of
i
qualitative and quantitative data.
l
• Which of the following would give people is very large. Yet, one can
R
you qualitative data; beauty, remember easily a summary figure like
b
intelligence, income earned, the average income that is obtained
E
marks in a subject, ability to statistically. In this way, Statistics
u
sing, learning skills? summarises and presents a
C
meaningful overall information about
p
4. WHAT STATISTICS DOES? a mass of data.
N re
Quite often, Statistics is used in
By now, you know that Statistics is
an indispensable tool for an economist finding relationships between different
that helps him to understand an economic factors. An economist may
be interested in finding out what
© e
economic problem. Using its various
methods, effort is made to find the happens to the demand for a
causes behind it with the help of the commodity when its price increases
b
qualitative and the quantitative facts or decreases? Or, would the supply of
of the economic problem. Once the a commodity be affected by the
causes of the problem are identified, changes in its own price? Or, would
o
it is easier to formulate certain policies the consumption expenditure increase
t
to tackle it. when the average income increases?
But there is more to Statistics. It Or, what happens to the general price
t
enables an economist to present level when the government
economic facts in a precise and expenditure increases? Such ques-
o
definite form that helps in proper tions can only be answered if any
comprehension of what is stated. relationship exists between the
n
When economic facts are expressed in various economic factors that have
statistical terms, they become exact. been stated above. Whether such
Exact facts are more convincing than relationships exist or not can be easily
vague statements. For instance, verified by applying statistical
saying that with precise figures, 310 methods to their data. In some cases
people died in the recent earthquake the economist might assume certain
in Kashmir, is more factual and, thus, relationships between them and like
INTRODUCTION 7
d
might be interested in predicting the
e
changes in one economic factor due 5. CONCLUSION
to the changes in another factor. For
Today, we increasingly use Statistics
h
example, she/he might be interested
in knowing the impact of today’s to analyse serious economic problems
T s
investment on the national income in such as rising prices, growing
i
future. Such an exercise cannot be population, unemployment, poverty
R l
undertaken without the knowledge of etc., to find measures that can solve
Statistics. such problems. Further it also helps
b
evaluate the impact of such policies
E
Sometimes, formulation of plans
and policies requires the knowledge in solving the economic problems. For
u
of future trends. For example, an example, it can be ascertained easily
C p
Statistical methods are no substitute for common sense!
N re
There is an interesting story which is told to make fun of statistics. It is said
that a family of four persons (husband, wife and two children) once set out
to cross a river. The father knew the average depth of the river. So he
© e
calculated the average height of his family members. Since the average height
of his family members was greater than the average depth of the river, he
thought they could cross safely. Consequently some members of the family
(children) drowned while crossing the river.
b
Does the fault lie with the statistical method of calculating averages or
with the misuse of the averages?
t o
economic planner has to decide in using statistical techniques whether
2005 how much the economy should the policy of family planning is
t
produce in 2010. In other words, one effective in checking the problem of
must know what could be the ever-growing population.
o
expected level of consumption in 2010 In economic policies, Statistics
in order to decide the production plan plays a vital role in decision making.
n
of the economy for 2010. In this For example, in the present time of
situation, one might make subjective rising global oil prices, it might be
judgement based on the guess about necessary to decide how much oil
consumption in 2010. Alternatively, India should import in 2010. The
one might use statistical tools to decision to import would depend on
predict consumption in 2010. That the expected domestic production of
could be based on the data of oil and the likely demand for oil in
8 STATISTICS FOR ECONOMICS
2010. Without the use of Statistics, it cannot be made unless we know the
cannot be determined what the actual requirement of oil. This vital
expected domestic production of oil information that help make the
and the likely demand for oil would decision to import oil can only be
be. Thus, the decision to import oil obtained statistically.
e d
h
Recap
s
• Our wants are unlimited but the resources used in the production
T i
of goods that satisfy our wants are limited and scarce. Scarcity is
l
the root of all economic problems.
R
• Resources have alternative uses.
• Purchase of goods by consumers to satisfy their various needs is
E b
Consumption.
• Manufacture of goods by producers for the market is Production.
u
• Division of the national income into wages, profits, rents and interests
C
is Distribution.
p
• Statistics finds economic relationships using data and verifies them.
N re
• Statistical tools are used in prediction of future trends.
• Statistical methods help analyse economic problems and
formulate policies to solve them.
© e
b
EXERCISES
o
(i) Statistics can only deal with quantitative data.
(ii) Statistics solves economic problems.
t
(iii) Statistics is of no use to Economics without data.
t
2. Make a list of activities that constitute the ordinary business of life. Are
these economic activities?
o
3. ‘The Government and policy makers use statistical data to formulate
suitable policies of economic development’. Illustrate with two examples.
n
4. You have unlimited wants and limited resources to satisfy them. Explain
by giving two examples.
5. How will you choose the wants to be satisfied?
6. What are your reasons for studying Economics?
7. Statistical methods are no substitute for common sense. Comment.
CHAPTER
2
e d Collection of Data
T s h
R li
E u b
C
N re p
© e
chapter, you will study the sources of
Studying this chapter should enable
data and the mode of data collection.
you to:
• understand the meaning and The purpose of collection of data is to
b
purpose of data collection; collect evidence for reaching a sound
• distinguish between primary and and clear solution to a problem.
secondary sources; In economics, you often come
o
• know the mode of collection of data; across a statement like,
t
• distinguish between Census and “After many fluctuations the output
Sample Surveys;
of food grains rose to 176 million tonnes
t
• be familiar with the techniques of
sampling; in 1990–91 and 199 million tonnes in
o
• know about some important 1996–97, but fell to 194 million tonnes
sources of secondary data. in 1997–98. Production of food grains
n
then rose continuously and touched
212 million tonnes in 2001–02.”
1. I N T R O D U C T I O N
In this statement, you can observe
In the previous chapter, you have read that the food grains production in
about what is economics. You also different years does not remain the
studied about the role and importance same. It varies from year to year and
of statistics in economics. In this from crop to crop. As these values
1 0 STATISTICS FOR ECONOMICS
vary, they are called variable. The 2. WHAT ARE THE SOURCES OF DATA?
variables are generally represented by
Statistical data can be obtained from
the letters X, Y or Z. The values of
two sources. The enumerator (person
these variables are the observation.
who collects the data) may collect the
For example, suppose the food grain
d
data by conducting an enquiry or an
production in India varies between
investigation. Such data are called
e
100 million tonnes in 1970–71 to 220
Primary Data, as they are based on
million tonnes in 2001–02 as shown
first hand information. Suppose, you
h
in the following table. The years are
want to know about the popularity of
represented by variable X and the
T s
a film star among school students. For
production of food grain in India (in
i
this, you will have to enquire from a
million tonnes) is represented by
R l
large number of school students, by
variable Y:
asking questions from them to collect
b
TABLE 2.1
E
the desired information. The data you
Production of Food Grain in India get, is an example of primary data.
u
(Million Tonnes) If the data have been collected and
C
processed (scrutinised and tabulated)
p
X Y
by some other agency, they are called
N re
1970–71 108
1978–79 132 Secondary Data. Generally, the
1979–80 108 published data are secondary data.
1990–91 176 They can be obtained either from
© e
1996–97 199 published sources or from any other
1997–98 194 source, for example, a web site. Thus,
the data are primary to the source that
b
2001–02 212
collects and processes them for the
Here, these values of the variables first time and secondary for all sources
X and Y are the ‘data’, from which we
o
that later use such data. Use of
can obtain information about the
t
secondary data saves time and cost.
trend of the production of food grains
For example, after collecting the data
in India. To know the fluctuations in
on the popularity of the film star
t
the output of food grains, we need the
among students, you publish a report.
‘data’ on the production of food grains
o
If somebody uses the data collected
in India. ‘Data’ is a tool, which helps
by you for a similar study, it becomes
in understanding problems by
n
secondary data.
providing information.
You must be wondering where do
3. HOW DO WE COLLECT THE DATA?
‘data’ come from and how do we collect
these? In the following sections we will Do you know how a manufacturer
discuss the types of data, method and decides about a product or how a
instruments of data collection and political party decides about a
sources of obtaining data. candidate? They conduct a survey by
COLLECTION OF DATA 1 1
d
usefulness (in case of the product) and • The questions should be precise
e
popularity, honesty, loyalty (in case and clear. For example,
of the candidate). The purpose of the Poor Q
h
survey is to collect data. Survey is a What percentage of your income do
method of gathering information from
T s
you spend on clothing in order to look
individuals.
i
presentable?
l
Good Q
R
Preparation of Instrument
What percentage of your income do
b
The most common type of instrument you spend on clothing?
E
used in surveys is questionnaire/
• The questions should not be
u
interview schedule. The questionnaire
ambiguous, to enable the respon-
C
is either self administered by the
dents to answer quickly, correctly
p
respondent or administered by the
and clearly. For example:
N re
researcher (enumerator) or trained
investigator. While preparing the Poor Q
questionnaire/interview schedule, you Do you spend a lot of money on books
should keep in mind the following in a month?
© e
points; Good Q
How much do you spend on books in
• The questionnaire should not be too a month?
b
long. The number of questions (i) Less than Rs 200
should be as minimum as possible. (ii) Between Rs 200–300
Long questionnaires discourage
o
(iii) Between Rs 300–400
people from completing them. (iv) More than Rs 400
t
• The series of questions should move • The question should not use double
t
from general to specific. The negatives. The questions starting
questionnaire should start from with “Wouldn’t you” or “Don’t you”
o
general questions and proceed to should be avoided, as they may
more specific ones. This helps the lead to biased responses. For
n
respondents feel comfortable. For example:
example: Poor Q
Poor Q Don’t you think smoking should be
(i) Is increase in electricity charges prohibited?
justified? Good Q
(ii) Is the electricity supply in your Do you think smoking should be
locality regular? prohibited?
1 2 STATISTICS FOR ECONOMICS
d
Poor Q both sides of the issue. There is also
How do you like the flavour of this a possibility that the individual’s true
e
high-quality tea? response is not present among the
Good Q options given. For this, the choice of
h
How do you like the flavour of this tea? ‘Any Other’ is provided, where the
T s
respondent can write a response,
• The question should not indicate
i
which was not anticipated by the
l
alternatives to the answer. For
R
researcher. Moreover, another
example:
limitation of multiple-choice questions
b
Poor Q
E
Would you like to do a job after college is that they tend to restrict the
u
or be a housewife? answers by providing alternatives,
C
Good Q without which the respondents may
p
Would you like to do a job, if possible? have answered differently.
N re
The questionnaire may consist of Open-ended questions allow for
closed ended (or structured) questions more individualised responses, but
or open ended (or unstructured) they are difficult to interpret and hard
© e
questions. to score, since there are a lot of
Closed ended or structured variations in the responses. Example,
questions can either be a two-way Q. What is your view about
b
question or a multiple choice question. globalisation?
When there are only two possible
answers, ‘yes’ or ‘no’, it is called a two-
o
Mode of Data Collection
way question.
t
Have you ever come across a television
When there is a possibility of more
than two options of answers, multiple show in which reporters ask questions
t
choice questions are more appropriate. from children, housewives or general
public regarding their examination
o
Example,
Q. Why did you sell your land? performance or a brand of soap or a
n
(i) To pay off the debts. political party? The purpose of asking
(ii) To finance children’s educa- questions is to do a survey for
tion. collection of data. There are three
(iii) To invest in another property. basic ways of collecting data: (i)
(iv) Any other (please specify). Personal Interviews, (ii) Mailing
Closed -ended questions are easy (questionnaire) Surveys, and (iii)
to use, score and code for analysis, Telephone Interviews.
COLLECTION OF DATA 1 3
d
members. The resea- respondents by the interviewer. It also
rcher (or investigator)
e
permits the respondents to take
conducts face to face interviews with sufficient time to give thoughtful
the respondents.
h
answers to the questions. These days
Personal interviews are preferred online surveys or surveys through
T s
due to various reasons. Personal short messaging service i.e. SMS have
i
contact is made between the become popular. Do you know how an
R l
respondent and the interviewer. The online survey is conducted?
interviewer has the opportunity of The disadvantages of mail survey
E b
explaining the study and answering are that, there is less opportunity to
provide assistance in clarifying
u
any query of the respondents. The
instructions, so there is a possibility
C
interviewer can request the respon-
of misinterpretation of questions.
p
dent to expand on answers that are
Mailing is also likely to produce low
N re
particularly important. Misinterpre-
response rates due to certain factors
tation and misunderstanding can be
such as returning the questionnaire
avoided. Watching the reactions of the
without completing it, not returning
© e
respondents can provide supplemen-
the questionnaire at all, loss of
tary information. questionnaire in the mail itself, etc.
Personal interview has some
b
demerits too. It is expensive, as it Telephone Interviews
requires trained interviewers. It takes
In a telephone interview, the
longer time to complete the survey.
o
investigator asks questions over the
Presence of the researcher may inhibit
t
telephone. The advan-
respondents from saying what they
tages of telephone
really think.
t
interviews are that they
are cheaper than
Mailing Questionnaire
o
personal interviews and
When the data in a survey are can be conducted in a shorter time.
n
collected by mail, the questionnaire is They allow the researcher to assist the
sent to each individual respondent by clarifying the
by mail with a request questions. Telephone interview is
to complete and return better in the cases where the
it by a given date. The respondents are reluctant to answer
advantages of this certain questions in personal
method are that, it is interviews.
1 4 STATISTICS FOR ECONOMICS
d
mode of data collection will be
the most appropriate for the questionnaire, so as to know the
e
collecting information from him? shortcomings and drawbacks of the
• You have to interview the parents questions. Pilot survey also helps in
h
about the quality of teaching in assessing the suitability of questions,
a school. If the principal of the clarity of instructions, performance of
T s
school is present there, what enumerators and the cost and time
i
types of problems can arise? involved in the actual survey.
R l
The disadvantage of this method
4. CENSUS AND SAMPLE SURVEYS
b
is access to people, as many people
E
may not own telephones. Telephone Census or Complete Enumeration
u
Interviews also obstruct visual
A survey, which includes every
C
reactions of the respondents, which
element of the population, is known
p
becomes helpful in obtaining
as Census or the Method of Complete
N re
information on sensitive issues.
Enumeration. If certain agencies are
interested in studying the total
Pilot Survey
population in India, they have to
© e
Once the questionnaire is ready, it is obtain information from all the
advisable to conduct a try-out with a households in rural and urban India.
b
Advantages Disadvantages
• Highest Response Rate • Most expensive
• Allows use of all types of questions • Possibility of influencing
o
• Better for using open-ended respondents
t
questions • More time taking.
• Allows clarification of ambiguous
t
questions.
o
• Least expensive • Cannot be used by illiterates
• Only method to reach remote areas • Long response time
n
• No influence on respondents • Does not allow explanation of
• Maintains anonymity of respondents unambiguous questions
• Best for sensitive questions. • Reactions cannot be watched.
d
India, which is carried out every ten
e
years. A house-to-house enquiry is
carried out, covering all households
h
in India. Demographic data on birth
and death rates, literacy, workforce,
T s
life expectancy, size and composition
i
of population, etc. are collected and
R l
Source: Census of India, 2001.
published by the Registrar General of
b
India. The last Census of India was 1981 indicated that the rate of
E
held in February 2001. population growth during 1960s and
u
1970s remained almost same. 1991
C
Census indicated that the annual
p
growth rate of population during
N re
1980s was 2.14 per cent, which came
down to 1.93 per cent during 1990s
according to Census 2001.
© e
“At 00.00 hours of first March,
2001 the population of India stood
at 1027,015,247 comprising of
b
531,277,078 males and
495,738,169 females. Thus, India
becomes the second country in the
o
world after China to cross the one
t
billion mark.”
t
Sample Survey
o
Population or the Universe in statistics
n
means totality of the items under
According to the Census 2001, study. Thus, the Population or the
population of India is 102.70 crore. It Universe is a group to which the
was 23.83 crore according to Census results of the study are intended to
1901. In a period of hundred years, apply. A population is always all the
the population of our country individuals/items who possess certain
increased by 78.87 crore. Census characteristics (or a set of characteris-
1 6 STATISTICS FOR ECONOMICS
tics), according to the purpose of the • Sample: Ten per cent of the
survey. The first task in selecting a agricultural labourers in Chura-
sample is to identify the population. chandpur district.
Once the population is identified, the Most of the surveys are sample
researcher selects a Representative surveys. These are preferred in
d
Sample, as it is difficult to study the statistics because of a number of
e
entire population. A sample refers to reasons. A sample can provide
a group or section of the population reasonably reliable and accurate
h
from which information is to be information at a lower cost and
obtained. A good sample (represen- shorter time. As samples are smaller
T s
tative sample) is generally smaller than than population, more detailed
i
the population and is capable of information can be collected by
R l
providing reasonably accurate conducting intensive enquiries. As we
b
information about the population at need a smaller team of enumerators,
E
a much lower cost and shorter time. it is easier to train them and supervise
u
Suppose you want to study the their work more effectively.
C
average income of people in a certain Now the question is how do you
p
region. According to the Census do the sampling? There are two main
method, you would be required to find types of sampling, random and non-
N re
out the income of every individual in random. The following description will
the region, add them up and divide make their distinction clear.
by number of individuals to get the
© e
average income of people in the region. Activities
This method would require huge • In which years will the next
expenditure, as a large number of
b
Census be held in India and
enumerators have to be employed. China?
Alternatively, you select a represent- • If you have to study the opinion
o
ative sample, of a few individuals, from of students about the new
economics textbook of class XI,
t
the region and find out their income.
what will be your population and
The average income of the selected
sample?
t
group of individuals is used as an
• If a researcher wants to estimate
estimate of average income of the the average yield of wheat in
o
individuals of the entire region. Punjab, what will be her/his
population and sample?
n
Example
• Research problem: To study the Random Sampling
economic condition of agricultural As the name suggests, random
labourers in Churachandpur district sampling is one where the individual
of Manipur. units from the population (samples)
• Population: All agricultural are selected at random. The
labourers in Churachandpur district. government wants to determine the
COLLECTION OF DATA 1 7
d
sampling frame) in the population.
They are available either in a
e
A Population of 20
published form or can be generated
Kuchha and 20 by using appropriate software
h
Pucca Houses
packages (See Appendix B).You can
T s
start using the table from anywhere,
i
i.e., from any page, column, row or
R l
A Representative A non Representative point. In the above example, you need
Sample Sample
to select a sample of 30 households
b
impact of the rise in petrol price on
out of 300 total households. Here, the
E
the household budget of a particular
largest serial number is 300, a three
u
locality. For this, a representative
digit number and therefore we consult
C
(random) sample of 30 households has
three digit random numbers in
p
to be taken and studied. The names
sequence. We will skip the random
N re
of all the 300 households of that area
numbers greater than 300 since there
are written on pieces of paper and
is no household number greater than
mixed well, then 30 names to be
300. Thus, the 30 selected households
© e
interviewed are selected one by one.
are with serial numbers: 149, 219,
In the random sampling, every
111, 165, 230, 007, 089, 212, 051,
individual has an equal chance of being
244, 300, 051, 244, 155, 300, 051,
b
selected and the individuals who are
152, 156, 205, 070, 015, 157, 040,
selected are just like the ones who are
243, 479, 116, 122, 081, 160, 162.
not selected. In the above example, all
o
the 300 sampling units (also called
t
sampling frame) of the population got
Exit Polls
an equal chance of being included in
t
the sample of 30 units and hence the You must have seen that when an
o
sample, such drawn, is a random election takes place, the television
sample. This is also called lottery networks provide election coverage.
n
method. The same could be done using They also try to predict the results.
a Random Number Table also. This is done through exit polls,
wherein a random sample of voters
How to use the Random Number who exit the polling booths are asked
Tables? whom they voted for. From the data
of the sample of voters, the
Do you know what are the Random
prediction is made.
Number Tables? Random number
1 8 STATISTICS FOR ECONOMICS
d
you have to select a sample of difference between the actual value of
e
production of ten years. Using a parameter of the population (which
the Random Number Tables, is not known) and its estimate (from
how will you select your sample?
h
the sample) is the sampling error. It is
possible to reduce the magnitude of
T s
Non-Random Sampling
sampling error by taking a larger
i
There may be a situation that you sample.
R l
have to select 10 out of 100
Example
b
households in a locality. You have to
E
decide which household to select and Consider a case of incomes of 5
farmers of Manipur. The variable x
u
which to reject. You may select the
C
households conveniently situated or (income of farmers) has measure-
p
the households known to you or your ments 500, 550, 600, 650, 700. We
note that the population average of
N re
friend. In this case, you are using your
judgement (bias) in selecting 10 (500+550+600+650+700)
households. This way of selecting 10 ÷ 5 = 3000 ÷ 5 = 600.
out of 100 households is not a random Now, suppose we select a sample
© e
selection. In a non-random sampling of two individuals where x has
method all the units of the population measurements of 500 and 600. The
b
do not have an equal chance of being sample average is (500 + 600) ÷ 2
selected and convenience or judgement = 1100 ÷ 2 = 550.
of the investigator plays an important Here, the sampling error of the
o
role in selection of the sample. They are estimate = 600 (true value) – 550
t
mainly selected on the basis of (estimate) = 50.
judgment, purpose, convenience or
t
quota and are non-random samples. Non-Sampling Errors
o
Non-sampling errors are more serious
5. SAMPLING AND NON-S AMPLING than sampling errors because a
ERRORS
n
sampling error can be minimised by
Sampling Errors taking a larger sample. It is difficult
The purpose of the sample is to take to minimise non-sampling error, even
an estimate of the population. by taking a large sample. Even a
Sampling error refers to the Census can contain non-sampling
differences between the sample errors. Some of the non-sampling
estimate and the actual value of a errors are:
COLLECTION OF DATA 1 9
d
the length of the teacher’s table in the
classroom. The measurement by the tion (CSO), Registrar General of India
e
students may differ. The differences (RGI), Directorate General of
may occur due to differences in Commercial Intelligence and Statistics
h
measuring tape, carelessness of the (DGCIS), Labour Bureau etc.
The Census of India provides the
T s
students etc. Similarly, suppose we
most complete and continuous
i
want to collect data on prices of
demographic record of population. The
l
oranges. We know that prices vary
R
from shop to shop and from market Census is being regularly conducted
b
to market. Prices also vary according every ten years since 1881. The first
E
to the quality. Therefore, we can only Census after Independence was held
u
consider the average prices. Recording in 1951. The Census collects
C
mistakes can also take place as the information on various aspects of
p
enumerators or the respondents may population such as the size, density,
sex ratio, literacy, migration, rural-
N re
commit errors in recording or trans-
scripting the data, for example, he/ urban distribution etc. Census in
she may record 13 instead of 31. India is not merely a statistical
operation, the data is interpreted and
© e
Non-Response Errors analysed in an interesting manner.
The NSSO was established by the
Non-response occurs if an interviewer government of India to conduct
b
is unable to contact a person listed in nation-wide surveys on socio-
the sample or a person from the economic issues. The NSSO does
sample refuses to respond. In this
o
continuous surveys in successive
case, the sample observation may not
t
rounds. The data collected by NSSO
be representative. surveys, on different socio economic
t
subjects, are released through reports
Sampling Bias
and its quarterly journal
o
Sampling bias occurs when the Sarvekshana. NSSO provides periodic
sampling plan is such that some estimates of literacy, school
n
members of the target population enrolment, utilisation of educational
could not possibly be included in the services, employment, unemployment,
sample. manufacturing and service sector
enterprises, morbidity, maternity,
6. CENSUS OF INDIA AND NSSO child care, utilisation of the public
There are some agencies both at the distribution system etc. The NSS 59th
national and state level, which collect, round survey (January–December
2 0 STATISTICS FOR ECONOMICS
d
Survey includes various steps, which
undertakes the fieldwork of Annual
need to be planned carefully. There are
e
survey of industries, conducts crop
estimation surveys, collects rural and various agencies which collect,
h
urban retail prices for compilation of process, tabulate and publish
consumer price index numbers. statistical data. These can be used as
T s
secondary data. However, the choice
i
7. CONCLUSION
l
of source of data and mode of data
R
Economic facts, expressed in terms of collection depends on the objective of
b
numbers, are called data. The purpose the study.
C E u Recap
N re p
• Data is a tool which helps in reaching a sound conclusion on any
problem by providing information.
• Primary data is based on first hand information.
• Survey can be done by personal interviews, mailing questionnaires
© e
and telephone interviews.
• Census covers every individual/unit belonging to the population.
• Sample is a smaller group selected from the population from which
b
the relevant information would be sought.
• In a random sampling, every individual is given an equal chance of
being selected for providing information.
o
• Sampling error arises due to the difference between the actual
t
population and the estimate.
• Non-sampling errors can arise in data acquisition, by non-response
or by bias in selection.
t
• Census of India and National Sample Survey Organisation
o
are two important agencies at the national level, which collect,
process and tabulate data.
n EXERCISES
d
3. (i) There are many sources of data (true/false).
e
(ii) Telephone survey is the most suitable method of collecting data, when
the population is literate and spread over a large area (true/false).
h
(iii) Data collected by investigator is called the secondary data (true/false).
(iv) There is a certain bias involved in the non-random selection of samples
T s
(true/false).
i
(v) Non-sampling errors can be minimised by taking large samples (true/
R l
false).
4. What do you think about the following questions. Do you find any problem
E b
with these questions? If yes, how?
(i) How far do you live from the closest market?
u
(ii) If plastic bags are only 5 percent of our garbage, should it be banned?
C
(iii) Wouldn’t you be opposed to increase in price of petrol?
p
(iv) (a) Do you agree with the use of chemical fertilizers?
N re
(b) Do you use fertilizers in your fields?
(c) What is the yield per hectare in your field?
5. You want to research on the popularity of Vegetable Atta Noodles among
children. Design a suitable questionnaire for collecting this information.
© e
6. In a village of 200 farms, a study was conducted to find the cropping
pattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify the
b
population and the sample here.
7. Give two examples each of sample, population and variable.
o
8. Which of the following methods give better results and why?
(a) Census (b) Sample
t
9. Which of the following errors is more serious and why?
t
(a) Sampling error (b) Non-Sampling error
10. Suppose there are 10 students in your class. You want to select three out
o
of them. How many samples are possible?
n
11. Discuss how you would use the lottery method to select 3 students out of
10 in your class?
12. Does the lottery method always give you a random sample? Explain.
13. Explain the procedure of selecting a random sample of 3 students out of
10 in your class, by using random number tables.
14. Do samples provide better results than surveys? Give reasons for your
answer.
CHAPTER
Organisation of Data
e d
T s h
R li
E u b
C
N re p
© e
between census and sampling. In this
Studying this chapter should enable chapter, you will know how the data,
you to: that you collected, are to be classified.
• classify the data for further
b
The purpose of classifying raw data is
statistical analysis;
• distinguish between quantitative to bring order in them so that they
can be subjected to further statistical
o
and qualitative classification;
• prepare a frequency distribution analysis easily.
t
table; Have you ever observed your local
• know the technique of forming junk dealer or kabadiwallah to whom
t
classes; you sell old newspapers, broken
• be familiar with the method of tally household items, empty glass bottles,
o
marking; plastics etc. He purchases these
• differentiate between univariate
things from you and sells them to
n
and bivariate frequency distribu-
tions.
those who recycle them. But with so
much junk in his shop it would be very
difficult for him to manage his trade,
1. I N T R O D U C T I O N if he had not organised them properly.
In the previous chapter you have To ease his situation he suitably
learnt about how data is collected. You groups or “classifies” various junk.
also came to know the difference He puts old newspapers together and
ORGANISATION OF DATA 2 3
ties them with a rope. Then collects manner. The kabadiwallah groups his
all empty glass bottles in a sack. He junk in such a way that each group
heaps the articles of metals in one consists of similar items. For example,
corner of his shop and sorts them into under the group “Glass” he would put
groups like “iron”, “copper”, empty bottles, broken mirrors and
d
“aluminium”, “brass” etc., and so on. windowpanes etc. Similarly when you
e
In this way he groups his junk into classify your history books under the
different classes — “newspapers, group “History” you would not put a
h
“plastics”, “glass”, “metals” etc. — and book of a different subject in that
brings order in them. Once his junk group. Otherwise the entire purpose
T s
is arranged and classified, it becomes of grouping would be lost.
i
easier for him to find a particular item Classification, therefore, is arranging
R l
that a buyer may demand. or organising similar things into groups
b
Likewise when you arrange your or classes.
E
schoolbooks in a certain order, it
Activity
u
becomes easier for you to handle
C
them. You may classify them • Visit your local post-office to find
p
out how letters are sorted. Do
N re
you know what the pin-code in a
letter indicates? Ask your
postman.
© e
2. RAW DATA
Like the kabadiwallah’s junk, the
b
unclassified data or raw data are
highly disorganised. They are often
very large and cumbersome to handle.
o
To draw meaningful conclusions from
t
them is a tedious task because they
according to subjects where each do not yield to statistical methods
t
subject becomes a group or a class. easily. Therefore proper organisation
So, when you need a particular book and presentation of such data is
o
on history, for instance, all you need needed before any systematic
to do is to search that book in the statistical analysis is undertaken.
n
group “History”. Otherwise, you Hence after collecting data the next
would have to search through your step is to organise and present them
entire collection to find the particular in a classified form.
book you are looking for. Suppose you want to know the
While classification of objects or performance of students in
things saves our valuable time and mathematics and you have collected
effort, it is not done in an arbitrary data on marks in mathematics of 100
2 4 STATISTICS FOR ECONOMICS
d
Marks in Mathematics Obtained by 100 5090 1085 1823 2346 1523
Students in an Examination 1211 1360 1110 2152 1183
e
1218 1315 1105 2628 2712
47 45 10 60 51 56 66 100 49 40 4248 1812 1264 1183 1171
60 59 56 55 62 48 59 55 51 41 1007 1180 1953 1137 2048
h
42 69 64 66 50 59 57 65 62 50 2025 1583 1324 2621 3676
64 30 37 75 17 56 20 14 55 90
T s
1397 1832 1962 2177 2575
62 51 55 14 25 34 90 49 56 54 1293 1365 1146 3222 1396
i
70 47 49 82 40 82 60 85 65 66
R l
49 44 64 69 70 48 12 28 55 65 from Table 3.1 then you have to first
49 40 25 41 71 80 0 56 14 22 arrange the marks of 100 students
b
66 53 46 70 43 61 59 12 30 35
either in ascending or in descending
E
45 44 57 76 82 39 32 14 90 25
order. That is a tedious task. It
u
Or you could have collected data becomes more tedious, if instead of
C
on the monthly expenditure on food 100 you have the marks of a 1,000
p
of 50 households in your students to handle. Similarly in Table
N re
neighbourhood to know their average 3.2, you would note that it is difficult
expenditure on food. The data for you to ascertain the average
collected, in that case, had you monthly expenditure of 50
© e
households. And this difficulty will go
up manifold if the number was larger
— say, 5,000 households. Like our
b
kabadiwallah, who would be
distressed to find a particular item
o
when his junk becomes large and
t
disarranged, you would face a similar
situation when you try to get any
t
information from raw data that are
large. In one word, therefore, it is a
o
tedious task to pull information from
large unclassified data.
n
The raw data are summarised, and
presented as a table, would have
resembled Table 3.2. Both Tables 3.1 made comprehensible by classifi-
and 3.2 are raw or unclassified data. cation. When facts of similar
In both the tables you find that characteristics are placed in the same
numbers are not arranged in any class, it enables one to locate them
order. Now if you are asked what are easily, make comparison, and draw
the highest marks in mathematics inferences without any difficulty. You
ORGANISATION OF DATA 2 5
have studied in Chapter 2 that the ways. Instead of classifying your books
Government of India conducts Census according to subjects — “History”,
of population every ten years. The raw “Geography”, “Mathematics”, “Science”
data of census are so large and etc. — you could have classified them
fragmented that it appears an almost author-wise in an alphabetical order.
d
impossible task to draw any Or, you could have also classified them
e
meaningful conclusion from them. according to the year of publication.
But when the data of Census are The way you want to classify them
h
classified according to gender, would depend on your requirement.
education, marital status, occupation, Likewise the raw data could be
T s
etc., the structure and nature of classified in various ways depending
i
population of India is, then, easily on the purpose in hand. They can be
R l
understood. grouped according to time. Such a
classification is known as a
b
The raw data consist of
E
observations on variables. Each unit Chronological Classification. In
such a classification, data are
u
of raw data is an observation. In Table
classified either in ascending or in
C
3.1 an observation shows a particular
descending order with reference to
p
value of the variable “marks of a
time such as years, quarters, months,
N re
student in mathematics”. The raw
weeks, etc. The following example
data contain 100 observations on
shows the population of India
“marks of a student” since there are classified in terms of years. The
100 students. In Table 3.2 it shows a
© e
variable ‘population’ is a Time Series
particular value of the variable as it depicts a series of values for
“monthly expenditure of a household different years.
b
on food”. The raw data in it contain
50 observations on “monthly Example 1
expenditure on food of a household”
o
Population of India (in crores)
because there are 50 households.
t
Year Population (Crores)
Activity 1951 35.7
t
• Collect data of total weekly 1961 43.8
expenditure of your family for a 1971 54.6
o
year and arrange it in a table. 1981 68.4
See how many observations you 1991 81.8
n
have. Arrange the data monthly 2001 102.7
and find the number of
observations. In Spatial Classification the data
are classified with reference to
3. CLASSIFICATION OF DATA geographical locations such as
countries, states, cities, districts, etc.
The groups or classes of a Example 2 shows the yield of wheat in
classification can be done in various different countries.
2 6 STATISTICS FOR ECONOMICS
d
following example, we find population
e
of a country is grouped on the basis
of the qualitative variable “gender”. An
h
observation could either be a male or
Example 2
a female. These two characteristics
T s
Yield of Wheat for Different Countries could be further classified on the basis
i
of marital status (a qualitative
l
Country Yield of wheat (kg/acre)
R
America 1925 variable) as given below:
b
Brazil 127
E
China 893 Example 3
Denmark 225
u
France 439 Population
C
India 862
p
Male Female
N re
Activities
• In the time-series of Example 1,
in which year do you find the Married Unmarried Married Unmarried
© e
population of India to be the
minimum. Find the year when it The classification at the first stage
is the maximum. is based on the presence and absence
of an attribute i.e. male or not male
b
• In Example 2, find the country
whose yield of wheat is slightly (female). At the second stage, each
more than that of India’s. How class — male and female, is further sub
o
much would that be in terms of divided on the basis of the presence or
percentage?
t
absence of another attribute i.e.
• Arrange the countries of whether married or unmarried. On the
Example 2 in the ascending
t
order of yield. Do the same
Activity
exercise for the descending order
o
of yield. • The objects around can be
grouped as either living or non-
Sometimes you come across
n
living. Is it a quantitative
characteristics that cannot be classification?
expressed quantitatively. Such
characteristics are called Qualities or other hand, characteristics like height,
Attributes. For example, nationality, weight, age, income, marks of
literacy, religion, gender, marital students, etc. are quantitative in
status, etc. They cannot be measured. nature. When the collected data of
Yet these attributes can be classified such characteristics are grouped into
ORGANISATION OF DATA 2 7
classes, the classification is a chapter, does not tell you how it varies.
Quantitative Classification. Different variables vary differently and
depending on the way they vary, they
Example 4 are broadly classified into two types:
d
Frequency Distribution of Marks in (i) Continuous and
Mathematics of 100 Students
(ii) Discrete.
e
Marks Frequency
A continuous variable can take any
0–10 1
h
numerical value. It may take integral
10–20 8
20–30 6 values (1, 2, 3, 4, ...), fractional values
T s
30–40 7 (1/2, 2/3, 3/4, ...), and values that
i
40–50 21 are not exact fractions ( 2 =1.414,
R l
50–60 23
60–70 19
3 =1.732, … , 7 =2.645). For
b
70–80 6 example, the height of a student, as
E
80–90 5 he/she grows say from 90 cm to 150
90–100 4
u
cm, would take all the values in
C
Total 100 between them. It can take values that
p
are whole numbers like 90cm, 100cm,
Example 4 shows quantitative 108cm, 150cm. It can also take
N re
classification of the data of marks in fractional values like 90.85 cm, 102.34
mathematics of 100 students given in cm, 149.99cm etc. that are not whole
Table 3.1 as a Frequency Distribution. numbers. Thus the variable “height”
© e
is capable of
Activity manifesting in
b
• Express the values of frequency every conceivable
of Example 4 as proportion or value and its
percentage of total frequency. values can also
o
Note that frequency expressed in be broken down into infinite
t
this way is known as relative gradations. Other examples of a
frequency. continuous variable are weight, time,
t
• In Example 4, which class has distance, etc.
the maximum concentration of Unlike a continuous variable, a
o
data? Express it as percentage discrete variable can take only certain
of total observations. Which class values. Its value changes only by finite
n
has the minimum concentration
“jumps”. It “jumps” from one value to
of data?
another but does not take any
intermediate value between them. For
4. VARIABLES: CONTINUOUS AND
example, a variable like the “number
DISCRETE
of students in a class”, for different
A simple definition of variable, classes, would assume values that are
which you have read in the last only whole numbers. It cannot take
2 8 STATISTICS FOR ECONOMICS
d
and 26. Instead its value
A frequency distribution is a
e
could have been either 25
comprehensive way to classify raw
or 26. What we observe is
data of a quantitative variable. It
h
that as its value changes
shows how the different values of a
from 25 to 26, the values
T s
variable (here, the marks in
in between them — the fractions are
i
mathematics scored by a student) are
not taken by it. But do not have the
l
distributed in different classes along
R
impression that a discrete variable
with their corresponding class
b
cannot take any fractional value.
frequencies. In this case we have ten
E
Suppose X is a variable that takes
classes of marks: 0–10, 10–20, … , 90–
u
values like 1/8, 1/16, 1/32, 1/64, ...
100. The term Class Frequency means
C
Is it a discrete variable? Yes, because
the number of values in a particular
p
though X takes fractional values it
class. For example, in the class 30–
cannot take any value between two
N re
40 we find 7 values of marks from raw
adjacent fractional values. It changes
data in Table 3.1. They are 30, 37, 34,
or “jumps” from 1/8 to 1/16 and from
30, 35, 39, 32. The frequency of the
1/16 to 1/32. But cannot take a value
© e
class: 30–40 is thus 7. But you might
in between 1/8 and 1/16 or between
be wondering why 40–which is
1/16 and 1/32
occurring twice in the raw data – is
b
not included in the class 30–40. Had
Activity
it been included the class frequency
• Distinguish the following of 30–40 would have been 9 instead
o
variables as continuous and of 7. The puzzle would be clear to you
t
discrete:
if you are patient enough to read this
Area, volume, temperature,
number appearing on a dice,
chapter carefully. So carry on. You will
t
crop yield, population, rainfall, find the answer yourself.
Each class in a frequency
o
number of cars on road, age.
distribution table is bounded by Class
Earlier we have mentioned that
Limits. Class limits are the two ends
n
example 4 is the frequency of a class. The lowest value is called
distribution of marks in mathematics the Lower Class Limit and the highest
of 100 students as shown in Table 3.1. value the Upper Class Limit. For
It shows how the marks of 100 example, the class limits for the class:
students are grouped into classes. You 60–70 are 60 and 70. Its lower class
will be wondering as to how we got it limit is 60 and its upper class limit is
from the raw data of Table 3.1. But, 70. Class Interval or Class Width is
ORGANISATION OF DATA 2 9
the difference between the upper class frequency distribution of the data in
limit and the lower class limit. For the our example above. To obtain the
class 60–70, the class interval is 10 frequency curve we plot the class
(upper class limit minus lower class marks on the X-axis and frequency on
limit). the Y-axis.
d
The Class Mid-Point or Class Mark
e
is the middle value of a class. It lies
halfway between the lower class limit
h
and the upper class limit of a class
and can be ascertained in the
T s
following manner:
R
Class Mid-Point or Class Mark =
li
b
(Upper Class Limit + Lower Class
E
Limit)/2.....................................(1)
u
Fig.3.1: Diagrammatic Presentation of
The class mark or mid-value of
C
Frequency Distribution of Data.
each class is used to represent the
p
How to prepare a Frequency
class. Once raw data are grouped into
N re
Distribution?
classes, individual observations are
not used in further calculations. While preparing a frequency
Instead, the class mark is used. distribution from the raw data of Table
© e
3.1, the following four questions need
TABLE 3.3 to be addressed:
The Lower Class Limits, the Upper Class 1. How many classes should we
b
Limits and the Class Mark
have?
Class Frequency Lower Upper Class 2. What should be the size of each
Class Class Marks
o
Limit Limit
class?
3. How should we determine the class
t
0–10 1 0 10 5
10–20 8 10 20 15
limits?
4. How should we get the frequency
t
20–30 6 20 30 25
30–40 7 30 40 35 for each class?
o
40–50 21 40 50 45
50–60 23 50 60 55 How many classes should we have?
60–70 19 60 70 65
n
70–80 6 70 80 75 Before we determine the number
80–90 5 80 90 85 of classes, we first find out as to what
90–100 4 90 100 95
extent the variable in hand changes
Frequency Curve is a graphic in value. Such variations in variable’s
representation of a frequency value are captured by its range. The
distribution. Fig. 3.1 shows the Range is the difference between the
diagrammatic presentation of the largest and the smallest values of the
3 0 STATISTICS FOR ECONOMICS
variable. A large range indicates that example, suppose the range is 100
the values of the variable are widely and the class interval is 50. Then the
spread. On the other hand, a small number of classes would be just 2
range indicates that the values of the (i.e.100/50 = 2). Though there is no
variable are spread narrowly. In our hard-and-fast rule to determine the
d
example the range of the variable number of classes, the rule of thumb
e
“marks of a student” are 100 because often used is that the number of
the minimum marks are 0 and the classes should be between 5 and 15.
h
maximum marks 100. It indicates that In our example we have chosen to
the variable has a large variation.
T s
have 10 classes. Since the range is 100
After obtaining the value of range,
i
and the class interval is 10, the
it becomes easier to determine the
R l
number of classes is 100/10 =10.
number of classes once we decide the
b
class interval. Note that range is the
What should be the size of each
E
sum of all class intervals. If the class
class?
u
intervals are equal then range is the
C
product of the number of classes and The answer to this question depends
p
class interval of a single class. on the answer to the previous
N re
question. The equality (2) shows that
Range = Number of Classes × Class
given the range of the variable, we can
Interval........................................(2)
determine the number of classes once
© e
we decide the class interval. Similarly,
Activities we can determine the class interval
Find the range of the following: once we decide the number of classes.
b
• population of India in Example 1, Thus we find that these two decisions
• yield of wheat in Example 2. are inter-linked with one another. We
o
Given the value of range, the cannot decide on one without deciding
number of classes would be large if on the other.
t
we choose small class intervals. A
In Example 4, we have the number
frequency distribution with too many
t
of classes as 10. Given the value of
classes would look too large. Such a
range as 100, the class intervals are
o
distribution is not easy to handle. So
we want to have a reasonably compact automatically 10 by the equality (2).
Note that in the present context we
n
set of data. On the other hand, given
the value of range if we choose a class have chosen class intervals that are
interval that is too large then the equal in magnitude. However we could
number of classes becomes too small. have chosen class intervals that are
The data set then may be too compact not of equal magnitude. In that case,
and we may not like the loss of the classes would have been of
information about its diversity. For unequal width.
ORGANISATION OF DATA 3 1
How should we determine the class the lower class limit of that class. Had
limits? we done that we would have excluded
When we classify raw data of a the observation 0. The upper class
continuous variable as a frequency limit of the first class: 0–10 is then
obtained by adding class interval with
d
distribution, we in effect, group the
individual observations into classes. lower class limit of the class. Thus the
e
The value of the upper class limit of a upper class limit of the first class
class is obtained by adding the class becomes 0 + 10 = 10. And this proce-
h
interval with the value of the lower dure is followed for the other classes
s
as well.
T
class limit of that class. For example,
i
the upper class limit of the class 20– Have you noticed that the upper
l
30 is 20 + 10 = 30 where 20 is the class limit of the first class is equal to
R
lower class limit and 10 is the class the lower class limit of the second
b
interval. This method is repeated for class? And both are equal to 10. This
E
other classes as well. is observed for other classes as well.
u
But how do we decide the lower Why? The reason is that we have used
C
class limit of the first class? That is to the Exclusive Method of classification
p
say, why 0 is the lower class limit of of raw data. Under the method we
N re
the first class: 0–10? It is because we form classes in such a way that the
chose the minimum value of the lower limit of a class coincides with
variable as the lower limit of the first the upper class limit of the previous
© e
class. In fact, we could have chosen a class.
value less than the minimum value of The problem, we would face next,
the variable as the lower limit of the is how do we classify an observation
b
first class. Similarly, for the upper that is not only equal to the upper
class limit for the last class we could class limit of a particular class but is
have chosen a value greater than the
o
also equal to the lower class limit of
maximum value of the variable. It is
t
the next class. For example, we find
important to note that, when a
observation 30 to be equal to the
frequency distribution is being
t
upper class limit of the class 20–30
constructed, the class limits should
and it is equal to the lower class limit
be so chosen that the mid-point or
o
class mark of each class coincide, as of class 30–40. Then, in which of the
far as possible, with any value around two classes: 20–30 or 30–40 should
n
which the data tend to be we put the observation 30? We can put
concentrated. it either in class 20–30 or in class 30–
In our example on marks of 100 40. It is a dilemma that one commonly
students, we chose 0 as the lower limit faces while classifying data in
of the first class: 0–10 because the overlapping classes. This problem is
minimum marks were 0. And that is solved by the rule of classification in
why, we could not have chosen 1 as the Exclusive Method.
3 2 STATISTICS FOR ECONOMICS
d
lower class limit of the next class. In 900–999 100
this way the continuity of the data is 1000–1099 200
e
maintained. That is why this method 1100–1199 150
1200–1299 40
of classification is most suitable in
h
1300–1399 10
case of data of a continuous variable.
T s
Total 550
Under the method, the upper class limit
i
is excluded but the lower class limit of
in the class: 800–899 those employees
R l
a class is included in the interval. Thus
whose income is either Rs 800, or
an observation that is exactly equal
b
between Rs 800 and Rs 899, or Rs
E
to the upper class limit, according to
899. If the income of an employee is
the method, would not be included in
u
exactly Rs 900 then he is put in the
that class but would be included in
C
next class: 900–999.
the next class. On the other hand, if
p
it were equal to the lower class limit Adjustment in Class Interval
N re
then it would be included in that class.
In our example on marks of students, A close observation of the Inclusive
the observation 40, that occurs twice, Method in Table 3.4 would show that
© e
in the raw data of Table 3.1 is not though the variable “income” is a
included in the class: 30–40. It is continuous variable, no such
included in the next class: 40–50. That continuity is maintained when the
b
is why we find the frequency corres- classes are made. We find “gap” or
ponding to the class 30–40 to be 7 discontinuity between the upper limit
of a class and the lower limit of the
o
instead of 9.
next class. For example, between the
t
There is another method of forming
upper limit of the first class: 899 and
classes and it is known as the
the lower limit of the second class:
Inclusive Method of classification.
t
900, we find a “gap” of 1. Then how
do we ensure the continuity of the
o
Inclusive Method
variable while classifying data? This
In comparison to the exclusive method, is achieved by making an adjustment
n
the Inclusive Method does not exclude in the class interval. The adjustment
the upper class limit in a class is done in the following way:
interval. It includes the upper class 1. Find the difference between the
in a class. Thus both class limits are lower limit of the second class and
parts of the class interval. the upper limit of the first class.
For example, in the frequency For example, in Table 3.4 the lower
distribution of Table 3.4 we include limit of the second class is 900 and
ORGANISATION OF DATA 3 3
d
3. Subtract the value obtained in (2) 899.5–999.5 100
999.5–1099.5 200
e
from lower limits of all classes 1099.5–1199.5 150
(lower class limit – 0.5) 1199.5–1299.5 40
h
4. Add the value obtained in (2) to 1299.5–1399.5 10
upper limits of all classes (upper
T s
Total 550
class limit + 0.5).
i
After the adjustment that restores
R l
continuity of data in the frequency How should we get the frequency
for each class?
b
distribution, the Table 3.4 is modified
E
into Table 3.5 In simple terms, frequency of an
u
After the adjustments in class observation means how many times
C
limits, the equality (1) that determines that observation occurs in the raw
p
the value of class-mark would be data. In our Table 3.1, we observe that
modified as the following: the value 40 occurs thrice; 0 and 10
N re
Adjusted Class Mark = (Adjusted occur only once; 49 occurs five times
Upper Class Limit + Adjusted Lower and so on. Thus the frequency of 40
is 3, 0 is 1, 10 is 1, 49 is 5 and so on.
© e
Class Limit)/2.
But when the data are grouped into
TABLE 3.6
b
Tally Marking of Marks of 100 Students in Mathematics
Class Observations Tally Frequency Class
Mark Mark
o
0–10 0 / 1 5
t
10–20 10, 14, 17, 12, 14, 12, 14, 14 //// /// 8 15
20–30 25, 25, 20, 22, 25, 28 //// / 6 25
30–40 30, 37, 34, 39, 32, 30, 35, //// // 7 35
t
40–50 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, //// //// ////
49, 46, 41, 40, 43, 48, 48, 49, 49, 40, //// /
o
41 21 45
50–60 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, //// //// ////
n
59, 56, 59, 57, 59, 55, 56, 51, 55, 56, //// ///
55, 50, 54 23 55
60–70 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, //// //// ////
62, 61, 66, 60, 65, 62, 65, 66, 65 //// 19 65
70–80 70, 75, 70, 76, 70, 71 ///// 6 75
80–90 82, 82, 82, 80, 85 //// 5 85
90–100 90, 100, 90, 90 //// 4 95
Total 100
3 4 STATISTICS FOR ECONOMICS
d
class. further statistical calculations. In
e
Example 4, the class 20–30 contains
Finding class frequency by tally 6 observations: 25, 25, 20, 22, 25 and
h
marking 28. So when these data are grouped
as a class 20–30 in the frequency
T s
A tally (/) is put against a class for
distribution, the latter provides only
i
each student whose marks are
the number of records in that class
l
included in that class. For example, if
R
(i.e. frequency = 6) but not their actual
the marks obtained by a student are
b
values. All values in this class are
57, we put a tally (/) against class 50
E
assumed to be equal to the middle
–60. If the marks are 71, a tally is put
u
value of the class interval or class
against the class 70–80. If someone
C
mark (i.e. 25). Further statistical
obtains 40 marks, a tally is put
p
calculations are based only on the
against the class 40–50. Table 3.6
values of class mark and not on the
N re
shows the tally marking of marks of
values of the observations in that
100 students in mathematics from
class. This is true for other classes as
Table 3.1.
well. Thus the use of class mark
© e
The counting of tally is made easier
instead of the actual values of the
when four of them are put as //// observations in statistical methods
and the fifth tally is placed across involves considerable loss of
b
them as . Tallies are then counted information.
as groups of five. So if there are 16
tallies in a class, we put them as
o
Frequency distribution with
/ for the sake of
t
unequal classes
convenience. Thus frequency in a
class is equal to the number of tallies By now you are familiar with
t
against that class. frequency distributions of equal class
intervals. You know how they are
o
Loss of Information constructed out of raw data. But in
some cases frequency distributions
n
The classification of data as a with unequal class intervals are more
frequency distribution has an appropriate. If you observe the
inherent shortcoming. While it frequency distribution of Example 4,
summarises the raw data making it as in Table 3.6, you will notice that
concise and comprehensible, it does most of the observations are
not show the details that are found in concentrated in classes 40–50, 50–60
raw data. There is a loss of information and 60–70. Their respective frequen-
ORGANISATION OF DATA 3 5
cies are 21, 23 and 19. It means that terms of unequal classes. Each of the
out of 100 observations, 63 classes 40–50, 50–60 and 60–70 are
(21+23+19) observations are split into two classes. The class 40–
concentrated in these classes. These 50 is divided into 40–45 and 45–50.
classes are densely populated with The class 50–60 is divided into 50– 55
d
observations. Thus, 63 percent of data and 55–60. And class 60–70 is divided
e
lie between 40 and 70. The remaining into 60–65 and 65–70. The new
37 percent of data are in classes classes 40–45, 45–50, 50–55, 55–60,
h
0–10, 10–20, 20–30, 30–40, 70–80, 60–65 and 65–70 have class interval
80–90 and 90–100. These classes are
T s
of 5. The other classes: 0–10, 10–20,
sparsely populated with observations.
i
20–30, 30–40, 70–80, 80–90 and 90–
Further you will also notice that
R l
100 retain their old class interval of
observations in these classes deviate
10. The last column of this table shows
b
more from their respective class marks
the new values of class marks for
E
than in comparison to those in other
these classes. Compare them with the
u
classes. But if classes are to be formed
old values of class marks in Table 3.6.
C
in such a way that class marks
Notice that the observations in these
p
coincide, as far as possible, to a value
around which the observations in a classes deviated more from their old
N re
class tend to concentrate, then in that class mark values than their new class
case unequal class interval is more mark values. Thus the new class mark
appropriate. values are more representative of the
© e
Table 3.7 shows the same data in these classes than the old
frequency distribution of Table 3.6 in values.
b
TABLE 3.7
Frequency Distribution of Unequal Classes
Class Observations Frequency Class
o
Mark
t
0–10 0 1 5
10–20 10, 14, 17, 12, 14, 12, 14, 14 8 15
20–30 25, 25, 20, 22, 25, 28 6 25
t
30–40 30, 37, 34, 39, 32, 30, 35, 7 35
40–45 42, 44, 40, 44, 41, 40, 43, 40, 41 9 42.5
o
45–50 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 12 47.5
50–55 51, 53, 51, 50, 51, 50, 54 7 52.5
n
55–60 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55,
56, 55, 56, 55 16 57.5
60–65 60, 64, 62, 64, 64, 60, 62, 61, 60, 62, 10 62.5
65–70 66, 69, 66, 69, 66, 65, 65, 66, 65 9 67.5
70–80 70, 75, 70, 76, 70, 71 6 75
80–90 82, 82, 82, 80, 85 5 85
90–100 90, 100, 90, 90 4 95
Total 100
3 6 STATISTICS FOR ECONOMICS
d
2 15
3 25
e
4 35
5 10
h
6 5
7 3
T s
8 2
i
Total 100
R l
The variable “size of the
b
household” is a discrete variable that
E
Fig. 3.2: Frequency Curve only takes integral values as shown
u
in the table. Since it does not take any
C
fractional value between two adjacent
p
Activity
integral values, there are no classes
N re
• If you compare Figure 3.2 with in this frequency array. Since there
Figure 3.1, what do you observe? are no classes in a frequency array
Do you find any difference
there would be no class intervals. As
© e
between them? Can you explain
the difference? the classes are absent in a discrete
frequency distribution, there is no
class mark as well.
b
Frequency array
6. BIVARIATE FREQUENCY DISTRIBUTION
So far we have discussed the
o
classification of data for a continuous The frequency distribution of a single
t
variable using the example of variable is called a Univariate
percentage marks of 100 students in Distribution. The example 3.3 shows
t
mathematics. For a discrete variable, the univariate distribution of the
single variable “marks of a student”.
o
the classification of its data is known
as a Frequency Array. Since a discrete A Bivariate Frequency Distribution is
n
variable takes values and not the frequency distribution of two
intermediate fractional values variables.
between two integral values, we have Table 3.9 shows the frequency
frequencies that correspond to each distribution of two variable sales and
of its integral values. advertisement expenditure (in Rs.
The example in Table 3.8 lakhs) of 20 companies. The values of
illustrates a Frequency Array. sales are classed in different columns
ORGANISATION OF DATA 3 7
TABLE 3.9
Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure
(in Thousand Rs) of 20 Firms
115–125 125–135 135–145 145–155 155–165 165–175 Total
62–64 2 1 3
d
64–66 1 3 4
66–68 1 1 2 1 5
e
68–70 2 2 4
70–72 1 1 1 1 4
h
Total 4 5 6 3 1 1 20
T is
and the values of advertisement unclassified. Once the data is
R l
expenditure are classed in different collected, the next step is to classify
rows. Each cell shows the frequency them for further statistical analysis.
b
of the corresponding row and column
E
Classification brings order in the
values. For example, there are 3 firms
data.
u
whose sales are between Rs 135–145
C
lakhs and their advertisement The chapter enables you to know how
p
expenditures are between Rs 64–66 data can be classified through a
thousands. The use of a bivariate frequency distribution in a
N re
distribution would be taken up in comprehensive manner. Once you
Chapter 8 on correlation. know the techniques of classification,
7. CONCLUSION it will be easy for you to construct a
© e
The data collected from primary and frequency distribution, both for
secondary sources are raw or continuous and discrete variables.
b
t o
Recap
• Classification brings order to raw data.
t
• A Frequency Distribution shows how the different values of a variable
are distributed in different classes along with their corresponding
o
class frequencies.
• The upper class limit is excluded but lower class limit is included in
n
the Exclusive Method.
• Both the upper and the lower class limits are included in the Inclusive
Method.
• In a Frequency Distribution, further statistical calculations are based
only on the class mark values, instead of values of the observations.
• The classes should be formed in such a way that the class mark
of each class comes as close as possible, to a value around
which the observations in a class tend to concentrate.
3 8 STATISTICS FOR ECONOMICS
EXERCISES
d
(b) The product of upper class limit and the lower class limit.
(c) The ratio of the upper class limit and the lower class limit.
e
(d) None of the above.
(ii) The frequency distribution of two variables is known as
h
(a) Univariate Distribution
T s
(b) Bivariate Distribution
i
(c) Multivariate Distribution
l
(d) None of the above
R
(iii) Statistical calculations in classified data are based on
b
(a) the actual values of observations
E
(b) the upper class limits
u
(c) the lower class limits
C
(d) the class midpoints
p
(iv) Under Exclusive method,
N re
(a) the upper class limit of a class is excluded in the class interval
(b) the upper class limit of a class is included in the class interval
(c) the lower class limit of a class is excluded in the class interval
(d) the lower class limit of a class is included in the class interval
© e
(v) Range is the
(a) difference between the largest and the smallest observations
b
(b) difference between the smallest and the largest observations
(c) average of the largest and the smallest observations
(d) ratio of the largest to the smallest observation
o
2. Can there be any advantage in classifying things? Explain with an example
t
from your daily life.
3. What is a variable? Distinguish between a discrete and a continuous
t
variable.
o
4. Explain the ‘exclusive’ and ‘inclusive’ methods used in classification of
data.
n
5. Use the data in Table 3.2 that relate to monthly household expenditure
(in Rs) on food of 50 households and
(i) Obtain the range of monthly household expenditure on food.
(ii) Divide the range into appropriate number of class intervals and obtain
the frequency distribution of expenditure.
(iii) Find the number of households whose monthly expenditure on food is
(a) less than Rs 2000
(b) more than Rs 3000
ORGANISATION OF DATA 3 9
d
1 3 2 2 2 2 1 2 1 2 2 3 3 3 3
e
3 3 2 3 2 2 6 1 6 2 1 5 1 5 3
2 4 2 7 4 2 4 3 4 2 0 3 1 4 3
h
7. What is ‘loss of information’ in classified data?
T s
8. Do you agree that classified data is better than raw data?
i
9. Distinguish between univariate and bivariate frequency distribution.
R l
10. Prepare a frequency distribution by inclusive method taking class interval
of 7 from the following data:
28
1
17
8
15
3
E
22
10
u b
29
5
21
20
23
16
27
12
18
8
12
4
7
33
2
27
9
21
4
15
6
9
C
3 36 27 18 9 2 4 6 32 31 29 18 14 13
p
15 11 9 7 1 5 37 32 28 26 24 20 19 25
N re
19 20
Suggested Activity
© e
• From your old mark-sheets find the marks that you obtained in
mathematics in the previous classes. Arrange them year-wise. Check
whether the marks you have secured in the subject is a variable or
b
not. Also see, if over the years, you have improved in mathematics.
t o
o t
n
CHAPTER
Presentation of Data
e d
T s h
R li
E u b
C
N re p
© e
• Textual or Descriptive presentation
Studying this chapter should • Tabular presentation
enable you to: • Diagrammatic presentation.
b
• present data using tables;
• represent data using appropriate
diagrams.
2. TEXTUAL PRESENTATION OF DATA
o
In textual presentation, data are
t
1. I N T R O D U C T I O N described within the text. When the
quantity of data is not too large this form
You have already learnt in previous
t
of presentation is more suitable. Look
chapters how data are collected and at the following cases:
o
organised. As data are generally
voluminous, they need to be put in a Case 1
n
compact and presentable form. This In a bandh call given on 08 September
chapter deals with presentation of data 2005 protesting the hike in prices of
precisely so that the voluminous data petrol and diesel, 5 petrol pumps were
collected could be made usable readily found open and 17 were closed whereas
and are easily comprehended. There are 2 schools were closed and remaining 9
generally three forms of presentation of schools were found open in a town of
data: Bihar.
PRESENTATION OF DATA 4 1
d
females against 53 crore males. 74 crore
people resided in rural India and only information that relates an attribute of
e
28 crore lived in towns or cities. While gender ("male", "female" or total) with a
there were 62 crore non-worker number (literacy percentages of rural
h
population against 40 crore workers in people, urban people and total). The
most important advantage of tabulation
T s
the entire country, urban population
is that it organises data for further
i
had an even higher share of non-
statistical treatment and decision-
l
workers (19 crores) against the workers
R
(9 crores) as compared to the rural making. Classification used in
b
population where there were 31 crore tabulation is of four kinds:
E
workers out of a 74 crore population.... • Qualitative
u
In both the cases data have been • Quantitative
C
presented only in the text. A serious • Temporal and
p
drawback of this method of presentation • Spatial
N re
is that one has to go through the
complete text of presentation for Qualitative classification
comprehension but at the same time, it When classification is done according
© e
enables one to emphasise certain points to qualitative characteristics like social
of the presentation. status, physical status, nationality, etc.,
it is called qualitative classification. For
b
example, in Table 4.1 the characteris-
tics for classification are sex and
location which are qualitative in nature.
t o
TABLE 4.1
Literacy in Bihar by sex and location (per cent)
t
Location Total
Sex Rural Urban
o
Male 57.70 80.80 60.32
Female 30.03 63.30 33.57
n
Total 44.42 72.71 47.53
3. TABULAR P RESENTATION OF DATA
Source: Census of India 2001, Provisional
In a tabular presentation, data are Population Totals.
presented in rows (read horizontally)
and columns (read vertically). For Quantitative classification
example see Table 4.1 below tabulating In quantitative classification, the data
information about literacy rates. It has are classified on the basis of
4 2 STATISTICS FOR ECONOMICS
d
production, income, etc are quantitative may be in hours, days, weeks, months,
characteristics. Classes are formed by years, etc. For example, see Table 4.3.
e
assigning limits called class limits for TABLE 4.3
the values of the characteristic under Yearly sales of a tea shop
h
consideration. An example of from 1995 to 2000
T s
quantitative classification is Table 4.2. Years Sale (Rs in lakhs)
i
1995 79.2
R l
TABLE 4.2 1996 81.3
Distribution of 542 respondents by 1997 82.4
b
their age in an election study in Bihar 1998 80.5
E
1999 100.2
Age group No. of
2000 91.2
u
(yrs) respondents Per cent
C
20–30 3 0.55 Data Source: Unpublished data.
p
30–40 61 11.25
In this table the classifying
N re
40–50 132 24.35
50–60 153 28.24 characteristic is year and takes values
60–70 140 25.83 in the scale of time.
70–80 51 9.41
© e
80–90 2 0.37
All 542 100.00 Activity
• Go to your library and collect
b
Source: Assembly election Patna central
data on the number of books in
constituency 2005, A.N. Sinha Institute of Social
Studies, Patna.
economics, the library had at
the end of the year for the last
o
Here classifying characteristic is age ten years and present the data
t
in years and is quantifiable. in a table.
t
Activities Spatial classification
o
• Construct a table presenting When classification is done in such a
data on preferential liking of the way that place becomes the classifying
n
students of your class for Star variable, it is called spatial
News, Zee News, BBC World, classification. The place may be a
CNN, Aaj Tak and DD News. village/town, block, district, state,
• Prepare a table of country, etc.
(i) heights (in cm) and
Here the classifying characteristic is
(ii) weights (in kg) of students
country of the world. Table 4.4 is an
of your class.
example of spatial classification.
PRESENTATION OF DATA 4 3
d
Germany 5.6
number that distinguishes one table
Other EU 14.7 from another. It is given at the top or
e
UK 5.7 at the beginning of the title of the table.
Japan 4.9 Generally, table numbers are whole
h
Russia 2.1
Other East Europe 0.6
numbers in ascending order if there are
T s
OPEC 10.5 many tables in a book. Subscripted
i
Asia 19.0 numbers like 1.2, 3.1, etc. are also in
l
Other LDCs 5.6 use for identifying the table according
R
Others 9.5
to its location. For example, Table
b
All 100.0 number 4.5 may read as fifth table
E
(Total Exports: US $ 33658.5 million)
of the fourth chapter and so on.
u
(See Table 4.5)
C p
Activity (ii) Title
N re
• Construct a table presenting The title of a table narrates about the
data collected from students of
contents of the table. It has to be very
your class according to their
clear, brief and carefully worded so that
© e
native states/residential
locality. the interpretations made from the table
are clear and free from any ambiguity.
4. TABULATION DATA PARTS It finds place at the head of the table
b
OF AND OF
A TABLE
succeeding the table number or just
below it. (See Table 4.5).
To construct a table it is important to
o
learn first what are the parts of a good (iii) Captions or Column Headings
t
statistical table. When put together in
At the top of each column in a table a
a systematically ordered manner these
t
column designation is given to explain
parts form a table. The most simple way
figures of the column. This is
o
of conceptualising a table may be data called caption or column heading.
presented in rows and columns (See Table 4.5)
n
alongwith some explanatory notes.
Tabulation can be done using one- (iv) Stubs or Row Headings
way, two-way or three-way Like a caption or column heading each
classification depending upon the row of the table has to be given a
number of characteristics involved. A heading. The designations of the rows
good table should essentially have the are also called stubs or stub items, and
following: the complete left column is known as
4 4 STATISTICS FOR ECONOMICS
stub column. A brief description of the were non-workers in 2001. (See Table
row headings may also be given at the 4.5).
left hand top in the table. (See Table
4.5). (vi) Unit of Measurement
d
The unit of measurement of the figures
(v) Body of the Table in the table (actual data) should always
e
Body of a table is the main part and it be stated alongwith the title if the unit
contains the actual data. Location of does not change throughout the table.
h
any one figure/data in the table is fixed If different units are there for rows or
T s
and determined by the row and column columns of the table, these units must
i
of the table. For example, data in the be stated alongwith ‘stubs’ or
l
second row and fourth column indicate ‘captions’. If figures are large, they
R
that 25 crore females in rural India should be rounded up and the method
Table Number
↓
E u b Title
↓
C p
Table 4.5 Population of India according to workers and non-workers by gender and location
N re
(Crore)
Column Headings/Captions ↑
↓ Units
© e
Location Gender Workers Non-worker Total
Main Marginal Total
b
Male 17 3 20 18 38
Row Headings/stubs
Rural
o
Male 7 1 8 7 15
Urban
t
Female 1 0 1 12 13
Total 8 1 9 19 28
t
Male 24 4 28 25 53
All
Female 7 5 12 37 49
o
Total 31 9 40 62 102
n
Source : Census of India 2001
↑ Foot note : Figures are rounded to nearest crore
Source note
↑
Footnote
(Note : Table 4.5 presents the same data in tabular form already presented through case 2 in
textual presentation of data)
PRESENTATION OF DATA 4 5
d
It is a brief statement or phrase
indicating the source of data presented important ones are the following:
e
in the table. If more than one source is (i) Geometric diagram
there, all the sources are to be written (ii) Frequency diagram
h
in the source note. Source note is (iii) Arithmetic line graph
T s
generally written at the bottom of the
Geometric Diagram
i
table. (See Table 4.5).
l
Bar diagram and pie diagram come in
R
(viii) Footnote the category of geometric diagram for
b
presentation of data. The bar diagrams
E
Footnote is the last part of the table.
Footnote explains the specific feature are of three types – simple, multiple and
u
of the data content of the table which is component bar diagrams.
C
not self explanatory and has not been
p
explained earlier. Bar Diagram
N re
Simple Bar Diagram
Bar diagram comprises a group of
Activities
equispaced and equiwidth rectangular
© e
• How many rows and columns bars for each class or category of data.
are essentially required to form
Height or length of the bar reads the
a table?
b
magnitude of data. The lower end of the
• Can the column/row headings
of a table be quantitative? bar touches the base line such that the
height of a bar starts from the zero unit.
o
Bars of a bar diagram can be visually
t
5. D I A G R A M M A T I C PRESENTATION OF compared by their relative height and
DATA accordingly data are comprehended
t
quickly. Data for this can be of
This is the third method of presenting frequency or non-frequency type. In
o
data. This method provides the non-frequency type data a particular
quickest understanding of the actual characteristic, say production, yield,
n
situation to be explained by data in population, etc. at various points of
comparison to tabular or textual time or of different states are noted and
presentations. Diagrammatic presenta- corresponding bars are made of the
tion of data translates quite effectively respective heights according to the
the highly abstract ideas contained in values of the characteristic to construct
numbers into more concrete and easily the diagram. The values of the
comprehensible form. characteristics (measured or counted)
4 6 STATISTICS FOR ECONOMICS
Activity
d
• You had constructed a table
presenting the data about the
e
students of your class. Draw a
bar diagram for the same table.
h
Different types of data may require
different modes of diagrammatical
T is
representation. Bar diagrams are
l
suitable both for frequency type and A category that has a longer bar
R
non-frequency type variables and (literacy of Kerala) than another
b
attributes. Discrete variables like family category (literacy of West Bengal), has
E
size, spots on a dice, grades in an more of the measured (or enumerated)
u
examination, etc. and attributes such characteristics than the other. Bars
C
as gender, religion, caste, country, etc. (also called columns) are usually used
p
can be represented by bar diagrams. in time series data (food grain
N re
Bar diagrams are more convenient for produced between 1980–2000,
non-frequency data such as income- decadal variation in work participation
TABLE 4.6
© e
Literacy Rates of Major States of India
2001 1991
b
Major Indian States Person Male Female Person Male Female
Andhra Pradesh (AP) 60.5 70.3 50.4 44.1 55.1 32.7
Assam (AS) 63.3 71.3 54.6 52.9 61.9 43.0
o
Bihar (BR) 47.0 59.7 33.1 37.5 51.4 22.0
t
Jharkhand (JH) 53.6 67.3 38.9 41.4 55.8 31.0
Gujarat (GJ) 69.1 79.7 57.8 61.3 73.1 48.6
Haryana (HR) 67.9 78.5 55.7 55.8 69.1 40.4
t
Karnataka (KA) 66.6 76.1 56.9 56.0 67.3 44.3
Kerala (KE) 90.9 94.2 87.7 89.8 93.6 86.2
o
Madhya Pradesh (MP) 63.7 76.1 50.3 44.7 58.5 29.4
Chhattisgarh (CH) 64.7 77.4 51.9 42.9 58.1 27.5
Maharashtra (MR) 76.9 86.0 67.0 64.9 76.6 52.3
n
Orissa (OR) 63.1 75.3 50.5 49.1 63.1 34.7
Punjab (PB) 69.7 75.2 63.4 58.5 65.7 50.4
Rajasthan (RJ) 60.4 75.7 43.9 38.6 55.0 20.4
Tamil Nadu (TN) 73.5 82.4 64.4 62.7 73.7 51.3
Uttar Pradesh (UP) 56.3 68.8 42.2 40.7 54.8 24.4
Uttaranchal (UT) 71.6 83.3 59.6 57.8 72.9 41.7
West Bengal (WB) 68.6 77.0 59.6 57.7 67.8 46.6
India 64.8 75.3 53.7 52.2 64.1 39.3
PRESENTATION OF DATA 4 7
e d
T s h
R li
E u b
C p
Fig. 4.1: Bar diagram showing literacy rates (person) of major states of India, 2001.
N re
rate, registered unemployed over the different years, marks obtained in
years, literacy rates, etc.) (Fig 4.2). different subjects in different classes,
Bar diagrams can have different etc.
© e
forms such as multiple bar diagram
and component bar diagram. Component Bar Diagram
b
Activities Component bar diagrams or charts
(Fig.4.3), also called sub-diagrams, are
• How many states (among the very useful in comparing the sizes of
o
major states of India) had
different component parts (the elements
t
higher female literacy rate than
or parts which a thing is made up of)
the national average in 2001?
and also for throwing light on the
t
• Has the gap between maximum
and minimum female literacy relationship among these integral parts.
o
rates over the states in two For example, sales proceeds from
consecutive census years 2001 different products, expenditure pattern
n
and 1991 declined? in a typical Indian family (components
being food, rent, medicine, education,
Multiple Bar Diagram power, etc.), budget outlay for receipts
Multiple bar diagrams (Fig.4.2) are and expenditures, components of
used for comparing two or more sets of labour force, population etc.
data, for example income and Component bar diagrams are usually
expenditure or import and export for shaded or coloured suitably.
4 8 STATISTICS FOR ECONOMICS
e d
T s h
R li
Fig. 4.2: Multiple bar (column) diagram showing female literacy rates over two census years 1991
and 2001 by major states of India.
E b
Interpretation: It can be very easily derived from Figure 4.2 that female literacy rate over the years
was on increase throughout the country. Similar other interpretations can be made from the figure
u
like the state of Rajasthan experienced the sharpest rise in female literacy, etc.
C
TABLE 4.7 its height equivalent to the total value
p
Enrolment by gender at schools (per cent) of the bar [for per cent data the bar
N re
of children aged 6–14 years in a district of
Bihar
height is of 100 units (Figure 4.3)].
Otherwise the height is equated to total
Enrolled Out of school
value of the bar and proportional
© e
Gender (per cent) (per cent)
heights of the components are worked
Boy 91.5 8.5
out using unitary method. Smaller
Girl 58.6 41.4
components are given priority in
b
All 78.0 22.0
parting the bar.
Data Source: Unpublished data
Pie Diagram
o
A component bar diagram shows
t
the bar and its sub-divisions into two A pie diagram is also a component
or more components. For example, the
t
bar might show the total population of
children in the age-group of 6–14 years.
o
The components show the proportion
of those who are enrolled and those
n
who are not. A component bar diagram
might also contain different component
bars for boys, girls and the total of
children in the given age group range,
as shown in Figure 4.3. To construct a
component bar diagram, first of all, a Fig. 4.3: Enrolment at primary level in a district
bar is constructed on the x-axis with of Bihar (Component Bar Diagram)
PRESENTATION OF DATA 4 9
d
TABLE 4.8
Distribution of Indian population by their
e
working status (crore)
Status Population Per cent Angular
h
Component
Marginal Worker 9 8.8 32°
T s
Main Worker 31 30.4 109°
i
Non-Worker 62 60.8 219°
R l
All 102 100.0 360°
E u b
is also called a pie chart. The circle is
C
divided into as many parts as there are
p
components by drawing straight lines
N re
from the centre to the circumference.
Pie charts usually are not drawn
with absolute values of a category. The
© e
values of each category are first Fig. 4.4: Pie diagram for different categories of
Indian population according to working status
expressed as percentage of the total
2001.
value of all the categories. A circle in a
b
pie chart, irrespective of its value of
Activities
radius, is thought of having 100 equal
o
parts of 3.6° (360°/100) each. To find • Represent data presented
t
out the angle, the component shall through Figure 4.4 by a
component bar diagram.
subtend at the centre of the circle, each • Does the area of a pie have any
t
percentage figure of every component bearing on total value of the
is multiplied by 3.6°. An example of this data to be represented by the
o
conversion of percentages of pie diagram?
components into angular components
n
of the circle is shown in Table 4.8. Frequency Diagram
It may be interesting to note that Data in the form of grouped frequency
data represented by a component bar distributions are generally represented
diagram can also be represented by frequency diagrams like histogram,
equally well by a pie chart, the only frequency polygon, frequency curve
requirement being that absolute values and ogive.
5 0 STATISTICS FOR ECONOMICS
d
(Rs) earners (f)
boundaries (along X-axis) and with
45–49 2 2 85
e
areas proportional to the class
50–54 3 5 83
frequency (Fig.4.5). If the class intervals 55–59 5 10 80
h
are of equal width, which they generally 60–64 3 13 75
65–69 6 19 72
are, the area of the rectangles are
T s
70–74 7 26 66
i
proportional to their respective 75–79 12 38 59
l
frequencies. However, in some type of 80–84 13 51 47
R
85–89 9 60 34
data, it is convenient, at times
b
90–94 7 67 25
necessary, to use varying width of class
E
95–99 6 73 18
intervals. For example, when tabulating 100–104 4 77 12
u
105–109 2 79 8
deaths by age at death, it would be very
C
110–114 3 82 6
p
meaningful as well as useful too to have 115–119 3 85 3
very short age intervals (0, 1, 2, ..., yrs/
N re
Source: Unpublished data
0, 7, 28, ..., days) at the beginning
when death rates are very high Since histograms are rectangles, a line
parallel to the base line and of the same
© e
compared to deaths at most other
magnitude is to be drawn at a vertical
higher age segments of the population. distance equal to frequency (or
For graphical representation of such frequency density) of the class interval.
b
data, height for area of a rectangle is A histogram is never drawn for a
the quotient of height (here frequency) discrete variable/data. Since in an
interval or ratio scale the lower class
o
and base (here width of the class
boundary of a class interval fuses with
t
interval). When intervals are equal, that
the upper class boundary of the
is, when all rectangles have the same
previous interval, equal or unequal, the
t
base, area can conveniently be rectangles are all adjacent and there is
represented by the frequency of any no open space between two consecutive
o
interval for purposes of comparison. rectangles. If the classes are not
When bases vary in their width, the continuous they are first converted into
n
heights of rectangles are to be adjusted continuous classes as discussed in
to yield comparable measurements. Chapter 3. Sometimes the common
portion between two adjacent
The answer in such a situation is
rectangles (Fig.4.6) is omitted giving a
frequency density (class frequency better impression of continuity. The
divided by width of the class interval) resulting figure gives the impression of
instead of absolute frequency. a double staircase.
PRESENTATION OF DATA 5 1
d
The spacing and the width or the area
of bars are all arbitrary. It is the height coordinate of the dotted vertical line
e
and not the width or the area of the bar gives the mode.
that really matters. A single vertical line
h
could have served the same purpose Frequency Polygon
T s
as a bar of same width. Moreover, in A frequency polygon is a plane
i
histogram no space is left in between bounded by straight lines, usually four
R l
two rectangles, but in a bar diagram or more lines. Frequency polygon is an
some space must be left between alternative to histogram and is also
E b
consecutive bars (except in multiple derived from histogram itself. A
bar or component bar diagram). frequency polygon can be fitted to a
u
Although the bars have the same histogram for studying the shape of the
C
width, the width of a bar is unimportant
p
curve. The simplest method of drawing
for the purpose of comparison. The a frequency polygon is to join the
N re
width in a histogram is as important midpoints of the topside of the
as its height. We can have a bar consecutive rectangles of the
diagram both for discrete and histogram. It leaves us with the two
© e
b
t o
o t
n
Fig. 4.5: Histogram for the distribution of 85 daily wage earners in a locality of a town.
5 2 STATISTICS FOR ECONOMICS
ends away from the base line, denying No matter whether class boundaries or
the calculation of the area under the midpoints are used in the X-axis,
curve. The solution is to join the two frequencies (as ordinates) are always
end-points thus obtained to the base plotted against the mid-point of class
line at the mid-values of the two classes intervals. When all the points have been
d
with zero frequency immediately at plotted in the graph, they are carefully
e
each end of the distribution. Broken joined by a series of short straight lines.
lines or dots may join the two ends with Broken lines join midpoints of two
h
the base line. Now the total area under intervals, one in the beginning and the
the curve, like the area in the other at the end, with the two ends of
T s
the plotted curve (Fig.4.6). When
i
histogram, represents the total
comparing two or more distributions
l
frequency or sample size.
R
Frequency polygon is the most plotted on the same axes, frequency
b
common method of presenting grouped polygon is likely to be more useful since
E
frequency distribution. Both class the vertical and horizontal lines of two
u
boundaries and class-marks can be or more distributions may coincide in
C
used along the X-axis, the distances a histogram.
p
between two consecutive class marks
Frequency Curve
N re
being proportional/equal to the width
of the class intervals. Plotting of data The frequency curve is obtained by
becomes easier if the class-marks fall drawing a smooth freehand curve
© e
on the heavy lines of the graph paper. passing through the points of the
b
t o
o t
n
Fig. 4.6: Frequency polygon drawn for the data given in Table 4.9
PRESENTATION OF DATA 5 3
e d
T s h
R li
E u b
C
Fig. 4.7: Frequency curve for Table 4.9
N re p
frequency polygon as closely as frequencies are plotted against the
possible. It may not necessarily pass respective lower limits of the class
through all the points of the frequency interval. An interesting feature of the
© e
polygon but it passes through them as two ogives together is that their
closely as possible (Fig. 4.7). intersection point gives the median
Fig. 4.8 (b) of the frequency distribu-
b
Ogive tion. As the shapes of the two ogives
Ogive is also called cumulative suggest, less than ogive is never
decreasing and more than ogive is
o
frequency curve. As there are two types
of cumulative frequencies, for example never increasing.
t
less than type and more than type,
TABLE 4.10
accordingly there are two ogives for any
t
Frequency distribution of marks
grouped frequency distribution data. obtained in mathematics
o
Here in place of simple frequencies as Marks Number of ‘Less than’ ‘More than’
in the case of frequency polygon, students cumulative cumulative
n
cumulative frequencies are plotted x f frequency frequency
along y-axis against class limits of the 0–20 6 6 64
frequency distribution. For less than 20–40 5 11 58
40–60 33 44 53
ogive the cumulative frequencies are
60–80 14 58 20
plotted against the respective upper 80–100 6 64 6
limits of the class intervals whereas for
Total 64
more than ogives the cumulative
5 4 STATISTICS FOR ECONOMICS
e d
T s h
R li
E u b
C p
Fig. 4.8(a): 'Less than' and 'More than' ogive for data given in Table 4.10
N re
Arithmetic Line Graph
An arithmetic line graph is also called
time series graph and is a method of
© e
diagrammatic presentation of data. In
it, time (hour, day/date, week, month,
year, etc.) is plotted along x-axis and
b
the value of the variable (time series
data) along y-axis. A line graph by
o
joining these plotted points, thus,
t
obtained is called arithmetic line graph
(time series graph). It helps in
t
understanding the trend, periodicity,
etc. in a long term time series data.
no
Fig. 4.8(b): ‘Less than’ and ‘More than’ ogive
for data given in Table 4.10
•
Activity
Can the ogive be helpful in
locating the partition values of
the distribution it represents?
PRESENTATION OF DATA 5 5
TABLE 4.11 Here you can see from Fig. 4.9 that
Value of Exports and Imports of India
for the period 1978 to 1999, although
(Rs in 100 crores)
the imports were more than the exports
Year Exports Imports
all through, the rate of acceleration
1977–78 54 60 went on increasing after 1988–89 and
d
1978–79 57 68
1979–80 64 91
the gap between the two (imports and
e
1980–81 67 125 exports) was widened after 1995.
1982–83 88 143
h
1983–84 98 158 6. C O N C L U S I O N
1984–85 117 171
T s
1985–86 109 197 By now you must have been able to
i
1986–87 125 201 learn how collected data could be
l
1987–88 157 222
R
1988–89 202 282
presented using various forms of
presentation — textual, tabular and
b
1989–90 277 353
E
1990–91 326 432 diagrammatic. You are now also able
1991–92 440 479 to make an appropriate choice of the
u
1992–93 532 634
form of data presentation as well as the
C
1993–94 698 731
p
1994–95 827 900 type of diagram to be used for a given
1995–96 1064 1227 set of data. Thus you can make
N re
1996–97 1186 1369
1997–98 1301 1542
presentation of data meaningful,
1998–99 1416 1761 comprehensive and purposeful.
© e
Scale: 1cm=200 crores on Y-axis
2000
b
1800
1600
o
1400
Values (in Rs 100 Crores)
t
1200 Exports
Imports
t
1000
800
o
600
n
400
200
0
1981
1978
1979
1980
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Year
Fig. 4.9: Arithmetic line graph for time series data given in Table 4.11
5 6 STATISTICS FOR ECONOMICS
Recap
• Data (even voluminous data) speak meaningfully through
presentation.
• For small data (quantity) textual presentation serves the purpose
d
better.
• For large quantity of data tabular presentation helps in
e
accommodating any volume of data for one or more variables.
• Tabulated data can be presented through diagrams which enable
h
quicker comprehension of the facts presented otherwise.
R T lis EXERCISES
E b
Answer the following questions, 1 to 10, choosing the correct answer
1. Bar diagram is a
u
(i) one-dimensional diagram
C
(ii) two-dimensional diagram
p
(iii) diagram with no dimension
N re
(iv) none of the above
2. Data represented through a histogram can help in finding graphically the
(i) m e a n
© e
(ii) mode
(iii) median
(iv) all the above
b
3. Ogives can be helpful in locating graphically the
(i) mode
(ii) m e a n
o
(iii) median
t
(iv) none of the above
4. Data represented through arithmetic line graph help in understanding
t
(i) long term trend
(ii) cyclicity in data
o
(iii) seasonality in data
(iv) all the above
n
5. Width of bars in a bar diagram need not be equal (True/False).
6. Width of rectangles in a histogram should essentially be equal (True/
False).
7. Histogram can only be formed with continuous classification of data
(True/False).
PRESENTATION OF DATA 5 7
d
10. Median of a frequency distribution cannot be known from the ogives.
(True/False).
e
11. What kind of diagrams are more effective in representing the following?
(i) Monthly rainfall in a year
h
(ii) Composition of the population of Delhi by religion
T s
(iii) Components of cost in a factory
i
12. Suppose you want to emphasise the increase in the share of urban
R l
non-workers and lower level of urbanisation in India as shown in
Example 4.2. How would you do it in the tabular form?
E b
13. How does the procedure of drawing a histogram differ when class
intervals are unequal in comparison to equal class intervals in a
u
frequency table?
C p
14. The Indian Sugar Mills Association reported that, ‘Sugar production
during the first fortnight of December 2001 was about 3,87,000 tonnes,
N re
as against 3,78,000 tonnes during the same fortnight last year (2000).
The off-take of sugar from factories during the first fortnight of December
2001 was 2,83,000 tonnes for internal consumption and 41,000 tonnes
© e
for exports as against 1,54,000 tonnes for internal consumption and
nil for exports during the same fortnight last season.’
(i) Present the data in tabular form.
b
(ii) Suppose you were to present these data in diagrammatic form which
of the diagrams would you use and why?
(iii) Present these data diagrammatically.
o
15. The following table shows the estimated sectoral real growth rates
t
(percentage change over the previous year) in GDP at factor cost.
Year Agriculture and allied sectors Industry Services
t
(1) (2) (3) (4)
1994–95 5.0 9.2 7.0
o
1995–96 –0.9 11.8 10.3
1996–97 9.6 6.0 7.1
n
1997–98 –1.9 5.9 9.0
1998–99 7.2 4.0 8.3
1999–2000 0.8 6.9 8.2
Represent the data as multiple time series graphs.
CHAPTER
e d
T s h
R li
E u b
C
N re p
© e
Studying this chapter should of the data. In this chapter, you will
enable you to: study the measures of central
• understand the need for tendency which is a numerical method
b
summarising a set of data by one to explain the data in brief. You can
single number; see examples of summarising a large
• recognise and distinguish set of data in day to day life like
o
between the different types of average marks obtained by students
t
averages;
of a class in a test, average rainfall in
• learn to compute different types
of averages; an area, average production in a
t
• draw meaningful conclusions factory, average income of persons
living in a locality or working in a firm
o
from a set of data;
• develop an understanding of etc.
which type of average would be Baiju is a farmer. He grows food
n
most useful in a particular grains in his land in a village called
situation. Balapur in Buxar district of Bihar. The
village consists of 50 small farmers.
Baiju has 1 acre of land. You are
1. I N T R O D U C T I O N
interested in knowing the economic
In the previous chapter, you have read condition of small farmers of Balapur.
the tabular and graphic representation You want to compare the economic
MEASURES OF CENTRAL TENDENCY 5 9
d
The mean family income is
see if the land owned by Baiju is –
e
1. above average in ordinary sense obtained by adding up the incomes
(see the Arithmetic Mean below) and dividing by the number of
h
2. above the size of what half the families.
1600 + 1500 + 1400 + 1525 + 1625 + 1630
s
farmers own (see the Median
T
Rs
i
below) 6
l
3. above what most of the farmers = Rs 1,547
R
own (see the Mode below) It implies that on an average, a
b
In order to evaluate Baiju’s relative family earns Rs 1,547.
E
economic condition, you will have to Arithmetic mean is the most
u
summarise the whole set of data of commonly used measure of central
C
land holdings of the farmers of tendency. It is defined as the sum of
p
Balapur. This can be done by use of the values of all observations divided
N re
central tendency, which summarises by the number of observations and is
the data in a single value in such a usually denoted by x . In general, if
way that this single value can there are N observations as X1, X2, X3,
© e
represent the entire data. The ..., XN, then the Arithmetic Mean is
measuring of central tendency is a given by
way of summarising the data in the
b
form of a typical or representative X 1 + X 2 + X 3 + ... + X N
x=
value. N
o
There are several statistical SX
=
t
measures of central tendency or N
“averages”. The three most commonly
Where, S X = sum of all observa-
t
used averages are:
tions and N = total number of obser-
• Arithmetic Mean
vations.
o
• Median
• Mode
n
How Arithmetic Mean is Calculated
You should note that there are two
more types of averages i.e. Geometric The calculation of arithmetic mean
Mean and Harmonic Mean, which are can be studied under two broad
suitable in certain situations. categories:
However, the present discussion will 1. Arithmetic Mean for Ungrouped
be limited to the three types of Data.
averages mentioned above. 2. Arithmetic Mean for Grouped Data.
6 0 STATISTICS FOR ECONOMICS
d
Arithmetic mean by direct method is large number of observations as well
the sum of all observations in a series
e
as large numerical figures, you can
divided by the total number of use assumed mean method. Here you
h
observations. assume a particular figure in the data
as the arithmetic mean on the basis
T s
Example 1 of logic/experience. Then you may
i
take deviations of the said assumed
l
Calculate Arithmetic Mean from the
R
data showing marks of students in a mean from each of the observation.
b
class in an economics test: 40, 50, 55, You can, then, take the summation of
E
78, 58. these deviations and divide it by the
u
number of observations in the data.
SX
C
X= The actual arithmetic mean is
N
p
estimated by taking the sum of the
40 + 50 + 55 + 78 + 58 assumed mean and the ratio of sum
N re
= = 56.2 of deviations to number of observa-
5
tions. Symbolically,
The average marks of students in Let, A = assumed mean
© e
the economics test are 56.2. X = individual observations
N = total numbers of observa-
Assumed Mean Method tions
b
If the number of observations in the d = deviation of assumed mean
data is more and/or figures are large, from individual observation,
o
it is difficult to compute arithmetic i.e. d = X – A
t t
no (HEIGHT IN INCHES)
MEASURES OF CENTRAL TENDENCY 6 1
Then sum of all deviations is taken Arithmetic Mean using assumed mean
as Sd = S( X - A ) method
Sd Sd
X =A + = 850 + (2, 660)/10
Then find N
N
d
Sd = Rs1,116.
Then add A and to get X
e
N Thus, the average weekly income
Sd of a family by both methods is
Therefore, X = A +
h
N Rs 1,116. You can check this by using
You should remember that any the direct method.
T is
value, whether existing in the data or
Step Deviation Method
l
not, can be taken as assumed mean.
R
However, in order to simplify the The calculations can be further
b
calculation, centrally located value in simplified by dividing all the deviations
E
the data can be selected as assumed taken from assumed mean by the
u
mean. common factor ‘c’. The objective is to
C
avoid large numerical figures, i.e., if
p
Example 2
d = X – A is very large, then find d'.
N re
The following data shows the weekly
This can be done as follows:
income of 10 families.
Family d X-A
A B C D E F G H = .
c C
© e
I J
Weekly Income (in Rs) The formula is given below:
850 700 100 750 5000 80 420 2500
S d¢
b
400 360 X =A + ·c
Compute mean family income. N
Where d' = (X – A)/c, c = common
o
TABLE 5.1
Computation of Arithmetic Mean by factor, N = number of observations,
t
Assumed Mean Method A= Assumed mean.
Families Income d = X – 850 d'
Thus, you can calculate the
t
(X) = (X – 850)/10 arithmetic mean in the example 2, by
the step deviation method,
o
A 850 0 0
B 700 –150 –15 X = 850 + (266)/10 × 10 = Rs 1,116.
C 100 –750 –75
n
D 750 –100 –10 Calculation of arithmetic mean for
E 5000 +4150 +415 Grouped data
F 80 –770 –77
G 420 –430 –43 Discrete Series
H 2500 +1650 +165
I 400 –450 –45 Direct Method
J 360 –490 –49
In case of discrete series, frequency
11160 +2660 +266 against each of the observations is
6 2 STATISTICS FOR ECONOMICS
d
earlier, with a simple modification.
Since frequency (f) of each item is
e
S fX given here, we multiply each deviation
X =
Sf (d) by the frequency to get fd. Then we
h
Where, S fX = sum of product of get S fd. The next step is to get the
T s
variables and frequencies. total of all frequencies i.e. S f. Then
i
S f = sum of frequencies.
l
find out S fd/ S f. Finally the
R
arithmetic mean is calculated by
Example 3
b
S fd
E
Calculate mean farm size of X =A + using assumed mean
Sf
u
cultivating households in a village for
method.
C
the following data.
p
Farm Size (in acres):
Step Deviation Method
N re
64 63 62 61 60 59
No. of Cultivating Households: In this case the deviations are divided
8 18 12 9 7 6 by the common factor ‘c’ which
simplifies the calculation. Here we
© e
TABLE 5.2
d X-A
Computation of Arithmetic Mean by estimate d' = = in order to
Direct Method c C
b
Farm Size No. of X d fd reduce the size of numerical figures
(X) cultivating (1 × 2) (X - 62) (2 × 4) for easier calculation. Then get fd' and
in acres households(f)
o
(1) (2) (3) (4) (5) S fd'. Finally the formula for step
t
64 8 512 +2 +16 deviation method is given as,
63 18 1134 +1 +18 S fd ¢
X =A + ·c
t
62 12 744 0 0
61 9 549 –1 –9 Sf
60 7 420 –2 –14
o
59 6 354 –3 –18 Activity
60 3713 –3 –7 • Find the mean farm size for the
n
data given in example 3, by using
Arithmetic mean using direct method, step deviation and assumed
mean methods.
S fX 3717
X = = = 61.88 acres
Sf 60 Continuous Series
Therefore, the mean farm size in a Here, class intervals are given. The
village is 61.88 acres. process of calculating arithmetic mean
MEASURES OF CENTRAL TENDENCY 6 3
d
method formula:
be exclusive or inclusive or of unequal
e
size. Example of exclusive class S fm 2110
X= = = 30.14 marks
interval is, say, 0–10, 10–20 and so Sf 70
h
on. Example of inclusive class interval
Step deviation method
T s
is, say, 0–9, 10–19 and so on. Example
i
of unequal class interval is, say,
m A
l
0–20, 20–50 and so on. In all these
R
1. Obtain d' =
cases, calculation of arithmetic mean c
b
is done in a similar way. 2. Take A = 35, (any arbitrary figure),
E
c = common factor.
u
Example 4
£ fd’ ( 34)
C
Calculate average marks of the X = A+ c = 35 + 10
p
£f 70
following students using (a) Direct = 30.14 marks
N re
method (b) Step deviation method.
An interesting property of A.M.
Direct Method
© e
Marks It is interesting to know and
0–10 10–20 20–30 30–40 40–50 useful for checking your calculation
50–60 60–70 that the sum of deviations of items
b
No. of Students about arithmetic mean is always equal
5 12 15 25 8
3 2 to zero. Symbolically, S ( X – X ) = 0.
o
However, arithmetic mean is
t
TABLE 5.3 affected by extreme values. Any large
Computation of Average Marks for
Exclusive Class Interval by Direct Method
value, on either end, can push it up
t
or down.
Mark No. of mid fm d'=(m-35) fd'
(x) students value (2)×(3) 10
o
(f) (m) Weighted Arithmetic Mean
(1) (2) (3) (4) (5) (6) Sometimes it is important to assign
n
0–10 5 5 25 –3 –15
10–20 12 15 180 –2 –24 weights to various items according to
20–30 15 25 375 –1 –15 their importance, when you calculate
30–40 25 35 875 0 0 the arithmetic mean. For example,
40–50 8 45 360 1 8 there are two commodities, mangoes
50–60 3 55 165 2 6
60–70 2 65 130 3 6
and potatoes. You are interested in
finding the average price of mangoes
70 2110 –34
(p1) and potatoes (p2). The arithmetic
6 4 STATISTICS FOR ECONOMICS
p1 + p2 3. MEDIAN
mean will be . However, you
2 The arithmetic mean is affected by the
might want to give more importance presence of extreme values in the data.
to the rise in price of potatoes (p2). To If you take a measure of central
d
do this, you may use as ‘weights’ the tendency which is based on middle
quantity of mangoes (q1) and the position of the data, it is not affected
e
quantity of potatoes (q2). Now the by extreme items. Median is that
positional value of the variable which
h
arithmetic mean weighted by the
divides the distribution into two equal
q1p1 + q 2 p 2
T s
quantities would be . parts, one part comprises all values
i
q1 + q 2
greater than or equal to the median
l
In general the weighted arithmetic
R
value and the other comprises all
mean is given by, values less than or equal to it. The
b
Median is the “middle” element when
E
w1 x1 + w 2 x 2 +...+ w n x n £ wx
= the data set is arranged in order of the
u
w1 + w 2 +...+ w n £w
magnitude.
C
When the prices rise, you may be
p
interested in the rise in the price of Computation of median
N re
the commodities that are more The median can be easily computed
important to you. You will read more by sorting the data from smallest to
about it in the discussion of Index largest and counting the middle value.
© e
Numbers in Chapter 8.
Example 5
b
Activities Suppose we have the following
• Check this property of the observation in a data set: 5, 7, 6, 1, 8,
arithmetic mean for the following 10, 12, 4, and 3.
o
example: Arranging the data, in ascending order
t
X: 4 6 8 10 12 you have:
• In the above example if mean is 1, 3, 4, 5, 6, 7, 8, 10, 12.
t
increased by 2, then what
happens to the individual
o
observations, if all are equally
affected.
The “middle score” is 6, so the
n
• If first three items increase by median is 6. Half of the scores are
2, then what should be the larger than 6 and half of the scores
values of the last two items, so are smaller.
that mean remains the same. If there are even numbers in the
• Replace the value 12 by 96. What data, there will be two observations
happens to the arithmetic mean. which fall in the middle. The median
Comment.
in this case is computed as the
MEASURES OF CENTRAL TENDENCY 6 5
d
The following data provides marks of In case of discrete series the position
20 students. You are required to of median i.e. (N+1)/2th item can be
e
calculate the median marks. located through cumulative freque-
25, 72, 28, 65, 29, 60, 30, 54, 32, 53, ncy. The corresponding value at this
h
33, 52, 35, 51, 42, 48, 45, 47, 46, 33. position is the value of median.
T is
Arranging the data in an ascending Example 7
l
order, you get
R
The frequency distribution of the
25, 28, 29, 30, 32, 33, 33, 35, 42,
b
number of persons and their
E
45, 46, 47, 48, 51, 52, 53, 54, 60, respective incomes (in Rs) are given
u
below. Calculate the median income.
C
Income (in Rs): 10 20 30 40
p
65, 72. Number of persons: 2 4 10 4
N re
You can see that there are two In order to calculate the median
observations in the middle, namely 45 income, you may prepare the
and 46. The median can be obtained frequency distribution as given below.
© e
by taking the mean of the two
observations: TABLE 5.4
Computation of Median for Discrete Series
45 + 46
b
Median = = 45.5 marks Income No of Cumulative
2 (in Rs) persons(f) frequency(cf)
In order to calculate median it is 10 2 2
o
important to know the position of the 20 4 6
t
median i.e. item/items at which the 30 10 16
40 4 20
median lies. The position of the
t
median can be calculated by the The median is located in the (N+1)/
following formula: 2 = (20+1)/2 = 10.5th observation.
o
th
This can be easily located through
(N+1) cumulative frequency. The 10.5th
Position of median = item
n
2 observation lies in the c.f. of 16. The
Where N = number of items. income corresponding to this is Rs 30,
You may note that the above so the median income is Rs 30.
formula gives you the position of the
median in an ordered array, not the Continuous Series
median itself. Median is computed by In case of continuous series you have
the formula: to locate the median class where
6 6 STATISTICS FOR ECONOMICS
e d
T s h
R li
N/2th item [not (N+1)/2th item] lies. In the above illustration median
E b
The median can then be obtained as class is the value of (N/2)th item
follows:
u
(i.e.160/2) = 80th item of the series,
C
(N/2 c.f.) which lies in 35–40 class interval.
Median = L + h
p
f Applying the formula of the median
N re
Where, L = lower limit of the median as:
class,
TABLE 5.5
c.f. = cumulative frequency of the class Computation of Median for Continuous
© e
preceding the median class, Series
f = frequency of the median class,
Daily wages No. of Cumulative
h = magnitude of the median class (in Rs) Workers (f) Frequency
b
interval.
20–25 14 14
No adjustment is required if 25–30 28 42
frequency is of unequal size or
o
30–35 33 75
magnitude. 35–40 30 105
t
40–45 20 125
Example 8 45–50 15 140
t
50–55 13 153
Following data relates to daily wages 55–60 7 160
o
of persons working in a factory.
Compute the median daily wage. (N/2 c.f.)
Median = L + h
n
Daily wages (in Rs): f
55–60 50–55 45–50 40–45 35–40 30–35 35 +(80 75)
25–30 20–25 = (40 35)
Number of workers: 30
7 13 15 20 30 33 = Rs 35.83
28 14
Thus, the median daily wage is
The data is arranged in ascending
order here. Rs 35.83. This means that 50% of the
MEASURES OF CENTRAL TENDENCY 6 7
workers are getting less than or equal The third Quartile (denoted by Q3) or
to Rs 35.83 and 50% of the workers upper Quartile has 75% of the items
are getting more than or equal to this of the distribution below it and 25%
wage. of the items above it. Thus, Q1 and Q3
You should remember that denote the two limits within which
d
median, as a measure of central central 50% of the data lies.
e
tendency, is not sensitive to all the
values in the series. It concentrates
h
on the values of the central items of
the data.
•
Activities
R T lis
Find mean and median for all
b
four values of the series. What
E
do you observe?
u
Percentiles
C
TABLE 5.6
Mean and Median of different series
p
Percentiles divide the distribution into
Series X (Variable Mean Median hundred equal parts, so you can get
N re
Values) 99 dividing positions denoted by P1,
A 1, 2, 3 ? ? P2, P3, ..., P99. P50 is the median value.
B 1, 2, 30 ? ?
If you have secured 82 percentile in a
© e
C 1, 2, 300 ? ?
D 1, 2, 3000 ? ? management entrance examination, it
means that your position is below 18
• Is median affected by extreme
b
values? What are outliers? percent of total candidates appeared
• Is median a better method than in the examination. If a total of one
mean? lakh students appeared, where do you
o
stand?
t
Quartiles
Calculation of Quartiles
t
Quartiles are the measures which
divide the data into four equal parts, The method for locating the Quartile
o
each portion contains equal number is same as that of the median in case
of observations. Thus, there are three of individual and discrete series. The
n
quartiles. The first Quartile (denoted value of Q1 and Q3 of an ordered series
by Q1) or lower quartile has 25% of can be obtained by the following
the items of the distribution below it
formula where N is the number of
and 75% of the items are greater than
observations.
it. The second Quartile (denoted by Q2)
or median has 50% of items below it (N + 1)th
and 50% of the observations above it. Q1= size of item
4
6 8 STATISTICS FOR ECONOMICS
d
Discrete Series
Calculate the value of lower quartile
e
from the data of the marks obtained Consider the data set 1, 2, 3, 4, 4, 5.
by ten students in an examination. The mode for this data is 4 because 4
h
22, 26, 14, 30, 18, 11, 35, 41, 12, 32. occurs most frequently (twice) in the
Arranging the data in an ascending data.
T s
order,
i
11, 12, 14, 18, 22, 26, 30, 32, 35, 41. Example 10
R l
(N +1)th Look at the following discrete series:
b
Q1 = size of item = size of
4 Variable 10 20 30 40 50
E
Frequency 2 8 20 10 5
u
(10 +1)th
Here, as you can see the maximum
C
item = size of 2.75th item
4 frequency is 20, the value of mode is
p
= 2nd item + .75 (3rd item – 2nd item) 30. In this case, as there is a unique
N re
= 12 + .75(14 –12) = 13.5 marks. value of mode, the data is unimodal.
But, the mode is not necessarily
Activity unique, unlike arithmetic mean and
© e
• Find out Q3 yourself. median. You can have data with two
modes (bi-modal) or more than two
5. MODE modes (multi-modal). It may be
b
possible that there may be no mode if
Sometimes, you may be interested in
no value appears more frequent than
knowing the most typical value of a any other value in the distribution. For
o
series or the value around which example, in a series 1, 1, 2, 2, 3, 3, 4,
t
maximum concentration of items 4, there is no mode.
occurs. For example, a manufacturer
t
would like to know the size of shoes
o
that has maximum demand or style
of the shirt that is more frequently
n
demanded. Here, Mode is the most
Unimodal Data Bimodal Data
appropriate measure. The word mode
has been derived from the French Continuous Series
word “la Mode” which signifies the In case of continuous frequency
most fashionable values of a distribution, modal class is the class
distribution, because it is repeated the with largest frequency. Mode can be
highest number of times in the series. calculated by using the formula:
MEASURES OF CENTRAL TENDENCY 6 9
d
D 1 = difference between the frequency Calculate the value of modal worker
e
of the modal class and the frequency family’s monthly income from the
of the class preceding the modal class following data:
h
(ignoring signs). Income per month (in ’000 Rs)
T s
D2 = difference between the frequency Below 50 Below 45 Below 40 Below 35
i
of the modal class and the frequency Below 30 Below 25 Below 20 Below 15
l
Number of families
R
of the class succeeding the modal 97 95 90 80
class (ignoring signs). 60 30 12 4
E b
h = class interval of the distribution. As you can see this is a case of
You may note that in case of
u
cumulative frequency distribution. In
C
continuous series, class intervals order to calculate mode, you will have
p
should be equal and series should be to covert it into an exclusive series. In
N re
TABLE 5.7
Grouping Table
Income (in
© e
’000 Rs) Group Frequency
I I III IV V VI
45–50 97 – 95 = 2
b
40–45 95 – 90 = 5 7 17
35–40 90 – 80 = 10 15
30–35 80 – 60 = 20 30 35
25–30 60 – 30 = 30 50 60
o
20–25 30 – 12 = 18 48 68
t
15–20 12 – 4 = 8 26 56
10–15 4 12 30
t
TABLE 5.8
Analysis Table
o
Columns Class Intervals
n
45–50 40–45 35–40 30–35 25–30 20–25 15–20 10–15
I ×
I × ×
III × ×
IV × × ×
V × × ×
VI × × ×
Total – – 1 3 6 3 1 –
7 0 STATISTICS FOR ECONOMICS
e d
T s h
R li
b
this example, the series is in the • Take a small survey in your class
E
descending order. Grouping and to know the student’s preference
for Chinese food using
u
Analysis table would be made to
appropriate measure of central
C
determine the modal class.
tendency.
p
The value of the mode lies in
• Can mode be located
25–30 class interval. By inspection
N re
graphically?
also, it can be seen that this is a modal
class. 6. RELATIVE POSITION OF ARITHMETIC
Now L = 25, D1 = (30 – 18) = 12, D2
© e
MEAN, MEDIAN AND MODE
= (30 – 20) = 10, h = 5
Using the formula, you can obtain Suppose we express,
the value of the mode as: Arithmetic Mean = Me
b
MO (in ’000 Rs) Median = Mi
Mode = Mo
D1
o
M= h so that e, i and o are the suffixes.
D1 + D2
t
The relative magnitude of the three are
12 M e>M i>M o or M e<M i<M o (suffixes
= 25 + 5 = Rs 27,273 occurring in alphabetical order). The
t
10+12 median is always between the
o
Thus the modal worker family’s arithmetic mean and the mode.
monthly income is Rs 27,273.
n
7. CONCLUSION
Activities
Measures of central tendency or
• A shoe company, making shoes averages are used to summarise the
for adults only, wants to know data. It specifies a single most
the most popular size of shoes.
representative value to describe the
Which average will be most
appropriate for it? data set. Arithmetic mean is the most
commonly used average. It is simple
MEASURES OF CENTRAL TENDENCY 7 1
d
describe the qualitative data. Median analysis and the nature of the
e
and mode can be easily computed distribution.
T s h Recap
i
• The measure of central tendency summarises the data with a single
R l
value, which can represent the entire data.
• Arithmetic mean is defined as the sum of the values of all observations
b
divided by the number of observations.
E
• The sum of deviations of items from the arithmetic mean is always
u
equal to zero.
C
• Sometimes, it is important to assign weights to various items
p
according to their importance.
N re
• Median is the central value of the distribution in the sense that the
number of values less than the median is equal to the number greater
than the median.
• Quartiles divide the total set of values into four equal parts.
© e
• Mode is the value which occurs most frequently.
b EXERCISES
o
1. Which average would be suitable in the following cases?
t
(i) Average size of readymade garments.
(ii) Average intelligence of students in a class.
t
(iii) Average production in a factory per shift.
(iv) Average wages in an industrial concern.
o
(v) When the sum of absolute deviations from average is least.
(vi) When quantities of the variable are in ratios.
n
(vii) In case of open-ended frequency distribution.
2. Indicate the most appropriate alternative from the multiple choices
provided against each question.
(i) The most suitable average for qualitative measurement is
(a) arithmetic mean
(b) median
(c) mode
7 2 STATISTICS FOR ECONOMICS
d
(c) arithmetic mean
(d) geometric mean
e
(e) harmonic mean
(iii) The algebraic sum of deviation of a set of n values from A.M. is
h
(a) n
(b) 0
T s
(c) 1
i
(d) none of the above
R l
[Ans. (i) b (ii) c (iii) b]
b
3. Comment whether the following statements are true or false.
E
(i) The sum of deviation of items from median is zero.
(ii) An average alone is not enough to compare series.
u
(iii) Arithmetic mean is a positional value.
C
(iv) Upper quartile is the lowest value of top 25% of items.
p
(v) Median is unduly affected by extreme observations.
N re
[Ans. (i) False (ii) True (iii) False (iv) True (v) False]
4. If the arithmetic mean of the data given below is 28, find (a) the missing
frequency, and (b) the median of the series:
© e
Profit per retail shop (in Rs) 0-10 10-20 20-30 30-40 40-50 50-60
Number of retail shops 12 18 27 - 17 6
(Ans. The value of missing frequency is 20 and value of the median is
b
Rs 27.41)
5. The following table gives the daily income of ten workers in a factory.
Find the arithmetic mean.
o
Workers A B C D E F G H I J
t
Daily Income (in Rs) 120 150 180 200 250 300 220 350 370 260
(Ans. Rs 240)
t
6. Following information pertains to the daily income of 150 families.
Calculate the arithmetic mean.
o
Income (in Rs) Number of families
More than 75 150
n
,, 85 140
,, 95 115
,, 105 95
,, 115 70
,, 125 60
,, 135 40
,, 145 25
(Ans. Rs 116.3)
MEASURES OF CENTRAL TENDENCY 7 3
7. The size of land holdings of 380 families in a village is given below. Find
the median size of land holdings.
Size of Land Holdings (in acres)
Less than 100 100–200 200 – 300 300–400 400 and above. –
Number of families
d
40 89 148 64 39
(Ans. 241.22 acres)
e
8. The following series relates to the daily income of workers employed in a
firm. Compute (a) highest income of lowest 50% workers (b) minimum
h
income earned by the top 25% workers and (c) maximum income earned
T s
by lowest 25% workers.
i
Daily Income (in Rs) 10–14 15–19 20–24 25–29 30–34 35–39
l
Number of workers 5 10 15 20 10 5
R
(Hint: compute median, lower quartile and upper quartile.)
b
[Ans. (a) Rs 25.11 (b) Rs 19.92 (c) Rs 29.19]
E
9. The following table gives production yield in kg. per hectare of wheat of
u
150 farms in a village. Calculate the mean, median and mode production
C
yield.
p
Production yield (kg. per hectare)
N re
50–53 53–56 56–59 59–62 62–65 65–68 68–71 71–74 74–77
Number of farms
3 8 14 30 36 28 16 10 5
© e
(Ans. mean = 63.82 kg. per hectare, median = 63.67 kg. per hectare,
mode = 63.29 kg. per hectare)
b
t o
o t
n
CHAPTER
Measures of Dispersion
ed
T s h
R l i
E u b
C
N re p
© e
measures, which seek to quantify
Studying this chapter should
variability of the data.
enable you to:
b
• know the limitations of averages;
Three friends, Ram, Rahim and
• appreciate the need of measures Maria are chatting over a cup of tea.
During the course of their
o
of dispersion;
t
• enumerate various measures of conversation, they start talking about
dispersion; their family incomes. Ram tells them
t
• calculate the measures and that there are four members in his
compare them; family and the average income per
o
• distinguish between absolute member is Rs 15,000. Rahim says that
and relative measures.
n
the average income is the same in his
family, though the number of members
1. INTRODUCTION is six. Maria says that there are five
members in her family, out of which
In the previous chapter, you have one is not working. She calculates that
studied how to sum up the data into the average income in her family too,
a single representative value. However, is Rs 15,000. They are a little surprised
that value does not reveal the since they know that Maria’s father is
variability present in the data. In this earning a huge salary. They go into
chapter you will study those details and gather the following data:
MEASURES OF DISPERSION 75
d
relative standards of living enjoyed by
Total income 60,000 90,000 75,000
different strata of society.
e
Average income 15,000 15,000 15,000
Dispersion is the extent to which
h
Do you notice that although the values in a distribution differ from the
average is the same, there are average of the distribution.
T i s
considerable differences in individual To quantify the extent of the
R l
incomes? variation, there are certain measures
It is quite obvious that averages namely:
E b
try to tell only one aspect of a (i) Range
u
distribution i.e. a representative size (ii) Quartile Deviation
C
of the values. To understand it better, (iii) Mean Deviation
p
you need to know the spread of values (iv) Standard Deviation
N re
also.
Apart from these measures which
You can see that in Ram’s family.,
give a numerical value, there is a
© e
dif ferences in incomes are
graphic method for estimating
comparatively lower. In Rahim’s
dispersion.
family, differences are higher and in
b
Range and Quartile Deviation
Maria’s family are the highest.
measure the dispersion by calculating
Knowledge of only average is
o
the spread within which the values lie.
insufficient. If you have another value
t
Mean Deviation and Standard
which reflects the quantum of
Deviation calculate the extent to
t
which the values differ from the
o
average.
n
2. MEASURES BASED UPON SPREAD OF
VALUES
Range
Range (R) is the difference between the
largest (L) and the smallest value (S)
in a distribution. Thus,
R=L–S
Higher value of Range implies
higher dispersion and vice-versa.
76 STATISTICS FOR ECONOMICS
d
• If 50 is replaced by 150, what
will be the Range? In such a situation, if the entire
e
data is divided into four equal parts,
h
each containing 25% of the values, we
Range: Comments
get the values of Quartiles and
T s
Range is unduly affected by extreme
i
values. It is not based on all the Median. (You have already read about
R l
values. As long as the minimum and these in Chapter 5).
maximum values remain unaltered, The upper and lower quartiles (Q3
E b
any change in other values does not and Q 1, respectively) are used to
u
affect range. It can not be calculated calculate Inter Quartile Range which
C
for open-ended frequency distri-
is Q3 – Q1.
p
bution.
Inter -Quartile Range is based
N re
Notwithstanding some limitations, upon middle 50% of the values in a
Range is understood and used distribution and is, therefore, not
© e
frequently because of its simplicity. affected by extreme values. Half of
For example, we see the maximum the Inter -Quartile Range is called
b
and minimum temperatures of Quartile Deviation. Thus:
different cities almost daily on our TV Q3 - Q1
screens and form judgments about the Q .D . =
o
2
t
temperature variations in them.
Q.D. is therefore also called Semi-
Inter Quartile Range.
t
Open-ended distributions are those
in which either the lower limit of the
o
lowest class or the upper limit of the Calculation of Range and Q.D. for
highest class or both are not ungrouped data
n
specified.
Example 1
d
Q1, the 3rd value is 29. [What will you Class- Frequencies Cumulative
Intervals Frequencies
do if these values are not in an order?]
e
CI f c. f.
3( n + 1) 0–10 5 05
h
Similarly, Q3 is size of th 10–20 8 13
4
T s
20–40 16 29
i
value; i.e. 9th value which is 51. Hence 40–60 7 36
l
Q3 = 51
R
60–90 4 40
b
Q3 - Q1 51 - 29 n = 40
E
Q .D . = = = 11
2 2 n th
u
Q1 is the size of value in a
C
Do you notice that Q.D. is the 4
p
average difference of the Quartiles
continuous series. Thus it is the size
N re
from the median. of the 10th value. The class containing
Activity the 10th value is 10–20. Hence Q1 lies
© e
• Calculate the median and check in class 10–20. Now, to calculate the
whether the above statement is exact value of Q 1 , the following
formula is used:
b
correct.
Calculation of Range and Q.D. for a n
cf
o
frequency distribution. Q1 = L + 4 ·i
t
f
Example 2
Where L = 10 (lower limit of the
t
For the following distribution of marks relevant Quartile class)
o
scored by a class of 40 students, c.f. = 5 (Value of c.f. for the class
calculate the Range and Q.D. preceding the Quartile class)
n
TABLE 6.1 i = 10 (interval of the Quartile
Class intervals No. of students
class), and
CI (f) f = 8 (frequency of the Quartile
0–10 5
class) Thus,
10–20 8 10 - 5
20–40 16 Q1 = 10 + · 10 = 16.25
40–60 7 8
60–90 4
3n th
40 Similarly, Q3 is the size of
4
78 STATISTICS FOR ECONOMICS
value; i.e., 30th value, which lies in to rich and poor, from the median of
class 40–60. Now using the formula the entire group.
for Q3, its value can be calculated as Quartile Deviation can generally be
follows: calculated for open-ended distribu-
tions and is not unduly affected by
3n
- c.f. extreme values.
Q3 = L + 4 i
f
d
3. M EASURES OF D ISPERSION FROM
AVERAGE
e
30 - 29
Q3 = 40 + 20
7 Recall that dispersion was defined as
h
Q3 = 42.87 the extent to which values differ from
T s
their average. Range and Quartile
i
42.87 - 16.25
l
Q.D. = = 13.31 Deviation do not attempt to calculate,
R
2 how far the values are, from their
b
average. Yet, by calculating the spread
E
In individual and discrete series, Q1
of values, they do give a good idea
u
n +1 th
C
is the size of value, but in a about the dispersion. Two measures
p
4 which are based upon deviation of the
N re
continuous distribution, it is the size values from their average are Mean
n th Deviation and Standard Deviation.
of value. Similarly, for Q3 and Since the average is a central
© e
4
value, some deviations are positive
median also, n is used in place of
n+1. and some are negative. If these are
b
added as they are, the sum will not
reveal anything. In fact, the sum of
If the entire group is divided into
o
deviations from Arithmetic Mean is
two equal halves and the median
t
calculated for each half, you will have always zero. Look at the following two
sets of values.
t
the median of better students and the
median of weak students. These Set A : 5, 9, 16
o
medians differ from the median of the Set B : 1, 9, 20
n
entire group by 13.31 on an average.
You can see that values in Set B
Similarly, suppose you have data
about incomes of people of a town. are farther from the average and hence
Median income of all people can be more dispersed than values in Set A.
calculated. Now if all people are Calculate the deviations from
divided into two equal groups of rich Arithmetic Mean amd sum them up.
and poor, medians of both groups can What do you notice? Repeat the same
be calculated. Quartile Deviation will with Median. Can you comment upon
tell you the average difference between the quantum of variation from the
medians of these two groups belonging calculated values?
MEASURES OF DISPERSION 79
d
We shall now discuss them separately Mean Deviation.)
e
in detail. Activities
h
• Calculate the total distance to be
Mean Deviation
travelled by students if the
T s
Suppose a college is proposed for college is situated at town A, at
l i
students of five towns A, B, C, D and town C, or town E and also if it
R
E which lie in that order along a road. is exactly half way between A and
b
E.
E
Distances of towns in kilometres from
• Decide where, in you opinion,
town A and number of students in
u
the college should be establi-
C
these towns are given below: shed, if there is only one student
p
in each town. Does it change
N re
Town Distance No.
from town A of Students
your answer?
A 0 90
Calculation of Mean Deviation from
© e
B 2 150
C 6 100 Arithmetic Mean for ungrouped
D 14 200 data.
b
E 18 80
Direct Method
620
o
Steps:
t
Now, if the college is situated in
(i) The A.M. of the values is calculated
town A, 150 students from town B will
(ii) Difference between each value and
t
have to travel 2 kilometers each (a
the A.M. is calculated. All
o
total of 300 kilometres) to reach the dif ferences are considered
college. The objective is to find a positive. These are denoted as |d|
n
location so that the average distance (iii) The A.M. of these dif ferences
travelled by students is minimum. (called deviations) is the Mean
You may observe that the students Deviation.
will have to travel more, on an average, S |d|
if the college is situated at town A or i.e. M.D. =
n
E. If on the other hand, it is
somewhere in the middle, they are Example 3
likely to travel less. The average Calculate the Mean Deviation of the
distance travelled is calculated by following values; 2, 4, 7, 8 and 9.
80 STATISTICS FOR ECONOMICS
d
8 2 actual mean including the actual
e
9 3 mean.
12 Σ fA is the number of values above the
h
12 actual mean.
T s
M.D.( X ) = = 2.4
i
5 Substituting the values in the
l
above formula:
R
Assumed Mean Method
b
11 + (6 - 7)(2 - 3) 12
E
M.D.( x ) = = = 2.4
Mean Deviation can also be calculated 5 5
u
by calculating deviations from an
C
assumed mean. This method is
p
Mean Deviation from median for
adopted especially when the actual
N re
ungrouped data.
mean is a fractional number. (Take
care that the assumed mean is close Direct Method
© e
to the true mean). Using the values in example 3, M.D.
For the values in example 3, from the Median can be calculated as
suppose value 7 is taken as assumed
b
follows,
mean, M.D. can be calculated as (i) Calculate the median which is 7.
under: (ii) Calculate the absolute deviations
o
from median, denote them as |d|.
t
Example 4 (iii) Find the average of these absolute
t
X |d| deviations. It is the Mean
Deviation.
o
2 5
4 3 Example 5
n
7 0
[X-Median]
8 1
X |d|
9 2
2 5
11
4 3
In such cases, the following 7 0
formula is used, 8 1
S| d | + ( x - Ax )(S f B - S f A ) 9 2
M.D.( x ) = 11
n
MEASURES OF DISPERSION 81
M. D. from Median is thus, (iii) Multiply each |d| value with its
corresponding frequency to get
S | d | 11 f|d| values. Sum them up to get
M.D.( median ) = = = 2.2
n 5 Σ f|d|.
(iv) Apply the following formula,
Short-cut method
S f |d|
To calculate Mean Deviation by short M.D. ( x ) =
d
Sf
cut method a value (A) is used to
e
calculate the deviations and the Mean Deviation of the distribution
following formula is applied. in Table 6.2 can be calculated as
h
follows:
T s
M.D.( Median )
i
S | d | + ( Median - A )(S f B - S f A )
l
Example 6
R
=
n
b
C.I. f m.p. |d| f|d|
E
where, A = the constant from which
10–20 5 15 25.5 127.5
u
deviations are calculated. (Other 20–30 8 25 15.5 124.0
C
notations are the same as given in the 30–50 16 40 0.5 8.0
p
assumed mean method). 50–70 8 60 19.5 156.0
N re
70–80 3 75 34.5 103.5
Mean Deviation from Mean for 40 519.0
© e
Continuous distribution S f | d | 519
M.D.( x ) = = = 12.975
TABLE 6.2 Sf 40
b
Profits of Number of
companies Companies Mean Deviation from Median
o
(Rs in lakhs) frequencies
t
Class-intervals TABLE 6.3
10–20 5 Class intervals Frequencies
t
20–30 8
20–30 5
30–50 16
o
30–40 10
50–70 8
40–60 20
70–80 3
n
60–80 9
40 80–90 6
50
Steps:
The procedure to calculate Mean
(i) Calculate the mean of the Deviation from the median is the
distribution. same as it is in case of M.D. from
(ii) Calculate the absolute deviations Mean, except that deviations are to
|d| of the class midpoints from the be taken from the median as given
mean. below:
82 STATISTICS FOR ECONOMICS
d
50 665
(i) Actual Mean Method
e
S f |d| (ii) Assumed Mean Method
M.D.( Median ) =
h
Sf (iii) Direct Method
(iv) Step-Deviation Method
T s
665
i
= = 13.3 Actual Mean Method:
l
50
R
Suppose you have to calculate the
E b
Mean Deviation: Comments standard deviation of the following
Mean Deviation is based on all values:
C u
values. A change in even one value 5, 10, 25, 30, 50
p
will affect it. It is the least when
Example 8
N re
calculated from the median i.e., it
will be higher if calculated from the
mean. However it ignores the signs X d d2
© e
of deviations and cannot be 5 –19 361
calculated for open-ended distribu- 10 –14 196
tions. 25 +1 1
b
30 +6 36
50 +26 676
Standard Deviation
o
0 1270
t
Standard Deviation is the positive Following formula is used:
square root of the mean of squared
t
deviations from mean. So if there are S d2
s=
o
five values x1, x2, x3, x4 and x5, first n
their mean is calculated. Then
n
deviations of the values from mean are 1270
s= = 254 = 15.937
calculated. These deviations are then 5
squared. The mean of these squared
Do you notice the value from which
deviations is the variance. Positive
deviations have been calculated in the
square root of the variance is the
above example? Is it the Actual Mean?
standard deviation.
(Note that Standard Deviation is Assumed Mean Method
calculated on the basis of the mean For the same values, deviations may
only). be calculated from any arbitrary value
MEASURES OF DISPERSION 83
Example 9 S x2
s= - ( x )2
n
X d d2
4150
d
5 –20 400
or s = - (24 )2
10 –15 225 5
e
25 0 0
30 +5 25 or s = 254 = 15.937
h
50 +25 625
s
Standard Deviation is not affected
T
–5 1275
i
by the value of the constant from
R l
which deviations are calculated. The
Formula for Standard Deviation value of the constant does not figure
E b
2
in the standard deviation formula.
S d2 Sd Thus, Standard Deviation is
u
s= -
Łn ł
C
n Independent of Origin.
N re p
2
1275 -5 Step-deviation Method
s= - = 254 = 15.937
5 Ł 5 ł If the values are divisible by a common
© e
factor, they can be so divided and
The sum of deviations from a value
other than actul mean is not equal standard deviation can be calculated
from the resultant values as follows:
b
to zero
Example 11
o
Direct Method
t
Since all the five values are divisible
Standard Deviation can also be by a common factor 5, we divide and
t
calculated from the values directly, get the following values:
i.e., without taking deviations, as
o
shown below: x x' d d2
n
5 1 –3.8 14.44
Example 10 10 2 –2.8 7.84
25 5 +0.2 0.04
X x2 30 6 +1.2 1.44
50 10 +5.2 27.04
5 25
10 100 0 50.80
25 625
(Steps in the calculation are same
30 900
50 2500 as in actual mean method).
The following formula is used to
120 4150
calculate standard deviation:
84 STATISTICS FOR ECONOMICS
d
Standard Deviation in Continuous
e
50.80
s= 5 frequency distribution:
5
h
Like ungrouped data, S.D. can be
s = 10.16 · 5
T s
calculated for grouped data by any of
i
s = 15.937 the following methods:
R l
(i) Actual Mean Method
Alternatively, instead of dividing
b
(ii) Assumed Mean Method
E
the values by a common factor, the
(iii) Step-Deviation Method
u
deviations can be divided by a
C
common factor. Standard Deviation
p
Actual Mean Method
can be calculated as shown below:
N re
For the values in Table 6.2, Standard
Example 12
Deviation can be calculated as follows:
© e
x d d' d2
Example 13
5 –20 –4 16
10 –15 –3 9
b
(1) (2) (3) (4) (5) (6) (7)
25 0 0 0
CI f m fm d fd fd2
30 +5 +1 1
o
50 +25 +5 25 10–20 5 15 75 –25.5 –127.5 3251.25
t
20–30 8 25 200 –15.5 –124.0 1922.00
–1 51 30–50 16 40 640 –0.5 –8.0 4.00
50–70 8 60 480 +19.5 +156.0 3042.00
t
Deviations have been calculated 70–80 3 75 225 +34.5 +103.5 3570.75
from an arbitrary value 25. Common
o
40 1620 0 11790.00
factor of 5 has been used to divide
n
deviations. Following steps are required:
1. Calculate the mean of the
2
S d ’2 Sd’ distribution.
s= ·c
n Ł n ł Sfm 1620
x= = = 40.5
Sf 40
2
51 -1 2. Calculate deviations of mid-values
s= - ·5 from the mean so that
5 Ł5 ł
d = m - x (Col. 5)
s = 10.16 · 5 = 15.937 3. Multiply the deviations with their
MEASURES OF DISPERSION 85
d
5. Apply the formula as under: s= -
n Ł n ł
e
Sfd2 11790
s= = = 17.168 2
h
n 40 11800 20
or s = -
Ł40 ł
T s
40
i
Assumed Mean Method or s = 294.75 = 17.168
R l
For the values in example 13,
b
Step-deviation Method
E
standard deviation can be calculated
by taking deviations from an assumed In case the values of deviations are
C u
mean (say 40) as follows: divisible by a common factor, the
p
calculations can be simplified by the
Example 14
N re
step-deviation method as in the
(1) (2) (3) (4) (5) (6) following example.
CI f m d fd fd2
© e
10–20 5 15 -25 –125 3125 Example 15
20–30 8 25 -15 –120 1800
b
30–50 16 40 0 0 0 (1) (2) (3) (4) (5) (6) (7)
50–70 8 60 +20 160 3200 CI f m d d' fd' fd'2
70–80 3 75 +35 105 3675
o
10–20 5 15 –25 –5 –25 125
40 +20 11800 20–30 8 25 –15 –3 –24 72
t
30–50 16 40 0 0 0 0
The following steps are required: 50–70 8 60 +20 +4 +32 128
t
1. Calculate mid-points of classes 70–80 3 75 +35 +7 +21 147
o
(Col. 3) 40 +4 472
2. Calculate deviations of mid-points
n
from an assumed mean such that Steps required:
d = m – A x (Col. 4). Assumed 1. Calculate class mid-points (Col. 3)
Mean = 40. and deviations from an arbitrarily
3. Multiply values of ‘d’ with chosen value, just like in the
corresponding frequencies to get assumed mean method. In this
‘fd’ values (Col. 5). (note that the example, deviations have been
total of this column is not zero taken from the value 40. (Col. 4)
since deviations have been taken 2. Divide the deviations by a common
from assumed mean). factor denoted as ‘C’. C = 5 in the
86 STATISTICS FOR ECONOMICS
d
B, it is 30,000. The value of Range is
much higher in Set B. Can you say
e
5. Sum up values in Col. 6 and Col.
7 to get Σ fd' and Σ fd'2 values. that the variation in sales is higher
h
for the departmental store? It can be
6. Apply the following formula.
s
easily observed that the highest value
T i
2 in Set A is double the smallest value,
Sfd ¢2 Sfd ¢
R l
s = - ·c whereas for the Set B, it is only 30%
Sf Ł Sf ł
b
higher. Thus absolute measures may
E
2
give misleading ideas about the extent
u
472 4 of variation specially when the
or s = - ·5
C
40 Ł40 ł averages differ significantly.
p
Another weakness of absolute
N re
or s = 11.8 - .01 · 5 measures is that they give the answer
in the units in which original values
s = 11.79 · 5
© e
or are expressed. Consequently, if the
s = 17.168 values are expressed in kilometers, the
dispersion will also be in kilometers.
b
Standard Deviation: Comments
However, if the same values are
Standard Deviation, the most widely expressed in meters, an absolute
o
used measure of dispersion, is based measure will give the answer in meters
t
on all values. Therefore a change in
and the value of dispersion will appear
even one value affects the value of
t
standard deviation. It is independent
to be 1000 times.
of origin but not of scale. It is also To overcome these problems,
o
useful in certain advanced statistical relative measures of dispersion can be
n
problems. used. Each absolute measure has a
relative counterpart. Thus, for Range,
there is Coefficient of Range which is
5. ABSOLUTE AND RELATIVE MEASURES
calculated as follows:
OF DISPERSION
L- S
All the measures, described so far, are Coefficient of Range =
absolute measures of dispersion. They L+ S
calculate a value which, at times, is where L = Largest value
difficult to interpret. For example, S = Smallest value
consider the following two data sets: Similarly, for Quartile Deviation, it
MEASURES OF DISPERSION 87
d
For Mean Deviation, it is value of dispersion. A graphical
e
Coefficient of Mean Deviation. measure called Lorenz Curve is
Coefficient of Mean Deviation = available for estimating dispersion.
h
M.D.( x ) M.D.( Median ) You may have heard of statements like
T s
or ‘top 10% of the people of a country
i
x Median
l
earn 50% of the national income while
R
Thus if Mean Deviation is
top 20% account for 80%’. An idea
b
calculated on the basis of the Mean,
E
it is divided by the Mean. If Median is about income disparities is given by
u
used to calculate Mean Deviation, it such figures. Lorenz Curve uses the
C
is divided by the Median. information expressed in a cumulative
p
For Standard Deviation, the manner to indicate the degree of
N re
relative measure is called Coefficient variability. It is specially useful in
of Variation, calculated as below: comparing the variability of two or
more distributions.
© e
Coefficient of Variation
Given below are the monthly
Standard Deviation incomes of employees of a company.
= · 100
b
Arithmetic Mean TABLE 6.4
It is usually expressed in Incomes Number of employees
o
percentage terms and is the most
t
0–5,000 5
commonly used relative measure of 5,000–10,000 10
dispersion. Since relative measures
t
10,000–20,000 18
are free from the units in which the 20,000–40,000 10
o
values have been expressed, they can 40,000–50,000 7
n
Example 16
d
3. Express the grand totals of Col. 3
e
and 6 as 100, and convert the
h
cumulative totals in these columns
into percentages, as in Col. 4 and 7.
T i s
4. Now, on the graph paper, take the
l
cumulative percentages of the
R
variable (incomes) on Y axis and
E b
cumulative percentages of
u
frequencies (number of employees)
C
on X-axis, as in figure 6.1. Thus
p
each axis will have values from ‘0’
N re
to ‘100’.
5. Draw a line joining Co-ordinate 8. CONCLUSION
(0, 0) with (100,100). This is called
© e
Although Range is the simplest to
the line of equal distribution calculate and understand, it is unduly
shown as line ‘OC’ in figure 6.1. affected by extreme values. QD is not
b
6. Plot the cumulative percentages of affected by extreme values as it is
the variable with corresponding based on only middle 50% of the data.
o
cumulative percentages of However, it is more dif ficult to
t
frequency. Join these points to get interpret M.D. and S.D. both are based
the curve OAC.
t
upon deviations of values from their
average. M.D. calculates average of
o
Studying the Lorenz Curve deviations from the average but
n
OC is called the line of equal ignores signs of deviations and
distribution, since it would imply a therefore appears to be unmathema-
situation like, top 20% people earn tical. Standard Deviation attempts to
20% of total income and top 60% earn calculate average deviation from
60% of the total income. The farther mean. Like M.D., it is based on all
the curve OAC from this line, the values and is also applied in more
greater is the variability present in the advanced statistical problems. It is
distribution. If there are two or more the most widely used measure of
curves, the one which is the farthest dispersion.
MEASURES OF DISPERSION 89
Recap
• A measure of dispersion improves our understanding about the
behaviour of an economic variable.
• Range and Quartile Deviation are based upon the spread of values.
• M.D. and S.D. are based upon deviations of values from the average.
• Measures of dispersion could be Absolute or Relative.
• Absolute measures give the answer in the units in which data are
d
expressed.
• Relative smeasures are free from these units, and consequently can
e
be used to compare different variables.
• A graphic method, which estimates the dispersion from shape
h
of a curve, is called Lorenz Curve.
R T l i s
b
EXERCISES
E u
1. A measure of dispersion is a good supplement to the central value in
C
understanding a frequency distribution. Comment.
N re p
2. Which measure of dispersion is the best and how?
3. Some measures of dispersion depend upon the spread of values whereas
some calculate the variation of values from a central value. Do you agree?
© e
4. In a town, 25% of the persons earned more than Rs 45,000 whereas 75%
earned more than 18,000. Calculate the absolute and relative values of
b
dispersion.
5. The yield of wheat and rice per acre for 10 districts of a state is as under:
o
District 1 2 3 4 5 6 7 8 9 10
t
Wheat 12 10 15 19 21 16 18 9 25 10
Rice 22 29 12 23 18 15 12 34 18 12
t
Calculate for each crop,
o
(i) Range
(ii) Q.D.
n
(iii) Mean Deviation about Mean
(iv) Mean Deviation about Median
(v) Standard Deviation
(vi) Which crop has greater variation?
(vii) Compare the values of different measures for each crop.
6. In the previous question, calculate the relative measures of variation and
indicate the value which, in your opinion, is more reliable.
7. A batsman is to be selected for a cricket team. The choice is between X
and Y on the basis of their five previous scores which are:
90 STATISTICS FOR ECONOMICS
X 25 85 40 80 120
Y 50 70 65 45 80
Which batsman should be selected if we want,
(i) a higher run getter, or
(ii) a more reliable batsman in the team?
8. To check the quality of two brands of lightbulbs, their life in burning
hours was estimated as under for 100 bulbs of each brand.
d
Life No. of bulbs
(in hrs) Brand A Brand B
e
0–50 15 2
h
50–100 20 8
T s
100–150 18 60
i
150–200 25 25
R l
200–250 22 5
b
100 100
E u
(i) Which brand gives higher life?
C
(ii) Which brand is more dependable?
p
9. Averge daily wage of 50 workers of a factory was Rs 200 with a Standard
N re
Deviation of Rs 40. Each worker is given a raise of Rs 20. What is the
new average daily wage and standard deviation? Have the wages become
more or less uniform?
© e
10. If in the previous question, each worker is given a hike of 10 % in wages,
how are the Mean and Standard Deviation values affected?
b
11. Calculate the Mean Deviation about Mean and Standard Deviation for the
following distribution.
t o
Classes Frequencies
20–40 3
t
40–80 6
o
80–100 20
100–120 12
n
120–140 9
50
12. The sum of 10 values is 100 and the sum of their squares is 1090. Find
the Coefficient of Variation.
CHAPTER
7
e d Correlation
T s h
R li
E u b
C
N re p
© e
As the summer heat rises, hill
Studying this chapter should stations, are crowded with more and
enable you to: more visitors. Ice-cream sales become
b
• understand the meaning of the
term correlation;
more brisk. Thus, the temperature is
• understand the nature of related to number of visitors and sale
o
relationship between two of ice-creams. Similarly, as the supply
t
variables; of tomatoes increases in your local
• calculate the different measures
mandi, its price drops. When the local
of correlation;
t
• analyse the degree and direction harvest starts reaching the market,
the price of tomatoes drops from a
o
of the relationships.
princely Rs 40 per kg to Rs 4 per kg or
n
1. INTRODUCTION even less. Thus supply is related to
price. Correlation analysis is a means
In previous chapters you have learnt
for examining such relationships
how to construct summary measures
systematically. It deals with questions
out of a mass of data and changes
among similar variables. Now you will such as:
learn how to examine the relationship • Is there any relationship between
between two variables. two variables?
92 STATISTICS FOR ECONOMICS
d
effect interpretation. Others may be
e
just coincidence. The relation between
the arrival of migratory birds in a
h
• If the value of one variable sanctuary and the birth rates in the
changes, does the value of the locality can not be given any cause
T s
other also change? and ef fect interpretation. The
li
relationships are simple coincidence.
R
The relationship between size of the
b
shoes and money in your pocket is
E
another such example. Even if
u
relationship exist, they are difficult to
C
explain it.
p
In another instance a third
N re
variable’s impact on two variables may
give rise to a relation between the two
variables. Brisk sale of ice-creams may
© e
• Do both the variables move in the be related to higher number of deaths
same direction? due to drowning. The victims are not
drowned due to eating of ice-creams.
b
Rising temperature leads to brisk sale
of ice-creams. Moreover, large number
of people start going to swimming
o
pools to beat the heat. This might have
t
raised the number of deaths by
drowning. Thus temperature is behind
t
the high correlation between the sale
o
of ice-creams and deaths due to
drowning.
n
• How strong is the relationship?
What Does Correlation Measure?
2. TYPES OF RELATIONSHIP Correlation studies and measures the
Let us look at various types of direction and intensity of relationship
relationship. The relation between among variables. Correlation
movements in quantity demanded measures covariation, not causation.
and the price of a commodity is an Correlation should never be
CORRELATION 93
d
value of one variable is found to Karl Pearson’s coef ficient of
change in one direction, the value of correlation and Spearman’s rank
e
the other variable is found to change correlation.
either in the same direction (i.e. A scatter diagram visually presents
h
positive change) or in the opposite the nature of association without
T s
direction (i.e. negative change), but in giving any specific numerical value. A
i
a definite way. For simplicity we numerical measure of linear
R l
assume here that the correlation, if relationship between two variables is
given by Karl Pearson’s coefficient of
b
it exists, is linear, i.e. the relative
E
movement of the two variables can be correlation. A relationship is said to
be linear if it can be represented by a
u
represented by drawing a straight line
straight line. Another measure is
C
on graph paper.
Spearman’s coefficient of correlation,
p
which measures the linear association
N re
Types of Correlation
between ranks assigned to indiviual
Correlation is commonly classified items according to their attributes.
into negative and positive correlation. Attributes are those variables which
© e
The correlation is said to be positive cannot be numerically measured such
when the variables move together in as intelligence of people, physical
the same direction. When the income appearance, honesty etc.
b
rises, consumption also rises. When
Scatter Diagram
income falls, consumption also falls.
o
Sale of ice-cream and temperature A scatter diagram is a useful
t
move in the same direction. The technique for visually examining the
correlation is negative when they move for m of relationship, without
t
in opposite directions. When the price calculating any numerical value. In
this technique, the values of the two
of apples falls its demand increases.
o
variables are plotted as points on a
When the prices rise its demand
graph paper. The cluster of points, so
n
decreases. When you spend more time plotted, is referred to as a scatter
in studying, chances of your failing diagram. From a scatter diagram, one
decline. When you spend less hours can get a fairly good idea of the nature
in study, chances of your failing of relationship. In a scatter diagram
increase. These are instances of the degree of closeness of the scatter
negative correlation. The variables points and their overall direction
move in opposite direction. enable us to examine the relation-
94 STATISTICS FOR ECONOMICS
ship. If all the points lie on a line, the Inspection of the scatter diagram
correlation is perfect and is said to be gives an idea of the nature and
unity. If the scatter points are widely intensity of the relationship.
dispersed around the line, the
Karl Pearson’s Coef ficient of
d
correlation is low. The correlation is
said to be linear if the scatter points Correlation
e
lie near a line or on a line. This is also known as product moment
Scatter diagrams spanning over correlation and simple correlation
h
Fig. 7.1 to Fig. 7.5 give us an idea of coefficient. It gives a precise numerical
value of the degree of linear
T s
the relationship between two
relationship between two variables X
i
variables. Fig. 7.1 shows a scatter
l
around an upward rising line and Y. The linear relationship may be
R
indicating the movement of the given by
b
Y = a + bX
variables in the same direction. When
E
This type of relation may be
X rises Y will also rise. This is positive
u
described by a straight line. The
correlation. In Fig. 7.2 the points are
C
intercept that the line makes on the
found to be scattered around a
p
Y-axis is given by a and the slope of
downward sloping line. This time the the line is given by b. It gives the
N re
variables move in opposite directions. change in the value of Y for very small
When X rises Y falls and vice versa. change in the value of X. On the other
This is negative correlation. In Fig.7.3 hand, if the relation cannot be
© e
there is no upward rising or downward represented by a straight line as in
sloping line around which the points Y = X2
are scattered. This is an example of the value of the coefficient will be zero.
b
no correlation. In Fig. 7.4 and Fig. 7.5 It clearly shows that zero correlation
the points are no longer scattered need not mean absence of any type
o
around an upward rising or downward of relation between the two variables.
Let X1, X2, ..., XN be N values of X
t
falling line. The points themselves are
on the lines. This is referred to as and Y1, Y2 ,..., YN be the corresponding
values of Y. In the subsequent
t
perfect positive correlation and perfect
negative correlation respectively. presentations the subscripts
o
indicating the unit are dropped for the
Activity
sake of simplicity. The arithmetic
n
means of X and Y are defined as
• Collect data on height, weight
ΣX ΣY
and marks scored by students
in your class in any two subjects
X= ; Y=
N N
in class X. Draw the scatter and their variances are as follows
diagram of these variables taking
two at a time. What type of Σ( X - X )2 ΣX 2
relationship do you find? s2 x = = - X2
N N
CORRELATION 95
e d
T s h
R li
E u b
C
N re p
© e
b
t o
o t
n
96 STATISTICS FOR ECONOMICS
d
Y respectively are the positive square It means units of measurement are
roots of their variances. Covariance of not part of r. r between height in
e
X and Y is defined as feet and weight in kilograms, for
instance, is 0.7.
Σ( X - X )( Y - Y ) Σxy
h
Cov(X,Y) = = • A negative value of r indicates an
N N
T s
inverse relation. A change in one
i
Where x = X - X and y = X - Y variable is associated with change
R l
in the other variable in the
are the deviations of the ith value of X
opposite direction. When price of
b
and Y from their mean values
E
a commodity rises, its demand
respectively.
falls. When the rate of interest
u
The sign of covariance between X
rises the demand for funds also
C
and Y determines the sign of the
falls. It is because now funds have
p
correlation coefficient. The standard
become costlier.
N re
deviations are always positive. If the
covariance is zero, the correlation
coefficient is always zero. The product
© e
moment correlation or the Karl
Pearson’s measure of correlation is
given by
b
r = Σxy Ns s ...(1)
x y
o
or
t
Σ( X - X ) ( Y - Y )
r= ...(2)
Σ( X - X )2 Σ( Y - Y )2
t
or
o
(ΣX )(ΣY )
ΣXY - • If r is positive the two variables
n
r= N move in the same direction. When
(ΣX ) 2 (ΣY ) 2 ...(3) the price of coffee, a substitute of
ΣX 2 - ΣY 2 -
N N tea, rises the demand for tea also
rises. Improvement in irrigation
or
NΣXY (ΣX )(ΣY ) facilities is associated with higher
r= yield. When temperature rises the
NΣX 2 (ΣX )2 NΣY 2 (ΣY )2 ...(4) sale of ice-creams becomes brisk.
CORRELATION 97
d
correlation between the number of
• If r = 1 or r = –1 the correlation is deaths and the number of doctors sent
e
perfect. The relation between them to the villages is found to be positive.
is exact. Normally the health care facilities
h
• A high value of r indicates strong provided by the doctors are expected
linear relationship. Its value is
T s
to reduce the number of deaths
said to be high when it is close to
i
showing a negative correlation. This
l
+1 or –1.
R
happened due to other reasons. The
• A low value of r indicates a weak
data relate to a specific time period.
b
linear relation. Its value is said to
E
Many of the reported deaths could be
be low when it is close to zero.
terminal cases where the doctors
u
• The value of the correlation
could do little. Moreover, the benefit
C
coefficient lies between minus one
of the presence of doctors becomes
p
and plus one, –1 ≤ r ≤ 1. If, in
visible after some time. It is also
N re
any exercise, the value of r is
outside this range it indicates error possible that the reported deaths are
in calculation. not due to the epidemic. A tsunami
suddenly hits the state and death toll
© e
• The value of r is unaffected by the
change of origin and change of rises.
scale. Given two variables X and Y Let us illustrate the calculation of
r by examining the relationship
b
let us define two new variables.
between years of schooling of the
X A Y C farmer and the annual yield per acre.
U= ; V=
o
B D
t
where A and C are assumed means of Example 1
X and Y respectively. B and D are
common factors. Then No. of years Annual yield per
t
of schooling acre in ’000 (Rs)
rxy = ruv of farmers
o
This. property is used to calculate 0 4
correlation coefficient in a highly 2 4
n
4 6
simplified manner, as in the step 6 10
deviation method. 8 10
As you have read in chapter 1, the 10 8
statistical methods are no substitute 12 7
for common sense. Here, is another Formula 1 needs the value of
example, which highlights the need for
understanding the data properly
Σxy, s x , s y
98 STATISTICS FOR ECONOMICS
From Table 7.1 we get, education, higher will be the yield per
acre. It underlines the importance of
Σxy = 42, farmers’ education.
To use formula (3)
Σ( X - X )2 112
sx = = ,
d
N 7 (ΣX )(ΣY )
ΣXY -
e
r= N
Σ( Y - Y )2 38 (ΣX ) 2 (ΣY ) 2 ...(3)
sy = = ΣX 2 - ΣY 2 -
h
N 7 N N
T s
Substituting these values in the value of the following expressions
i
formula (1) have to be calculated i.e.
l
ΣXY, ΣX 2 , ΣY 2 .
R
42
r= = 0.644
112 38 Now apply formula (3) to get the
b
7
E
7 7 value of r.
The same value can be obtained Let us know the interpretation of
u
different values of r. The correlation
C
from formula (2) also.
coefficient between marks secured in
p
Σ ( X - X )( Y - Y ) English and Statistics is, say, 0.1. It
r=
N re
...(2)
Σ ( X - X )2 Σ ( Y - Y )2 means that though the marks secured
in the two subjects are positively
42 correlated, the strength of the
r= = 0.644
© e
112 38 relationship is weak. Students with high
Thus years of education of the marks in English may be getting
relatively low marks in statistics. Had
b
farmers and annual yield per acre are
positively correlated. The value of r is the value of r been, say, 0.9, students
also large. It implies that more the with high marks in English will
o
number of years farmers invest in invariably get high marks in Statistics.
t
TABLE 7.1
Calculation of r between years of schooling of farmers and annual yield
t
Years of (X– X ) (X– X ) 2 Annual yield (Y– Y ) (Y– Y )2 (X– X )(Y– Y )
o
Education per acre in ’000 Rs
(X) (Y)
n
0 –6 36 4 –3 9 18
2 –4 16 4 –3 9 12
4 –2 4 6 –1 1 2
6 0 0 10 3 9 0
8 2 4 10 3 9 6
10 4 16 8 1 1 4
12 6 36 7 0 0 0
d
supply in the local mandi will be 1993–94 17 23
accompanied by lower price of
e
1994–95 18 26
vegetables. Had it been –0.1 large 1995–96 17 27
1996–97 16 25
vegetable supply will be accompanied
h
1997–98 12 25
by lower price, not as low as the price, 1998–99 16 23
T s
when r is –0.9. The extent of price fall 1999–00 11 25
i
2000–01 8 24
depends on the absolute value of r.
l
2001–02 10 23
R
Had it been zero there would have
b
been no fall in price, even after large Source: Economic Survey, (2004–05) Pg. 8,9
E
supplies in the market. This is also a a pr operty of r. It is that r is
u
possibility if the increase in supply is independent of change in origin and
C
taken care of by a good transport scale. It is also known as step
p
network transferring it to other deviation method. It involves the
N re
markets. transformation of the variables X and
Y as follows:
Activity X A Y B
U= ;V =
© e
• Look at the following table. h k
Calculate r between annual where A and B are assumed means, h
and k are common factors.
b
growth of national income at
current price and the Gross Then rUV = rXY
Domestic Saving as percentage
This can be illustrated with the
o
of GDP.
exercise of analysing the correlation
t
between price index and money
Step deviation method to calculate supply.
t
correlation coefficient.
Example 2
o
When the values of the variables
are large, the burden of calculation Price 120 150 190 220 230
index (X)
n
can be considerably reduced by using Money 1800 2000 2500 2700 3000
a pr operty of r. It is that r is supply
independent of change in origin and in Rs crores (Y)
scale. It is also known as step The simplification, using step
deviation method. It involves the deviation method is illustrated below.
transformation of the variables X and Let A = 100; h = 10; B = 1700 and
Y as follows: k = 100
100 STATISTICS FOR ECONOMICS
d
deviation method deviation method and see the
e
simplification.
TABLE 7.3
h
U V Spearman’s rank correlation
Ê X - 100 ˆ Ê Y - 1700 ˆ Spearman’s rank correlation was
T s
ÁË ˜ Á ˜ U2 V2 UV
10 ¯ Ë 100 ¯
developed by the British psychologist
li
2 1 4 1 2 C.E. Spearman. It is used when the
R
5 3 25 9 15 variables cannot be measured
b
9 8 81 64 72 meaningfully as in the case of price,
E
12 10 144 100 120 income, weight etc. Ranking may be
u
13 13 169 169 169 more meaningful when the
C
measurements of the variables are
ΣU = 41; ΣV = 35; ΣU 2 = 423;
p
suspect. Consider the situation where
ΣV 2 = 343; ΣUV = 378
N re
we are required to calculate the
Substituting these values in formula correlation between height and weight
(3) of students in a remote village. Neither
measuring rods nor weighing scales
© e
(ΣU )(ΣV ) are available. The students can be
ΣUV -
r= N easily ranked in terms of height and
b
(ΣU ) 2 (ΣV )2 (3) weight without using measuring rods
ΣU 2 - ΣV 2 - and weighing scales.
N N
There are also situations when you
o
are required to quantify qualities such
41 ¥ 35
t
378 - as fairness, honesty etc. Ranking may
= 5 be a better alternative to quantifica-
t
(41) 2 (35) 2 tion of qualities. Moreover, sometimes
423 - 343 -
5 5 the correlation coefficient between two
o
variables with extreme values may be
= 0.98
quite different from the coefficient
n
without the extreme values. Under
This strong positive correlation
these circumstances rank correlation
between price index and money
provides a better alternative to simple
supply is an important premise of correlation.
monetary policy. When the money Rank correlation coefficient and
supply grows the price index also simple correlation coefficient have the
rises. same interpretation. Its formula has
CORRELATION 101
been derived from simple correlation concerning the data is not utilised.
coefficient where individual values The first differences of the values of
have been replaced by ranks. These the items in the series, arranged in
ranks are used for the calculation of order of magnitude, are almost never
correlation. This coefficient provides constant. Usually the data cluster
d
a measure of linear association around the central values with smaller
e
between ranks assigned to these differences in the middle of the array.
units, not their values. It is the If the first differences were constant
h
Product Moment Correlation between then r and r k would give identical
the ranks. Its formula is results. The first difference is the
T s
difference of consecutive values.
i
6ΣD 2
rk = 1
l
...(4) Rank correlation is preferred to
R
n3 n Pearsonian coefficient when extreme
b
where n is the number of observations values are present. In general
E
and D the deviation of ranks assigned rk is less than or equal to r.
to a variable from those assigned to
u
The calculation of rank correlation
C
the other variable. When the ranks are will be illustrated under three
p
repeated the formula is situations.
rk = 1–
N re
1. The ranks are given.
2. The ranks are not given. They have
È ( m 31 - m1 ) ( m 32 - m 2 ) ˘
6 ÍΣD2 + + + ...˙ to be worked out from the data.
Î 12 12 ˚ 3. Ranks are repeated.
© e
n( n - 1)
2
where m1, m2, ..., are the number of Case 1: When the ranks are given
b
m 31 m1 Example 3
repetitions of ranks and ...,
12 Five persons are assessed by three
o
their corresponding correction judges in a beauty contest. We have
factors. This correction is needed for
t
to find out which pair of judges has
every repeated value of both variables. the nearest approach to common
If three values are repeated, there will perception of beauty.
t
be a correction for each value. Every Competitors
o
time m1 indicates the number of times
Judge 1 2 3 4 5
a value is repeated.
n
All the properties of the simple A 1 2 3 4 5
B 2 4 1 5 3
correlation coefficient are applicable C 1 3 5 2 4
here. Like the Pearsonian Coefficient
of correlation it lies between 1 and There are 3 pairs of judges
–1. However, generally it is not as necessitating calculation of rank
accurate as the ordinary method. This correlation thrice. Formula (4) will be
is due the fact that all the information used —
102 STATISTICS FOR ECONOMICS
d
secured by 5 students in Economics
and Statistics. Then the ranking has
e
A B D D2
to be worked out and the rank
1 2 –1 1 correlation is to be calculated.
h
2 4 –2 4
3 1 2 4
T s
4 5 –1 1 Student Marks in Marks in
i
5 3 2 4 Statistics Economics
R l
(X) (Y)
Total 14
A 85 60
b
B 60 48
E
Substituting these values in C 55 49
u
formula (4) D 65 50
C
E 75 55
6ΣD2
rs = 1 -
p
...(4)
n3 - n
N re
Student Ranking in Ranking in
6 ¥ 14 84 Statistics Economics
=1- =1- = 1 - 0.7 = 0.3 (Rx) (RY )
5 -5
3
120
A 1 1
© e
The rank correlation between A B 4 5
and C is calculated as follows: C 5 4
D 3 3
b
E 2 2
A C D D2
1 1 0 0 Once the ranking is complete
o
2 3 –1 1 formula (4) is used to calculate rank
3 5 –2 4
t
correlation.
4 2 2 4
5 4 1 1 Case 3: When the ranks are repeated
t
Total 10
o
Example 5
Substituting these values in The values of X and Y are given as
n
formula (4) the rank correlation is 0.5. X 25 45 35 40 15 19 35 42
Similarly, the rank correlation Y 55 60 30 35 40 42 36 48
between the rankings of judges B and In order to work out the rank
C is 0.9. Thus, the perceptions of correlation, the ranks of the values
judges A and C are the closest. Judges are worked out. Common ranks are
B and C have very different tastes. given to the repeated items. The
CORRELATION 103
d
assigned the rank next to the rank
6 ÍΣD 2 + ˙˚
rs = 1 - Î
already assumed. The formula of 12 ...(5)
e
Spear man’s rank correlation n3 - n
coef ficient when the ranks are
h
Substituting the values of these
repeated is as follows expressions
T s
rs = 1 - 6(65.5 + 0.5)
i
396
rs = 1 - =1-
l
83 - 8
R
È ( m - m1 ) ( m 2 - m 2 )
3 3
˘ 504
6 ÍΣD2 + + + ...˙
1
Î 12 12 ˚ = 1 - 0.786 = 0.214
E b
n( n 2 - 1) Thus there is positive rank correlation
u
where m1, m2, ..., are the number between X and Y. Both X and Y move
C
of r epetitions of ranks and in the same direction. However, the
p
relationship cannot be described as
m 31 - m1
N re
strong.
..., their corresponding
12
correction factors. Activity
© e
X has the value 35 both at the • Collect data on marks scored by
4th and 5th rank. Hence both are 10 of your classmates in class
given the average rank i.e., IX and X examinations. Calculate
the rank correlation coefficient
b
4+5 between them. If your data do not
th = 4.5 th rank
2 have any repetition, repeat the
o
exercise by taking a data set
having repeated ranks. What are
t
X Y Rank of Rank of Deviation in D2 the circumstances in which rank
Ranking corr elation coef ficient is
t
XR' YR'' D=R'–R'' preferred to simple correlation
25 55 6 2 4 16 coefficient? If data are precisely
o
45 80 1 1 0 0 measured will you still prefer
35 30 4.5 8 3.5 12.25 rank correlation coefficient to
n
40 35 3 7 –4 16 simple correlation? When can
15 40 8 5 3 9 you be indifferent to the choice?
19 42 7 4 3 9 Discuss in class.
35 36 4.5 6 –1.5 2.25
42 48 2 3 –1 1
4. CONCLUSION
Total ΣD = 65.5
We have discussed some techniques
The necessary correction thus is for studying the relationship between
104 STATISTICS FOR ECONOMICS
d
correlation such as Karl Pearson’s correlation gives us an idea of the
e
coefficient of corr elation and direction and intensity of change in a
Spearman’s rank correlation are variable when the correlated variable
h
strictly the measures of linear changes.
R T lis Recap
b
• Correlation analysis studies the relation between two variables.
E
• Scatter diagrams give a visual presentation of the nature of
u
relationship between two variables.
C
• Karl Pearson’s coefficient of correlation r measures numerically only
p
linear relationship between two variables. r lies between –1 and 1.
• When the variables cannot be measured precisely Spearman’s rank
N re
correlation can be used to measure the linear relationship
numerically.
• Repeated ranks need correction factors.
© e
• Correlation does not mean causation. It only means
covariation.
b EXERCISES
t o
1. The unit of correlation coefficient between height in feet and weight in
kgs is
(i) kg/feet
t
(ii) percentage
(iii) non-existent
o
2. The range of simple correlation coefficient is
n
(i) 0 to infinity
(ii) minus one to plus one
(iii) minus infinity to infinity
3. If rxy is positive the relation between X and Y is of the type
(i) When Y increases X increases
(ii) When Y decreases X increases
(iii) When Y increases X does not change
CORRELATION 105
d
5. Of the following three measures which can measure any type of relationship
(i) Karl Pearson’s coefficient of correlation
e
(ii) Spearman’s rank correlation
(iii) Scatter diagram
h
6. If precisely measured data are available the simple correlation coefficient
is
T s
(i) more accurate than rank correlation coefficient
i
(ii) less accurate than rank correlation coefficient
R l
(iii) as accurate as the rank correlation coefficient
b
7. Why is r preferred to covariance as a measure of association?
E
8. Can r lie outside the –1 and 1 range depending on the type of data?
C u
9. Does correlation imply causation?
p
10. When is rank correlation more precise than simple correlation coefficient?
N re
11. Does zero correlation mean independence?
12. Can simple correlation coefficient measure any type of relationship?
13. Collect the price of five vegetables from your local market every day for a
© e
week. Calculate their correlation coefficients. Interpret the result.
14. Measure the height of your classmates. Ask them the height of their
b
benchmate. Calculate the correlation coefficient of these two variables.
Interpret the result.
15. List some variables where accurate measurement is difficult.
t o
16. Interpret the values of r as 1, –1 and 0.
17. Why does rank correlation coefficient differ from Pearsonian correlation
t
coefficient?
18. Calculate the correlation coefficient between the heights of fathers in inches
o
(X) and their sons (Y)
X 65 66 57 67 68 69 70 72
n
Y 67 56 65 68 72 72 69 71
(Ans. r = 0.603)
19. Calculate the correlation coefficient between X and Y and comment on
their relationship:
X –3 –2 –1 1 2 3
Y 9 4 1 1 4 9
(Ans. r = 0)
106 STATISTICS FOR ECONOMICS
e d
h
Activity
• Use all the formulae discussed here to calculate r between
T is
India’s national income and export taking at least ten
l
observations.
E R b
C p u
N re
© e
b
t o
o t
n
CHAPTER
e d Index Numbers
T s h
R li
E u b
C
N re p
© e
commodities have changed. Some
Studying this chapter should items have become costlier, while
enable you to: others have become cheaper. On his
b
• understand the meaning of the return from the market, he tells his
term index number; father about the change in price of the
• become familiar with the use of
each and every item, he bought. It is
o
some widely used index
bewildering to both. The industrial
numbers;
t
• calculate an index number;
sector consists of many subsectors.
• appreciate its limitations. Each of them is changing. The output
t
of some subsectors are rising, while it
is falling in some subsectors. The
o
1. INTRODUCTION changes are not uniform. Description
You have learnt in the previous of the individual rates of change will
n
chapters how summary measures can be difficult to understand. Can a
be obtained from a mass of data. Now single figur e summarise these
you will learn how to obtain summary changes? Look at the following cases:
measures of change in a group of
related variables. Case 1
Rabi goes to the market after a long An industrial worker was earning a
gap. He finds that the prices of most salary of Rs 1,000 in 1982. Today, he
108 STATISTICS FOR ECONOMICS
Case 2
e
You must be reading about the sensex
d
h
in the newspapers. The sensex
T s
crossing 8000 points is, indeed,
i
greeted with euphoria. When, sensex
l
dipped 600 points recently, it eroded
R
investors’ wealth by Rs 1,53,690
b
crores. What exactly is sensex?
Case 3
C E u
The government says inflation rate will Conventionally, index numbers are
p
not accelerate due to the rise in the expressed in terms of percentage. Of
N re
price of petroleum products. How the two periods, the period with which
does one measure inflation? the comparison is to be made, is
These are a sample of questions known as the base period. The value
© e
you confront in your daily life. A study in the base period is given the index
of the index number helps in number 100. If you want to know how
analysing these questions. much the price has changed in 2005
b
from the level in 1990, then 1990
2. WHAT IS AN INDEX NUMBER becomes the base. The index number
o
of any period is in proportion with it.
An index number is a statistical device
Thus an index number of 250
t
for measuring changes in the
indicates that the value is two and half
magnitude of a group of related
times that of the base period.
t
variables. It represents the general
trend of diverging ratios, from which Price index numbers measure and
o
it is calculated. It is a measure of the permit comparison of the prices of
average change in a group of related certain goods. Quantity index
n
variables over two different situations. numbers measure the changes in the
The comparison may be between like physical volume of production,
categories such as persons, schools, construction or employment. Though
hospitals etc. An index number also price index numbers are more widely
measures changes in the value of the used, a production index is also an
variables such as prices of specified important indicator of the level of the
list of commodities, volume of output in the economy.
INDEX NUMBERS 109
d
price index numbers. ΣP0
e
Let us look at the following example: Where P1 and P0 indicate the price
Example 1 of the commodity in the current
h
period and base period respectively.
Calculation of simple aggregative price
T s
Using the data from example 1, the
index
i
simple aggregative price index is
l
TABLE 8.1
R
4+6+5+3
Commodity Base Current Percentage P01 = ¥ 100 = 138.5
2+5+4+2
b
period period change
E
price (Rs) price (Rs) Here, price is said to have risen by
u
A 2 4 100 38.5 percent.
C
B 5 6 20 Do you know that such an index
p
C 4 5 25 is of limited use? The reason is that
D 2 3 50 the units of measurement of prices of
N re
As you observe in this example, the various commodities are not the
percentage changes are different for same. It is unweighted, because the
relative importance of the items has
© e
every commodity. If the percentage
not been properly reflected. The items
changes were the same for all four
ar e treated as having equal
items, a single measure would have importance or weight. But what
b
been sufficient to describe the change. happens in reality? In reality the items
However, the percentage changes pur chased dif fer in order of
o
differ and reporting the percentage importance. Food items occupy a
t
change for every item will be large proportion of our expenditure.
confusing. It happens when the In that case an equal rise in the price
t
number of commodities is large, which of an item with large weight and that
of an item with low weight will have
is common in any r eal market
o
different implications for the overall
situation. A price index represents
change in the price index.
n
these changes by a single numerical The for mula for a weighted
measure. aggregative price index is
There are two methods of
ΣP1q1
constructing an index number. It can P01 = ¥ 100
ΣP0 q1
be computed by the aggregative
method and by the method of An index number becomes a
weighted index when the relative
averaging relatives.
110 STATISTICS FOR ECONOMICS
d
190
each year is calculated. It thus This method uses the base period
e
measures the changing value of a fixed quantities as weights. A weighted
aggregate of goods. Since the total aggregative price index using base
h
value changes with a fixed basket, the period quantities as weights, is also
change is due to price change.
T s
known as Laspeyre’s price index. It
Various methods of calculating a
i
provides an explanation to the
l
weighted aggregative index use question that if the expenditure on
R
different baskets with respect to time. base period basket of commodities
b
was Rs 100, how much should be the
E
expenditure in the current period on
u
the same basket of commodities? As
C
you can see here, the value of base
p
period quantities has risen by 35.3 per
N re
cent due to price rise. Using base
period quantities as weights, the price
is said to have risen by 35.3 percent.
© e
Since the current period quantities
differ from the base period quantities,
the index number using current period
b
weights gives a different value of the
index number.
Example 2
ΣP1q1
o
Calculation of weighted aggregative P01 = ¥ 100
ΣP0 q1
t
price index
TABLE 8.2 4 ¥ 5 + 6 ¥ 10 + 5 ¥ 15 + 3 ¥ 10
= ¥ 100
t
Base period Current period 2 ¥ 5 + 5 ¥ 10 + 4 ¥ 15 + 2 ¥ 15
Commodity Price Quantity Price Quality
o
P0 q0 p1 q1 185
= ¥ 100 = 132.1
140
n
A 2 10 4 5
B 5 12 6 10 It uses the current period
C 4 20 5 15
D 2 15 3 10
quantities as weights. A weighted
aggregative price index using current
ΣP1q1 period quantities as weights is known
P01 = ¥ 100 as Paasche’s price index. It helps in
ΣP0 q1
answering the question that, if the
INDEX NUMBERS 111
d
same basket of commodities. A Ë P0 ¯
P01 =
e
Paasche’s price index of 132.1 is ΣW
interpreted as a price rise of 32.1
where W = Weight.
h
percent. Using current period weights,
In a weighted price relative index
the price is said to have risen by 32.1
T s
weights may be determined by the
per cent.
i
proportion or percentage of
R l
Method of Averaging relatives expenditure on them in total
expenditure during the base period.
b
When there is only one commodity, the
E
It can also refer to the current period
price index is the ratio of the price of depending on the formula used. These
u
the commodity in the current period are, essentially, the value shares of
C
to that in the base period, usually different commodities in the total
p
expressed in percentage terms. The expenditure. In general the base
N re
method of averaging relatives takes period weight is preferred to the
the average of these relatives when current period weight. It is because
there are many commodities. The calculating the weight every year is
© e
price index number using price inconvenient. It also refers to the
relatives is defined as changing values of different baskets.
They are strictly not comparable.
b
1 p1
P01 = Σ ¥ 100 Example 3 shows the type of
n p0 information one needs for calculating
weighted price index.
o
where P1 and Po indicate the price of
t
the ith commodity in the current Example 3
period and base period respectively. Calculation of weighted price relatives
t
The ratio (P1/P0) × 100 is also referred index
to as price relative of the commodity.
o
TABLE 8.3
n stands for the number of
commodities. In the curr ent Commodity Base Current Price Weight
n
year year price relative in %
example price (in Rs)
(in Rs.)
1 Ê 4 6 5 3ˆ
P01 = Á + + + ˜ ¥ 100 = 149 A 2 4 200 40
4 Ë 2 5 4 2¯ B 5 6 120 30
C 4 5 125 20
Thus the prices of the commodities D 2 3 150 10
have risen by 49 percent.
112 STATISTICS FOR ECONOMICS
d
manual employees (1984–85 as
base) and CPI for agricultural
e
40 ¥ 200 + 30 ¥ 120 + 20 ¥ 125 + 10 ¥ 150
= labourers (base 1986–87). They are
100 routinely calculated every month to
h
= 156 analyse the impact of changes in the
The weighted price index is 156. retail price on the cost of living of
T s
The price index has risen by 56 these three br oad categories of
i
percent. The values of the unweighted consumers. The CPI for industrial
R l
price index and the weighted price workers and agricultural labourers
are published by Labour Bureau,
b
index differ, as they should. The higher
E
rise in the weighted index is due to Shimla. The Central Statistical
Organisation publishes the CPI
u
the doubling of the most important
number of urban non manual
C
item A in example 3.
employees. This is necessary
p
because their typical consumption
Activity
N re
baskets contain many dissimilar
• Interchange the current period items.
values with the base period The weight scheme in CPI for
values, in the data given in
© e
industrial workers (1982=100) by
example 2. Calculate the price major commodity groups is given
index using Laspeyre’s, and in the following table. In this scheme
Paasche’s for mula. What
b
food has the largest weight. Food
difference do you observe from being the most important category,
the earlier illustration?
any rise in the food price will have a
o
significant impact on CPI. This also
4. SOME IMPORTANT INDEX NUMBERS
t
explains the government’s frequent
statement that oil price hike will not
be inflationary.
t
Consumer price index
Consumer price index (CPI), also Major Group Weight in %
o
known as the cost of living index, Food 57.00
measures the average change in retail Pan, supari, tobacco etc. 3.15
n
prices. The CPI for industrial workers Fuel & light 6.28
Housing 8.67
is increasingly considered the
Clothing, bedding & footwear 8.54
appropriate indicator of general Misc. group 16.36
inflation, which shows the most General 100.00
accurate impact of price rise on the
Source: Economic Survey, Government of
cost of living of common people.
India.
Consider the statement that the CPI
INDEX NUMBERS 113
d
commodities, he needs Rs 526 in have any reference consumer
e
January 2005 to be able to buy an category. It does not include items
identical basket of commodities. It is pertaining to services like barber
h
not necessary that he/she buys the charges, repairing etc.
basket. What is important is whether What does the statement “WPI with
T s
he has the capability to buy it.
i
1993-94 as base is 189.1 in March,
l
2005” mean? It means that the
R
Example 4
general price level has risen by 89.1
b
Construction of consumer price index
percent during this period.
E
number.
u
TABLE 8.4
C
Item Weight in % Base period Current period R=P1/P0 × 100 WR
p
W price (Rs) price (Rs) (in%)
N re
Food 35 150 145 96.67 3883.45
Fuel 10 25 23 92.00 920.00
Cloth 20 75 65 86.67 1733.40
Rent 15 30 30 100.00 1500.00
© e
Misc. 20 40 45 112.50 2250.00
9786.85
b
Industrial production index
ΣWR 9786.85
CPI = = = 97.86 The index number of industrial
ΣW 100 production measures changes in the
o
level of industrial production
t
This exercise shows that the cost comprising many industries. It
of living has declined by 2.14 per cent. includes the production of the public
t
What does an index larger than 100 and the private sector. It is a weighted
average of quantity relatives. The
o
indicate? It means a higher cost of
formula for the index is
living necessitating an upward
Σq1 ¥ W
n
adjustment in wages and salaries. The IIP01 = ¥ 100
rise is equal to the amount, it exceeds ΣW
100. If the index is 150, 50 percent In India, it is currently calculated
every month with 1993–94 as the
upward adjustment is required. The base. In table 8.6, you can see the
salaries of the employees have to be index number of some industrial
raised by 50 per cent. groupings along with their weights.
114 STATISTICS FOR ECONOMICS
d
production and the value of imports
inclusive of import duty during the Index number of agricultural
e
base year. It is available on a weekly production
basis. Commodities are broadly
Index number of agricultural production
h
classified into three categories viz
is a weighted average of quantity
primary articles, fuel, power, light
T s
and lubricants and manufactured relatives. Its base period is the
i
products. The weight scheme is triennium ending 1981-82. In 2003–
l
04 the index number of agricultural
R
given below. The low weight of
fuel,power,light and lubricants production was 179.5. It means that
b
explains how the government can agricultural production has increased
E
get away with such a statement that by 79.5 percent over the average of
u
the oil price hike will not be the three years 1979–80, 1980–81 and
C
inflationary at least in the short run. 1981–82. Foodgrains have a weight of
p
TABLE 8.5 62.92 percent in this index.
N re
Category Weight in % No. of items
Primary articles 22.0 98 SENSEX
Fuel, power, You ofen come across a news item in
© e
light & lubricants 14.2 19
Manufactured a newspaper,
products 63.8 318 “Sensex breaches 8700 mark. BSE
closes at 8650 points. Investor wealth
b
Source: Economic Survey 2004–2005,
rises by Rs 9,000 crore. The sensex
Govt. of India, p–89
broke the 8700 mark for the first time
o
in its history but ended off the mark
TABLE 8.6
t
Broad industrial grouping and their at 8650, also a new record closing
weights level”.
The rise in sensex was at the
t
Broad groupings Weight in % Index no. in
May, 2005 highest level till date, which reflects
o
Mining and the good health of the economy in
quarrying 10.47 155.2 general. As the share prices increase,
n
Manufacturing 79.36 222.7 reflected by the rise in sensex, the
Electricity 10.17 196.7 value of wealth of the shareholders
General index 213.0
also rises.
As the table shows, the growth Look at another news item,
performances of the broad industrial “Sensex dips 600 in 30 days flat.
categories differ. The general index Rs 1,53,690 crore investor wealth
represents the average performance of eroded. While the sensex has lost 338
INDEX NUMBERS 115
d
value of the sensex is with
reference to this period. It is the
e
benchmark index for the Indian
stock market. It consists of 30
h
stocks which represent 13 sectors
of the economy and the companies
T s
listed ar e leaders in their
i
respective industries. If the sensex
R l
rises, it indicates that the market
is doing well and investors expect
b
better earnings from companies.
E
It also indicates a gr owing
u
confidence of investors in the basic health of the economy.
C p
points in two consecutive days, it has index number will replace wholesale
N re
eroded 6.8% or 598 points since price index.
October 4 when it hit an all time high Producer Price Index
at 8800 points. Investor wealth eroded
Pr oducer price index number
© e
by a staggering Rs 1,53,690 crore or
measures price changes from the
6.7% during the period.”
producers’ perspective. It uses only
It shows that all is not well with basic prices including taxes, trade
b
the health of the economy. The margins and transport costs. A
investors may find it hard to decide Working Gr oup on Revision of
whether to invest or not. Wholesale Price Index (1993–
o
94=100) is inter alia examining the
t
feasibility of switching over from WPI
to a PPI in India as in many
t
countries.
o
5. ISSUES IN THE CONSTRUCTION OF AN
INDEX NUMBER
n
You should keep certain important
issues in mind, while constructing an
index number.
• You need to be clear about the
Another useful index in recent purpose of the index. Calculation of a
years is the human development volume index will be inappropriate,
index. Very soon producers price when one needs a value index.
116 STATISTICS FOR ECONOMICS
d
condition of the poor agricultural for the week. What problems do
e
labourers. Thus the items to be you encounter in applying both
methods for the construction of
included in any index have to be
h
a price index?
selected carefully to be as
representative as possible. Only then
T s
6. INDEX NUMBER IN ECONOMICS
i
you will get a meaningful picture of
l
the change. Why do we need to use the index
R
• Every index should have a base. numbers? Wholesale price index
b
This base should be as normal as number (WPI), consumer price index
E
possible. Extreme values should not number (CPI) and industrial
u
be selected as base period. The period production index (IIP) are widely used
C
should also not belong to too far in in policy making.
p
the past. The comparison between • Consumer index number (CPI) or
N re
cost of living index numbers are
1993 and 2005 is much more
helpful in wage negotiation,
meaningful than a comparison
formulation of income policy, price
between 1960 and 2005. Many items
policy, rent control, taxation and
© e
in a 1960 typical consumption basket
general economic policy formulation.
have disappeared at present.
• The wholesale price index (WPI) is
Therefore, the base year for any index
b
used to eliminate the effect of changes
number is routinely updated. in prices on aggregates such as
• Another issue is the choice of the national income, capital formation etc.
o
formula, which depends on the nature • The WPI is widely used to measure
of question to be studied. The only
t
the rate of inflation. Inflation is a
difference between the Laspeyres’ general and continuing increase in
index and Paasche’s index is the
t
prices. If inflation becomes sufficiently
weights used in these formulae. large, money may lose its traditional
o
• Besides, there are many sources function as a medium of exchange and
of data with different degrees of as a unit of account. Its primary
n
reliability. Data of poor reliability will impact lies in lowering the value of
give misleading results. Hence, due money. The weekly inflation rate is
care should be taken in the collection given by
of data. If primary data are not being
Xt Xt
used, then the most reliable source of 1
¥ 100 where X and X
secondary data should be chosen. X t -1 t t-1
INDEX NUMBERS 117
refer to the WPI for the t th and (t-1) • Sensex is a useful guide for
th weeks. investors in the stock market. If the
• CPI are used in calculating the sensex is rising, investors ar e
purchasing power of money and real optimistic of the future performance
wage: of the economy. It is an appropriate
d
(i) Purchasing power of money = 1/ time for investment.
e
Cost of living index
(ii) Real wage = (Money wage/Cost of Where can we get these index
h
living index) × 100 numbers?
T s
Some of the widely used index
If the CPI (1982=100) is 526 in
i
numbers are routinely published in
l
January 2005 the equivalent of a the Economic Survey, an annual
R
rupee in January, 2005 is given by publication of the Government of India
b
are WPI, CPI, Index Number of Yield
E
100
Rs = 0.19 . It means that it is of Principal Crops, Index of Industrial
u
526 Production, Index of Foreign Trade.
C
worth 19 paise in 1982. If the money
p
wage of the consumer is Rs 10,000, Activity
N re
his real wage will be • Check from the newspapers and
construct a time series of sensex
100 with 10 observations. What
Rs 10, 000 ¥ = Rs 1, 901
© e
happens when the base of the
526 consumer price index is shifted
from 1982 to 2000?
b
It means Rs 1,901 in 1982 has
the same purchasing power as Rs 7. CONCLUSION
10,000 in January, 2005. If he/she
Thus, the method of the index number
o
was getting Rs 3,000 in 1982, he/
enables you to calculate a single
t
she is worse off due to the rise in price.
measure of change of a large number
To maintain the 1982 standard of
of items. Index numbers can be
t
living the salary should be raised to
calculated for price, quantity, volume
Rs 15,780 obtained by multiplying the
etc.
o
base period salary by the factor 526/
100. It is also clear from the formulae
n
• Index of industrial production that the index numbers need to be
gives us a quantitative figure about interpreted carefully. The items to be
the change in production in the included and the choice of the base
industrial sector. period are important. Index numbers
• Agricultural production index are extremely important in policy
provides us a ready reckoner of the making as is evident by their various
performane of agricultural sector. uses.
118 STATISTICS FOR ECONOMICS
Recap
• An index number is a statistical device for measuring relative change
in a large number of items.
• There are several formulae for working out an index number and
d
every formula needs to be interpreted carefully.
• The choice of formula largely depends on the question of interest.
e
• Widely used index numbers are wholesale price index, consumer
price index, index of industrial production, agricultural production
h
index and sensex.
• The index numbers are indispensable in economic policy
T s
making.
R li EXERCISES
E b
1. An index number which accounts for the relative importance of the items
u
is known as
C
(i) weighted index
p
(ii) simple aggregative index
N re
(iii) simple average of relatives
2. In most of the weighted index numbers the weight pertains to
(i) base year
© e
(ii) current year
(iii) both base and current year
3. The impact of change in the price of a commodity with little weight in the
b
index will be
(i) small
(ii) large
o
(iii) uncertain
t
4. A consumer price index measures changes in
(i) retail prices
t
(ii) wholesale prices
(iii) producers prices
o
5. The item having the highest weight in consumer price index for industrial
workers is
n
(i) Food
(ii) Housing
(iii) Clothing
6. In general, inflation is calculated by using
(i) wholesale price index
(ii) consumer price index
(iii) producers’ price index
INDEX NUMBERS 119
d
10. What does a consumer price index for industrial workers measure?
e
11. What is the difference between a price index and a quantity index?
12. Is the change in any price reflected in a price index number?
h
13. Can the CPI number for urban non-manual employees represent the
T s
changes in the cost of living of the President of India?
i
14. The monthly per capita expenditure incurred by workers for an industrial
R l
centre during 1980 and 2005 on the following items are given below. The
weights of these items are 75,10, 5, 6 and 4 respectively. Prepare a
E b
weighted index number for cost of living for 2005 with 1980 as the base.
u
Items Price in 1980 Price in 2005
C
Food 100 200
p
Clothing 20 25
Fuel & lighting 15 20
N re
House rent 30 40
Misc 35 65
© e
15. Read the following table carefully and give your comments.
b
Industry Weight in % 1996–97 2003–2004
General index 100 130.8 189.0
Mining and quarrying 10.73 118.2 146.9
o
Manufacturing 79.58 133.6 196.6
t
Electricity 10.69 122.0 172.6
16. Try to list the important items of consumption in your family.
t
17. If the salary of a person in the base year is Rs 4,000 per annum and the
o
current year salary is Rs 6,000, by how much should his salary rise to
maintain the same standard of living if the CPI is 400?
n
18. The consumer price index for June, 2005 was 125. The food index was
120 and that of other items 135. What is the percentage of the total
weight given to food?
19. An enquiry into the budgets of the middle class families in a certain city
gave the following information;
120 STATISTICS FOR ECONOMICS
d
What is the cost of living index of 2004 as compared with 1995?
e
20. Record the daily expenditure, quantities bought and prices paid per unit
of the daily purchases of your family for two weeks. How has the price
h
change affected your family?
s
21. Given the following data-
T i
Year CPI of industrial CPI of urban CPI of agricultural WPI
R l
workers non-manual labourers (1993–94=100)
(1982 =100) employees (1986–87 = 100)
b
(1984–85 = 100)
E
1995–96 313 257 234 121.6
u
1996–97 342 283 256 127.2
C
1997–98 366 302 264 132.8
p
1998–99 414 337 293 140.7
1999–00 428 352 306 145.3
N re
2000–01 444 352 306 155.7
2001–02 463 390 309 161.3
2002–03 482 405 319 166.8
2003–04 500 420 331 175.9
© e
Source: Economic Survey, Government of India.2004–2005
b
(ii) Comment on the relative values of the index numbers.
(iii) Are they comparable?
t o Activity
t
• Consult your class teacher to make a list of widely used index
o
numbers. Get the most recent data indicating the source. Can you
tell what the unit of an index number is?
• Make a table of consumer price index for industrial workers in the
n
last 10 years and calculate the purchasing power of money. How is it
changing?
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n X
f
f
m
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n
e d
T s h
R li
E u b
C
N re p
© e
b
t o
ot
n