Module-3: Classification, Tabulation and
Graphical Representation
Dr. Akanksha Gupta
Classification
Classification is the process of arranging things or items in groups or classes according to
their resemblance and give expression to the units of attributes that may exist amongst
the diversity of individuals. Different modes of classification are:
• Geographical classification: Classification is according to place, area or region.
• Chronological classification: It is according to the lapse of time, e.g., monthly,
yearly, etc.
• Qualitative classification: Data are classified according to the attributes of the
subjects or items, e.g., gender, qualification, colour, etc.
• Quantitative classification: Data are classified according to the magnitude of the
numerical values, e.g., age, income, height, weight, etc.
Objectives of Classification
• To present the facts in a simple manner.
• To highlight items which possess or do not possess certain attributes or qualities.
• To provide help in making comparison between items.
• To find out mutual relationship between certain measures and their effects.
• To provide basis for tabulation.
Characteristics of Classification
• Exhaustive: The classes should be such that they cover every item of the set, i.e.,
they should be complete and non-overlapping.
• Stability: Classification should be unifrom or standardized so that the results are
comparable in different studies.
• Homogeneity: The units of measurement of all classes should be same. Like units
only be accommodated in one class.
• Suitability: Classification should be done according to the objective of the study
only.
1
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
• Arithmetic accuracy: The sum of number of all units or observations in all
classes should be equal to the total number of units or observations.
Formation of Classes
Exclusive and inclusive class intervals: In exclusive class intervals, the upper limit
of a class is the lower limit of the next class. Also the upper limit of a class is not included
in that class. In inclusive class intervals the upper limit of a class instead is not the lower
limit of the next class. The lower limit is generally greater by unit measurement. This
approach is suitable in case of data given in whole numbers. In rest of the cases exclusive
class approach is suitable. To simplify the calculations inclusive classes are sometimes
converted into exclusive classes.
In quantitative classification, the number of classes depends upon the class interval. H.A.
Sturges suggested the following formula to determine the class interval and the number
of classes
L−S
i=
1 + 3.322 log10 n
where,
i: class interval
L: Largest observation
S: Samllest observation
n: total number of observations in the set.
The denominator of the above formula is equal to the number of classes.
If only the mid values of the classes are known then the classes can be formed as follows:
Let m is the mid-value of a class and i be the difference between two consecutive mid-
i i
values, the lower and upper class limits are m − 2 and m + 2 , respectively.
If in grouped classes, the lower limit of the beginning class is not specified and/or the
upper limit of the highest (last) class is not specified, it is known as grouped data with
open end class(es).
Tabulation
It is the process of presenting data collected through survey, experiment or record in
rows and columns so that it can be understood more easily and can be used for further
statistical analysis.
While classification is meant for arranging the data into characteistics or groups where
each group has number of items attached to it, tabulation is the logical and systematic
arrangement of data in rows and columns. The primary objectives of tabulation are:
• To clarify the object of investigation.
• To reduce complexity of data.
• To economise space.
2
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
• To detect errors and omissions of data, if any.
• To depict the relation among data if it exists.
• To facilitate analysis and comparison of data.
Requisites of a standard table are:
• It should be suitable for the purpose.
• Clarity and completeness of table is necessary.
• Table should be of adequate size.
• Units of measurements should be specified.
• Logical arrangement of items.
• Totals and sub-totals be given.
Classification is meant for arranging the data into characteristics or groups where each
group has the number of item attached to it. In case of variables, it is given in the form
of frequency distribution.
Tabulation is the logical and systematic arrangement of data in rows and columns. In a
table, data may be presented in modified form as well, e.g., in percent, proportion, total
or average values, etc.
Distribution of Variables
When observations, discrete or continuous, are available on a single characteristic of a
large number of individuals, often it becomes necessary to condense the data as far as
possible without losing any information of interest. An easy way to do so is to form
Frequency Distribution.
A frequency distribution for qualitative data groups data into categories and records
the number of observations that fall into each category. The word ‘frequency’ is derived
from ‘how frequently’ a variable occurs.
For quantitative data, a frequency distribution groups data into intervals called classes
and records the number of observations that falls into each class.
A cumulative frequency distribution records the number of observations that fall
below (above) the upper (lower) limit of each class.
To illustrate the construction of a frequency distribution with nominal data, Table 1
shows the weather for the month of February (2010) in Seattle, Washington.
From the frequency distribution, we can now readily observe that the most common type
of day in February was rainy since this type of day occurs with the highest frequency.
3
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
Table 1: Seattle Weather, February 2010
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
7 Rainy 1 Rainy 2 Rainy 3 Rainy 4 Rainy 5 Rainy 6 Rainy
14 Rainy 15 Rainy 9 Cloudy 10 Rainy 11 Rainy 12 Rainy 13 Rainy
21 Sunny 22 Sunny 16 Rainy 17 Sunny 18 Sunny 19 Sunny 20 Rainy
28 Sunny 23 Rainy 24 Rainy 25 Rainy 26 Rainy 27 Rainy
Table 2: Frequency Distribution for Seattle Weather, February 2010
Weather Tally Frequency
Cloudy 1
Rainy 20
Sunny 7
Total = 28 days
Relative Frequency and Percent Frequency
We want to compare data sets that differ in size. For example, we might want to compare
the weather in February to the weather in March. However, February has 28 days (except
during a leap year) and March has 31 days. In this instance, we would convert the
frequency distribution to a relative frequency distribution.
Table 3: Relative Frequency Distribution for Seattle Weather
February 2010 March 2010
Weather Relative Frequency Relative Frequency
Cloudy 1/28=0.036 3/31=0.097
Rainy 20/28=0.714 18/31=0.581
Sunny 7/28=0.250 10/31=0.323
Total = 1 (subject to
Total = 1
rounding)
From the relative frequency distribution, we can now conclude that the weather in Seattle
in both February and March was predominantly rainy. However, the weather in March
was a bit nicer in that approximately 32% of the days were sunny, as opposed to only
25% of the days in February.
The relative frequency for each category in a frequency distribution equals the pro-
portion (fraction) of observations in each category. A category’s relative frequency is
calculated by dividing the frequency by the total number of observations. The sum of
the relative frequencies should equal one.
4
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
The percent frequency for each category in a frequency distribution equals the percent
(%) of observations in each category; it equals the relative frequency of the category
multiplied by 100.
Diagrammatic and Graphical Representation
We can visualize the information found in frequency distributions by constructing various
graphs. Graphical representations often portray the data more dramatically, as well as
simplify interpretation. A pie chart and a bar chart are two widely used graphical repre-
sentations of qualitative data, whereas, histograms and polygons are graphical depictions
of frequency and relative frequency distributions for quantitative data. The advantage
of a visual display is that we can quickly see where most of the observations tend to
cluster, as well as the spread and shape of the data. Following are the advantages of
diagrammatic representation of data:
• Diagrams give a bird’s-eye view of complex data.
• They have long lasting impression.
• Easy to understand.
• Save time and labour.
• Facilitate comparison.
Types of diagrams
One-dimensional diagram: Line diagram, simple bar diagram, multiple or compound
bar diagram, sub-divided or component bar diagram, percentage bar diagram, etc.
Two-dimensional diagram: Rectangle, circle, pie diagram, etc.
Three-dimensional diagram: Cube, Cylinder, sphere, etc.
Non-dimensional diagram: Pictogram
Line Diagram
A line diagram is a one-dimensional diagram in which the height of the line represents
the frequency corresponding to the value of the item or a factor.
Figure 1: Line diagram
5
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
Bar Diagram and Sub-divided Bar Diagram
A bar diagram (Fig. 2) represents the magnitude of a single factor according to time
periods, places, items, etc. When the magnitude of the factor is given with its subfactors,
each bar is further sub-divided into components in proportion to the magnitude of the
sub-factors and is known as sub-divided bar diagram (Fig. 3).
Figure 2: Bar diagram
Figure 3: Sub-divided Bar diagram
Multiple or Compound Bar Diagram
In a multiple bar diagram (Fig. 4), adjoining bars are drawn according to the number
of factors and their heights in proportion to the values of the factors in the same order
for each period or place. Each bar of a group is shown by different patterns or colours
to make them easily distinguishable and this pattern is retained in all the groups. A
constant distance is maintained between groups of bars drawn for periods or places. Such
a diagram is known as multiple or compound bar diagram.
Figure 4: Multiple bar diagram
6
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
Overlapping Bar or Column Charts
In this type of charts, one column (bar) penetrates into the next column (bar) by half of
its width. Each column (bar) is shown by different patterns or colours. Such charts save
space as the spread of group of columns is sizeably reduced (Fig. 5).
Figure 5: Overlapping bar diagram
Histogram, Frequency Polygon and Frequency Curve
A histogram (Fig. 6) is similar to a bar chart, but with a continuous scale. In other
words, a bar chart is called a histogram when it is used to present continuous numerical
data.
Figure 6: Histogram
The above histogram have equal group widths. In some situations it may be convenient
to have one or two wider groups at the extremes of the distributions. For such cases it
should be noted that it is the areas of the rectangles that are proportional to the frequen-
cies not their heights.
In drawing the histogram of a given continuous frequency distribution we first mark off
along the x-axis all the class intervals on a suitable scale. On each class interval erect
7
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
rectangles with heights proportional to the frequency of the corresponding class interval
so that the area of the rectangle is proportional to the frequency of the class. If, however,
the classes are of unequal width then the height of the rectangle will be proportional to
the ratio of the frequencies to the width of the classes.
When the mid-points of the tops of the adjacent bars of a histogram are joined in order,
then the graph of lines so obtained is called a frequency polygon (Fig. 7).
Figure 7: Frequency polygon
A frequency curve (Fig. 8) is a graphical representation of frequencies corresponding to
their variate values by a smooth curve. A smoothened frequency polygon represents a
frequency curve.
Figure 8: Frequency Curve
Stem and Leaf Diagram
An alternative to the histogram is the stem and leaf display. It gives a visual represen-
tation similar to the histogram but does not lose the detail of the individual data points
in the grouping. Construction steps:
• Arrange the data in ascending order.
8
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
• Determine the stems: The stem represents the leading digit(s) of each number.
For example, in the number 47, the stem is 4.
• Determine the leaves: The leaf is the last digit of each number. In 47, the leaf
is 7.
• List stems in a vertical column, in increasing order.
• Write the leaves corresponding to each stem in a row next to it.
• Label the display properly (e.g., “Stem = tens, Leaf = ones”).
For example, a sample of 100 claims (in Rs.) for damage due to water leakage on an
insurance company’s household contents policies might be as follows:
Figure 9: Claims in Rs.
These data might be summarised in the following grouped frequency distribution:
Group Frequency
50-99 1 0 7
100-149 5 1 11344
150-199 4 1 6888
200-249 14 2 0122333344444
250-299 22 2 566677777778899999999
300-349 20 3 00011122222233334444
350-399 14 3 55556677777799
400-449 13 4 000001122333444
450-499 6 4 677899
500-549 1 5 2
The stems are on the left with units of Rs.100 and the leaves are on the right with units
of Rs. 10. So the individual data points can be represented, although they are rounded
to the nearest Rs. 10.
9
STAMN11: Descriptive Statistics Dr. Akanksha Gupta
References
• Goon A.M., Gupta A.K. and Das Gupta B. (1999): Fundamental of Statistics, Vol.
I, World Press, Calcutta.
• Jaggia Sanjiv and Kelly Alison (2016): Business Statistics, 2nd edn., McGraw-Hill
Education, New York.
• Agarwal B.L. (2005): Programmed Statistics, 2nd edn., New Age International
Publishers, New Delhi.
• Gupta S.C. and Kapoor V.K. (2016): Fundamental of Mathematical Statistics, 10th
edn., Sultan Chand & Sons, New Delhi.
10