Lecture 2
Summary statistics
SHORT REVIEW:
Grouping is an operation that involves selecting and combining indicators with the
same characteristics from a population.
Groupings are performed by task. For example, if the standard of living of the
population is analyzed, then the grouping is made by household income and
expenses, and if the dynamics of population growth is studied, then the grouping is
made by birth rate, mortality rate, family size, number of children in families, etc.
can be done based on indicators.
Groupings are formed according to quantitative and qualitative characteristics.
Groupings are called simple if they are formed by one characteristic or
characteristic, and complex or combined if they are formed by several
characteristics.
The size of the interval is important in groupings due to quantitative
characteristics. Intervals have volume, lower and upper boundaries. Intervals can
be open or closed depending on their boundaries. Open intervals have a lower or
upper limit. Intervals that have upper and lower boundaries are closed intervals.
Closed intervals can be the same or different in size.
Grouping of data plays a significant role when we have to deal with large data.
This information can also be displayed using a pictograph or a bar graph. Data
formed by arranging individual observations of a variable into groups, so that a
frequency distribution table of these groups provides a convenient way of
summarizing or analyzing the data is termed as grouped data.
Frequency distribution table for grouped data
When the collected data is large, then we can follow the below approach to analyse
it easily using tally marks.
Example:
Consider the marks of 50 students of class VII obtained in an examination. The
maximum marks of the exam are 50.
23, 8, 13, 18, 32, 44, 19, 8, 25, 27, 10, 30, 22, 40, 39, 17, 25, 9, 15, 20, 30, 24, 29,
19, 16, 33, 38, 46, 43, 22, 37, 27, 17, 11, 34, 41, 35, 45, 31, 26, 42, 18, 28, 30, 22,
20, 33, 39, 40, 32
If we create a frequency distribution table for each and every observation, then it
will form a large table. So for easy understanding, we can make a table with a
group of observations say 0 to 10, 10 to 20 etc.
The distribution obtained in the above table is known as the grouped frequency
distribution. This helps us to bring various significant inferences like:
(i) Many students have secured between 20-40, i.e. 20-30 and 30-40.
(ii) 8 students have secured higher than 40 marks, i.e. they got more than 80% in
the examination.
In the above-obtained table, the groups 0-10, 10-20, 20-30,… are known as class
intervals (or classes). It is observed that 10 appears in both intervals, such as 0-10
and 10-20. Similarly, 20 appears in both the intervals, such as as10-20 and 20-30.
But it is not feasible that observation either 10 or 20 can belong to two classes
concurrently. To avoid this inconsistency, we choose the rule that the general
conclusion will belong to the higher class. It means that 10 belongs to the class
interval 10-20 but not to 0-10. Similarly, 20 belongs to 20-30 but not to 10-20, etc.
Consider a class say 10-20, where 10 is the lower class interval and 20 is the
upper-class interval. The difference between upper and lower class limits is called
class height or class size or class width of the class interval.
Interval or Class Size. This class interval is very important when
it comes to drawing Histograms and Frequency diagrams. All the classes may have
the same class size or they may have different classes sizes depending on how you
group your data. The class interval is always a whole number.
Below is an example of grouped data where the classes have the same class
interval.
Age (years) Frequency
0–9 12
10 – 19 30
20 – 29 18
30 – 39 12
40 – 49 9
50 – 59 6
60 – 69 0
Below is an example of grouped data where the classes have different class
interval.
Age (years) Frequency Class Interval
0–9 15 10
10 – 19 18 10
20 – 29 17 10
30 – 49 35 20
50 – 79 20 30
Calculating Class Interval
Given a set of raw or ungrouped data, how would you group that data into suitable
classes that are easy to work with and at the same time meaningful?
The first step is to determine how many classes you want to have. Next, you
subtract
the lowest value in the data set from the highest value in the data set and then
you divide by the number of classes that you want to have:
Example 1:
Group the following raw data into ten classes.
Solution:
The first step is to identify the highest and lowest number
Class interval should always be a whole number and yet in this case we have a
decimal
number. The solution to this problem is to round off to the nearest whole number.
In this example, 2.8 gets rounded up to 3. So now our class width will be 3;
meaning
that we group the above data into groups of 3 as in the table below.
Number Frequency
1–3 7
4–6 6
7–9 4
10 – 12 2
13 – 15 2
16 – 18 8
19 – 21 1
22 – 24 2
25 – 27 3
28 – 30 2
Class Limits and Class Boundaries
Class limits refer to the actual values that you see in the table. Taking an example
of the table above, 1 and 3 would be the class limits of the first
class. Class limits are divided into two categories: lower class limit and upper
class limit. In the table above, for the first class, 1 is the lower class
limit while 3 is the upper class limit.
On the other hand, class boundaries are not always observed in the frequency table.
Class boundaries give the true class interval, and similar to class limits, are
also divided into lower and upper class boundaries.
The relationship between the class boundaries and the class interval is given as
follows:
Class boundaries are related to class limits by the given relationships:
As a result of the above, the lower class boundary of one class is equal to the
upper class boundary of the previous class.
Class limits and class boundaries play separate roles when it comes to representing
statistical data diagrammatically as we shall see in a moment.
For example, consider the following groupings according to the size of farm plots
(in hectares): up to 3, 4-5, 6-10, 11-20, 21-50, 51-70, 71-100, 101. -200, 200
above. Both open and closed intervals are considered here. Strictly speaking, the
first and last intervals are open, the rest are closed. In this grouping, closed
intervals differ from each other in volume. Let's look at another example. The
intervals for points of the balloon system are closed and equal in size, except for
the 1st interval: 0-51, 51-60, 61-70, 71-80, 81-90, 91-100. Combined groups are
formed according to two or more characteristics. In such groupings, groups based
on one characteristic are divided into subgroups based on another characteristic .