Lec 1 - Data, Tables and Graphs
Lec 1 - Data, Tables and Graphs
Definition of a Statistics
Statistics is a group of methods used to collect, analyze, present, and interpret data
and to make decisions.
Broadly Speaking
Different authors have defined statistics differently from time to time. The
reasons for a variety of definitions are primarily two.
First: In modern times the field of utility of statistics has widened considerably. In
ancient times statistics was confined only to the affairs of states but now it
embraces almost every sphere of human activities.
1
Types of Statistics
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.
Inferential Statistics consists of methods that use sample results to help make
decisions or predictions about a population.
The word “Statistics” seems to have obtained from the Latin word “status” or
Italian word “statista” or the German word “Statistik” each of which means
“political state”. In ancient time, the government used to collect informations
about total population, land, wealth, total no. of employees, solders etc. to have the
idea of the man power of the country for formulation of administrative set-up,
fiscal, new taxes, levies and military policies of the government.
In modern times, Statistics is viewed not as a simple mere device for collecting
numerical data but as a means of developing sound techniques for their handling
and analysis and drawing valid inferences from them.
As such it is not confined to the affairs of the state but is intruding constantly into
various diversified spheres of life-social, economic and political. It is now finding
wide application in almost all sciences-social as well as physical such as biology,
psychology, education, business management etc. It also applied in industry,
Medical science, and planning Accounting and Auditing.
Limitations of Statistics
As a developing science, statistics and its techniques are widely used in every
branch of knowledge. Statistics is not a magical device, which gives solution to
problems. Educated and uneducated, rich and poor people are making use of
statistics in their day-to-day life but it has its own limitations.
2
Statistics does not deal with individuals items
Statistics deals with quantitative data only
Statistics may mislead to wrong conclusion in the absence of details.
Statistical laws are true only on averages.
Statistics does not reveal the entire story
Statistical data should be uniform and homogeneous
Statistics is liable to misused.
Ex: A small number of students out of total in a particular class under study
constitute the sample.
3
Figure 1.1 Population and sample.
Population
Sample
Data
The raw material of statistics is data. For our purpose we may define data as
numbers. Data collections are of any number of related observations.
The two kinds of numbers that we use in statistics are numbers a measurement,
and those that result from the process of counting.
For example, when a nurse evaluates a patient or takes a patient’s temperature, a
measurement, consisting of a number such as 73 kilograms or 99 degrees
centigrade, is obtained. Different type of number is obtained when a hospital
administrator counts the number of patients—perhaps 25—on a given day.
Source of Data
Primary
Secondary
On-line
4
Basic Terms
An element or member of a sample or population is a specific subject or object
(for example, a person, firm, item, state, or country) about which the information
is collected.
Variable
Types of variables
Quantitative Variables
Discrete Variables
Continuous Variables
Qualitative or Categorical Variables
5
A variable that can be measured numerically is called a quantitative variable.
The data collected on a quantitative variable are called quantitative data.
21
6
Levels of Measurement
Variables can be classified on the basis of their level of measurement. The way we
classify variables greatly affects how we can use them in our analysis. The
Classifications are:
Nominal
Ordinal
Interval
Ratio
Nominal level
A nominal measurement is created when names are used to establish categories
into which variables can be exclusively recorded.
For example, Soft drinks may be classified as Coke, Pepsi, 7-ups. It is important
to remember that a nominal measurement carries no indication of order of
preference. An example is given in Table 1.2.
Ordinal level
Unlike a nominal measurement, an ordinal scale produces a distinct ordering or
arrangement of the data i.e. the observations is ranked on the basis of some
7
criterion. Table 1.3 lists the ratings of the company commander by the nurses
under her command is an illustration of the ordinal level of measurement.
Interval level
It includes all the characteristics of the ordinal scale, but in addition, the distance
between values is a constant size. Temperature on the Fahrenheit scale is an
example.
Ratio level
This level has all the characteristics of interval level: the distances between
numbers are of a known; constant size.
For example, we can compare 30 units of sales made by Rahim to 90 units of
sales made by Karim, set up the ratio 90:30 and say that Karim sold three times as
much as Rahim.
8
Classification & Tabulation
Data recorded in the sequence in which they are collected and before they are
processed or ranked are called raw data
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
Frequency Distribution
Frequency distribution is a classification according to the number possessing the
same values of the variables. It is simply a table in which the data are grouped into
classes and the number of cases, which fall in each class, is recorded.
Frequency distribution of qualitative data
Number of Frequency
Variable Type of Employment Students column
Private companies/businesses 44
Category Federal government 16 Frequency
State/local government 23
Own business 17
Sum = 100
9
Example 2-1
Some what None Somewhat Very Very None
Very Somewhat Somewhat Very Somewhat Somewhat
Very Somewhat None Very None Somewhat
Somewhat Very Somewhat Somewhat Very None
Somewhat Very very somewhat None Somewhat
Solution 2-1
10
10
Figure 2.1 Bar graph for the frequency distribution of
Table 2.3
16
14
12
Frequency
10
8
6
4
2
0
Very Somewhat None
Strees on Job
16
Pie Chart
A circle divided into portions that represent the relative frequencies or percentages
of a population or a sample belonging to different categories is called a pie chart.
18
11
Figure 2.2 Pie chart for the percentage distribution of
Table 2.4.
None, 20%
Very,
33.30%
Somewhat,
46.70%
19
Frequency Distributions
Table 2.7 Weekly Earnings of 100 Employees of a Company
Variable
Weekly Earnings Number of Employees Frequency
(dollars) f column
401 to 600 9
601 to 800 22
Frequency of the
Third class 801 to 1000 39 third class
1001 to 1200 15
1201 to 1400 9
1401 to 1600 6
Lower limit of the Upper limit of the
sixth class sixth class
21
12
Class-limits
The class-limits are the smallest or the lowest and the largest or the highest value
in the class.
For example, take the class 1401-1600, the lowest value is 1401 and the highest
value is 1600. The two boundaries of the class are known as lower limit and
upper limit of the class. Class limit is also known as class boundaries. Frequency
of a particular class is called the class frequency.
Class-Intervals
The difference between the lower limit and upper limit of the class is known as the
class interval.
For example, 1401-1600, the class-interval is 200.
ls
i
k
K = 1 + 3.322 logN
K is the number of classes, N = total no of observations.
Class Midpoint
Lower Limit Upper Limit
Class Mid Po int
2
13
Table 2.8 Class Boundaries, Class Widths, and Class
Midpoints for Table 2.7
27
Example:
Table 2.9 gives the total home runs hit by all players of each of the 30 Major
League Baseball teams during the 2002 season. Construct a frequency distribution
table.
14
Solutions
230 124
Approximat e width of each class 21 .2
5
The lower limit of the first class can be taken as 124 or any number less than
124. Suppose we take 124 as the lower limit of the first class. Then our classes will
be 124 – 145, 146 – 167, 168 – 189, 190 – 211, and 212 - 233
32
A histogram is a graph in which classes are marked on the horizontal axis and
the frequencies are marked on the vertical axis. The frequencies are represented by
the heights of the bars. In a histogram, the bars are drawn adjacent to each other.
15
Figure 2.3 Frequency histogram for Table 2.10.
15
12
Frequency
0
124 - 146 - 168 - 190 - 212 -
145 167 189 211 233
37
Total home runs
A cumulative frequency distribution gives the total number of values that fall
below the upper boundary of each class.
Example 2-7
56
16
Solution 2-7
57
17
Exercises:
2. The following are the sales (in thousand dollars) in a year of 50 companies.
30 55 27 45 56 48 45 49 32 57
37 55 52 34 54 42 32 59 35 46
32 26 40 28 53 54 29 42 42 54
39 56 59 58 49 53 30 53 21 34
52 57 43 46 54 31 22 31 24 24
From this data construct (i) A frequency distribution (ii) Cumulative frequency distribution (iii)
A relative frequency distribution. Draw an appropriate diagram.
3. Construct a frequency distribution for the following data (exports in thousand dollars) by
using a suitable class interval:
65 64 99 55 64 89 87 65 62 38
99 68 95 86 57 53 47 50 55 81
80 98 51 36 63 66 85 79 83 70
67 70 60 69 78 39 75 56 71 51
35 42 60 71 65 55 41 39 45 76
99 68 95 86 57 53 47 50 55 81
Also calculate the cumulative and relative frequency.
18