06-09-2015
References
Statistics Theory & Practice
D. Bhattacharya & S. RoyChowdhury (UN Dhur & Sons Pvt.
Ltd.)
Probability & Statistics
M Spiegel (McGraw Hill)
Probability & Statistics for Engineering and the Sciences
Jay L. Devore (Duxbury Press)
Quantitative Analysis for Management (Forecasting)
B Render, RM Stair, ME Hanan
Classification, Collection &
Presentation of Data
RK Jana
Content
Data
Data is defined as known or assumed facts and figures from
which conclusions can be drawn.
Data are used to make decisions, to support decisions already
made, to provide reasons behind the happening of certain
events, and to make prediction about future events.
Data are numbers with a context.
What is Statistics?
Classification of variables
Classification of data
Collection of data
Presentation of data
What is Statistics?
Branches of Statistics
Statistics
Statistics
In singular sense: Refers to the discipline of study to
collect, organize, summarize and analyze data and to draw
conclusions.
In plural sense: Refers to data.
Descriptive Statistics
Deals with collection, organization, presentation as well
as numerical & graphical methods to look for the
patterns in the data set.
Inferential Statistics
Deals with the ways of making prediction or drawing
conclusion about population characteristics based on
collected data.
06-09-2015
Some Basic Characteristics
Classification of Variables
Units (individuals)
Population
Variable
Observations
Data set
Classification of data depends on the type of
variable we measure.
Variable
Qualitative
Ordinal
Non-ordinal
Univariate
Bivariate/Multivariate
Quantitative
Discrete
Continuous
Raw data
7
Continued
Continued
Qualitative Variable: The variable which cannot be measured
numerically but can be measured by their quality is known as
the qualitative variable. Example: hair colour, cloudiness of
sky, quality of students etc.
Ordinal Variable: The qualitative variables that can be
arranged in natural order is known as ordinal variable.
Example: income group, effectiveness of medicine etc.
Non-ordinal Variable: The qualitative variables that can not
be arranged in natural order is known as non-ordinal variable.
Example: blood group, religion, marital status etc.
Quantitative Variable: The variable which can be measured
numerically is known as the quantitative variable. Example:
temperature, height, house rent etc.
Discrete Variable: The quantitative variables that can assume
finite or countable number of discrete values is known as a
discrete variable. Example: number of customer visiting a
bank in a day, number of telephone calls received in a given
time interval etc.
Continuous Variable: The quantitative variables that can
assume any numerical value is known as a continuous
variable. Example: daily temperature, IQ of a student, time
taken to finish a job etc.
Classification of Data
10
An Alternative Classification of Data
Data
Data
Qualitative
Frequency data
Non-frequency data
Ordinal
Non-ordinal
Time series data
Cross-sectional data
Quantitative
Discrete
Continuous
11
12
06-09-2015
Continued
Data Collection
Time series data: The data collected on the same unit for the
same variable for different time periods is known as the time
series data.
Data Source
Primary source
Secondary source
Example: Rice yield recorded for the last ten years, average per square
feet rate of houses in Kolkata between 1990-2000, export of your
company in last five years etc.
Cross-sectional data: The data collected on different units for
the same time period are called cross-sectional data.
Example: Present price of ten cars of 2005 models.
Spatial data: If the cross-sectional data relate to different
geographic location then it is known as the spatial data.
Example: Total population of different states of India as per 1991
census.
13
14
Surveys
Continued
Surveys can be done by using a variety of methods:
Examples are telephone, mail questionnaires, personal
interviews, surveying records and direct observations.
Method of collecting primary data
Direct observation method
Designed experiment method
Survey method
To obtain samples that are unbiased, statisticians use the
following methods of sampling:
Random samples are selected by using chance methods or
random numbers.
15
Systematic samples are obtained by numbering each value in
the population and then selecting the kth value.
Stratified samples are selected by dividing the population into
groups (strata) according to some characteristic and then
taking samples from each group.
16
Data Presentation
Textual Presentation
Tabular Presentation
Diagrammatic Presentation
Numerical Description
Cluster samples are selected by dividing the population into
groups and then taking samples of the groups.
Convenience samples are typically used in student projects
and by journalists, uses subjects that can be conveniently
polled or tested. Not suitable for pollsters or medical
research.
17
18
06-09-2015
Textual Presentation
Tabular Presentation
In this method the data are presented along with a piece of
text which is brief, precise, and follow the logical sequence.
In this method data are presented in the form of a
table comprising of a number of rows and columns.
This method is very effective.
Components of a Table
This is not a very good method as it is not effective for large
mass of data. Data presented through this method will not
lend itself directly to statistical analysis.
Title
Stub
Body
Source & footnote
Caption
19
20
Diagrammatic Presentation
Power of Charts
Gives reader a compact and structured synthesis.
Many details can be shown in a small area.
Gives an immediate depiction of the differences and
patterns in a set of data.
Reader can see immediately major similarities and
differences.
In this method data are presented using different
types of diagrams (line, chart).
Bar chart
Line chart
Area chart
Pie chart
21
22
Bar Chart
Example
Bar graphs compare the values of different items in
specific categories or at discrete points in time, e.g.
survival rates for boys and girls respectively, compared
across grade levels and/or between those in urban and
rural areas.
Simple to create and easy to interpret.
Used to illustrate variable values which are distinct (i.e.
qualitative variable).
Vertical Bar Chart
Horizontal Bar Chart
Normally, we use horizontal bar chart when there are
variable values with long name
many variables
23
24
06-09-2015
Example
Line Graph
Line graphs show the progression of values over time, e.g. the
number of schools in operation over time; gross and net
admission rates for boys and girls, respectively, over time.
Easier for the eye to follow curves for different series.
Easier to get a clearer picture of the development over time.
Good for answering the following questions:
In what periods were the changes large?
When were the turning points?
26
25
Area Graphs
Example
Area graphs show the actual value each series
contributes to the total.
Best show patterns created over time, e.g. how total
enrolment changed over time, due to enrolment
changes in urban and rural schools respectively; how
total children of school-age, consisting of those in and
those out of school, grew over time.
Good for illustrating situations with only a few parts
that have simple development patterns.
27
Pie Chart
28
Source: GMR 2002
Example
Suitable for illustrating percentage distributions of
qualitative variables - e.g. the breakdown of the
annual education budget into categories of
expenditure such as teacher salaries, school
construction, etc.
Displays the contribution of each value to a total;
Best suited for overviews.
Should not have too many sectors.
29
30
06-09-2015
Before Preparing Charts
After Making Charts
Who is the target audience?
Is it easy to understand?
What is their level of understanding?
What are their interests?
Too fancy, too dull, too much, too little
Does this give the message that I would like to convey?
Role of charts in conveying your message
What is my question before doing this graph
Can this chart be misinterpreted?
Trends
Contrast
Achievement, way forwards
Absolute, relative
Magnitude, percentage
Am I giving the wrong message?
Is it self-contained?
How will the charts be presented?
In color, B&W
In a publication, as a presentation using overhead projector
What chart is the best?
Bar, Pie, Maps?
Compare various styles
Title
Legend
Axis title
Scale
Sources
Other relevant information
Is the chart in right place?
31
32