Chapter 1 and 2
Chapter 1 and 2
1. INTRODUCTION
1.1 Definition and classifications of statistics
1.1.1 Definition:
We can define statistics in two ways.
1. Plural sense (lay man definition).
It is an aggregate or collection of numerical facts.
Any aggregate of numbers cannot be called statistical data. We say that an aggregate of numbers
is statistical data when they are
❖ Comparable
❖ Measurable
❖ Collected for a well-defined objective.
2. Singular sense (formal definition)
Statistics is defined as the science of collecting, organizing, presenting, analyzing and interpreting
numerical data for the purpose of assisting in making a more effective decision.
Statistics is a subject that deals with numbers and figures describing a certain situation. It primarily
deals with numerical data taken by survey and summarizes these data in such way that this
summary gives a good indication a bout the nature of the data.
1.1.2 Classifications:
Depending on how data can be used, statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables.
❖ It consists of collection, organization, summarization and presentation of data.
❖ It deals with describing data without attempting to infer anything that goes beyond
the given set of data.
2. Inferential Statistics: is a method used to generalize from a sample to a population. For
example, the average income of all families (the population) in Ethiopia can be estimated
from figures obtained from a few hundred (the sample) families.
❖ It is important because statistical data usually arises from sample.
❖ Statistical techniques based on probability theory are required
For example,
1. The average age of students in Dilla university is 20.5 years
2. The average income of all families (the population) in Ethiopia can be estimated from
figure obtained from a few hundred (the sample) families.
3. There is a relationship between smoking tobacco and an increased risk of developing
cancer.
1.2 Stages in Statistical Investigation
There are five stages or steps in any statistical investigation.
1. Collection of data: the process of measuring, gathering, assembling the raw data up on
which the statistical investigation is to be based.
Data can be collected in a variety of ways; one of the most common methods is through
the use of survey. Survey can also be done in different methods, three of the most
common methods are
❖ Telephone survey
❖ Mailed questionnaire
❖ Personal interview
Exercise: discuss the advantage and disadvantage of the above three methods with
respect to each other.
2. Organization of data: Summarization of data in some meaningful way, e.g table
form
3. Presentation of the data: The process of re-organization, classification, compilation,
and summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized
data, mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made.
❖ Statistical techniques based on probability theory are required
1.3 Definitions of some terms
A. Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study. An example
is all of the students in DU4101 course in this term.
B. Sample: It is a subset of the population, selected using some sampling technique in such a
way that they represent the population.
C. Sampling: The process or method of sample selection from the population.
d. Sample size: The number of elements or observation to be included in the sample.
E. Census: Complete enumeration or observation of the elements of the population. Or it is the
collection of data from every element in a population
F. Parameter: Characteristic or measure obtained from a population.
G. Statistic: Characteristic or measure obtained from a sample.
H. Variable: It is an item of interest that can take on many different numerical values.
1.4 Applications, Uses and Limitations of statistics
1.4.1 Applications of statistics:
❖ In almost all fields of human endeavor.
❖ Almost all human beings in their daily life are subjected to obtaining numerical
facts e.g. abut price.
❖ Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
❖ In industries especially in quality control area.
1.4.2 Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The following
are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes(supply) a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.
1.4.3 Limitations of statistics
As a science statistics has its own limitations. The following are some of the limitations:
❖ Deals with only quantitative information.
❖ Deals with only aggregate of facts and not with individual data items.
❖ Statistical data are only approximately and not mathematical correct.
❖ Statistics can be easily misused and therefore should be used be experts.
1.5 Types of Variables and Measurement scales
1.5.1 Types of variables
A variable is characteristic of an object that can have different possible values. There are two
types of variables.
1. Qualitative Variables are nonnumeric variables and can't be measured. Examples
include gender, religious affiliation, color, beauty and state of birth.
Qualitative variables also called categorical variables
2. Quantitative Variables are numerical variables and can be measured.
Examples: balance in checking account, number of children in family, height, income,
temperature etc.
Quantitative variables can be further classified as;
❖ Discrete variables and
❖ Continuous variables
I. Discrete variables are variables whose values are counts
e.g. number of students, family size (number of households), number of pages of a book
II. Continuous variables are variables that can have any values within an interval.
e.g. weight, volume, length, etc.
1.5.2 Measurement scales
Proper knowledge about the nature and type of data to be dealt with is essential in order to specify
and apply the proper statistical method for their analysis and inferences. Measurement scale refers
to the property of value assigned to the data based on the properties of order, distance and fixed
zero.
In mathematical terms measurement is a functional mapping from the set of objects {Oi} to the
set of real numbers {M(Oi)}.
The goal of measurement systems is to structure the rule for assigning numbers to objects in such
a way that the relationship between the objects is preserved in the numbers assigned to the objects.
The different kinds of relationships preserved are called properties of the measurement system.
Order
The property of order exists when an object that has more of the attribute than another object, is
given a bigger number by the rule system. This relationship must hold for all objects in the "real
world".
The property of ORDER exists
When for all i, j if Oi > Oj, then M(Oi) > M(Oj).
Distance
The property of distance is concerned with the relationship of differences between objects. If a
measurement system possesses the property of distance, it means that the unit of measurement
means the same thing throughout the scale of numbers. That is, an inch is an inch, no matters where
it falls - immediately ahead or a mile downs the road.
More precisely, an equal difference between two numbers reflects an equal difference in the "real
world" between the objects that were assigned the numbers. In order to define the property of
distance in the mathematical notation, four objects are required: Oi, Oj, Ok, and Ol . The difference
between objects is represented by the "-" sign; Oi - Oj refers to the actual "real world" difference
between object i and object j, while M(Oi) -M(Oj) refers to differences between numbers.
The property of DISTANCE exists, for all i, j, k, l
If Oi-Oj ≥ Ok- Ol then M(Oi)-M(Oj) ≥ M(Ok)-M(Ol ).
Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of the
attribute in question is assigned the number zero by the system of rules. The object does not need
to really exist in the "real world", as it is somewhat difficult to visualize a "man with no height".
The requirement for a rational zero is this: if objects with none of the attribute did exist would they
be given the value zero. Defining O0 as the object with none of the attribute in question, the
definition of a rational zero becomes:
The property of FIXED ZERO exists if M(O0) = 0.
The property of fixed zero is necessary for ratios between numbers to be meaningful.
SCALE TYPES
Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and each
possessed different properties of measurement systems.
Nominal Scales
Nominal scales are measurement systems that possess none of the three
properties stated above.
❖ Level of measurement which classifies data into mutually exclusive, all-inclusive
categories in which no order or ranking can be imposed on the data.
❖ No arithmetic and relational operation can be applied.
Examples:
❖ Political party preference (Republican, Democrat, or Other,)
❖ Sex (Male or Female.)
❖ Marital status (married, single, widow, divorce)
❖ Country code
❖ Regional differentiation of Ethiopia.
Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order, but not the property
of distance. The property of fixed zero is not important if the property of distance is not satisfied.
❖ Level of measurement which classifies data into categories that can be ranked. Differences
between the ranks do not exist.
❖ Arithmetic operations are not applicable but relational operations are
applicable.
❖ Ordering is the sole property of ordinal scale.
Examples:
❖ Letter grades (A, B, C, D, F).
❖ Rating scales (Excellent, very good, Good, Fair, poor).
❖ Military status
❖ Ranks in race, etc.
Interval Scales
❖ Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero.
❖ Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
❖ Interval scale data convey better information than ordinal and nominal scale data.
❖ All arithmetic operations except division and multiplication are applicable.
❖ Relational operations are also possible.
Examples:
❖ IQ
❖ Temperature in oF.
Ratio Scales
❖ Ratio scales are measurement systems that possess all three properties: order, distance, and
fixed zero. The added power of a fixed zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of Bekele’s height to Martha's height is 1.32, whereas this is not
possible with interval
scales.
❖ Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units of
measure.
❖ All arithmetic and relational operations are applicable.
❖ This measurement scale provides better information than interval scale of measurement.
❖ Zero measurement indicates absence of the quantity being measured.
Examples:
❖ Weight
❖ Height
❖ Number of students
❖ Age
Exercise:
The following present a list of different attributes and rules for assigning numbers to objects. Try
to classify the different measurement systems into one of the four types of scales (nominal. ordinal,
interval and ratio).
1) A response to the statement "Abortion is a woman's right" where "Strongly
Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" =4, and "Strongly
Agree" = 5, as a measure of attitude toward abortion.
2) Times for swimmers to complete a 50-meter race
3) Months of the year Meskerm, Tikimit…
4) Socioeconomic status of a family when classified as low, middle and upper
classes.
5) Blood type of individuals, A, B, AB and O.
6) Pollen counts provided as numbers between 1 and 10 where 1 implies there is
almost no pollen and 10 that it is rampant, but for which the values do not
represent an actual count of grains of pollen.
7) Regions numbers of Ethiopia (1, 2, 3 etc.)
8) The number of students in a college;
9) the net salary of a group of workers;
10) the height of the men in the same town;
11) Your checking account number as a name for your account.
12) Your checking account balance as a measure of the amount of money you have
in that account.
13) The order in which you were eliminated in a spelling bee as a measure of your
spelling ability.
14) Your score on the first statistics test as a measure of your knowledge of
statistics.
15) Your score on an individual intelligence test as a measure of your intelligence.
16) The distance around your forehead measured with a tape measure as a measure
of your intelligence.
CHAPTER 2
2.Methods of data collection and presentation
2.1. Methods of data collection
Raw data: are collected data, which have not been organized numerically.
Examples: 25,10,32,18,6,93,4.
An array: is an arrangement of raw numerical data in ascending or descending order of magnitude.
➢ It enables us to know the range of the data set easily and it also gives us any scientific
investigation requires data related to the study. The required data can be obtained from
either a primary source or a secondary source.
There are two sources of data:
1. Primary Data
❖ Data measured or collect by the investigator or the user directly from
the source.
❖ Two activities involved: planning and measuring.
A) Planning:
➢ Identify source and elements of the data.
➢ Decide whether to consider sample or census.
➢ If sampling is preferred, decide on sample size, selection method, etc.
➢ Decide measurement procedure.
➢ Set up the necessary organizational structure.
B) Measuring: there are different options.
➢ Focus Group
➢ Telephone Interview
➢ Mail Questionnaires
➢ Door-to-Door Survey
➢ Mall Intercept
➢ New Product Registration
➢ Personal Interview and
➢ Experiments are some of the sources for collecting the primary data.
2. Secondary Data
❖ Data gathered or compiled from published and unpublished sources or files.
❖ When our source is secondary data check that:
✓ The type and objective of the situations.
✓ The purpose for which the data are collected and compatible with the
present problem.
✓ The nature and classification of data is appropriate to our problem.
✓ There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
𝑓
Step 4: Find the percentages of values in each class by using; % = ∗ 100
𝑛
5. Pick a suitable starting point less than or equal to the minimum value. The starting point is
called the lower limit of the first class. Continue to add the class width to this lower limit to get
the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second class.
Then continue to add the class width to this upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from
the upper limits. The boundaries are also halfway between the upper limit of one class and the
lower limit of the next class. may not be necessary to find the boundaries.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not
be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example*:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
𝑅 33
Step 4: Find the class width; 𝑤 = 𝑘 = = 5.5 = 6 (𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑢𝑝)
6
Step 6: Find the upper-class limit; e.g. the first upper class=12-U=12-1=11
𝑈
E.g. for class 1 Lower class boundary= 6 − 2 = 5.5
𝑈
Upper class boundary = 11 + 2 = 11.5
Step 9: Write the numeric values for the tallies in the frequency column.
Bar Charts:
➢ A set of bars (thick lines or narrow rectangles) representing some magnitude overtime
space.
➢ They are useful for comparing aggregate over time space.
➢ Bars can be drawn either vertically or horizontally.
➢ There are different types of bar charts. The most common being:
❖ Simple bar chart
❖ Deviation or two-way bar chart
❖ Broken bar chart
❖ Component or sub divided bar chart.
❖ Multiple bar charts.
Simple Bar Chart
➢ Are used to display data on one variable.
➢ They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example: The following data represent sale by product, 1957- 1959 of a given company
for three products A, B, C.
Product Sales ($) 1957 Sale ($) 1958 Sales ($) 1959
A 12 14 18
B 24 21 18
C 24 35 54
Solutions:
Histogram
➢ A graph which displays the data by using vertical bars of height to represent
➢ frequencies. Class boundaries are placed along the horizontal axes. Class marks and class
limits are sometimes used as quantity on the X axes.
Example: Construct a histogram to represent the previous data (example *).
Frequency Polygon:
➢ A line graph. The frequency is placed along the vertical axis and classes mid points are
placed along the horizontal axis. It is customer to the next higher- and lower-class interval
with corresponding frequency of zero, this is to make it a complete polygon.
Example: Draw a frequency polygon for the above data (example *).
Solutions:
Ogive (cumulative frequency polygon)
➢ A graph showing the cumulative frequency (less than or more than type) plotted against
upper- or lower-class boundaries respectively. That is class boundaries are plotted along
the horizontal axis and the corresponding cumulative frequencies are plotted along the
vertical axis. The points are joined by a free hand curve.
Example: Draw an ogive curve (less than type) for the above data. (Example *)
i) Less than type cumulative frequency
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the numbers."
Example: Suppose the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation.
If the expression were written with "i=3", the summation would start with the third number in the
set. For example:
In the example set of numbers, this would give the following result:
The "N" in the upper part of the summation notation tells where to end the sequence of summation.
If there were only three scores then the summation and example would be:
Sometimes if the summation notation is used in an expression and the expression must be written
a number of times, as in a proof, then a shorthand notation for the shorthand notation is employed.
When the summation sign "" is used without additional notation, then "i=1" and "N" are assumed.
For example:
PROPERTIES OF SUMMATION