DESCRIPTIVE STATISTICS
Statistics is a branch of Mathematics which deals with the collection, organization, analysis
and interpretation of quantitative data and such problems as experiment design and decision-
making. (Roc-Narag, 2010)
Statistics is the language of research. Statistics give meaning to the data collected during
the research proper. It aids the researcher to make an accurate description of data according to
the computed mean, standard deviation and relationship with another factor.
Statistics is needed in research because it gives meaning to the data collected during the
process. It helps the researcher make a conclusion and answer the different questions presented
at the start of the research.
The four steps in statistics includes:
a. collection of data- the process of gathering information through experiments, interviews,
surveys, questionnaires etc.
b. presentation of data- organizing the collected data into tables, graphs and text
c. analysis of data- process of getting important information from the data collected using an
appropriate statistical tool
d. Interpretation of data- process of getting answers from the analyzed data. Conclusions
about a large group can be formulated from the data gathered from a small group.
Statistics can further be divided into two:
1. Descriptive Statistics- branch of statistics which deals with the collection, summarization
and presentation of data
2. Inferential Statistics- branch of statistics which deals with the process of generalizing from
samples to population, testing hypothesis, determining relationships among variables and
making prediction of future outcomes
Research deals with variables. Variables are attributes that can assume different values. It can
be classified according to:
1. According to Functional Relationships
a. Independent Variable- a variable that can be controlled or manipulated
b. Dependent Variable- a variable that cannot be controlled or manipulated
2. According to Continuity of Values
a. Continuous Variable- a variable obtained by measuring and can assume all values.
This can also take in the form of decimal.
Examples: height, mass, distance, voltage, electric current
b. Discrete or Discontinuous- variable that assumes values that can be counted. This
cannot take in the form of decimals.
Examples: number of samples, number of seeds, number of students
Data mean the observations or measurements of variables. There are two kinds of data that can
be observed or measured:
1. Qualitative Data- variables that can be categorized according to some characteristics or
attributes
Examples: Sex- male, female
Year Level- Grade 7, Grade 8, Grade 9, Grade 10 …
Employment Status- regular, probationary, part-time
2. Quantitative Data- these are numerical in nature and can be obtained through measuring
and counting
Examples: mass - 10 kg, 500 mg
temperature – 37.5°C, 273.15 K
height - 150 cm, 5 feet
Measurement is used to convert qualitative data into
quantitative data so that they can be treated statistically.
Qualitative Data Quantitative Data
Small Measure the actual length,
Size of an Medium height and width of an
object Large object, say, the height of the
plant is 12 cm.
Levels of Measurement
Variables can be classified according to how they are categorized, ordered or counted.
Nominal level of measurements
This level of measurement is not ordered or ranked.
Example: gender (male/female)
civil status (single, married, widowed, separated)
zip code (4800, 4801, 4802, 4803, 4804, 4805)
Ordinal level of measurements
This level of measurement can be ranked or ordered but the exact differences between the
differences does not exists.
Example: size of a body (small, medium, large)
Type of quiz bee test (easy, average, difficult)
Rating scale (excellent, very satisfactory, satisfactory, needs improvement)
Interval level of measurements
This level of measurement can also be ranked or ordered and the exact differences between
the differences exists. However, there is no true zero.
Example: temperature (37°C is very much different from 38°C) but 0°C does not necessarily
mean no heat at all. Though in Physics, accordance to the Third Law of Thermodynamics-
absolute zero is unattainable in terms of heat and temperature.
IQ Test (an IQ test of 110 is different from 109) but you can’t say that an unintelligent
person has zero IQ.
Ratio level of measurements
This level of measurement can also be ranked or ordered and the exact differences between
the differences exists. In addition, there is a meaningful zero.
Example: salary weight velocity displacement
mass time force speed
POPULATION VS SAMPLE
Population- the totality of subjects (individuals, objects, places, reactions and events) with
common characteristics that are being studies
Sample - a group of subjects which is selected from a population
PARAMETERS VS STATISTIC (without /s)
Parameter- characteristics or measurements obtained by using all the values in a specific
population
Statistic- characteristics or measurements obtained by using all the values from a sample
Researchers may draw samples from a population to gain information. However, if the
population is small, it is not necessary to use samples since the
population can be used to get data in order to answer research
questions.
Researchers used descriptive statistics to describe certain situation based from the data.
Once the data is collected through surveys, interviews, experiments and some other means, it is
summarized and organized into other forms like tables, graphs and charts so that we can have a
clearer view on what these data mean. Aside from tables, graphs and charts, different statistical
methods are also used to summarize data. These includes the measure of averages (also called
measures of central tendency), measures of variation and measures of position.
MEASURE OF CENTRAL TENDENCY
The measures of central tendency, also known as the measures of averages include the mean,
median, mode and midrange.
a. THE MEAN
The mean is the sum of the values, divided by the total number of values. The symbol
represents the sample mean.
= X1 + X2 + X3 + …. + Xn = ∑X
n n
where n represents the total number of values in the sample. For a population, the Greek letter μ
(mu) is used for the mean.
μ = X1 + X2 + X3 + …. + XN = ∑X
N N
where N represents the total number of values in the population.
You can solve for the mean by simply adding the values of the data and dividing by the total
number of values.
The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean.
SAMPLE
20, 26, 40, 36, 23, 42, 35, 24, 30
PROBLEM 1
SOLUTION:
= X1 + X2 + X3 + …. + Xn = ∑X
n n
= 20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 = 276
9 9
= 30.7 days
Consider the frequency distribution table below to compute for the mean.
The data represent the number of miles run during one week for a sample
SAMPLE of 20 runners.
PROBLEM 2 SOLUTION:
STEP 1: Make a table as shown.
A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
STEP 2: Find the midpoint Xm (Column C)
Xm = (5.5 + 10.5) ÷ 2 = 16 ÷ 2 Xm = (10.5 + 15.5) ÷ 2 = 26 ÷ 2
Xm = 8 Xm = 13
Xm = (15.5 + 20.5) ÷ 2 = 36 ÷ 2 Xm = (20.5 + 25.5) ÷ 2 = 46 ÷ 2
Xm = 18 Xm = 23
Xm = (25.5 + 30.5) ÷ 2 = 56 ÷ 2 Xm = (30.5 + 35.5) ÷ 2 = 66 ÷ 2
Xm = 28 Xm = 33
Xm = (35.5 + 40.5) ÷ 2 = 56 ÷ 2
Xm = 38
STEP 3: Multiply the midpoint and the frequency (Column D)
STEP 4: Add the values in column D
STEP 5: Divide the sum by n to get the mean
A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
n = 20 Σf * Xm = 490
= 490 ÷ 20 = 24.5 miles
b. Median
It is the midpoint of the data array. Data array mean that the data set is
ORDERED/ARRANGED. Its symbol is MD.
Here are the steps in getting the median of the data array:
STEP 1: Arrange the data in order.
STEP 2: Select the middle point
Find the median. Six customers purchased these numbers of
SAMPLE hardbound books. 1, 7, 3, 3, 4, 2
PROBLEM 3 STEP 1: Arrange the data in order
1, 2, 3, 3, 4, 7
STEP 2: Select the middle point
MD = (3 + 3) ÷ 2 = 3
MD = 6 ÷ 2 = 3
Find the median. The number of typhoons in the Philippines for the
past seven years. 20, 18, 25, 23, 17, 19, 16
SAMPLE
PROBLEM 4 STEP 1: Arrange the data in order
16, 17, 18, 19, 20, 23, 25
STEP 2: Select the middle point
MD = 19
MODE
The value that occurs the most in a set of data. Take note that a set of data can have more
than one mode or no mode at all.
Unimodal- a data set with one value that occurs with the greatest frequency
Bimodal- a data set with two values that occur with the same greatest frequency
Multimodal- a data set with more than two values with the same greatest frequency
Find the mode for the number of teachers per school for 10 selected
secondary schools in Virac, Catanduanes
SAMPLE
PROBLEM 5 25, 110, 234, 5, 78, 74, 15, 22, 45, 30
Answer: The data set has no mode. It is wrong to answer 0 (zero)
because there are quantities that zero could mean a certain value;
example: temperature - 0°C
Find the modal class (term used if the mode is from a grouped
data).
SAMPLE
A
PROBLEM 6 B
Class Frequency (f)
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
Answer: 5
THE MIDRANGE
It is a very rough estimate of the average. It is the sum of the highest value and lowest value in
the data set. However, the midrange depends greatly on the extremely high or low value in a
data set. Its symbol is MR.
In getting the midrange, simple add the highest value and lowest value in the data set and divide
it by 2.
MR = highest value + lowest value
2
Find the midrange data from the scores of quarterly examination of ten
students of grade 9 STE.
SAMPLE
PROBLEM 7 45, 40, 32, 12, 41, 18, 20, 37, 30, 49
SOLUTION:
MR = (49 + 12)/ 2
MR = 30.5
So far, how do you find the measure of central
tendency? Isn’t it easy? Don’t worry if some of it may
seem confusing, you’ll understand it more as you apply
it on your actual research. Read the table below as
adapted from Allan G. Bluman’s book of Elementary
Statistics. I hope that it can help you further understand
the measure of central tendency. Keep going!
Uses and Properties of Central Tendency
The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from
the same population and all three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data
values.
5. The mean cannot be computed for the data in a frequency distribution that
has an open-ended class.
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values
fall into the upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely
low values.
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal, such as religious
preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or
the mode may not exist for a data set.
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.
Adapted from “Elementary Statistics” by Allan G. Bluman p. 116
MEASURE OF VARIATION
Another statistical test in descriptive research is the measure of variation. This measures how
spread out a set of data is. The data with less variability is a more consistent data. It includes the
range, variance and standard deviation.
THE RANGE
It is the difference between the highest value and the lowest value in the data set. Its symbol is
R.
R = highest value – lowest value
The range is also largely affected by extremely high or low data.
Find the range from the scores of quarterly examination of ten
students of grade 9 STE.
SAMPLE
PROBLEM 8 45, 40, 32, 12, 41, 18, 20, 37, 30, 49
SOLUTION:
R = highest value – lowest value
R = 49 – 12
R = 37
POPULATION VARIANCE AND STANDARD DEVIATION
Another meaningful method of measuring the variability of a given data is through variance and
standard deviation. We will start with the population variance and standard deviation.
The variance is the average of the squares of the distance each value is from the mean. Its
symbol is σ2 (σ is a Greek lowercase letter sigma). The formula for population variance is:
σ2 = ∑(X - µ)2
N
where: X is the individual value
µ is the population mean
N is the population size
The standard deviation is the square root of the variance. The symbol for the population
standard deviation is σ.
σ = √ σ2 = √(∑(X - µ)2)
N
A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing lab
SAMPLE makes 6 gallons of each paint to test. Since different chemical agents are
added to each group and only six cans are involved, these two groups
constitute two small populations. The results (in months) are shown.
Brand A Brand B
10 35
60 45
50 40
40 35
30 30
20 25
Compute for the variance and standard deviation for each brand.
SOLUTIONS:
STEP 1: Get the population mean (µ) for each brand.
BRAND A BRAND B
µ = 10 + 60 + 50 + 40 + 30 + 20 µ = 35 + 45 + 40 + 35 + 30 + 25
6 6
µ = 210/6 µ = 210/6
µ = 35 µ = 35
STEP 2: Subtract the mean from each data value
BRAND A BRAND B
10 – 35 = -25 35 – 35 = 0
60 – 35 = 25 45 – 35 = 10
50 – 35 = 15 40 – 35 = 5
40 – 35 = 5 35 – 35 = 0
30 – 35 = -5 30 – 35 = -5
20 – 35 = -15 25 – 35 = -10
STEP 3: Square the difference
BRAND A BRAND B
10 – 35 = -25 = (-25)2 = 625 35 – 35 = 0 = 02 = 0
60 – 35 = 25 = 252 = 625 45 – 35 = 10 = 102 = 100
50 – 35 = 15 = 152 = 225 40 – 35 = 5 = 52 = 25
40 – 35 = 5 = 52 = 25 35 – 35 = 0 = 02 = 0
30 – 35 = -5 = (-5)2 = 25 30 – 35 = -5 = (-5)2 = 25
20 – 35 = -15 = (-15)2 = 225 25 – 35 = -10 = (-10)2 = 100
STEP 4: Get the sum of the squares
BRAND A BRAND B
625 + 625 + 225 + 25 + 25 + 225 0 + 100 + 25 + 0 + 25 + 100
= 1750 = 250
STEP 5: Get the population variance by dividing the sum of the squares to the population size
BRAND A BRAND B
σ2 = 1750 ÷ 6 σ2 = 250 ÷ 6
σ2 = 291.7 σ2 = 41.7
STEP 5: Get the population standard deviation by getting the square root of the variance
BRAND A BRAND B
√σ2 = √291.7 √σ2 = √41.7
σ = 17.1 σ = 6.5
Interpretation: It is interesting to note that the two brands have the same mean which is 35
months. However, even if they have the same mean they don’t have the same variance and
standard deviation.
What does this mean? Brand A has higher standard deviation than Brand B. Therefore, the
data obtained for Brand B is more consistent and less varied than Brand A.
SAMPLE VARIANCE AND STANDARD DEVIATION
We can get the variance of a sample using the formula below. Its symbol is s2.
s2 = n(∑X2) – (∑X)2
n(n – 1)
where:
n is the sample size
∑X is the sum of all the values in the data set
∑X2 is the sum of the square of all the values in the data set
s2 is the sample variance
The sample standard deviation is the square root of the sample variance. Its symbol is s.
s = √s2 = √(∑X2) – (∑X)2
n(n – 1)
Find the sample variance and standard deviation from the scores of
quarterly examination of ten students of grade 9 STE.
SAMPLE
PROBLEM 10 45, 40, 32, 12, 41, 18, 20, 37, 30, 49
SOLUTIONS.
STEP 1: Add the values in the set of data
∑X = 45 + 40 + 32 + 12 + 41 + 18 + 20 + 37 + 30 + 49
∑X = 324
STEP 2: Square the values in data set, then get the sum
∑X2 = 452 + 402 + 322 + 122 + 412 + 182 + 202 + 372 + 302 + 492
∑X2 = 2025 + 1600 + 1024 + 144 + 1681 + 324 + 400 + 1369 + 900 + 2401
∑X2 = 11868
STEP 3: Substitute the values in the equation to compute for the sample variance
s2 = n(∑X2) – (∑X)2
n(n – 1)
s2 = 10 (11868) – (324)2
10 (10 – 1)
s2 = 118680 – 104976
10 (9)
s2 = 13704/90
s2 = 152.3
STEP 4: Compute for the sample standard deviation
s = √s2
s = √152.3
s = 12.3
Take note that (∑X) 2 is different from ∑X2.
(∑X) 2 means that you need to add the values
first then square the sum while ∑X2 means that you
need to square the values then add them.
VARIANCE AND STANDARD DEVIATION FOR GROUPED DATA
s2 = n(∑( f * X2m)) - ∑(f * Xm)2
n(n-1)
Standard deviation is the square root of this variance.
s = √s2
Consider the frequency distribution table below to compute for the
variance and standard deviation. The data represent the number of
SAMPLE miles run during one week for a sample of 20 runners.
PROBLEM 11
SOLUTION:
STEP 1: Make a table as shown.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
STEP 2: Get the midpoint by adding the lower class and upper class in the interval, then divide it
by two. Write you answer on column C.
5.5 + 10.5 = 16/2 = 8
2
Do this on other intervals.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8
10.5- 15.5 2 13
15.5- 20.5 3 18
20.5- 25.5 5 23
25.5- 30.5 4 28
30.5- 35.5 3 33
35.5- 40.5 2 38
n = 20
STEP 3: Multiply the frequency (f) and midpoint (Xm). Write your answer on column D and get
the sum.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
∑ f * Xm =
n = 20
490
STEP 4: Square the midpoint and multiply it to the frequency. Write your answer on column E.
1 * 82 = 64 2 * 132 = 338 …. Do the same with the succeeding intervals… get the
sum.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8 64
10.5- 15.5 2 13 26 338
15.5- 20.5 3 18 54 972
20.5- 25.5 5 23 115 2645
25.5- 30.5 4 28 112 3136
30.5- 35.5 3 33 99 3267
35.5- 40.5 2 38 76 2888
∑( f * X2m)=
n = 20
13310
STEP 5: Substitute the values to the equation of variance
s2 = n(∑( f * X2m)) - ∑(f * Xm)2
n(n-1)
s2 = 20(13310) – (490)2
20(20 - 1)
2
s = 266200 – 240100
20(19)
2
s = 26100/380
s2 = 68.7
STEP 6: Get the square root for the standard deviation
s = √s2
s = √68.7
s = 8.3
USES OF VARIANCE AND STANDARD DEVIATION
1. They are used to determine the spread of the data. It is used to
determine which data are more variable. The greater the variance and
standard deviation, the more variable the data are.
2. They are used to test the consistency of a variable. The lesser the
variance and standard deviation, the more consistent the variable.
3. They are also used to determine if the values fall within the specified
interval in a distribution.
4. They are used in inferential statistics.
Adapted from “Elementary Statistics” by Allan G. Bluman p. 132
The following terms used in this module are defined as follows:
Descriptive Statistics- branch of statistics which deals with
the collection, summarization and presentation of data
Inferential Statistics- branch of statistics which deals with the
process of generalizing from samples to population, testing
hypothesis, determining relationships among variables and making prediction of future
outcomes
Independent Variable- a variable that can be controlled or manipulated
Dependent Variable- a variable that cannot be controlled or manipulated
Continuous Variable- a variable obtained by measuring and can assume all values. This
can also take in the form of decimal.
Discreet or Discontinuous- variable that assumes values that can be counted. This
cannot take in the form of decimals.
Qualitative Data- variables that can be categorized according to some characteristics or
attributes
Quantitative Data- these are numerical in nature and can be obtained through
measuring and counting
Measurement is used to convert qualitative data into quantitative data so that they can
be treated statistically.
Population- the totality of subjects (individuals, objects, places, reactions and events)
with common characteristics that are being studies
Sample- a group of subjects which is selected from a population
Parameter- characteristics or measurements obtained by using all the values in a specific
population
Statistic- characteristics or measurements obtained by using all the values from a sample
Mean- the sum of the values, divided by the total number of values.
Median- it is the midpoint of the data array.
Data array- mean that the data set is ORDERED/ARRANGED.
Mode- the value that occurs the most in a set of data.
Midrange- It is a very rough estimate of the average. It is the sum of the highest value
and lowest value in the data set.
Range- It is the difference between the highest value and the lowest value in the data set.
Variance- is the average of the squares of the distance each value is from the mean
Standard deviation- is the square root of the variance