Statistics
Branch of Applied Mathematics
Collection
Organization
Presentation
Analysis of data
Brief History
Derive from: Latin word ‘status’ or the
Italian word ‘Statista’
Meaning of these words is ‘political
state’ or government
IMPORTANCE OF STATISTICS
Categories of Statistics
[Link] STATISTICS
[Link] STATISTICS
Terminologies in Statistics
Population
Sample
Parameter
Statistic
Data-Qualitative
-Quantitative
Terminologies in statistics
Variables – discrete or continuous
- dependent and independent
EXAMPLES
1. If a class consists of male and female,
then gender is a variable in this class.
2. Height is a variable because different
people have different heights.
Discrete Variable
One that can assume a finite number of
values. In other words, it can assume
specific values only. The values of a
discrete variable are obtained through
the process of counting.
Continuous Variable
One that can assume infinite values
within a specified interval. The values
of a continuous variable are obtained
through measuring.
Scales of Measurement
• Nominal
• Ordinal
• Interval
• ratio
Population and sample
N
n
Ways to Determine
1. 30 %
where:
N = population size
N = sample size
e = margin of error (in education, used margin of
error not greater than 0.05)
3. Lynch’s Formula
where:
z = the value of the normal variable
(1.96) for reliability
p = largest possible proportion (0.5)
d = sampling error
N = population
n = sample size
Apply the Slovin’s and Lynch’s Formula
Given:
1.N=2600
2.N=3450
3.N=10050
4.N=335
5.N=23456
Sampling Technique
• Probability sampling technique
• Non-probability sampling technique
Some Probability Sampling Techniques
1. Random sampling
- Lottery Method
- Table of random numbers
LOTTERY METHOD
Suppose Mrs. Cruz wants to send five students to attend a 2-day training or
seminar in basic computer programming. To avoid bias in selecting these
five students from her 40 students, she can use the lottery method. this is
done by assigning number to each student and then writing these numbers on
pieces of paper. Then, these pieces of paper will be rolled or folded and
placed in a box called the lottery box. They should be thoroughly shaken
and then five pieces of paper will be picked or drawn from the box. The
student who were assigned to the numbers chosen will be sent to the
training. in this case, the selection of students is done without bias.
TABLE OF RANDOM NUMBERS
Mrs. Cruz wants to select 5 students from her
40 students. Again we will assign a number to
each students, say from 1 to 40.
31871 60770 59235 41702
87134 52839 17850 37359
06728 16314 81076 42172
95646 67486 65167 86819
44085 87246 47378 98338
Some Probability Sampling Technique
2. Systematic Sampling
To draw the members or elements of the sample
using this method, we have to select a random
starting point, then draw successive elements from
the population. In other words, we pick every nth
element of the population as a member of the sample
when we use this method.
EXAMPLE
Continuation of example
The next number is to write the numbers
1, 2, 3, 4, 5, 6, 7 and 8 on pieces of
paper and draw one number by lottery
Some probability sampling technique
3. Stratified Random Sampling
This is use when the population has
different groups or categories.
The word stratified comes from the root
word “STRATA which means groups or
categories (singular for stratum)
Example
There are 5000 families in a barangay which are
categorized as high, average and low income
families.
Find first the sample size. (Say you have computed 200 as the sample
size
Some Probability Sampling
4. Cluster Sampling
In this sampling, groups or clusters
are to be randomly selected instead of
individuals .
-Sometimes called Area Sampling
Some Probability Sampling
5. Multi-Stage Sampling
This is a combination of several
sampling technique and usually used to a
very large population, say entire the
country.
Multi-stage
This is done by starting the selection
of the members of the sample using
cluster sampling and then dividing each
cluster or group into strata. Then, from
each stratum individuals are drawn
randomly using simple random sampling.
Some Non-Probability sampling
[Link] Sampling
A researcher who wishes to investigate
the most popular noontime show may just
interview the respondents through the
telephone/cellphone.
Some Non-Probability Sampling
2. Quota Sampling
Similar to stratified sampling.
Difference: the selection of the
members of the sample is not done
randomly.
Example
Suppose we want to determine the teenagers’
most favorite brand of T-shirt. If there are
1000 female and 1000 male teenagers in the
population and we want to draw 150 members
for our sample, we can select 75 female and
75 male teenagers from the population
without using randomization.
Theoretical Sampling
-useful method of getting information from
a sample of population that a researcher
knows more about a subject.
-This approach is common in qualitative
research where statistical inference is not
required.
Purposive Sampling
- based on criteria set by the researcher
Snowball Technique
Sequential sampling
Activity
Type of Sampling When to use it? Advantage/s Disadvantage/s
Probability
Non-Probability
…
Data Collection
The world is full of potential data. However, only the
relevant and specifically required data are considered/
needed to investigate a research problem.
Data may be gathered from two general sources
1)primary
2) secondary
Primary sources are those sources from which
information are gathered directly from the original
source, or are based on direct or first-hand
experiences.
a. Interview
b. Questionnaires
c. Personal Accounts and Diaries
d. Observations and physical surveys
e. Standard scales and tests like mental ability tests
and other tests developed by professionals or groups.
f. Internet
Secondary sources are those sources from which
information are gathered from published or unpublished
materials that were previously collected by other
individuals or agencies used for a purpose other than the
original purpose for which they were collected.
a. Libraries and archives
b. Museums and collections
c. Government departments and commercial and
professional bodies
d. The field which includes ancient cities, buildings,
archaeological digs, etc.
e. The internet
Methods of Data Collection
1. Direct/Interview Method
2. Indirect or questionnaire method
3. Registration or Records Review
4. Observation Method
5. Experimentation Method
Ten Commandments of Data Collection (by: Neil J. Salkind)
1. Begin thinking about the type of data you will have to collect.
2. Think about where you will be obtaining the data.
3. Make sure that the data collection form you are using is clear and easy to use.
4. Make a duplicate copy of the data file and keep it in a separate location.
5. Do not rely on other people to collect or transfer your data unless you
personally have trained them and are confident that they understand the data
collection process as well as you do.
6. Plan a detailed schedule of when and where you will be collecting your data.
7. Cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview.
9. Never discard original data.
10. Follow the previous 9.
Activity 4. Methods of Collecting Data
Fill out the table by citing five research title; give instances where you may use primary and
secondary
as sources of information; and indicate at least one method of
data collection that may be applied and describe how the method is to be carried out.
Research Title Possible Possible Method of Descriptio
Primary Secondary Data n
Source Source Collection
Presentation of data
• Textual
• Tabular
• Graphical
Below are the test scores of the 50
students who took the Statistics math
Test
30 18 17 50 12 43 35 40 9
37 41 21 20 31 35 46 10 36
19 18 13 28 16 42 27 28 31
40 48 40 39 32 32 26 13 3
26 15 14 10 38 35 34 29 20
Arrange the scores from lowest to highest
scores to facilitate the enumeration of the
important characteristics of the data.
Arranging a mass of data manually is quite
tedious, but using computers for this purpose is
so easy.
Computer Stem-and-leaf plot
Stem-and-leaf plot
In a two-digit number, the stem consists of
the first digit, and the leaf consists of the
second digit.
While a three-digit number, the stem consists of
the first two digits, and the leaf consists of
the last digit.
In a one-digit number, the stem is zero.
Sample stem-and-leaf plot
Test Scores of Students in Mathematics 2
12 23 42 12 25 Stem Leaves
9 53 23 8 17 0 8, 9
1 0, 1, 1, 2, 2, 2, 3, 3, 6, 7
10 24 31 41 25
2 1, 3, 3, 3, 3, 3, 4, 5, 5, 5
11 36 11 21 23
3 1, 4, 6
12 45 16 23 34
4 1, 2, 5, 7
13 25 13 23 47 5 3
Do the Stem-Leaf-Plot to present the data below:
Below are the test scores of the 45 students who
took the Statistics math Test
30 18 17 50 12 43 35 40 9
37 41 21 20 31 35 46 10 36
19 18 13 28 16 42 27 28 31
40 48 40 39 32 32 26 13 3
26 15 14 10 38 35 34 29 20
Statistical Table
PARTS:
[Link] NUMBER This is for easy reference to the table.
[Link] TITLE
It briefly explains the content of the table
[Link] HEADER
[Link] CLASSIFIER It describes the data in each column
[Link]
It shows the classes or categories
[Link] NOTE
This is the main part of the table
This is placed below the table when the data
written are not original
Frequency Distribution Table
[Link] Data- arrangement of data from
lowest to highest which shows the frequency
of occurrence of each value in a set.
[Link] DATA- CONSISTS ONLY OF CLASS INTERVAL AND
FREQUENCY, CLASSMARKS AND CLASS BOUNDARIES
Table 3.2
Ungrouped Frequency Distribution for the Ages
of 50 Students Enrolled in Statistics
Age Frequency
14 4
15 13
16 25
17 5
18 2
19 1
N 50
Steps in constructing a frequency distribution
table:
1. Decide on the desired number of classes
2. Determine the class width (i)
3. Unless otherwise specified, always start the lowest
class with the lowest value of the raw data, in order to
minimize the errors.
4. Tally the frequencies for each class, until the highest
value is reached.
5. The last class interval can go beyond the highest
value in the observation as long as the obtained is
followed.
Problem Exercise:
The following are the entrance examination scores of the 60 students
19 31 36 26 34 32
44 33 37 39 45 21
24 38 40 42 39 32
43 18 24 32 49 33
33 33 40 24 46 22
29 33 37 30 43 43
26 39 57 30 40 33
25 33 48 39 34 29
29 37 39 35 41 29
23 32 48 28 45 19
RELATIVE FREQUENCY
DISTRIBUTION
IT SHOWS THE RELATIONSHIP OF EACH CLASS
TO THE ENTIRE SET OF DATA
TABLE 3.4
RELATIVE FREQUENCY DISTRIBUTION FOR THE
ENTRANCE EXAMINATION SCORES OF 60
STUDENTS
CUMULATIVE FREQUENCY
DISTRIBUTION
IT IS A TABLE SHOWS THE NUMBER OF CASES
FALLING BELOW A PARTICULAR VALUE.
TYPES: <CF AND >CF
TABLE 3.4
CUMULATIVE FREQUENCY DISTRIBUTION FOR
THE ENTRANCE EXAMINATION SCORES OF 60
STUDENTS
Class interval frequency <cf >cf
Guide for Interpretation in
Frequencies and Percentages
The highest percentage in the table Description of percentage results of the table (used to start the table
interpretation, but use only once per table.
100% All of the respondents (__%)
97%-99% Almost all of the respondents (___%)
86%-96% Most of the respondents (___%)
76%-85% Great Majority of the Respondents (___%)
51%-75% Majority of the respondents (___%)
50% Half of the respondents (___%)
49% and below A great percentage of the respondents (__%)
A great number of the respondents (___%)
A substantial percentage of the respondents (__%)
A mark percentage of the respondents (__%)
Table 1
Work Related Profile of the
Respondents
Variables f %
School’s Division
Ilocos Sur 45 65.22
Vigan City 18 26.09
Candon City 6 8.70
Total 69 100.00
Number of Years in Teaching Mathematics
25.7 – 32 5 7.25
19.3 – 25.6 6 8.70
12.9 – 19.2 4 5.80
6.5 – 12.8 0 0
0.0 – 6.4 54 78.26
Total 69 100.00
Number of Times Attending Seminars, Trainings and Workshops
Related to Classroom Assessment and Evaluation
15 – 21 2 2.90
8 – 14 0 0
0-7 67 87.10
Total 69 100.00
GRAPHICAL PRESENTATION
1. Bar Chart-horizontal or vertical
2. Histogram
3. Frequency Polygon
4. Pie Chart
5. Ogive
What is the level of performance of the students?
What is the average weight of the respondents?
What is the level of research skills of the respondents?
What is the center of the distribution?
MEASURES CENTRAL TENDENCY
MEAN FOR UNGROUPED DATA
Find the mean of the test results of the Grade 9 students
87 50 37 67 68 70 51 70 73 78
Find the average grade of student A considering the number of
units of his subject.
88 89 83 85 80 82 90 89
3 3 2 3 3 4 3 3
MEAN FOR GROUPED DATA
STEPS IN USING THE CLASSMARK FORMULA
Distribution of the test scores of the students in Mathematics
examination
Class Interval f
12-18 3
19-25 12
26-32 10
33-39 26
40-46 17
47-53 14
54-60 5
STEPS USING THE CODED DEVIATION
Distribution of the test scores of the students in Mathematics
examination
Class Interval f
12-18 3
19-25 12
26-32 10
33-39 26
40-46 17
47-53 14
54-60 5
CHARACTERISTICS OF THE MEAN
• Most appropriate measure of central tendency when the data are in the
interval or ratio scale.
• It lies between the largest and smallest values or measurements.
• There is only one value for the mean for a given set of measurements.
• The mean is easily affected by extreme values because all values
contribute to the average.
MEDIAN FOR UNGROUPED DATA
Arrange the data in an array and pick up the middle value.
MEDIAN FOR GROUPED DATA
L= lower class boundary of the median
class
N= total frequency
<cf = less than cumulative frequency
above the median class
i= size of the class interval
Fm=frequency of the median class
STEPS USING THE FORMULA OF MEDIAN FOR
GROUPED DATA
• Construct the less than cumulative frequency.
• Determine the median class. This is the class interval
containing one-half of the total frequency N/2 in the less than
cumulative frequency column.
• Use the formula
COMMON ERROR IN PICKING UP THE <CF
f <cf
2 2
5 7
0 7
3 10
N/2=10/2=5, so we take the first 7
30
31
ci f <cf
32
32 30-31 2 2
34 32-33 2 4
35 34-35 2 6
36 36-37 4 10
37
37
37 • N/2=10/2=5, HENCE WE TAKE <CF LOWER THAN 5
Find the medians of the two data above to verify our claim.
CHARACTERISTICS OF THE MEDIAN
• It is appropriate for interval data
• The median lies between the highest and lowest measurements
• There is only one value for the median in a given set of
measurements
• The median is not influenced by extreme values
• The median is used when the middle value is desired. It is the
value where 50% or half of the distribution lies above it and 50%
lies below it
Mode
most frequently occurring value/s
To get the mode/s of a data, pick up the
most frequently occurring value/s.
MODE
STEPS TO FIND THE MODE USING THE
FORMULA
• Find the modal class.
• Use the formula to find the mode.
IN SOME CASES THE HIGHEST
FREQUENCY IS REPEATED
CONSIDER….
Class interval f
25-29 11
30-34 25
35-39 25
40-44 13
45-49 14
i=5 N=88
CHARACTERISTICS OF THE MODE
• The mode is the most appropriate measure of central tendency if the data is
nominal in scale.
• The mode is the least reliable among the three measures of central
tendency because its value is undefined in some distributions.
• The mode is used when we want to find the value which occurs most often.
• The mode is a quick approximation of the average. The mode is sometimes
called as an inspection average.
PROBLEM EXERCISE
Below is the distribution of the daily salary of workers in Landmark Corp.
Compute the average daily salary of the workers using the classmark
formula and the coded formula. Find also the median and the mode
Salary f
100-119 4
120-139 10
140-159 12
160-179 13
180-199 17
200-219 12
220-239 10
240-259 23
260-279 21
280-299 2
300-319 3
MEASURES OF POSITION(QUANTILES)
•QUARTILE
•DECILE
•PERCENTILE
D9 P90
D8 P80
Q3 (upper) P75
D7 P70
D6 P60
MEDIAN Q2 D5 P50
D4 P40
Q1 (lower) D3 P30
P25
D2 P20
D1 P10
INTERQUARTILE RANGE
• DIFFERENCE BETWEEN Q3 AND Q1
WAYS TO DO SUCH (FOR UNGROUPED DATA)
1. MENDENHALL AND SINCICH METHOD
2. LINEAR INTERPOLATION
NOTE: These methods sometimes (but not always) produce the
same results.
SIMILARITY OF THE MENDELHALL AND SINCICH
METHOD AND LINEAR INTERPOLATION
METHOD
• ARRANGE THE DATA IN ASCENDING ORDER
• SAME FORMULAS IN FINDING THE POSITION
DIFFERENCES OF THE MENDELHALL AND
SINCICH METHOD AND LINEAR INTERPOLATION
METHOD
• PROCESS IN CHOOSING THE VALUE/S
• SOMETIMES VALUE/S
Locating the position of the value
The Quartile for Ungrouped Data
The Decile for Ungrouped Data
The Percentile for Ungrouped Data
EXAMPLE
1, 3, 7, 7, 16, 21, 27, 30, 31
Compute for Q1 and Q3
Mendenhall and Sincich Metho
1, 3, 7, 7, 16, 21, 27, 30, 31
NOTE 1: The computed value 2.5 becomes 3 after rounding
up. The lower quartile value (Q1) is the third data element.
So, Q1 = 7.
NOTE 2: The computed value 7.5 becomes 7 after rounding
down. The upper quartile value (Q3) is the 7th data element.
So, Q3 = 27.
Linear Interpolation Method
1, 3, 7, 7, 16, 21, 27, 30, 31
Compute for Q1
Since the results are decimal numbers, interpolation is needed.
For Q1 Steps for Interpolation
1. Subtract the 2nd data from the 3rd data
7–3=4
2. Multiply the result by the decimal part obtained as the
position of Q1
4(0.5) = 2
3. Add the result in step 2, to the 2nd smaller number.
1, 3, 7, 7, 16, 21, 27, 30, 31 3 + 2 = 5 Q1
For Q3
1. Subtract the 7th data from the 8th data
30 – 27 = 3
2. Multiply the result by the decimal part obtained as the
position of Q3
3(0.5) = 1.5
3. Add the result in step 2, to the 7th number.
1, 3, 7, 7, 16, 21, 27, 30, 31 27 + 1.5 = 28.5 Q3
NOTE: Mendenhall and Sincich Method and linear
Interpolation are still applicable to decile and percentile.
Problem Exercise:
Find the Q1, Q3, D3, D7, P32, P46, and P76 of the given data below using
Mendenhall and Sincich method and Linear Interpolation Method.
23 12 16 28 25 32 34 24 43 47 44 35
24 12 16 18 29 40 45 47 42 33
The Quartile for Grouped Data
k = nth quartile (n = 1, 2, and 3)
N = total frequency
<cf = cumulative frequency of the class below the Qk
fQk = frequency of the quartile class
L = lower class boundary of the Qk
i = class size
Find Q1 of the given data below
Salary f <cf
100-119 4 4
120-139 10 14
140-159 12 26
160-179 13 39
180-199 17 56 <cf
l=199.5 200-219 12 fQ1 68 57th-68th salary
220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127
i=20 N=127
The Decile for Grouped Data
k = nth decile (n = 1, 2, 3, 4, 5, 6, 7, 8, and 9)
N = total frequency
<cf = cumulative frequency of the class below the Dk
fDk = frequency of the decile class
L = lower class boundary of the Dk
i = class size
Find D3 of the given data below
Salary f <cf
100-119 4 4
120-139 10 14
140-159 12 26 <cf
159.5 160-179 13 fD3 39 27th-39th salary
180-199 17 56
200-219 12 68
220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127
i=20 N=127
The Percentile for Grouped Data
k = nth decile (n = 1, 2, 3, 4, 5, 6, 7, 8, 9,…, 98, and 99)
N = total frequency
<cf = cumulative frequency of the class below the Pk
fDk = frequency of the decile class
L = lower class boundary of the Pk
i = class size
Find P9 of the given data below
Salary f <cf
100-119 4 4 <cf
L=119.5 120-139 10 fP9 14 5th-14th
140-159 12 26
160-179 13 39
180-199 17 56
200-219 12 68
220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127
i=20 N=127
PROBLEM EXERCISE
Below is the distribution of the daily salary of workers in Landmark Corp.
Find Q3, D1, D3, D6, D8, P4, P23, P75, P80
Salary f
100-119 4
120-139 10
140-159 12
160-179 13
180-199 17
200-219 12
220-239 10
240-259 23
260-279 21
280-299 2
300-319 3