0% found this document useful (0 votes)
180 views98 pages

Fundamentals of Statistics Overview

Statistics is the branch of applied mathematics dealing with the collection, organization, analysis, interpretation and presentation of data. It involves both descriptive statistics, which summarize and describe characteristics of data, and inferential statistics, which draw conclusions from data. Key terms in statistics include population, sample, parameter, statistic, variables, and different types of data and variables.

Uploaded by

Erika Quindipan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views98 pages

Fundamentals of Statistics Overview

Statistics is the branch of applied mathematics dealing with the collection, organization, analysis, interpretation and presentation of data. It involves both descriptive statistics, which summarize and describe characteristics of data, and inferential statistics, which draw conclusions from data. Key terms in statistics include population, sample, parameter, statistic, variables, and different types of data and variables.

Uploaded by

Erika Quindipan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistics

Branch of Applied Mathematics


Collection
Organization
Presentation
Analysis of data
Brief History

Derive from: Latin word ‘status’ or the


Italian word ‘Statista’

Meaning of these words is ‘political


state’ or government
IMPORTANCE OF STATISTICS
Categories of Statistics

[Link] STATISTICS

[Link] STATISTICS
Terminologies in Statistics

Population
Sample
Parameter
Statistic
Data-Qualitative
-Quantitative
Terminologies in statistics

Variables – discrete or continuous


- dependent and independent
EXAMPLES

1. If a class consists of male and female,


then gender is a variable in this class.

2. Height is a variable because different


people have different heights.
Discrete Variable

One that can assume a finite number of


values. In other words, it can assume
specific values only. The values of a
discrete variable are obtained through
the process of counting.
Continuous Variable

One that can assume infinite values


within a specified interval. The values
of a continuous variable are obtained
through measuring.
Scales of Measurement

• Nominal
• Ordinal
• Interval
• ratio
Population and sample

N
n
Ways to Determine

1. 30 %

where:
N = population size
N = sample size
e = margin of error (in education, used margin of
error not greater than 0.05)
3. Lynch’s Formula

where:
z = the value of the normal variable
(1.96) for reliability
p = largest possible proportion (0.5)
d = sampling error
N = population
n = sample size
Apply the Slovin’s and Lynch’s Formula
Given:
1.N=2600
2.N=3450
3.N=10050
4.N=335
5.N=23456
Sampling Technique

• Probability sampling technique

• Non-probability sampling technique


Some Probability Sampling Techniques

1. Random sampling
- Lottery Method
- Table of random numbers
LOTTERY METHOD
Suppose Mrs. Cruz wants to send five students to attend a 2-day training or
seminar in basic computer programming. To avoid bias in selecting these
five students from her 40 students, she can use the lottery method. this is
done by assigning number to each student and then writing these numbers on
pieces of paper. Then, these pieces of paper will be rolled or folded and
placed in a box called the lottery box. They should be thoroughly shaken
and then five pieces of paper will be picked or drawn from the box. The
student who were assigned to the numbers chosen will be sent to the
training. in this case, the selection of students is done without bias.
TABLE OF RANDOM NUMBERS

Mrs. Cruz wants to select 5 students from her


40 students. Again we will assign a number to
each students, say from 1 to 40.

31871 60770 59235 41702


87134 52839 17850 37359
06728 16314 81076 42172
95646 67486 65167 86819
44085 87246 47378 98338
Some Probability Sampling Technique
2. Systematic Sampling
To draw the members or elements of the sample
using this method, we have to select a random
starting point, then draw successive elements from
the population. In other words, we pick every nth
element of the population as a member of the sample
when we use this method.
EXAMPLE
Continuation of example

The next number is to write the numbers


1, 2, 3, 4, 5, 6, 7 and 8 on pieces of
paper and draw one number by lottery
Some probability sampling technique

3. Stratified Random Sampling


This is use when the population has
different groups or categories.
The word stratified comes from the root
word “STRATA which means groups or
categories (singular for stratum)
Example
There are 5000 families in a barangay which are
categorized as high, average and low income
families.
Find first the sample size. (Say you have computed 200 as the sample
size
Some Probability Sampling

4. Cluster Sampling
In this sampling, groups or clusters
are to be randomly selected instead of
individuals .
-Sometimes called Area Sampling
Some Probability Sampling

5. Multi-Stage Sampling
This is a combination of several
sampling technique and usually used to a
very large population, say entire the
country.
Multi-stage

This is done by starting the selection


of the members of the sample using
cluster sampling and then dividing each
cluster or group into strata. Then, from
each stratum individuals are drawn
randomly using simple random sampling.
Some Non-Probability sampling

[Link] Sampling
A researcher who wishes to investigate
the most popular noontime show may just
interview the respondents through the
telephone/cellphone.
Some Non-Probability Sampling

2. Quota Sampling
Similar to stratified sampling.
Difference: the selection of the
members of the sample is not done
randomly.
Example

Suppose we want to determine the teenagers’


most favorite brand of T-shirt. If there are
1000 female and 1000 male teenagers in the
population and we want to draw 150 members
for our sample, we can select 75 female and
75 male teenagers from the population
without using randomization.
Theoretical Sampling
-useful method of getting information from
a sample of population that a researcher
knows more about a subject.
-This approach is common in qualitative
research where statistical inference is not
required.
Purposive Sampling

- based on criteria set by the researcher

Snowball Technique

Sequential sampling
Activity

Type of Sampling When to use it? Advantage/s Disadvantage/s

Probability

Non-Probability


Data Collection
The world is full of potential data. However, only the
relevant and specifically required data are considered/
needed to investigate a research problem.

Data may be gathered from two general sources


1)primary
2) secondary
Primary sources are those sources from which
information are gathered directly from the original
source, or are based on direct or first-hand
experiences.
a. Interview
b. Questionnaires
c. Personal Accounts and Diaries
d. Observations and physical surveys
e. Standard scales and tests like mental ability tests
and other tests developed by professionals or groups.
f. Internet
Secondary sources are those sources from which
information are gathered from published or unpublished
materials that were previously collected by other
individuals or agencies used for a purpose other than the
original purpose for which they were collected.
a. Libraries and archives
b. Museums and collections
c. Government departments and commercial and
professional bodies
d. The field which includes ancient cities, buildings,
archaeological digs, etc.
e. The internet
Methods of Data Collection

1. Direct/Interview Method
2. Indirect or questionnaire method

3. Registration or Records Review

4. Observation Method

5. Experimentation Method
Ten Commandments of Data Collection (by: Neil J. Salkind)
 
1. Begin thinking about the type of data you will have to collect.
2. Think about where you will be obtaining the data.
3. Make sure that the data collection form you are using is clear and easy to use.
4. Make a duplicate copy of the data file and keep it in a separate location.
5. Do not rely on other people to collect or transfer your data unless you
personally have trained them and are confident that they understand the data
collection process as well as you do. 
6. Plan a detailed schedule of when and where you will be collecting your data.
7. Cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview.
9. Never discard original data.
10. Follow the previous 9.
Activity 4. Methods of Collecting Data
Fill out the table by citing five research title; give instances where you may use primary and
secondary
as sources of information; and indicate at least one method of
data collection that may be applied and describe how the method is to be carried out. 

Research Title Possible Possible Method of Descriptio


Primary Secondary Data n
Source Source Collection
Presentation of data

• Textual
• Tabular
• Graphical
Below are the test scores of the 50
students who took the Statistics math
Test
30 18 17 50 12 43 35 40 9

37 41 21 20 31 35 46 10 36

19 18 13 28 16 42 27 28 31

40 48 40 39 32 32 26 13 3

26 15 14 10 38 35 34 29 20
Arrange the scores from lowest to highest
scores to facilitate the enumeration of the
important characteristics of the data.

Arranging a mass of data manually is quite


tedious, but using computers for this purpose is
so easy.

Computer Stem-and-leaf plot


Stem-and-leaf plot
In a two-digit number, the stem consists of
the first digit, and the leaf consists of the
second digit.
While a three-digit number, the stem consists of
the first two digits, and the leaf consists of
the last digit.
In a one-digit number, the stem is zero.
Sample stem-and-leaf plot

Test Scores of Students in Mathematics 2


12 23 42 12 25 Stem Leaves
9 53 23 8 17 0 8, 9
1 0, 1, 1, 2, 2, 2, 3, 3, 6, 7
10 24 31 41 25
2 1, 3, 3, 3, 3, 3, 4, 5, 5, 5
11 36 11 21 23
3 1, 4, 6
12 45 16 23 34
4 1, 2, 5, 7
13 25 13 23 47 5 3
Do the Stem-Leaf-Plot to present the data below:

Below are the test scores of the 45 students who


took the Statistics math Test

30 18 17 50 12 43 35 40 9

37 41 21 20 31 35 46 10 36

19 18 13 28 16 42 27 28 31

40 48 40 39 32 32 26 13 3

26 15 14 10 38 35 34 29 20
Statistical Table
PARTS:
[Link] NUMBER This is for easy reference to the table.
[Link] TITLE
It briefly explains the content of the table
[Link] HEADER
[Link] CLASSIFIER It describes the data in each column
[Link]
It shows the classes or categories
[Link] NOTE
This is the main part of the table

This is placed below the table when the data


written are not original
Frequency Distribution Table

[Link] Data- arrangement of data from


lowest to highest which shows the frequency
of occurrence of each value in a set.
[Link] DATA- CONSISTS ONLY OF CLASS INTERVAL AND
FREQUENCY, CLASSMARKS AND CLASS BOUNDARIES
Table 3.2
Ungrouped Frequency Distribution for the Ages
of 50 Students Enrolled in Statistics

Age Frequency
14 4
15 13
16 25
17 5
18 2
19 1
N 50
Steps in constructing a frequency distribution
table:
1. Decide on the desired number of classes
2. Determine the class width (i)
3. Unless otherwise specified, always start the lowest
class with the lowest value of the raw data, in order to
minimize the errors.
4. Tally the frequencies for each class, until the highest
value is reached.
5. The last class interval can go beyond the highest
value in the observation as long as the obtained is
followed.
Problem Exercise:
The following are the entrance examination scores of the 60 students
19 31 36 26 34 32
44 33 37 39 45 21
24 38 40 42 39 32
43 18 24 32 49 33
33 33 40 24 46 22
29 33 37 30 43 43
26 39 57 30 40 33
25 33 48 39 34 29
29 37 39 35 41 29
23 32 48 28 45 19
RELATIVE FREQUENCY
DISTRIBUTION
IT SHOWS THE RELATIONSHIP OF EACH CLASS
TO THE ENTIRE SET OF DATA
TABLE 3.4
RELATIVE FREQUENCY DISTRIBUTION FOR THE
ENTRANCE EXAMINATION SCORES OF 60
STUDENTS
CUMULATIVE FREQUENCY
DISTRIBUTION
IT IS A TABLE SHOWS THE NUMBER OF CASES
FALLING BELOW A PARTICULAR VALUE.

TYPES: <CF AND >CF


TABLE 3.4
CUMULATIVE FREQUENCY DISTRIBUTION FOR
THE ENTRANCE EXAMINATION SCORES OF 60
STUDENTS
Class interval frequency <cf >cf
Guide for Interpretation in
Frequencies and Percentages
The highest percentage in the table Description of percentage results of the table (used to start the table
interpretation, but use only once per table.
100% All of the respondents (__%)
97%-99% Almost all of the respondents (___%)
86%-96% Most of the respondents (___%)
76%-85% Great Majority of the Respondents (___%)
51%-75% Majority of the respondents (___%)
50% Half of the respondents (___%)
49% and below A great percentage of the respondents (__%)
A great number of the respondents (___%)
A substantial percentage of the respondents (__%)
A mark percentage of the respondents (__%)
Table 1
Work Related Profile of the
Respondents
Variables f %
School’s Division    
Ilocos Sur 45 65.22
Vigan City 18 26.09
Candon City 6 8.70
Total 69 100.00
Number of Years in Teaching Mathematics    
25.7 – 32 5 7.25
19.3 – 25.6 6 8.70
12.9 – 19.2 4 5.80
6.5 – 12.8 0 0
0.0 – 6.4 54 78.26
Total 69 100.00
Number of Times Attending Seminars, Trainings and Workshops    
Related to Classroom Assessment and Evaluation

15 – 21 2 2.90
8 – 14 0 0
0-7 67 87.10
Total 69 100.00
GRAPHICAL PRESENTATION
1. Bar Chart-horizontal or vertical
2. Histogram
3. Frequency Polygon
4. Pie Chart
5. Ogive
What is the level of performance of the students?

What is the average weight of the respondents?

What is the level of research skills of the respondents?

What is the center of the distribution?


MEASURES CENTRAL TENDENCY
MEAN FOR UNGROUPED DATA
Find the mean of the test results of the Grade 9 students

87 50 37 67 68 70 51 70 73 78

Find the average grade of student A considering the number of


units of his subject.

88 89 83 85 80 82 90 89
3 3 2 3 3 4 3 3
MEAN FOR GROUPED DATA
STEPS IN USING THE CLASSMARK FORMULA
Distribution of the test scores of the students in Mathematics
examination

Class Interval f
12-18 3
19-25 12
26-32 10
33-39 26
40-46 17
47-53 14
54-60 5
STEPS USING THE CODED DEVIATION
Distribution of the test scores of the students in Mathematics
examination

Class Interval f
12-18 3
19-25 12
26-32 10
33-39 26
40-46 17
47-53 14
54-60 5
CHARACTERISTICS OF THE MEAN
• Most appropriate measure of central tendency when the data are in the
interval or ratio scale.
• It lies between the largest and smallest values or measurements.
• There is only one value for the mean for a given set of measurements.
• The mean is easily affected by extreme values because all values
contribute to the average.
MEDIAN FOR UNGROUPED DATA
Arrange the data in an array and pick up the middle value.

MEDIAN FOR GROUPED DATA

L= lower class boundary of the median


class
N= total frequency
<cf = less than cumulative frequency
above the median class
i= size of the class interval
Fm=frequency of the median class
STEPS USING THE FORMULA OF MEDIAN FOR
GROUPED DATA
• Construct the less than cumulative frequency.
• Determine the median class. This is the class interval
containing one-half of the total frequency N/2 in the less than
cumulative frequency column.
• Use the formula
COMMON ERROR IN PICKING UP THE <CF
f <cf
2 2
5 7
0 7
3 10

N/2=10/2=5, so we take the first 7


30
31
ci f <cf
32
32 30-31 2 2
34 32-33 2 4
35 34-35 2 6
36 36-37 4 10
37
37
37 • N/2=10/2=5, HENCE WE TAKE <CF LOWER THAN 5

Find the medians of the two data above to verify our claim.
CHARACTERISTICS OF THE MEDIAN
• It is appropriate for interval data
• The median lies between the highest and lowest measurements
• There is only one value for the median in a given set of
measurements
• The median is not influenced by extreme values
• The median is used when the middle value is desired. It is the
value where 50% or half of the distribution lies above it and 50%
lies below it
Mode
most frequently occurring value/s

To get the mode/s of a data, pick up the


most frequently occurring value/s.
MODE
STEPS TO FIND THE MODE USING THE
FORMULA
• Find the modal class.
• Use the formula to find the mode.
IN SOME CASES THE HIGHEST
FREQUENCY IS REPEATED
CONSIDER….

Class interval f
25-29 11
30-34 25
35-39 25
40-44 13
45-49 14
i=5 N=88
CHARACTERISTICS OF THE MODE
• The mode is the most appropriate measure of central tendency if the data is
nominal in scale.
• The mode is the least reliable among the three measures of central
tendency because its value is undefined in some distributions.
• The mode is used when we want to find the value which occurs most often.
• The mode is a quick approximation of the average. The mode is sometimes
called as an inspection average.
PROBLEM EXERCISE
Below is the distribution of the daily salary of workers in Landmark Corp.
Compute the average daily salary of the workers using the classmark
formula and the coded formula. Find also the median and the mode
Salary f
100-119 4
120-139 10
140-159 12
160-179 13
180-199 17
200-219 12
220-239 10
240-259 23
260-279 21
280-299 2
300-319 3
MEASURES OF POSITION(QUANTILES)

•QUARTILE
•DECILE
•PERCENTILE
D9 P90
D8 P80
Q3 (upper) P75
D7 P70
D6 P60
MEDIAN Q2 D5 P50
D4 P40
Q1 (lower) D3 P30
P25
D2 P20
D1 P10
INTERQUARTILE RANGE

• DIFFERENCE BETWEEN Q3 AND Q1


WAYS TO DO SUCH (FOR UNGROUPED DATA)

1. MENDENHALL AND SINCICH METHOD


2. LINEAR INTERPOLATION

NOTE: These methods sometimes (but not always) produce the


same results.
SIMILARITY OF THE MENDELHALL AND SINCICH
METHOD AND LINEAR INTERPOLATION
METHOD
• ARRANGE THE DATA IN ASCENDING ORDER
• SAME FORMULAS IN FINDING THE POSITION

DIFFERENCES OF THE MENDELHALL AND


SINCICH METHOD AND LINEAR INTERPOLATION
METHOD
• PROCESS IN CHOOSING THE VALUE/S
• SOMETIMES VALUE/S
Locating the position of the value

The Quartile for Ungrouped Data

The Decile for Ungrouped Data

The Percentile for Ungrouped Data


EXAMPLE
1, 3, 7, 7, 16, 21, 27, 30, 31
Compute for Q1 and Q3
Mendenhall and Sincich Metho

1, 3, 7, 7, 16, 21, 27, 30, 31


NOTE 1: The computed value 2.5 becomes 3 after rounding
up. The lower quartile value (Q1) is the third data element.
So, Q1 = 7.

NOTE 2: The computed value 7.5 becomes 7 after rounding


down. The upper quartile value (Q3) is the 7th data element.
So, Q3 = 27.
Linear Interpolation Method

1, 3, 7, 7, 16, 21, 27, 30, 31


Compute for Q1

Since the results are decimal numbers, interpolation is needed.


For Q1 Steps for Interpolation
1. Subtract the 2nd data from the 3rd data
7–3=4
2. Multiply the result by the decimal part obtained as the
position of Q1
4(0.5) = 2
3. Add the result in step 2, to the 2nd smaller number.
1, 3, 7, 7, 16, 21, 27, 30, 31 3 + 2 = 5 Q1
For Q3
1. Subtract the 7th data from the 8th data
30 – 27 = 3
2. Multiply the result by the decimal part obtained as the
position of Q3
3(0.5) = 1.5
3. Add the result in step 2, to the 7th number.
1, 3, 7, 7, 16, 21, 27, 30, 31 27 + 1.5 = 28.5 Q3
NOTE: Mendenhall and Sincich Method and linear
Interpolation are still applicable to decile and percentile.
Problem Exercise:
Find the Q1, Q3, D3, D7, P32, P46, and P76 of the given data below using
Mendenhall and Sincich method and Linear Interpolation Method.

23 12 16 28 25 32 34 24 43 47 44 35
24 12 16 18 29 40 45 47 42 33
The Quartile for Grouped Data

k = nth quartile (n = 1, 2, and 3)


N = total frequency
<cf = cumulative frequency of the class below the Qk
fQk = frequency of the quartile class
L = lower class boundary of the Qk
i = class size
Find Q1 of the given data below
Salary f <cf
100-119 4 4
120-139 10 14
140-159 12 26
160-179 13 39
180-199 17 56 <cf
l=199.5 200-219 12 fQ1 68 57th-68th salary

220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127

i=20 N=127
The Decile for Grouped Data

k = nth decile (n = 1, 2, 3, 4, 5, 6, 7, 8, and 9)


N = total frequency
<cf = cumulative frequency of the class below the Dk
fDk = frequency of the decile class
L = lower class boundary of the Dk
i = class size
Find D3 of the given data below

Salary f <cf
100-119 4 4
120-139 10 14
140-159 12 26 <cf
159.5 160-179 13 fD3 39 27th-39th salary
180-199 17 56
200-219 12 68
220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127

i=20 N=127
The Percentile for Grouped Data

k = nth decile (n = 1, 2, 3, 4, 5, 6, 7, 8, 9,…, 98, and 99)


N = total frequency
<cf = cumulative frequency of the class below the Pk
fDk = frequency of the decile class
L = lower class boundary of the Pk
i = class size
Find P9 of the given data below
Salary f <cf
100-119 4 4 <cf
L=119.5 120-139 10 fP9 14 5th-14th
140-159 12 26
160-179 13 39
180-199 17 56
200-219 12 68
220-239 10 78
240-259 23 101
260-279 21 122
280-299 2 124
300-319 3 127

i=20 N=127
PROBLEM EXERCISE
Below is the distribution of the daily salary of workers in Landmark Corp.
Find Q3, D1, D3, D6, D8, P4, P23, P75, P80

Salary f
100-119 4
120-139 10
140-159 12
160-179 13
180-199 17
200-219 12
220-239 10
240-259 23
260-279 21
280-299 2
300-319 3

You might also like