Week 6-8 Module
Week 6-8 Module
LEARNING MODULE
GEC 4:
MATHEMATICS IN
THE MODERN
WORLD
(Weeks 6-8)
PREPARED BY:
VISION
The Technological University of the Philippines shall be the premier state university
with recognized excellence in engineering and technology at par with leading universities in
the ASEAN region.
MISSION
The University shall provide higher and advanced vocational, technical, industrial,
technological and professional education and training in industries and technology, and in
practical arts leading to certificates, diplomas and degrees.
It shall provide progressive leadership in applied research, developmental studies in
technical, industrial, and technological fields and production using indigenous materials; effect
technology transfer in the countryside; and assist in the development of small-and-medium
scale industries in identified growth center. (Reference: P.D. No. 1518, Section 2)
QUALITY POLICY
The Technological University of the Philippines shall commit to provide quality higher
and advanced technological education; conduct relevant research and extension projects;
continually improve its value to customers through enhancement of personnel competence and
effective quality management system compliant to statutory and regulatory requirements; and
adhere to its core values.
CORE VALUES
TABLE OF CONTENTS
Page Numbers
TUP Vision, Mission, Quality Policy, and Core Values
Table of Contents……………………………………………………………………. iii
Overview…………………………………………………………………………………iv
Learning Guide (Week No. 6) …………………………………………………………. 1
Topic ……………………………………………………………………………………. 1
Expected Competencies………………………………………………………………….. 1
Content/Technical Information…………………………………………………………... 1
Progress Check…… ……………………………………………………………………… 19
References………………………………………………………………….....…............... 19
Learning Guide (Week No. 7) …………………………………………..……………… ..20
Topic/s………………………………………………………………..…………………… 20
Expected Competencies…………………………………………………...……………….20
Content/Technical Information ………………………………………………….………..,20
Progress Check ………………………………………………………..………………….. 32
References………………………………………………………………..………………...33
Learning Guide (Week No. 8) …………………………………………………..................34
Topic/s…………………………………………………………………….………………. 34
Expected Competencies…………………………………………………..…………………34
Content/Technical Information…………………………………………..……………… 34
Progress Check…… ………………………………………………………………………...40
References………………………………………………………………………………… 41
REFERENCES………………………………………………………………………….. 41
About the Authors………………………………………………………………………... 42
iv
OVERVIEW
It seems that we just started the semester and here we are now in the middle of the term.
This time, we will focus on statistics. This is important as you are going to conduct your own
mini research.
On Week 6, we will discuss about measures of central tendency, dispersion, and
location, and exploratory data analysis. We will also discuss the introduction to research.
On Week 7, we will discuss the probabilities and the normal distribution.
On Week 8, we will discuss about linear regression, and correlation. The topics were
also shortened to provide you more time for the mini research.
May you find these ideas helpful for your research.
LEARNING GUIDE
Week No.: 6
CONTENT/TECHNICAL INFORMATION
Starting March 24, 2020, TUPVisayas and Negros Women for Tomorrow Foundation
(NWTF), an industry partner, provided regular updates of the total number of delivered face
shield. On April 1, there was a total of 955 deliveries. Daily deliveries on the rightmost
column is taken by subtracting the number of deliveries of the previous day from the current
delivery. For example, 1755 – 955 = 800. The questions are: Which month (April or May)
has higher deliveries? What is the average donation in each month? How do the data look like
when presented graphically?
When speaking about average, it is the arithmetic mean that comes to mind. But strictly
speaking, the word average means the location of the center of the data set. For many, the
measures of central tendency are mean, median and mode. But Bluman and Triola included
midrange as another measure of center.
Measure of center is only one of the important characteristics of data. Triola included
variation, distribution, outliers and time as other important characteristics. Variation, along
with the measures of position can also summarize the data. The distribution or the nature of the
spread of data over the range of values is another characteristic that will be discussed in this
section. The presence of outliers in a data set is also an important consideration especially in
getting the measure of center. Lastly, time is an important consideration in data gathering
because to some extent, it has an effect to the data.
A. Measures of Central Tendency
1. Mean – the quotient of the summation of all data values by its total frequency. Often,
mean is used to find the measure of central tendency because it involves all the
elements in the data set. This is used when the data are interval or ratio. This is
also used to find for other statistical tools like the standard deviation. However,
it is sensitive to extreme data values called outliers.
2. Median – the middlemost value when data are arranged in array. This can be used
among ordinal data.
3. Mode – the data with most commonly frequency in the data set. This is the only
measure of central tendency which can be used with nominal data.
4. Midrange – the mean of the highest and the lowest score. This is seldom used and
can be sensitive to outliers.
Example 1. A family has four members. Their monthly salaries are P14 000, P20 000, P15
000 and P18 000. Find the measures of central tendency.
∑𝑥 P 14 000+20 000+15 000+18 000 67 000
1. Mean: = = = = P16 750.00
𝑁 4 4
We use the formula above because we are sure that this pertains to a
population. Formulas for population use Greek letters. In this case, is a
Greek letter for mu. is sigma in the upper case.
2. Median: First we arrange the numbers in array. Array is the arrangement of
numbers either in ascending or descending order.
P14 000, P15 000, P18 000, P20 000
𝑁+1 4+1
We get the middle value by using the formula = = 2.5
2 2
nd
That means that the middle value is between the 2 and third term.
15 000+18 000
Medan = = P16 500.00
2
3
Example 2. These are the scores of randomly selected students in a 30-item test:
3, 2, 5, 3, 5, 4, 30
∑𝑥 3+2+5+3+5+4+30 52
1. Mean: 𝑥̅ = = = 7 = 7.4
𝑛 7
We use the formula above because this is just a sample data. The key words
are “randomly selected”. The round-off rule by Triola is to add one decimal
place to the original data. In this case, they are whole numbers so we round
up to the nearest tenths.
2. Median: First we arrange the numbers in array. 2, 3, 3, 4, 5, 5, 30
𝑛+1 7+1
We get the middle value by using the formula = 2 =4
2
The median is 4. It is the fourth term of the array.
3. Mode: 3 and 5. They both have the highest frequency. Since there are two,
we call it bimodal. If there is only one mode, it is called unimodal.
2+30
4. Midrange = = 16
2
From the data, we can see that 30 is a value that is too different from all
other scores. It is unusually high compared to others. Extreme scores, too
low or too high, are called outliers. Outliers affect mean and midrange that is
why, in their presence, we use the median.
When given a data set, inspection is the first step. Then use the appropriate
statistical tool.
Figure 2
Finding mean using Microsoft Excel
2. Median
a. Highlight the column of the data set.
b. Click the “Sort and Filter” on the upper right side of Excel sheet.
c. Click “Sort Smallest to Largest”.
d. Find the middle value if the number of elements in the data set is odd.
e. If the number of elements in the data set is even, get the mean of the two
middle values.
Figure 3
Finding median using Microsoft Excel
5
B. Measures of Variability
Figure 4
Comparison of variability of two data sets
Note: The narrow peak has a standard deviation of 10 while the flatter curve has a standard deviation of 50. The spread of
the data corresponds to its variability.
Source: Brown, J. R., n. d.
Both data sets have 5 as mean, median, mode and midrange but they are different.
That is why, aside from measures of central tendency, we also report the measures of
variability, especially the standard deviation.
1. Range – the difference between the highest and the lowest scores
2. Standard Deviation – the square root of the squared deviations from the mean.
6
Formulas:
Population Standard Deviation Sample Standard Deviation
(x − ) (x − x )
2 2
= s=
N n −1
Let us consider the data sets from the previous page as sample data sets. This is the
step by step process.
Data Set A Data Set B
x x - ̅𝑥 (x - ̅𝑥 )2 x x - ̅𝑥 (x - ̅𝑥 )2
0 -5 25 4 -1 1
2 -3 9 4 -1 1
3 -2 4 5 0 0
5 0 0 5 0 0
5 0 0 5 0 0
7 2 4 5 0 0
8 3 9 6 1 1
10 5 25 6 1 1
(x - ̅𝑥)2 = 76 (x - ̅𝑥)2 = 4
(x − x ) (x − x )
2 2
76 4
s= = = 10.857 s= = = 0.571
n −1 7 n −1 7
Data set B is clearly less variable than data set A. We can also say it is more
homogeneous or less dispersed.
1. Below the scores, type “=stdev.p” The prompter will show and you can simply click
the “=stdev.p”.
7
Figure 5
Finding the population standard deviation step 1
2. Type open parenthesis followed by the location of your data set. In this example, we
have c1 to c9. Then press close parenthesis.
Figure 6
Finding the population standard deviation step 2
The same sequence with finding the population standard deviation only then that
this time, it is the “stdev.s” that is used.
8
Figure 7
Finding the sample standard deviation
Figure 8
Finding the variance
9
Figure 9
Finding your relative position in the group
Measures of relative position are seldom used than the measure of center, but this
concept is also important.
1. Quartile – divides the data set into four equal parts
Example: 0, 2, 3, 5, 5, 7, 8, 10
There are eight numbers and there are exactly 2 numbers in each partition
𝑘 (𝑛+1)
Formula: 𝑄𝑘 = th term
4
1 (𝑛+1) 1 (8+1) 9
a. Q1 = th term = = = 2.25 or the 2nd term = 2
4 4 4
2 (𝑛+1) 2 (8+1) 18
b. Q2 = th term = = = 4.5th term
4 4 4
3 (𝑛+1) 3 (8+1) 27
c. Q3 = th term = = 6.75 or the 7th term = 8
4 4 4
Interquartile Range = Q3 – Q1 = 8 – 2 = 6
𝑘 (𝑛+1)
Formula: 𝐷𝑘 = th term
10
3 (8+1) 27
𝐷3 = th term = 10 = 2.7 or the third term = 3
10
7 (8+1) 63
𝐷7 = th term = 10 = 6.3 or the sixth term = 7
10
9 (8+1) 81
𝐷9 = th term = 10 = 8.1 or the eighth term = 10
10
𝑘 (𝑛+1)
Formula: 𝑃𝑘 = th term
100
3 (8+1) 27
𝐷3 = th term = 10 = 2.7 or the third term = 3
10
Median also divides the data set into two equal parts. It is both a measure of
central tendency and of location or position. It is equal to the following:
Median = Q2 = D5 = P50
Example: Suppose the following are the prelim scores of randomly selected students.
Find the measures of central tendency and variation then compare.
A. 34, 61, 68, 71, 73, 73, 85, 86, 89, 90
B. 60, 65, 65, 67, 73, 73, 73, 80, 84, 90
= 60 + 65 + 65 + 67 + 73 + 73 + 73 + 80 + 84 + 90
=
34+61+68+71+73+73+85+86+89+90 10
10
730
=
730 =
10 10
̅ = 𝟕𝟑
𝒙 ̅ = 𝟕𝟑
𝒙
𝑛+1 th 𝑛+1 th
Median Location: Median = Location: Median = score
2 2
score 10+1
= = 5.5 th score
10+1 2
= 2
=
th ̃ = 𝟕𝟑
𝒙
5.5 score
̃ = 𝟕𝟑
𝒙
11
Mode 𝑥̂ = 73 𝑥̂ = 73
Group A
Group B
0 10 20 30 40 50 60 70 80 90 100
The dot in Group A is called the outlier. If you review the data set, it is 34. The lowest
score after the outlier is 61. The first quartile is 68 and the third quartile is 86. The highest
number in the distribution is 90. Based on the position of the median, the data set is right
skewed because the right line is longer than the left.
For Group B data, the lowest score is 60. The first quartile is 64, the median is 73 and
the last quartile is 80. The highest score is 90. This time, the data is symmetric because the left
and right lines are almost of the same length.
The vertical box will provide the same information as the horizontal box plot.
13
INTRODUCTION TO RESEARCH
Research - careful, systematic, patient study and investigation in some field of knowledge.
Research Types
• Action research
• Survey research
• Historical research
• Experimental research
• Correlational research
• Ethnographic research
• Causal-comparative research
14
• The basic ethical question for all researchers to consider is whether any physical or
psychological harm could come to anyone as a result of the research.
• All respondents/ participants/ informants in a research study should be assured that
any data collected from or about them will be held in confidence.
• The term deception, as used in research, refers to intentionally misinforming the
respondents/ participants/ informants as to some or all aspects of the research topic.
• Plagiarism is the act of misrepresenting someone else’s work as one’s own.
Unintentional plagiarism can be avoided through the proper use and citation of
published and unlisted sources.
Informed Consent – obtained from research participants to protect them from harm if there
is a possibility of risk exposure.
• All participants should be assured that any data collected from or about them will be
confidential. They have the right to withdraw from the study or to request that data
collected about them will not be used.
Exemptions from Guidelines (USA Department of Health and Human Services)
1. Research conducted in educational settings, such as instructional strategy research or
studies on the effectiveness of educational techniques, curricula, or classroom
management methods.
2. Research using educational tests (cognitive, diagnostic, aptitude, and achievement),
provided that subjects remain anonymous.
3. Survey or interview procedures, except where all of the following conditions prevail:
a. Participants could be identified.
b. Participants’ responses, if they became public, could place the subject at risk
on criminal or civil charges or could affect the subjects’ financial or
occupational standing.
c. Research involves “sensitive aspects” of the participant’s behavior, such as
illegal conduct, drug use, sexual behavior, or alcohol use.
4. Observation of public behavior (including observation by participants), except where
all three of the conditions listed in item 3 above are applicable.
5. The collection or study of documents, records, existing data, pathological specimens,
or diagnostic specimens if these sources are available to the public or if the
information obtained from the sources remains anonymous.
Types of Sources
1. General reference tools - include indexes or abstracts
2. Primary sources – researchers report the results of their studies directly to the reader.
3. Secondary sources - refer to publications in which authors describe the work of
others.
Variable — a noun that stands for variation within a class of objects
Data – the kinds of information researchers obtain on the subjects of their research.
Instrumentation – the whole process of preparing to collect data. It involves the selection or
design of the instruments and the procedures and the conditions under which the
instruments will be administered.
Acquiring an Instrument
▪ Find and administer a previously existing instrument of some sort
▪ Administer an instrument the researcher personally developed or had developed by
someone else.
Types of Instruments
• There are many types of researcher-completed instruments. Some of the more
commonly used are rating scales, interview schedules, observation forms, tally sheets,
flowcharts, performance checklists, anecdotal records, and time-and-motion logs.
17
• Many types of instruments are completed by the subjects of a study rather than the
researcher. Some of the more commonly used of this type are questionnaires; self-
checklists; attitude scales; personality inventories; achievement, aptitude, and
performance tests; and projective and socio-metric devices.
• The types of items or questions used in subject-completed instruments can take many
forms, but they all can be classified as either selection or supply items.
Norm-Referenced Versus Criterion-Referenced Instruments
Norm-Referenced Instruments – provide scores that compare individual scores to the
scores of an appropriate reference group
Criterion-Referenced Instruments –based on a specific target for each learner to achieve
Usability
▪ How easy it will be to use any instrument he or she designs or selects?
▪ How long will it take to administer?
▪ Are the directions clear?
▪ Is it appropriate for the ethnic or other groups to whom it is conducted?
18
3. Obtain more information on the details of the study—that is, where and when it takes
place, extraneous events that occur, and so on. This helps control for location,
instrumentation, history, subject attitude, and implementation threats.
4. Choose an appropriate design. The proper design can do much to control these threats
to internal validity.
EXTERNAL VALIDITY (GENERALIZABILITY) – the extent that the results of a study
can be generalized from a sample to a population.
PROGRESS CHECK
From each section, 5 were randomly selected and their partial scores were checked. Which
section performed best? Which section is the most homogeneous? Which is the most
heterogeneous?
REFERENCES:
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The McGraw-
Hill Companies, Inc.
Fraenkel, J., Wallen, N., & Hyun, H. (2012). How to Design and Evaluate Research in
Education (10th Ed.). New York: The McGraw-Hill Companies, Inc.
Montgomery, D. & Runger, G. (2007). Applied Statistics and Probability for Engineers (4th
Ed.). Danvers: John Wiley & Sons (Asia) Pte Ltd.
Triola, M. (2012). Elementary Statistics (Custom Edition). Pearson.
20
LEARNING GUIDE
Week No.: __7__
EXPECTED COMPETENCIES: At the end of this lesson, the you must have:
1. classified various distribution shapes;
2. identified the characteristics of a normal distribution;
3. solved for the z-score and probability values; and
4. classified curves with regards to skewness and kurtosis.
CONTENT/TECHNICAL INFORMATION
You are familiar with the “COVID” curves below. In these curves, the x-axis represents the
number of infections. Their shapes look like a normal distribution curve, but they are not. We
will present to you the characteristics of the normal curve and you will determine the reason
why we cannot consider the figures below as normal curves.
Figure 1
The “COVID” Curve
Distribution Shapes
Let us suppose that in a certain program, there are six sections. Look at the shapes of
the distributions of each of these sections. Take note that this is NOT an accurate histogram
because the first bar should start with 5 on the x-axis. This graph overlapped the values of zero
and five on the origin. Besides, the x-axis should use the class boundaries.
Figure 2
Bell-shaped
Figure 3
Right skewed
Figure 4
Left skewed
Figure 5
Uniform
Figure 6 Figure 7
Bimodal U-shaped
The first distribution shape is bell-shaped which is also known as the normal
distribution or Gaussian distribution, as it is named after Carl Friedrick Gauss (1777-1855)
who derived its equation. You can see this curve on the encircled portion of the bill on figure
10 which honors Gauss.
Figure 8
The German bill that displays Gauss and the normal distribution
Source: banknotes.com
We will focus first on the normal distribution. Bluman (2012) defines normal
distribution as a continuous, symmetric, bell-shaped distribution of a variable. According to
Montgomery and Rungers (2014), the normal distribution as the most widely used model for a
23
continuous measurement. An example is an automotive engineer who may plan to study the
average pull-off force measurements from several connectors. The replicates of random
experiment will produce a normal distribution.
2. The mean, median and mode are equal and are located at the center of the distribution.
The normal distribution curve below (figure 10) has a mean, median and mode which
are all equal to 28. If we are going to solve for the mean of all scores, it will be 28. You
can see that 28 is also at the middle of the numbers when arranged from lowest to
highest, which is the median. The mode is the highest point in the distribution.
Figure 8
The normal distribution
4. The curve is symmetric about the mean, that is, its shape is the same on both sides of
a vertical line passing through the center. (Figure 10)
5. The curve is continuous; there are no gaps or holes. For each value of X, there is a
corresponding value of Y.
The given normal distributions that were stated earlier came from scores in the quiz.
Scores are continuous because it is a measurement of how much knowledge is
attained by a group of students. Discrete variables have different distributions as we
have discussed in learning guide 2.
6. The curve is asymptotic. It never touches the x axis. Theoretically, no matter how far
the curve extends in either direction, it never meets the x axis—but it gets
increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or 100%. This fact
may seem unusual, since the curve never touches the x axis, but one can prove it
mathematically by using calculus.
24
We use a table of values to identify the area under the normal curve (See table 1 on
the next page.)
8. The area under the part of a normal curve that lies within 1 standard deviation of the
mean is approximately 0.68, or 68%; within 2 standard deviations, about 0.95, or
95%; and within 3 standard deviations, about 0.997, or 99.7%. See figure 11.
Figure 9
The empirical (or 68-95-99.7) rule
To identify the area of the normal curve easily, we standardize the distribution. The
standard normal distribution is a normal probability distribution with mean equals 0 and
standard deviation equals to 1. The total area under its density curve is equal to 1. The formula
𝑥− 𝜇 𝑥− 𝑥̅
is z = 𝜎 for population data or z = 𝑠 for sample data. See figure 12.
Figure 10
Converting to a standard normal distribution
25
Figure 11
Finding for the area under the normal curve
Figure 11 shows how to get the area of the normal distribution. The green example
shows that if z = 2.01 red example shows that if z = 1.27, the area under the normal curve is
0.8980 or 89.80%.
Figure 12
Interpreting z-scores
Example1
Suppose that the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 9 milliamperes and a variance of four (milliamperes) 2. What is the
probability that a measurement a) is below 8 milliamperes (b) between 8 and 12 milliamperes,
and (c) exceeds 12 milliamperes?
Given:
Let us denote that the current is in milliamperes
x̅ = 9 s2 = 4
Required:
a. P (x <8) b. P (8< x < 12) c. P (x > 12)
x− x̅
Formula: z =
s
Since our given is expressed in variance and we know that the variance is the square of
standard deviation, then, our standard deviation is equal to 2.
26
Solution:
x− x
̅ 8− 9 −1
a. 𝑧1 = = = = - 0.5
s 2 2
Since our z-score is negative, let us take the values from the table (Figure 18)
on the next page. P (z < -0.5) = 0.3085 or 30.85%
To illustrate this, we can see in figure that -0.5 is to the left of zero, before -1. We shade
the left portion because we are interested of the scores which are less than -0.5. To make
it is easy for you, the shading will correspond to the “arrow”. Example, in less than <,
the pointed portion is on the left. Or simply, in less than, shade left part.
Figure 13
P (z < -0.5) = 30.85%
Z Score
b. P (8< x < 12). We are looking for the values between 8 and 12. Since we know
already the value of P (x<8), let us find for the value of P (x<12). Take note, that
the values in the normal distribution table is always to the left. (See Figure 19.)
x− x̅ 12− 9 3
𝑧2 = = = 2 = 1.5; P (z < 1.5) = 0.9332 = 93.32%
s 2
P (8< x < 12) = P (-0.5 < z < 1.5) = 93.32% – 30.85% = 62.47%
27
Figure 14
P (-0.5 < z < 1.5) = 62.47%
c. P (x > 12). Since we know that P (x < 12) is P (z < 1.5) = 0.9332, then we subtract
it from 1. This subtraction from one is because the area under the normal curve is
1 or 100%. Such that P (z > 1) = P (1 - z < 1.5) = 100% – 93.32% = 16.68%.
Figure 15
P (z > 1.5) = 16.68%
Figure 16
The standard normal distribution part 1
Source: Bluman
29
Figure 17
The standard normal distribution part 2
Source: Bluman
Example 2
The line width for semiconductor manufacturing is assumed to be normally distributed with a
mean of 0.4 micrometer and a standard deviation of 0.04 micrometer. What is the probability
that (a) a line width is greater than 0.52 micrometer? (b) a line is between 0.32 and 0.35
micrometer? and (c) the line width of 90% of samples is below what value?
30
Given:
Let us denote that the current is in micrometers
μ = 0.4 σ = 0.04
Required:
a. P (x > 0.52) b. P (0.32< x < 0.35) c. value below line width of 90% of samples
x− μ
Formula: z = σ
x− μ 0.52− 0.4 0.12
a. z1 = =z= = z = 0.04 = 3.0
σ 0.04
Figure 18
P (z > 3.0) = 0.13%
Figure 19
P (z > 3.0) = 8.283%
To answer this, we are going to find 90% in our normal distribution table. There is
no exact value of 90%; therefore, we are going to consider the value nearest to 0.9
which is 1.28 as shown in figure 22.
Source: Bluman
x− μ
Then we are going to substitute this to our formula, z = σ
x−0.4
1.28 = 0.04
0.0512 + 0.4 = x
0.4512 = x
x = 0.4512
Therefore, the line width of 90% of samples of the semiconductor is below 0.4512.
Thus, we can use the derived formula x = z σ + μ to find for the value of x.
32
PROGRESS CHECK
1. _______________________
2. _______________________
3. _______________________
4. _______________________
5. _______________________
Note: Images from Emory Oxford College
II. Match column A with column B. Write the CAPITAL letter on the space provided
before each number. (5 points)
Column A Column B
______1. The left side is a mirror image A. Asymptotic
of its right side B. Continuous
______2. The tails approximate the x-axis C. Kurtosis
but they do not meet. D. One-modal
______3. There are no gaps. E. Skewness
______4. The peakedness or flatness F. Symmetric
of a distribution G. Unimodal
______5. Has only one mode
III. Show neat and complete solution. (20 points)
The average fuel efficiency of U.S. light vehicles (cars, SUVs, minivans, vans, and
light trucks) for 2005 was 21 miles per gallon (mpg). If the standard deviation of
the population was 2.8 and the gas ratings were normally distributed, (a) what is the
probability that the fuel used for a random sample of 25 light vehicles is under 18?
(b) between 20 and 24?
Given:
33
Required:
Solution:
a.
b.
IV. Why is it that the “COVID curve” cannot be considered as a normal distribution?
(Content: 8 points, organization of ideas: 2 points)
V. What are the important concepts about the Central Limit Theorem? (10 points)
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The McGraw-
Hill Companies, Inc.
Devore, J. (2012). Probability and Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning
Glen. S. (n.d.). Kurtosis: Definition, Leptokurtic, Platykurtic. StatisticsHowTo.com
Montgomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers. (6th
Ed.). John Wiley & Sons, Inc.
Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/
Wattkins, J. (n.d.). An Introduction to the Science of Statistics: From Theory to
Implementation.
34
LEARNING GUIDE
Week No.: _8_
CONTENT/TECHNICAL INFORMATION
One of the statistical tools that is commonly used is correlation. Bluman (2012) defines
correlation as a statistical method used to determine if there is an existing linear relationship
between variables. Pierce (2020) emphasizes that correlation is NOT causation. This means
that in correlation, one thing does not cause the other or the other does not cause the first to
happen.
Figure 1
Correlation
(Source: mathisfun.com)
Bluman explains that there are two variables in a simple relationship: the independent
(explanatory or predictor) variable and the dependent (response) variable. The determination
of x and y variables is not always clear-cut and may sometimes be arbitrary. As you can see on
Figure 1, a relationship can either be positive or negative. A positive relationship occurs when
both variables increase or decrease at the same time. For example, height and weight generally
form a positive relationship. Usually, the taller the person, the heavier he or she is. In a negative
relationship, as one variable increases, the other decreases, and vice-versa. Example, many
studies show that test anxiety has negative relationship with the score.
35
According to Ott and Longnecker (2004), the stronger the correlation, the better x
predicts y.
1. Highlight the data set that you are checking for relationship.
2. Click “Insert”.
3. Click “Chart”
4. Choose which scatterplot you will use.
5. Once you have the scatterplot, you can right-click the points and click “add trendline”
so that you can clearly see the relationship of your data sets.
Figure 2
Constructing Scatterplot using Microsoft Excel
This is the scatterplot when we graph the prelim and midterm examination scores. The
correlation is r = 0.649922. We can see that the trendline is also rising to the right, which is
consistent to strong positive correlation.
37
Figure 3
The Scatterplot of the Two Variables in Table 1 on Page 7
80
60
40
20
0
0 20 40 60 80 100
Historically, Karl Pearson, whose name is used in correlation coefficient, has the following
contributions in Statistics (Bluman)
• pioneered research in the area of correlation
• Histogram
• Mode
• Introduced the statistical concepts of the range, standard deviation, and coefficient of
variation
Regression Analysis
- Used if the value of correlation coefficient is significant. In the determination of the
equation of the regression line, the researcher is able to see the trend and make
predictions based on the data.
Assumptions:
1. The relationship between x and y is linear.
2. The y is distributed normally at each value of x.
3. The variance of y at every value of x is the same (homogeneity of variance)
4. The observations are independent.
Guidelines:
1. If there is no significant linear correlation, don’t use regression to predict.
2. When using the regression equation for predictions, stay within the scope of the
available sample data.
3. A regression equation based on old data is not necessarily valid at present.
4. Don’t make predictions about a population that is different from the population
from which the sample data set was drawn.
38
Figure 4
Regression Analysis Using Excel
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.649922
R Square 0.422399
Adjusted R
Square 0.39031
Standard
Error 9.818107
Observations 20
ANOVA
df SS MS F Significance F
Regression 1 1268.886 1268.886 13.16337 0.001922972
Residual 18 1735.114 96.39523
Total 19 3004
PROGRESS CHECK
During the second semester of SY 2019 – 2020, students in Mathematics in the Modern
World were asked to rate their extent of preparation and level of difficulty. Five classes were
considered for this study. Suppose 15 students from each class were randomly selected, (1) is
there a significant relationship between their prelim and midterm scores? (2) is there a
significant relationship between their extent of preparation and prelim scores? (3) is there a
significant relationship between their extent of preparation and midterm scores? (4) is there a
significant relationship between their level of difficulty and prelim scores? (5) is there a
significant relationship between their level of difficulty and midterm scores? (6) is there a
significant relationship between the extent of preparation and level of difficulty in their
prelim scores? (7) is there a significant relationship between the extent of preparation and
level of difficulty in their midterm scores?
Legend:
PES – prelim exam scores
MES – prelim exam scores
EP – Extent of preparation
LD – level of difficulty
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The McGraw-
Hill Companies, Inc.
Devore, J. (2012). Probability & Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning
Montogomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.
Ott, L. & Longnecker, M. (2004), A First Course in Statistical Methods: Thomson-
Brooks/Cole
Pierce, R. (2019). Percentiles. http://www.mathsisfun.com/data/percentiles.html
Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/
www.ablebits.com
https://www.statisticshowto.com/excel-regression-analysis-output-explained/
www.statstutor.ac.uk. Pearson’s Correlation.
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8 th Ed.). The McGraw-
Hill Companies, Inc.
Devore, J. (2012). Probability & Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning
Fraenkel, J., Wallen, N., & Hyun, H. (2012). How to Design and Evaluate Research in
Education (10th Ed.). New York: The McGraw-Hill Companies, Inc.
Glen. S. (n.d.). Kurtosis: Definition, Leptokurtic, Platykurtic. StatisticsHowTo.com
Montogomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.
Ott, L. & Longnecker, M. (2004), A First Course in Statistical Methods: Thomson-
Brooks/Cole
Pierce, R. (2019). Percentiles. http://www.mathsisfun.com/data/percentiles.html
Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/
Wattkins, J. (n.d.). An Introduction to the Science of Statistics: From Theory to
Implementation.
www.ablebits.com
https://www.statisticshowto.com/excel-regression-analysis-output-explained/
www.statstutor.ac.uk. Pearson’s Correlation.
42