0% found this document useful (0 votes)
34 views

Week 5A - Statistics Handout

The document defines key statistical concepts including: 1. Variables can be qualitative (categorical attributes) or quantitative (numerical data) and can be discrete or continuous. 2. There are four levels of measurement: nominal, ordinal, interval, and ratio. 3. A hypothesis predicts a relationship between variables, with the null hypothesis stating there is no relationship and the alternative proposing a potential outcome. 4. Common graphs used in statistics are line graphs, bar graphs, pictographs, histograms, area graphs, and scatter plots.

Uploaded by

Peter John Teves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Week 5A - Statistics Handout

The document defines key statistical concepts including: 1. Variables can be qualitative (categorical attributes) or quantitative (numerical data) and can be discrete or continuous. 2. There are four levels of measurement: nominal, ordinal, interval, and ratio. 3. A hypothesis predicts a relationship between variables, with the null hypothesis stating there is no relationship and the alternative proposing a potential outcome. 4. Common graphs used in statistics are line graphs, bar graphs, pictographs, histograms, area graphs, and scatter plots.

Uploaded by

Peter John Teves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

STATISTICS

Prepared by: Mr. Vince William A. Cabotaje, LPT


Top 2, LEPT January 2022

VARIABLES
A variable is any information, attribute, characteristics, number, or quantity that describes a person, a place,
event, thing, or idea that can be measured or counted.

VARIABLES

Qualitative Quantitative
Variable Variable

Discrete

Continuous

Qualitative Variables
✓ expresses a categorical attribute
✓ answers the question “what kind?”
Examples: Religion, Gender, Hair Color

Quantitative Variables
✓ numerical data
✓ answers the questions “how much” and “how many”

Discrete
• counted; finite or countably infinite
Examples: Number of students enrolled in CBRC

Continuous
• measured; uncountably infinite
Examples: Height, Weight, Temperature
LEVELS OF MEASUREMENT
MEASUREMENT – assignment of numerals according to rules that give these numerals quantitative meaning.

LEVELS OF MEASUREMENT
LEVEL/SCALE CHARACTERISTICS EXAMPLES
1. Nominal Merely aims to identify or label a Numbers carried on the backs of the
class of variable athletes
2. Ordinal Numbers are used to express ranks Oliver ranked 1st in his class while
to denote position in the ordering Donna is 2nd
3. Interval Assumes equal interval or distances Fahrenheit and Centigrade measures
between any two starting point at an of temperature
arbitrary zero ❖ Zero point does not mean an
absolute absence of warmth
or cold or zero in the test
does not mean complete
absence of learning.
4. Ratio Has all the characteristics of the Height, weight
interval scale except that it has an
absolute zero point. ❖ A zero weight means no
weight at all

HYPOTHESIS
A hypothesis is an approximate explanation that relates to the set of facts that can be tested by certain further
investigations. There are basically two types, namely, null hypothesis and alternative hypothesis.
Null Hypothesis
The null hypothesis is generally denoted as 𝐻0 . It states the exact opposite of what an investigator or an
experimenter predicts or expects. It basically defines the statement which states that there is no exact or
actual relationship between the variables.
̅𝟏 = 𝒙
𝑯𝟎 : 𝒙 ̅𝟐

Alternative Hypothesis
The alternative hypothesis is generally denoted as 𝐻𝑎 . It makes a statement that suggests or advises a
potential result or an outcome that an investigator or the researcher may expect. It has been categorized
into two categories: directional alternative hypothesis and non-directional alternative hypothesis.
̅𝟏 ≠ 𝒙
𝑯𝒂 : 𝒙 ̅𝟐
o Directional Hypothesis – a kind that explains the direction of the expected findings. Sometimes this
type of alternative hypothesis is developed to examine the relationship among the variables rather
than a comparison between the groups.
o Non-directional Hypothesis – a kind that has no definite direction of the expected findings being
specified.
GRAPHS
1. Line Graph
Line graphs illustrate how related data changes over a specific period of time.
One axis might display a value, while the other axis shows the timeline. Line
graphs are useful for illustrating trends such as temperature changes during
certain dates.

2. Bar Graph
Bar graphs offer a simple way to compare numeric values of any kind, including
inventories, group sizes and financial predictions. Bar graphs can be either
horizontal or vertical. One axis represents the categories, while the other
represents the value of each category. The height or length of each bar relates
directly to its value. Marketing companies often use bar graphs to display ratings
and survey responses.

3. Pictograph
A pictograph uses pictures or symbols to display data instead of bars. Each
picture represents a certain number of items. Pictographs can be useful when
you want to display data in a highly visual presentation such as an infographic.
For example, you could use a picture of a book to display how many books a
store sold over a period of a few months.

4. Histogram
A histogram is another type of bar graph that illustrates the distribution of
numeric data across categories. People often use histograms to illustrate
statistics. For example, a histogram might display how many people belong to a
certain age range within a population. The height or length of each bar in the
histogram shows how many people are in each category.

5. Area Graph
Area graphs show a change in one or more quantities over a certain period of
time. They often help when displaying trends and patterns. Similar to a line
graph, area graphs use dots connected by a line. However, an area graph involves
coloring between the line and the horizontal axis. You can use several lines and
colors between each one to show how multiple quantities add up to a whole.

6. Scatter Plot
Scatter plots use dots to depict the relationship between two different variables.
Someone might use a scatter plot graph to show the relationship between a
person’s height and weight, for example. The process involves plotting one
variable along the horizontal axis and the other variable along the vertical axis.
The resulting scatter plot demonstrates how much one variable affects the other.
If there is no correlation, the dots appear in random places on the graph. If there
is a strong correlation, the dots are close together and form a line through the
graph.
MEASURES OF CENTRAL TENDENCY/LOCATION/POINT

• Mean
The mean (or average) is the most popular and well known measure of central tendency. It can be used with
both discrete and continuous data, although its use is most often with continuous. The mean is equal to the
sum of all the values in the data set divided by the number of values in the data set.
∑𝒙
̅=
𝒙
𝒏
• Median
The median is the middle score for a set of data that has been arranged in order of magnitude. The median is
less affected by outliers and skewed data.

𝒏+𝟏 𝒕𝒉
If 𝑛 is odd 𝑴𝒅 = ( 𝟐
) 𝒕𝒆𝒓𝒎

𝒏 𝒕𝒉 𝒏 𝒕𝒉
If 𝑛 is even 𝑴𝒅 = (𝟐) 𝒕𝒆𝒓𝒎 𝒂𝒏𝒅 (𝟐 + 𝟏) 𝒕𝒆𝒓𝒎

• Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart
or histogram. You can, therefore, sometimes consider the mode as being the most popular option.

Example:
The given data shows the scores obtained by different players in a match. What is mean, median
and mode of the given data?
80, 52, 40, 52, 70, 1, 8

MEASURES OF VARIABILITY

• Range – the difference between the highest and lowest values


𝑹𝒂𝒏𝒈𝒆 = 𝑯𝒊𝒈𝒉𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆 − 𝑳𝒐𝒘𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆

• Interquartile Range – the range of the middle half of a distribution


𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏
• Variance – average distance from the mean
∑(𝒙 − 𝒙 ̅)𝟐
𝒔𝟐 =
𝒏 −𝟏

• Standard Deviation – average of squared distances from the mean


̅)𝟐
∑(𝒙 − 𝒙
𝒔 = √
𝒏 −𝟏

Interpretations:
data are more spread out
Standard Deviation data are farther from the mean
data are heterogenous

data are more clustered


Standard Deviation data are closer to the mean
data are homogenous

MEASURES OF RELATIVE POSITIONS

• Percentiles
Percentiles are measures that divide an arranged data set (from smallest to largest) into 100 equal parts. It is
denoted by 𝑃𝑘 , where 𝑘=1, 2, 3, …, 99.
Example: P70 means that you are higher than 70% of the population.

• Quartiles
Quartiles are special cases of percentiles. These are measures that divide an arranged data set (from smallest
to largest) into four equal parts. There are three quartiles: 𝑄1 the lower quartile, 𝑄2 the median or middle
quartile, and 𝑄3 the upper quartile.
First Quartile 𝑄1 = 𝑃25
Second Quartile 𝑄2 = 𝑃50
Third Quartile 𝑄3 = 𝑃75

• Deciles
Deciles are also special cases of percentiles. These are measures that divide an arranged data set (from
smallest to largest) into ten equal parts. They are denoted as 𝐷𝑘 , where 𝑘 = 1, 2, … ,9
First Decile 𝐷1 = 𝑃10
Second Decile 𝐷2 = 𝑃20
Third Decile 𝐷3 = 𝑃30
Fourth Decile 𝐷4 = 𝑃40
Fifth Decile 𝐷5 = 𝑃50 and so on…
• z-scores
The standard score or z-score measures how many standard deviation a given value 𝑥 is above or below the
mean. A positive z-score indicates that the score or observed value is above the mean, whereas a negative z-
score indicates that the score or observed value is below the mean.
𝒙−𝒙 ̅
𝒛=
𝒔
where
𝑧 = standard score 𝑥 = raw score or observed value
𝑥̅ = sample mean 𝑠 = sample standard deviation

CORRELATION
Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning
they change together at a constant rate). It's a common tool for describing simple relationships without making a
statement about cause and effect.
STATISTICAL TESTS

• z-test
In a z-test, we assume the sample is normally distributed. A z-score is calculated with population parameters
such as population mean and population standard deviation. We use this test to validate a hypothesis that
states the sample belongs to the same population. This is used for hypothesis testing for larger sample size.

• t-test
We use a t-test to compare the mean of two given samples. Like a z-test, a t-test also assumes a normal
distribution of the sample. When we don’t know the population parameters (mean and standard deviation), we
use t-test. This is also used for hypothesis testing for small sample size.

Three Versions of a t-test


1. Independent sample t-test: compares mean for two groups
2. Paired sample t-test: compares means from the same group at different times
3. One sample t-test: tests the mean of a single group against a known mean

• Chi-Square
A chi-square test is a statistical test used to compare observed results with expected results. We use the chi-
square test to compare categorical variables. The purpose of this Test of Significance to determine the
difference observed and expected frequencies of certain observations.

Two Types of Chi-Square Test


1. Goodness of fit test: determines if a sample matches the population
2. Chi-square Fit Test for Two Independent Variables: used to compare two variables in a contingency
table to check if the data fits

✓ A small chi-square value means that data fits.


✓ A large chi-square value means that data doesn’t fit.

• ANOVA
ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the
means of more than two groups. We use ANOVA to compare three or more samples with a single test.

Two Major Types of ANOVA


1. One-way ANOVA: used to compare the difference between three or more samples/groups of a single
independent variable.
2. MANOVA: Multivariate Analysis of Variance allows us to test the effect of one or more independent
variables on two or more dependent variables. In addition, MANOVA can also detect the difference in
correlation between dependent variables given the groups of independent variables.

• Pearson-r
The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a
number between –1 and 1 that measures the strength and direction of the relationship between two variables.
This involves determining if variables has a degree of the relationship.
• Spearman-rho
A Spearman correlation coefficient is also referred to as Spearman rank correlation or Spearman's rho. It is
typically denoted either with the Greek letter rho (ρ), or rs. Like all correlation coefficients, Spearman's rho
measures the strength of association between two variables.

• Kendall-tau
Kendall’s Tau is a non-parametric measure of relationships between columns of ranked data. This is a statistic
tool used to measure the ordinal association between two measured quantities and determine the strength of
dependence of the given variables. The Tau correlation coefficient returns a value of 0 to 1, where:
0 is no relationship,
1 is a perfect relationship.

• Turkey’s Test
The Tukey's honestly significant difference test (Tukey's HSD) is used to test differences among sample means
for significance. The Tukey's HSD tests all pairwise differences while controlling the probability of making one
or more Type I errors.

• Mann Whitney U Test


The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the Wilcoxon Rank Sum Test,
is used to test whether two samples are likely to derive from the same population (i.e., that the two populations
have the same shape).

• Wilcoxon Rank Sums Test


Wilcoxon rank-sum test is used to compare two independent samples, while Wilcoxon signed-rank test is used
to compare two related samples, matched samples, or to conduct a paired difference test of repeated
measurements on a single sample to assess whether their population mean ranks differ.

TYPES OF ERROR
o Type I error
A Type I error, also called alpha error (𝛼 𝑒𝑟𝑟𝑜𝑟), is committed when the researcher rejects a null
hypothesis when in fact it is true.

o Type II error
A Type II error, also called beta error (𝛽 𝑒𝑟𝑟𝑜𝑟), is committed when the researcher accepts the null
hypothesis when in fact it is false. If the researcher fails to reject a true hypothesis, then there is no
error committed.
Additional Resources:

How to Compute for Mean, Median, and Mode in a Grouped Data


https://www.youtube.com/watch?v=pk7Bj_xAzg4

How to Compute for Range, Variance, and Standard Deviation in a Grouped Data
https://www.youtube.com/watch?v=QfeRTKDFTM4

ALL RIGHTS RESERVED.

This material is protected by copyright. No part of it may be reproduced, stored in a retrieval


system, or transmitted in any from or by any means- electronic, mechanical, photocopy, recording,
or otherwise without prior written permission of the author. Under Philippine law, copyright
infringement is punishable by law.

You might also like