0% found this document useful (0 votes)
95 views25 pages

Stat-Reviewer Notes

The document discusses levels of measurement in statistics including nominal, ordinal, interval, and ratio scales. It defines what constitutes a level of measurement and explains why it is important for determining appropriate statistical analyses and interpreting data. Nominal measurement assigns values that name attributes uniquely without order. Ordinal measurement ranks attributes but distances between ranks have no meaning. Interval and ratio measurements have meaningful distances between values, with ratio also having a true absolute zero.

Uploaded by

Eleonor Dapig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views25 pages

Stat-Reviewer Notes

The document discusses levels of measurement in statistics including nominal, ordinal, interval, and ratio scales. It defines what constitutes a level of measurement and explains why it is important for determining appropriate statistical analyses and interpreting data. Nominal measurement assigns values that name attributes uniquely without order. Ordinal measurement ranks attributes but distances between ranks have no meaning. Interval and ratio measurements have meaningful distances between values, with ratio also having a true absolute zero.

Uploaded by

Eleonor Dapig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Levels of Measurement

Nominal
Ordinal
Interval
Ratio

Definition

Qualities of Variables
Exhaustive -- Should include all possible answerable responses.
Mutually exclusive -- No respondent should be able to have two attributes simultaneously (for example, employed vs.
unemployed -- it is possible to be both if looking for a second job while employed).

What Is Level of Measurement?


The relationship of the values that are assigned to the attributes for a variable

Why Is Level of Measurement Important?

Helps you decide what statistical analysis is appropriate on the values that were assigned
Helps you decide how to interpret the data from that variable

Nominal Measurement
The values “name” the attribute uniquely.
The name does not imply any ordering of the cases.
The value does not imply any ordering of the cases, for example, jersey numbers in football.
Even though player 32 has higher number than player 19, you can’t say from the data that he’s greater than or more
than the other.

Ordinal Measurement
When attributes can be rank-ordered…
Distances between attributes do not have any meaning.
Distances between attributes do not have any meaning, for example, code Educational Attainment as 0=less than
H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college
Is the distance from 0 to 1 the same as 3 to 4?

Interval Measurement
When distance between attributes has meaning…
When distance between attributes has meaning, for example, temperature (in Fahrenheit) -- distance from 30-40 is
same as distance from 70-80
Note that ratios don’t make any sense -- 80 degrees is not twice as hot as 40 degrees (although the attribute values
are).

Ratio Measurement
Has an absolute zero that is meaningful
Can construct a meaningful ratio (fraction), for example, number of clients in past six months
It is meaningful to say that “...we had twice as many clients in this period as we did in the previous six months.

Ethics in Research
• All research involving human subjects should be conducted in accordance with four basic ethical principles,
namely respect for persons, beneficence, non-maleficence, and justice.
• It is usually assumed that these principles guide the conscientious preparation of proposals for
scientific studies (World Health Organization).

In practice, these ethical principles mean that as a researcher, you need to:
Obtain obtain informed consent from potential research participants; 

Minimize minimize the risk of harm to participants; 

Protect protect their anonymity and confidentiality; 

Avoid avoid using deceptive practices; and 

Give give participants the right to withdraw from your research. 


TEST SCORES OR PERFORMANCE RATINGS

AFFECT OUR LIVES!


 “Punish” you in the face when you do poorly
 Point you toward and away from a school, curriculum or program.
 Identify strengths or weaknesses in mental ability, various skills.
 Accompany you on job interviews and influence a job or career choice.
 Used as an aid in decision-making for a job promotion, training, performance evaluation.

ERRORS IN MEASUREMENT
Refers to the collective influence of all of the factors on a test score or measurement beyond those specifically
measured by the test or measurement.
An error may be defined as the difference between the measured value and the actual value.

SCALES OF MEASUREMENT

Statistics is a branch of mathematics that deals with the collection, organization and interpretation of data.

4 BASIC COMPONENTS
Asking questions.

Collecting appropriate data.

Analyzing the data.

Interpreting the results

Within statistics, there are two main categories:

1. Descriptive Statistics: In Descriptive Statistics you are describing, presenting, summarizing and organizing your
data (population), either through numerical calculations or graphs or tables.

2. Inferential statistics: Inferential Statistics are produced by more complex mathematical calculations and allow us to
infer trends and make assumptions and predictions about a population based on a study of a sample taken from it.

The Normal Distribution (The Normal Curve)

The normal distribution is a continuous probability distribution that is symmetrical on both


sides of the mean, so the right side of the center is a mirror image of the left side.

The area under the normal distribution curve represents probability and the total area under
the curve sums to one.

Most of the continuous data values in a normal distribution tend to cluster around the mean,
and the further a value is from the mean, the less likely it is to occur. The tails are asymptotic,
which means that they approach but never quite meet the horizon (i.e. x-axis).

For a perfectly normal distribution the mean, median and mode will be the same value,
visually represented by the peak of the curve.

The Key Features of the Normal Distribution


1. Normal distributions are symmetric around their mean.
2. The mean, median, and mode of a normal distribution are equal.
3. The area under the normal curve is equal to 1.0.
4. Normal distributions are denser in the center and less dense in the tails.
5. Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).
6. 68% of the area of a normal distribution is within one standard deviation of the mean.
7. Approximately 95% of the area of a normal distribution is within two standard deviations of the mean.
Why is the normal distribution important?

The Empirical Rule (68-95-99.7)


 68% of data falls within the first standard deviation from the mean. This means there is a 68% probability of
randomly selecting a score between -1 and +1 standard deviations from the mean.
 95% of the values fall within two standard deviations from the mean. This means there is a 95% probability
of randomly selecting a score between -2 and +2 standard deviations from the mean.
 99.7% of data will fall within three standard deviations from the mean. This means there is a 99.7%
probability of randomly selecting a score between -3 and +3 standard deviations from the mean.

CENTRAL LIMIT THEOREM


As the sample size increases, • A sufficiently large sample
• Sample size equal to or
the distribution of can predict the parameters
greater than 30 are required
frequencies approximates a of a population such as the
for the central limit theorem
bell-shaped curved (i.e. mean and standard
to hold true.
normal distribution curve). deviation.

DESCRIPTIVE STATISTICS
 Descriptive Statistics is a quantitative tool for description and summary of data. Statistics include the
presentation of a trend, distribution, specific features or a certain statistic measure (e.g. mean)
 The goal is to find a graph or a few computed scores that represent the entire distribution as well as
possible
 Descriptive Statistics is different to Inferential statistics that uses data to make predictions

DESCRIBING DATA
 Distribution: set/group of scores arrayed for recording or study
 Used to analyze the (non-)deviation from the normal curve
 Graphs
 Measures of Central Tendency
 Raw Score: Straightforward, unmodified accounting of performance that is usually numerical – can be
named as ungrouped data
 Need to be transferred into meaningful results

FREQUENCY DISTRIBUTION
 All scores listed alongside the number of times each score occurred
 Simple frequency distribution: data have not been grouped – raw scores
 Grouped frequency distribution: data has been grouped

GROUPED FREQUENCY DISTRIBUTION


 Test-score intervals (class intervals) replace actual test scores
 Number of class intervals used and size/width of each class interval are for the test user to decide
 Made on the basis of convenience
 Easy to read summary of the data

GRAPH
 Diagram or chart composed of lines, points, bars, or other symbols that describe and illustrate data
 Histogram: graph with vertical lines drawn at true limits of each test score
 Bar Graph: numbers indicating frequency also appear on the y axis and reference to some
categorization appears on the x-axis
 Frequency polygon: expressed using a continuous line connecting the points where test scores or
class intervals (x-axis) meet frequencies (y-axis)
 And many more in advanced Statistics!

MEASURES OF CENTRAL TENDENCY


 Central tendency is a statistical measure that attempts to determine the single value, usually located in the
center of a distribution that is most typical or most representative of the entire set of scores
 Measure of central tendency: statistic that indicates the average or midmost score between the extreme
scores in a distribution
 Mean
 Median
 Mode
MEAN, MEDIAN, MODE

ARITHMETIC MEAN or AVERAGE


 Denoted by the symbol X́
 Most commonly used of central tendency; Takes into account actual numerical value of every score
 Can only be used with interval and ordinal data
 Computed by adding all the scores in the distribution and dividing the sum by the number of scores
 Influenced by extreme scores

MEDIAN
 Defined as the middle score in a distribution
 50th percentile
 Another commonly used measure of central tendency
 Order number in a list by magnitude, in either ascending or descending order
 If total number of scores is an odd number, median is the score that is exactly in the middle (0.5) with one
half of the remaining scores lying above it and the other half lying below it
 Can be used with ordinal, interval and ratio data
 Not influenced by extreme scores in contrast to the arithmetic mean

MODE
 Most frequently occurring score in a distribution of scores
 If there is a tie, the scores will have a bimodal distribution
 Scores might also have no mode or just one in a rectangular distribution

BIMODAL & RECTANGULAR DISTRIBUTION

VARIABILTY
 Variability provides a quantitative measure of the differences between scores and a distribution and
describes the degree to which the scores are spread out or clustered together
  the variability is a tool to describe the distribution
  variability measures how well an individual score represents the entire distribution

MEASURES OF VARIABILITY
 Variability: indication of how scores in a distribution are scattered or dispersed
 Measures of variability: statistics that describe the amount of variation in a distribution
 Range
 Interquartile range
 Semi-Interquartile range
 Mean deviation
 Standard deviation
 Variance
RANGE
 The range is the distance covered by the scores of the distribution – from the smallest score to the largest
score
 Distribution equal to the difference between the highest and the lowest scores
 It provides a quick but gross (extreme or obvious) description of the spread of scores
 One extreme score can radically alter the value of the range
 When its value is based on extreme scores in a distribution, the resulting description of variation may be
understated or overstated

INTERQUARTILE RANGE
 Quartile: refers to a specific point
 Quarter: refers to an interval
 Individual score can fall at the third quartile or in the third quarter
 Interquartile range: measure of variability equal to the difference between
 Semi-Quartile range: equal to interquartile range divided by 2
 Q1 and Q3: exactly the same distance from the median
 If distances are unequal, there is lack of symmetry (skewness)
A QUARTERED DISTRIBUTION

MEAN or AVERAGE DEVIATION


 The mean deviation is a statistical measure of the average deviation of values from the mean in a sample
 Used to describe the amount of variability in a distribution
 Value of x is obtained by subtracting the mean from the score (X-mean=x)
 Bars on each side of x indicate that it is the absolute value of the deviation score
 All deviation scores are summed and divided by the total number of scores (n) to arrive at the average
deviation
 Rarely used because deletion of algebraic signs renders it a useless measure for the purposes of any
further operations
Variance
 The Variance describes “how spread out from the center” scores are
 Variance is the computation of the squared deviation of a random variable from its mean
 Informally: how far a set of numbers are spread around their average value

STANDARD DEVIATION
 Measure of variability equal to the square root of the Variance
 Standard Deviation = Square root of the Variance
 Also named SD or ∂
 Widely used measure in psychological research
 To make meaningful descriptions and interpretations, the test-score distribution should be approximately
normal
STANDARD DEVIATION AND GRADES

Measures of Shape
SKEWNESS
 Graphs can deviate from the normal because of skew
 A skewed distribution is asymmetrical, meaning the right and left sides are not identical – mean, median and
mode are at different positions
 The nature and extent to which symmetry is absent
 Can be Positive or Negative

POSITIVE SKEWNESS
 When relatively few of the scores fall at the high end of the distribution
 Examination results may indicate that the test was too difficult
 More items that were easier would have been desirable in order to better discriminate at the lower end of
the distribution of test scores

NEGATIVE SKEWNESS
 When relatively few of the scores fall at the end of the distribution
 May indicate that the test was too easy
 Test needs more items of a higher level of difficulty
SKEWNESS

KURTOSIS
 Kurtosis is a measure of “tailedness” of the probability of a real-valued random variable
 Kurtosis describes the shape of a probability distribution
 Term used to refer to the steepness of the distribution
PLATYKURTIC
 Relatively flat
 Distribution is more clustered around the mean
 Has a relatively smaller standard deviation
 Kurtosis is less than 3 = Kurtosis< 3
 Negative kurtosis, most of the data is in the tails; very little data is in the peak.
 Happens when distribution is uniform; no mode; median and mode coincide
 Rolling a dice or flipping a coin

LEPTOKURTIC
 Has kurtosis greater than a mesokurtic distribution
 Sometimes identified by peaks that are tall and thin
 Tails are thick and heavy
 Has kurtosis higher than 3= kurtosis > 3

MESOKURTIC
 Has tails shaped roughly like a normal distribution
 Neither high or low, it is considered to be the baseline for the two other classifications
 Has a kurtosis of 3 = kurtosis = 3
Key Differences
SKEWNESS
1. The characteristic of a frequency distribution that ascertains its symmetry about the mean.
2. A measure of the degree of lopsidedness in the frequency distribution.
3. An indicator of lack of symmetry, i.e. both left and right sides of the curve are unequal, with respect to the
central point.
4. Shows how much and in which direction, the values deviate from the mean.
KURTOSIS
1. The relative pointedness of the standard bell curve, defined by the frequency distribution.
2. A measure of degree of “tailed-ness” in the frequency distribution.
3. A measure of data, that is either peaked or flat, with respect to the probability distribution.
4. Explains how tall and sharp the central peak is.

NORMAL CURVE
 Bell-shaped, smooth, mathematically defined curve that is highest at the center.
 From the center, it tapers on both sides approaching the x-axis asymptotically (approaches, but never
touches, the axis)

CHARACTERISTICS OF ALL NORMAL DISTRIBUTIONS


 50% of the scores occur above the mean and 50% of the scores occur below the mean.
 Approximately 34% of all scores occur between the mean and 1 standard deviation above the mean
 Approximately 34% of all scores occur between the mean and 1 standard deviation below the mean.
 Approximately 68% of all scores occur between the mean and +/- 1 standard deviation
 Approximately 95% of all scores occur between the mean and +/- 2 standard deviations

STANDARD SCORE
 Raw score that has been converted from one scale to another scale, where the latter scale has some
arbitrarily set mean and standard deviation.
THE NORMAL CURVE

Z-Score
 The Z-score is computed by the difference of the Score and the mean, divided by the Standard Deviation
 Why would we use the Mean and SD?
 The Z-score is computed by the difference of the Score and the mean, divided by the Standard Deviation
 Why would we use the Mean and SD?
 The mean perfectly balances the positive and negative deviation scores
 SD is an indicator for how much variability there is in a set of scores
 Mean and SD help us to interpret a distribution of scores by telling us the “center” of scores and how much
scores vary around that center
 Z is a statistical procedure that is the mathematical equivalent using the mean and the SD of a distribution
 The z-score has two main functions:
 Locating a specific score in a distribution of scores
 allowing comparisons between two scores from different distributions
 Deviation is indicated in proportion of SD
 Difference of score and mean is the deviation score
 Indicator how far the given score is above or below the mean
 If score > mean  positive z
 If score < mean  negative z

Try this!
A tutor sets a piece of English Literature coursework for the 50 students in his class, assuming that the data is
normally distributed with a mean of 60 out of 100 items and a standard deviation of 15. One student, Sarah, has
asked the tutor if, by scoring 70 out of 100, she has done well.
The tutor has also faced a dilemma. In the next academic year, he must choose which of his students have
performed well enough to be entered into an advanced English Literature class. He decides to use the coursework
scores as an indicator of the performance of his students. As such, he feels that only those students that are in the
top 10% of the class should be entered into the advanced English Literature class.

1. How well did Sarah perform in her English Literature coursework compared to the other 50 students?
2. Which students came in the top 10% of the class?

1. How well did Sarah perform in her English Literature coursework compared to the other 50 students?
 What percentage (or number) of students scored higher than Sarah and what percentage (or number) of
students scored lower than Sarah?
 z= 0.667 (.2514)
 .2514 x 100 = 25.14%
 Around 25% of the class got a better mark than Sarah (13 students).
1. How well did Sarah perform in her English Literature coursework compared to the other 50 students?
100% - 25.14% = 74.86% ( Sarah did better than a large proportion of students).

2. Which students came in the top 10% of the class?


What mark would a student have to achieve to be in the top 10% of the class and qualify for the advanced English
Literature class?

In the SNT, the closest value


to .9 is 0.8997.
0.8997 (y-axis) = 1.2
0.8997 (x-axis) = 0.08
0.8997 = 1.2 + 0.08
= 1.28
Another example

the 4 digit number (actually 5 digits in the table above) is the


P(z<Z) =
where the Z value is row A.B plus the 0.0C from the column
This is the area to the left of z=Z in normal distribution.
so for Z value A.BC
P(z< A.BC) = 4 digit number or 5 digit number at the intersection of row and column
For example , for P(z< 1.96)
you read at the intersection of Row 1.9 and column 0.06
which gives Z value 1.96
P(z< 1.96 ) = 0.97500
To get P(z> Z) = 1- P(z<Z)
to get P( X<z< Y) = P(z< Y) - P(z< X)

You might also like