0% found this document useful (0 votes)
68 views66 pages

Understanding Statistics in Education

Uploaded by

ineedmoney0072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views66 pages

Understanding Statistics in Education

Uploaded by

ineedmoney0072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT ONE

NATURE OF STATISTICS

Statistics...the most important science in the whole world: for upon it depends on the practical

application of every other science and of every art: the one science essential to all political

and social administration, all education, and all organization based on experience, for it only

gives results of our experience." Florence Nightingale’’. It is important to know how to

understand statistics so that we can make the proper judgments when a person or a company

presents us with an argument backed by data. Data are numbers with a context. To properly

perform statistics we must always keep the meaning of our data in mind

What is Statistics?

Statistics is concerned with the process of finding out about real phenomena by
collecting and making sense of data. Its focus is on extracting meaningful patterns
from the variation which is always present in the data. An important feature is the
quantification of uncertainty so that we can make firm decisions and yet know how
likely we are to be right.
Statistics (singular): is the discipline that concerns the collection, organization,
displaying, analysis, interpretation and presentation of data.
1. Statistics: a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data

From the definitions above, Educational Statistics can be defined as;


1. the body of numbers or data in the field of education. For example, school statistics –
number pupils in a school, number of teachers in a school, number of textbooks in a
school; the number of teachers in a region, number of pupils in a district.
2. the study of the methods and procedures used in collecting, organizing, analyzing and
interpreting a body of numbers related to education for information and decision
making.

Descriptive and Inferential Statistics

Statistical data is analysed in two different ways

Descriptive Statistics
The descriptive statistics is concerned with describing or summarizing the numerical
properties of data. Some of the methodologies of descriptive statistics include classification,
tabulation, graphical representation and calculation of certain indicators like mean, median,
range etc. which summarises certain important features of data.
In this method, we cannot draw any conclusion but can provide worthy information regarding
the nature of a specific group of individuals.

Inferential Statistics
Inferential Statistics is also known as statistical inference. It is concerned with the derivation
of scientific inference about the generalization of results from the study of a few particular
cases. The methods of statistical inference help in generalizing the results of a sample to the
entire population from which the sample is drawn.
For example, chi-square, t-test, ANOVA, etc.

Why Statistics?
In order to put the topic into context, let me start by listing some of the uses of statistics
before zeroing-in on the specific uses in the teaching profession
(a) Statistics helps in providing a better understanding of a phenomenon
(b) Statistics helps in systematically inquiring into an issue.
(c) Statistics helps in collecting appropriate data
(d) Statistics helps in presenting complex data in a suitable tabular, visual form
through charts and diagrams for easy understanding of the data
(e) Statistics helps in understanding the nature and pattern of variability of a
phenomenon through observations
(f) Statistics helps in drawing valid conclusions (inferences).

Uses of Statistics in the Teaching Profession

Statistics is important to the teaching profession because it helps a teacher know when
teaching has effectively been done. They can use them to determine if the class understands
the material or if they need to cover more of it through administration assignments/homework,
tests and examinations.

Statistics are important to teachers for several reasons, and not just for the obvious one of
checking on students and their progress in school.

These reasons could include: ensuring the quality of education is being kept high; monitor
student’s progress; monitor the teacher’s progress or success, and check the effectiveness of a
subject.

Statistics are produced for the size of school or college; the number of pupils or students
enrolled by gender; composition of teachers by gender, age or qualification; Workload;
Number of classes or periods taught per week; Trend analysis – enrolment, pass rates etc.

It is necessary that those involved in the provision of education at various levels have some of
the statistical skills and reasoning necessary to interpret and use that information about
institutions of learning (In this case schools, and colleges), teachers/lecturers, and
pupils/students to improve the education system.

Statistics such as achievement trends over time, or comparison data for provinces and
comparable systems can help them develop ways to improve student learning. There is a need

2
for the educators to have sufficient understanding of statistics to make use of them in the
prevention of errors in decision-making.

1. Monitoring quality of education

The quality of education is heavily dependent on the performance of teachers. Teachers can
play a very important role in monitoring progress made towards achieving the goal but
specifically with reference to a range of indicators.

The first and possible the most important reason why teachers use statistics is so that they are
able to monitor pupils’/students’ progress throughout the term, semester or year. By giving
pupils/students homework/assignments, tests and end of term/semester/year examinations,
teachers are able to keep track of pupils’/students’ performance.

2. Guidance and Counselling

A teacher plays so many roles and one of these is guidance and counselling. This is a skill
that every teacher must have as they are always involved in guiding and counselling
pupils/students. There is a positive correlation between guidance and counselling and career
decision-making. Effective guidance and counselling have a positive influence on students’
career decision-making. In order for this to happen, the use of statistics is necessary.

3. School Attendance

Encouraging regular school attendance is one of the most powerful ways of preparing
children for success both in school and in life. When you make school attendance a priority,
you help your children get better grades, develop good life habits, and have a better chance of
graduating from school/college. When students are absent for some days, their grades and
numeracy and reading skills may be affected. Every teacher keeps a record of class
attendance of his pupils/students. The attendance records are then analysed using statistics.

School Attendance Ratios can be calculated. Specifically, Net Attendance Ratio (NAR) and
Gross Attendance Ratio (GAR) are often calculated. NAR indicates participation in primary
schooling for the population age 7-13 and secondary schooling for the population age 14-18.
The Gross Attendance Ratio (GAR) measures participation at each level of schooling among
those of any age from 5 to 24 years. All these begin from a classroom.

4. Examination Results Analysis

Analysis of examination results requires the use of statistics. For example, early this year it
was announced that “54 schools scored a hundred per cent pass rate. 2016 recorded an
increased proportion of pupils obtaining BECE with a 4.9 per cent shoot up from 2015. The
per cent of boys who obtained full certificates was higher than that of girls pegged at 63.95
per cent and 59.57 per cent respectively.

The results indicate that grant-aided schools topped the performance list followed by private
and public schools. It is clear that in this examination results announcement, statistics were
used to compare the performance of pupils between 2015 and 2016, boys and girls, provincial
performance, ownership of schools versus performance. This implies that the statistical
concepts of proportion and ranking were used. The use of proportion was to standardise the

3
results in order to enable comparison since the number of pupils differed from one province
to the other. This analysis can also be done by teachers at the class level.

5. Determination of Student: teacher/lecturer Ratio

Pupil/student: teacher/lecturer ratio expresses a relationship between the number of students


enrolled in a course/subject, school, college, or university. For example, a course/subject with
a student: teacher ratio of 30:1 indicates that there are 30 pupils/students for every one
teacher/lecturer. Class size and the student: teacher/lecturer ratios are much-discussed aspects
of quality of education provision and, along with pupils’/students’ learning time.

Smaller classes are often seen as beneficial because they allow teachers/lecturers to focus
more on the needs of individual pupils/students.

6. Teachers’/lecturers’ Records

Teachers/lecturers use statistics in keeping attendance registers, performance assessment


results (assignments/homework, tests, and examinations). Statistics can be helpful to teachers
and lecturers in any number of situations as they make it possible to analyse sets of data and
come to informed conclusions about that data.

The benefits of statistics that are gathered by teachers and lecturers in classrooms can have
great effects on education institutions and can provide a lot of improvements that will
probably have been overlooked.

If these statistics are looked at and analysed properly then people will have the power to
improve in the weak areas. If this goes on every year, the quality of education will continue
to improve every year.

The most important reason why teachers/lecturers using statistics is that they are able to
monitor students’ progress throughout the school term, semester, or year. Statistics can also
be used by education institutions, in general, to assess how good the students are doing in a
particular subject or course of study.

It can also show where there is possible room for improvement and by analysing this data;
these improvements can be implemented as quickly as possible.

7. Education Attainment and School Attendance Ratios

Statistics are used in the calculation of education attainment school attendance ratios. In a
Demographic Health Survey, educational attainment is one of the variables considered in the
background characteristics since it is believed that it is one of the most influential factors
affecting people’s knowledge, attitudes, and behaviours in various aspects of life. The
education attainment is split into female and male for comparison.

8. Pupil/Student and Teacher/Lecturer Attrition

Student attrition is the number of students who leave a programme of study before it is
finished. Teacher attrition is the number of teachers/lecturers who do not continue with their
work.

4
Teachers/lecturers are being lost due to a number of reasons such as being assigned to non-
teaching jobs, expiry of contract, resignation, dismissal, retirement and death. Statistics are
very important in education in the policy formulation for decision-makers.

Variables

A variable is characteristic that varies from one person to person, text to text, or
object to object. Simply put, variables are features or qualities that change (Mack & Gass,
2005). A value is an assigned number or label representing the attribute of a given individual
or object. For example, marital status as a variable can be broken down into categories and
given values as never married - 1, married - 2, divorced - 3 and widowed - 4. The number of
children in a family as a variable can be given the values 0, 1, 2, 3, 4 etc. Height can take on
values such as 1.2 metres, 1.7 metres, 2.0 metres and 2.2 metres. Religious affiliation can be
broken down to categories and given values as Christian – 1, Moslem – 2, Traditionalist – 3,
Buddhist - 4.

Variables can be classified as ordered or unordered.


For ordered variables, the attributes differ in magnitude along a quantitative dimension. For
example, the number of pupils in a class can be 20, 25, 32, 40, 45 etc. where 25 is greater
than 20 and 40 are less than 45.
For unordered variables, the attributes are classified into two or more mutually exclusive
categories that are qualitatively different. For example, gender is classified into male and
female. Face-to-face centres of Jackson College of Education in Brong-Ahafo Region in
Ghana are Bechem, Berekum, Drobo, Dormaa Ahenkro and Sunyani.

Variables can also be classified into;


1. Categorical or Continuous or Discrete
Categorical variables are variables that can take on specific values within a degree range of
values. They can be measured with a greater degree of precision. For example, gender can be
male or female.

In contrast with categorical variables, continuous variables are variables that can take on
values along the continuum. For example, age, income, weight and height. Therefore, the
type of data produced differs from one category to another.

Discrete variables are countable in a finite amount of time. For example, you can count the
change in your pocket. Variables that can only take on a finite number of values are called
"discrete variables." All qualitative variables are discrete. Some quantitative variables are
discrete, such as performance rated as 1,2,3,4, or 5, or temperature rounded to the nearest
degree.
2. Qualitative versus Quantitative variables
Qualitative variables are those that vary in kind. Rating something as ‘attractive’ or not,
‘helpful’ or not or ‘consistent’ or not are examples of qualitative variables that vary in kind.

Whereas, reporting the number of times something happened or the number of times someone
engages in a particular behaviour are examples of quantitative variables because they provide
information regarding the amount of something (Marczik, DeMatteo, Festinger, 2005).

5
Scales of Measurement
Depending upon the traits/attributes/characteristics and the way they are measured, different
kinds of data result representing different scales of measurement.

Scales of measurement refers to the particular way that a variable is measured within
scientific research, and scale of measurement refers to the particular tool that a researcher
uses to sort the data in an organized way, depending on the level of measurement that they
have selected. Choosing the level and scale of measurement are important parts of the
research design process because they are necessary for systematized measuring and
categorizing of data, and thus for analysing it and drawing conclusions from it as well that are
considered valid.

Within science, there are four commonly used levels and scales of measurement: nominal,
ordinal, interval, and ratio. Each level of measurement and its corresponding scale is able
to measure one or more of the four properties of measurement, which include identity,
magnitude, equal intervals, and a minimum value of zero.

There is a hierarchy of these different levels of measurement. With the lower levels of
measurement (nominal, ordinal), assumptions are typically less restrictive and data analyses
are less sensitive. At each level of the hierarchy, the current level includes all the qualities of
the one below it in addition to something new. In general, it is desirable to have higher levels
of measurement (interval or ratio) rather than a lower one. Let’s examine each level of
measurement and its corresponding scale in order from lowest to highest in the hierarchy.

The Nominal Level and Scale

A nominal scale is used to name the categories within the variables you use in your research.
This kind of scale provides no ranking or ordering of values; it simply provides a name for
each category within a variable so that you can track them among your data. Which is to say,
it satisfies the measurement of identity, and identity alone.

Common examples within sociology include the nominal tracking of sex (male or
female), race (White, Black, Hispanic, Asian, American Indian, etc.), and class (poor,
working-class, middle class, upper class). Of course, there are many other variables one can
measure on a nominal scale.

The nominal level of measurement is also known as a categorical measure and is considered
qualitative in nature. When doing statistical research and using this level of measurement,
one would use the mode, or the most commonly occurring value, as a measure of central
tendency.

The Ordinal Level and Scale

Ordinal scales are used when a researcher wants to measure something that is not easily
quantified, like feelings or opinions. Within such a scale the different values for a variable are
progressively ordered, which is what makes the scale useful and informative. It satisfies both
the properties of identity and of magnitude. However, it is important to note that as such a
scale is not quantifiable—the precise differences between the various categories are
unknowable.

6
Within sociology, ordinal scales are commonly used to measure people's views and opinions
on social issues, like racism and sexism, or how important certain issues are to them in the
context of a political election. For example, if a researcher wants to measure the extent to
which a population believes that racism is a problem, they could ask a question like "How big
a problem is a racism in our society today?" and provide the following response options: "it's
a big problem," "it is somewhat a problem," "it is a small problem," and "racism is not a
problem."

When using this level and scale of measurement, it is the median which denotes the central
tendency.

The Interval Level and Scale

Unlike nominal and ordinal scales, an interval scale is a numeric one that allows for ordering
of variables and provides a precise, quantifiable understanding of the differences between
them (the intervals between them). This means that it satisfies the three properties of identity,
magnitude, and equal intervals.

Age is a common variable that sociologists track using an interval scale, like 1, 2, 3, 4, etc.
One can also turn non-interval, ordered variable categories into an interval scale to
aid statistical analysis. For example, it is common to measure income as a range, like ¢0-
¢9,999; ¢10,000-¢19,999; ¢20,000-¢29,000, and so on. These ranges can be turned into
intervals that reflect the increasing level of income, by using 1 to signal the lowest category,
2 the next, then 3, etc.

Interval scales are especially useful because they not only allow for measuring the frequency
and percentage of variable categories within our data, they also allow us to calculate the mean,
in addition to the median, mode. Importantly, with the interval level of measurement, one can
also calculate the standard deviation.

The Ratio Level and Scale

The ratio scale of measurement is nearly the same as the interval scale, however, it differs in
that it has an absolute value of zero, and so it is the only scale that satisfies all four properties
of measurement.

A sociologist would use a ratio scale to measure actual earned income in a given year, not
divided into categorical ranges, but ranging from $0 upward. Anything that can be measured
from absolute zero can be measured with a ratio scale, like for example the number of
children a person has, the number of elections a person has voted in, or the number of friends
who are of a race different from the respondent.

One can run all the statistical operations as can be done with the interval scale, and even more
with the ratio scale. In fact, it is so-called because one can create ratios and fractions from the
data when one uses a ratio level of measurement and scale.

7
Comparism

To give a better overview of the values in 'Mathematical Operators', 'Advanced operations'


and 'Central tendency' are only the ones this level of measurement introduces. The
complete list includes the values of previous levels. This is inverted for the 'Measure
property'.
Incremental Mathematical Advanced Central
Measure property
progress operators operations tendency
Classification,
Nominal =, ≠ Grouping Mode
membership
Ordinal Comparison, level >, < Sorting Median
Mean,
Interval Difference, affinity +, − Yardstick
Deviation
Geometric mean,
Ratio Magnitude, amount ×, / Ratio Coefficient of
variation

8
UNIT TWO

DATA MANAGEMENT AND REPRESENTATION

Data are a set of facts and provide a partial picture of reality. Whether data are being
collected with a certain purpose or collected data are being utilized, questions regarding what
information the data are conveying, how the data can be used, and what must be done to
include more useful information must constantly be kept in mind.

Since most data are available to researchers in a raw format, they must be summarized,
organized, and analyzed to usefully derive information from them. Furthermore, each data set
needs to be presented in a certain way depending on what it is used for. Planning how the
data will be presented is essential before appropriately processing raw data.

First, a question for which an answer is desired must be clearly defined. The more detailed
the question is, the more detailed and clearer the results are.

Sources and Data Collection

The type of data collected and how the associated sampling takes place depend on the
statistical question asked.

There are a number of aspects of data collection that need to be considered when carrying out
a statistical investigation. Draw the frequency distribution table and used it to analyse data
3. Explain and use appropriate text, graphs and tables to summarise your data.

 Choose the questions (e.g. "Do you like …?", "How tall is …?") that will determine
the type of data collected (e.g. categorical, numerical).
 Choose a sample size that will allow confidence in the conclusion.
Use repeated sampling to demonstrate how much variation occurs in samples of
different sizes.
 Avoid bias in data collection.
Explore sources of bias in various contexts where data are collected.
 Understand the importance of random selection.
Explore various methods of random sampling.
 Know the relationship between samples and populations.
Learn to define populations and samples in various statistical contexts.

Methods of Data Collection:


a. Obtrusive data collection methods that directly obtain information from those
being evaluated e.g. interviews, surveys, focus groups, observation, case study,
questionnaires.
b. Unobtrusive data collection methods that do not collect information directly
from examinees. e.g.: Document analysis, observation at a distance.
Data Collection Tools: Participatory Methods. Records and Secondary Data, Observation,
Surveys and Interviews, Focus Groups, Diaries, Journals, Self-reported Checklists, Expert
Judgment, Delphi Technique. Other Tools. -scales (weight), tape measure, stopwatches, and

9
chemical tests: i.e. quality of water, - health testing tools: i.e. blood pressure. -citizen report
cards.

DATA MANAGEMENT TECHNIQUES


Data Management is concerned with “looking after” and processing data it involves: Looking
after field data sheets, entering data into computer files, checking and correcting the raw data
preparing data for analysis, documenting and archiving the data and meta-data. Data
Management is the consolidation of data (and meta-data) in a way that is easy to manipulate,
retrieve and maintain.

WHY is Data Management Important?


 Ensures data for analysis are of high quality so that conclusions are correct
 Good Data Management allows further use of the data in the future and enables
efficient integration of results with other studies.
 Good Data Management leads to improved processing efficiency, improved data
quality, improved meaningfulness of the data.

Data Management Problems


 Lack of skills –the inability to use software or set up data checking procedures
 Multiple copies of files
 No one with responsibility for checking data
 No clear policy on archiving or making data available
 Lack of documentation
 Multiple entries of the same data
 Hand pre-processing of data
Basically, three data management techniques will be discussed in this unit
namely:

Data trimming is the process of removing or excluding extreme values, or outliers, from
a data set. Data trimming is used for a number of reasons and can be accomplished using
various approaches. As social scientists, communication researchers often work with data
sets that may require the removal of outliers to strengthen a statistic and accomplish a
number of research goals. It is important to understand the impact outliers can have on
data and the approaches available to eliminate or censor these extreme values without
compromising the data set.

Data Winsorizing or winsorization is the transformation of statistics by limiting extreme


values in the statistical data to reduce the effect of possibly spurious outliers. Winsorization is
a way to minimize the influence of outliers in your data by either:

 Assigning the outlier a lower weight,


 Changing the value so that it is close to other values in the set.

Basic Method to Winsorize by Hand

1. Analyze your data to make sure the outlier isn’t a result of measurement error or
some other fixable error.

10
2. Decide how much Winsorization you want. This is specified as a total percentage of
untouched data. For example, if you want to Winsorize the top 5% and bottom 5% of
data points, this is equal to 100% – 5% – 5% = 90% Winsorization. An 80%
Winsorization means that 10% is modified from each tail area.
3. Replace the extreme values by the maximum and/or minimum values at the threshold.

For example:

 The following data set has several (bolded) extremes:


{0.1,1,12,14,16,18,19,21,24,26,29,32,33,35,39,40,41,44,99,125}
Mean = 33.405.
 After modifying the top and bottom 10% (I matched those values to the
nearest extreme):
{12,12,12,14,16,18,19,21,24,26,29,32,33,35,39,40,41,44,44,44}
80% Winsorized mean = 24.95.

Note that winsorizing is not equivalent to simply excluding data, which is a simpler
procedure, called trimming or truncation, but is a method of censoring data. In a trimmed
estimator, the extreme values are discarded; in a winsorized estimator, the extreme values
are instead replaced by certain percentiles (the trimmed minimum and maximum).

Bootstrapping is any test or metric that relies on random sampling with replacement.
Bootstrapping allows assigning measures of accuracy (defined in terms of bias, variance,
confidence intervals, prediction error or some other such measure) to sample estimates. This
technique allows estimation of the sampling distribution of almost any statistic using random
sampling methods. Generally, it falls in the broader class of resampling methods.

Bootstrapping is the practise of estimating properties of an estimator (such as its variance) by


measuring those properties when sampling from an approximating distribution. One standard
choice for an approximating distribution is the empirical distribution function of the observed
data. In the case where a set of observations can be assumed to be from an independent and
identically distributed population, this can be implemented by constructing a number of
resamples with replacement, of the observed data set (and of equal size to the observed data
set).

A bootstrap sample is a smaller sample that is “bootstrapped” from a larger sample.


Bootstrapping is a type of resampling where large numbers of smaller samples of the same
size are repeatedly drawn, with replacement, from a single original sample.

For example, let’s say your sample was made up of ten numbers: 49, 34, 21, 18, 10, 8, 6, 5, 2,
1. You randomly draw three numbers 5, 1, and 49. You then replace those numbers into the
sample and draw three numbers again. Repeat the process of drawing x numbers B times.
Usually, original samples are much larger than this simple example, and B can reach into the
thousands. After a large number of iterations, the bootstrap statistics are compiled into a
bootstrap distribution. You’re replacing your numbers back into the pot, so your resamples
can have the same item repeated several times (e.g. 49 could appear a dozen times in a dozen
resamples).

11
Bootstrapping is loosely based on the law of large numbers, which states that if you sample
over and over again, your data should approximate the true population data. This works,
perhaps surprisingly, even when you’re using a single sample to generate the data.

 An empirical bootstrap sample is drawn from observations.


 A parametric bootstrap sample is drawn from a parameterized distribution (e.g. a normal distribution).

PRESENTATION OF DATA

Ungrouped vs. Grouped Data

Data can be classified as grouped or ungrouped. Ungrouped data are data that are not
organized, or if arranged, could only be from highest to lowest or lowest to highest.
Grouped data are data that are organized and arranged into different classes or categories.
This refers to the organization of data into tables, graphs or charts, so that logical and
statistical conclusion can be derived from the collected measurements.

Principles of Data Presentation


(a) To arrange the data in such a way that it should create interest in the reader’s mind at first
sight.
(b) To present the information in a compact and concise form without losing important details.
(c) To present the data in a simple form so as to draw the conclusion directly by viewing at
the data.
(d) To present it in such a way that it can help in further statistical analysis.

Data may be presented in (3 Methods):


A. Textual
B. Tabular
C. Graphical.

Textual Presentation
Data can be presented using paragraphs or sentences. It involves enumerating important
characteristics, emphasizing significant figures and identifying important features of data.
 The data gathered are presented in paragraph form.
 Data are written and read.
 It is a combination of texts and figures.

Example:
Of the 150-sample interviewed, the following complaints were noted: 27 for lack of books in
the library, 25 for a dirty playground, 20 for lack of laboratory equipment, 17 for a not well-
maintained university building.
Example.
You are asked to present the performance of your section in the Statistics test. The following
are the test scores of your class:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46

12
Solution First, arrange the data in order for you to identify the important characteristics. This
can be done in two ways: rearranging from lowest to highest or using the stem-and-leaf plot.
Below is the rearrangement of data from lowest to highest:
9 23 28 35 38 43 45 48 17 24 29 37 39
43 45 49 18 25 34 38 39 44 46 50 20 26
34 38 39 44 46 50 23 27 35 38 42 45 46
50

OR
In the Statistics class of 40 students, 3 obtained the perfect score of 50. Sixteen students got a
score of 40 and above, while only 3 got 19 and below. Generally, the students performed
well in the test with 23 or 70% getting a passing score of 38 and above.

Graphical presentation kinds of graphs or diagrams


1. Bar graph – used to show relationships/ comparison between groups
2. Pie or circle graph- shows percentages effectively
3. Line graph – most useful in displaying data that changes continuously over time.
4. Pictograph – or pictogram. It uses small identical or figures of objects called isotopes in
making comparisons each picture represents a definite quantity.
a graphical method can be grouped into quantitative and qualitative data.
For quantitative: Histogram
Frequency polygon
Frequency Curve
Line Chart
Normal distribution cure
Cumulative distribution curve
Scatter diagram

For qualitative: Map diagram


Bar chart
Pie chart
Pictogram

Bar Chart/Graph

Data that are from nominal scales or categorical and are represented in graphic form
with the use of bar graphs. Bar graphs give a pictorial description of the data and emphasize
how groups compare with one another. They are used to compare the sizes of the various
parts. The height of the bars is the basis for the comparisons and not the area of the bars.
Data is presented in the form of a rectangular bar of equal breadth. Each bar represents one
variant /attribute. The suitable scale should be indicated and the scale starts from zero. The
width of the bar and the gaps between the bars should be equal throughout. The length of the
bar is proportional to the magnitude/ frequency of the variable. Bar graphs are either column
or horizontal. Column graphs are more popular in education. Column bar graphs are simple,
compound (multiple) or component. Examples are shown below.

13
Figure 2 is a compound column bar graph showing school enrolment at Ayeduase Basic
School by gender.

Component Bar Chart


It is known as a composite or stacked bar chart. It is used when a set of data combines to
form a total. The total is the length/height of the bar. It allows for visual comparisons
between different components ie how components contribute to the total of the category.

Figure 2: School enrolment at Ayeduase Basic School by gender.

Constructing bar graphs/charts


1. Draw two axes, a vertical and horizontal. Label the vertical axis by the source of the
values/scores e.g. enrolment, points etc. Label the horizontal axis by the names of the
categories.
2. Divide vertical scale by points considering the lowest value and the highest value.
Choose appropriate scales such that the bars are not too tall or too short and must start
with zero.

14
3. Construct equally wide and equally spaced bars for each category with the height of the
bar being the value/score for the category on the horizontal axis, which has the names of
the categories as the label.
4. Where computer software such as Microsoft Excel and SPSS are not available, it is
recommended that graph sheets be used.
5. Shade/colour the bars to differentiate bars and components.

Strengths and Limitations

1. Bar charts are easy to draw


2. Values can be read easily from the vertical axis.
3. Comparisons can be made easily and the significance of the information is easily grasped.
4. It cannot be effective with interval and ratio scales of measurement
5. Extreme values distort comparisons especially if some bars are too short and some very
tall.
6. For component bar graphs, too many subgroups make the graph crowded.

Uses

Teachers can use bar graphs in several ways. Enrolment by classes, courses and subjects
and inter-house competitions can be represented by bar graphs.

Pie Chart
Pie charts use nominal or categorical data. Pie charts are represented in the form of a circle
of 360 0 sliced into the shape of ‘pies. Each pie is cut from an angle at the centre of the circle.
The angle corresponds to the data for each category or group. Pie charts give a pictorial view
and the contributions of the parts that make a whole. An example is shown below.

Table 1: Performance in Jackson SRC games in Ashanti Region


Centre Total Points Degrees To calculate each degree:
Louis 120 72 For Louis Centre
KASS 100 60 120
X 360 0 =72
Agogo 130 90 600

Akrokerri 80 36 For KASS centre

Mampong 170 102


Total 600 360

15
100
X 360 0 =60
600
Figure3: Performance
in Jackson SRC games
in Ashanti Region

Constructing pie
charts
1. Calculate the degree
equivalents for the
value of each
category/group by
dividing the total
point for each group
by the overall total
points and multiple
the result by 3600.
For example, for
Louis centre above we
have:
120 100
 360 0  72 0 and for KASS centre, we have  360 0  60 0
600 600
2. Use a pair of compass and protractor to draw the circle and the sectors based on the
degrees calculated.
3. Shade/Colour the sectors to differentiate one from the other.

Strengths and limitations


1. Individual parts of the whole are seen and can be compared.
2. It provides a visual impression of the proportion that each part contributes to the overall
total.
3. Angles are harder to compare.
4. They are not easy to draw especially where statistical software and computers are not
available.
5. Data that are in continuous form and of ratio and interval scales are not appropriate.
6. The values of each component cannot be read from the chart but must be provided.
7. It is not useful where there are many parts as these parts become too small.
8. It only gives a visual impression but not the details of the data.

Uses
Pie charts can be used by teachers and educational practitioners for examination results
by the number of passes in various subjects, school enrolment by class, form or subjects.

Line graphs

Data that are related to time are best used for line graphs. Time could be days, weeks,
months and years. Line graphs show changes in the data over a period of time. Data
from interval and ratio scales are most appropriate. Line graphs could be simple or

16
compound. Simple line graphs give a pictorial description of the data. Compound line
graphs compare group data over a period of time.

Examples are shown below.

Table 2: Attendance at monthly teachers’ workshops


Month Total
January 120
February 85
March 100
April 150
May 90
June 85
July 100
August 60
September 90
October 75
November 100
December 150

Figure 4 is a simple line graph showing attendance at a monthly teachers’ workshop.

Figure 4: Attendance at monthly teachers’ workshops

Table 3: Attendance at monthly teachers’ workshops


Attendance
Month Female Male
Jan 60 60
Feb 45 40
March 60 40
April 80 70
May 50 40
June 40 45
July 50 50
August 25 35

17
Sep 40 50
Oct 30 35
Nov 50 50
Dec 90 60

Compound line graph showing attendance at a monthly teachers’ workshop by gender.

Figure 5: Attendance at monthly teachers’ workshops by gender

Constructing line graphs


1. Draw two axes, a vertical and horizontal. Label the vertical axis by the source of the
values/scores e.g. attendance, enrolment, points etc. Label the horizontal axis by the time
period e.g. months, days, weeks etc.
2. Divide vertical scale by points considering the lowest value and the highest value.
Choose appropriate scales such that the graph is not too tall or too flat and must start with
zero.
3. Plot the value/quantity for each time period on the graph and join all the points by a
straight line.
4. Where computer software such as Microsoft Excel and SPSS are not available, it is
recommended that graph sheets be used.

Strengths and Limitations


1. Values can be read easily from the vertical axis.
2. Comparisons can be made easily and the significance of the information easily grasped.
3. They are less appropriate for nominal scale data.
4. Graphs are distorted where there are extreme values.

Uses
Teachers and educational practitioners can use line graphs in several ways. Examination
results over a period of years in a subject, total school enrolment as well as enrolment by
subjects and courses for a period of time can be represented by line graphs.
Tabulation

18
Tables are the devices, that are used to present the data in a simple form. It is probably the
first step before the data is used for analysis or interpretation.

General principles of designing tables


a) The tables should be numbered e.g. ., Table 1, Table 2 etc.
b) A title must be given to each table, which should be brief and self-explanatory. c) The
headings of columns or rows should be clear and concise.
d) The data must be presented according to size or importance chronologically,
alphabetically, or geographically.
e) If percentages or averages are to be compared, they should be placed as close as
possible.
f) No table should be too large
g) Most of the people find a vertical arrangement better than a horizontal one because it
is easier to scan the data from top to bottom than from left to right
h) Footnotes may be given, where necessary, providing explanatory notes or additional
information.

Types of tables
1) Simple tables: Measurements of a single set are presented
2) Complex tables: Measurements of multiple sets are presented

Simple Table

When characteristics with values are presented in the form of a table, it is known as a
simple table e.g. Table Infant mortality rate of selected countries in 2004.

Table 5: The Infant mortality rate of selected regions in Ghana in 2004


Name of region Infant mortality rate
Central 90
Western 60
Eastern 26
Northern 60

Frequency distribution table

In the frequency distribution table, the data is first split up into convenient groups (class
interval) and the number of items (frequency) which occur in each group is shown in adjacent
columns. Hence it is a table showing the frequency with which the values are distributed in
different groups or classes with some defined characteristics.

Rules for construction of frequency table

1) The class interval should not be too large or too small


2) The number of classes to be formed more than 8 and less than 15
3) The class interval should be equal and uniform throughout the classification.
4) After construction of the table, proper and clear heading should be given to it

19
5) The base or source of data should be mentioned with the pattern of analysis in the footnote
at the end of the table
Example of grouped, relative, and cumulative frequency distributions of serum cholesterol
levels in 200 men.

Table 6: Frequency distributions of serum cholesterol levels in 200 men


Interval Frequency f relative f cumulative f
251- 260 5 2.5 200 0
241- 250 13 6.5 195.0
231- 240 19 9.5 182.0
221- 230 18 9.0 163.0
211- 220 38 19.0 145.0
201- 210 72 36.0 107 0
191- 200 14 7.0 35.0
181- 190 12 6.0 21.0
171- 180 5 2.5 9.0
161- 170 4 2.0 4.0

Features
1. Class. A group of scores.
2. Class interval. The range within which a group of scores lie. It has a number at the
beginning and at the end. E.g. 90- 95.
3. Unequal Class Interval. These result where there are differences in the range of the
intervals. E.g. 91 – 95, 91 - 100
4. Open-ended classes. These are classes with a value at the beginning of the end. e.g. 90
and above, 45 and below, below 46, above 90.
5. Class limits. The endpoints of a class interval. The smaller number is the lower limit
and the bigger number is the upper limit.
6. Class boundaries. The exact or real limits of a class interval. The lower-class
boundaries are obtained by subtracting 0.5 from the lower-class limit. The upper-class
boundaries are obtained by adding 0.5 to the upper-class limits. A class interval with
limits of 91 – 95 produces class boundaries of 90.5 - 95.5
7. Class size/class width. The number of distinct/discrete scores within a class interval.
They are obtained by finding the difference between successive lower-class limits or
upper-class limits in cases of equal class intervals. They can also be obtained by finding
the difference between successive class marks in cases of equal class intervals or between
class boundaries for each interval.
8. Class mark: The midpoint for each class interval.
9. Frequency: The number of distinct scores from the given data that can be found in a
class interval.
10. Cumulative frequency. The successive sum of the frequencies starting from the
frequency of the bottom class.
11. Cumulative percentage frequency. The successive sum of the percentage frequencies
starting from the percentage frequency of the bottom class. It is also obtained by
expressing each cumulative frequency as a percentage.
12. Relative frequency. It is obtained by dividing each frequency by the total frequency.

20
13. Cumulative relative frequency. The successive sum of the relative frequencies starting
from the frequency of the bottom class.

Constructing a grouped frequency distribution table


1. Draw a table with four columns with the headings – Class, Class mark, Tally, Frequency.
2. Determine the range i.e., the difference between the highest score and the lowest score.
For example, from the raw scores of the 40 students in the Statistics examination, the
highest score is 93 and the lowest is 49. The range becomes 93 – 49 = 44.
3. Arbitrarily decide on class size. Popular sizes are 3, 5, 7, 10. Odd-numbered class sizes
make computations easier. In Education, the most popular sizes are 5 and 10.
4. Determine the approximate number of classes by dividing the range by the class size. E.g.
44
suppose a size of 5 is taken. The approximate number of classes would be  8 .8
5
which is rounded to 9 classes. Generally, the number of classes is between 5 and 20.
Alternatively, arbitrarily decide on the number of classes. This is normally between 10 and
20 for large data sets. Determine the approximate class width by dividing the range by the
number of classes. Suppose the number of classes taken is 10. The approximate class size
44
becomes,  4.4 which is rounded to 5.
10
5. Identify the highest value/score and write it down.
6. To obtain the topmost class, decide on whether to start with the lower or the upper limit
of the class interval. Determine the closest value to the highest value identified in Step 5
that is a multiple of the class size. Choose one of the values as a limit and use the class
size to determine the other limit. For example, if class size is 5, and the highest score is
83, then the closest values are 80 and 85. A possible lower limit is 80 and a possible
upper limit is 85. Selecting the lower limit, the topmost class becomes 80-84 or selecting
the upper limit the topmost class becomes 91-95.
7. Complete the first column with the rest of the classes using equal class sizes, then the
second column with the class marks.
8. Tally the scores in the third column which is the tally column. Take the scores one by
one and place slashes (f) or tally marks in the respective classes. Where the tallies are
five, bind them into one unit to facilitate counting.
9. Count the number of slash or tally marks and write them in the frequency column. Add
the frequencies and put the total at the bottom of the frequency column.

Points to note in constructing Frequency Distributions


1. In Education, the highest-class intervals/classes are at the top so this convention must be
followed in constructing the frequency distribution table.
2. Use mutually exclusive classes. Make sure that an observation falls into one and only
class. Classes must not overlap at the class limits. For example, 50 – 60 and 60 – 70
contain overlapping class limits of 60.
3. There should be no class with a zero frequency. If this occurs, it is recommended that the
class size is changed. Preferably increase class size.
4. Open-ended classes should be avoided. These classes have only the lower limit if it is the
class at the top, or the upper limit if it is the class at the bottom. For example, 51 and
above, 20 and below.

21
5. Aim at classes with equal sizes or width. This facilitates the interpretation of the
information from the frequency distribution.
6. The number of classes should not be too small (i.e. not less than 5) and not too large (i.e.
not more than 20). Where the number of classes is less than 5, class size should be
reduced but when the number of classes is more than 20, the class size should be
increased.

GRAPHIC REPRESENTATIONS OF FREQUENCY DISTRIBUTIONS

Graphs are useful methods for presenting simple data.


1. They have a powerful impact on the imagination of people.
2. Gives information at a glance.
3. Diagrams are better retained in memory than a statistical table.
4. However, graphs cannot be substituted for a statistical table, because the graphs
cannot have mathematical treatment whereas tables can be treated mathematically.
5. Whenever graphs are compared, the difference in the scale should be noted.
6. It should be remembered that a lot of details and accuracy of original data is lost in
charts and diagrams, and if we want the real study, we have to go back to the original
data.

Histogram

Histograms use data from the ratio or interval scale and depend on frequency distributions. It
uses the classes and the frequencies from the frequency distribution table. Used for
quantitative, continuous variables. It is used to present variables which have no gaps e.g. age,
weight, height, blood pressure, blood sugar etc. It consists of a series of blocks. The class
intervals are given along the horizontal axis and the frequency along the vertical axis. An
example is shown below.

To construct a histogram
1. Draw two axes, a vertical and horizontal. Label the vertical axis by frequency and the
horizontal axis scores/classes.
2. Select an appropriate scale on the vertical axis considering the highest/largest value.
When using a graph sheet, the scale should be such that the bars are not too tall nor too
short.
3. Use class midpoints/marks or class boundaries or class limits to label the points on the
horizontal axis.
4. Drawbars of equal width representing the classes from a frequency distribution table with
corresponding heights as the frequencies.

Importance
1. It gives a pictorial description of the raw data, providing information about the nature of
the data.
2. It gives the direction of performance in terms of academic performance (i.e. skewness).

22
F F
40 40
r r
e e
30 30
q q

20 20

10 10

0 5 10 15 20 25 30 0 5 10 15 20 25 30
Classes Classes

Skewed to the right Skewed to the left


Group performance tends to be low Group performance tends to be high

3. It provides an estimate of the most typical score. This is the intersection of the two
diagonals of the tallest bar.
Frequency Polygon

Frequency polygon uses data from ratio or interval scales and depends on frequency
distributions. It uses the classes and the frequencies from the frequency distribution table.
An example is shown below.

F
r
e
q

Classes

To construct a frequency polygon;

1. Draw two axes, a vertical and horizontal. Label the vertical axis by frequency and the
horizontal axis scores/classes.
2. Select an appropriate scale on the vertical axis considering the highest/largest value.
When using a graph sheet, the scale should be such that the polygon is not too pointed or
too short.

23
3. Use class midpoints/marks or class boundaries or class limits to label the points on the
horizontal axis.
4. Plot at the midpoint of each class or the midpoint of the histogram the relevant heights as
the frequencies. Join the midpoints with a straight line.
5. Where the line has not touched the horizontal axis, extend the line one class in that
direction so that the polygon touches the horizontal axis.

Importance
1. It gives a pictorial description of the raw data, providing information about the nature of
the data.
2. It provides an estimate of the most typical score. This is the point on the horizontal axis
where the highest point of the polygon is located.

Most typical score

3. It is used to compare the performance of groups. E.g. Performance in a class test for
Forms 1 and 2 can be shown as follows.

Form 1

Form 2

10 20

The diagram shows that Form 2 class, which is more to the right, performs better. The most
typical scores, where the highest point of the polygon is located can be used to confirm the
comparisons. Where the total frequencies are not the same, use relative frequencies in place
of the actual frequencies to draw the polygon.

4. It gives the direction of performance (skewness). Consider three classes, A, B, C.

A B C

Positive skewness Normal Negative skewness


Skewed to the right Skewed to the left
Tends to score low marks Tends to score high marks

24
Cumulative Percentage Frequency Polygon (Ogive)
Ogives are drawn from frequency distribution tables. Data from ratio or interval
scales are most appropriate.

25
Plot the graph using the upper-class boundaries of each class against the cumulative
percentage frequencies.

C 100
U 80
M 70
% 60
50
F 40
R 30
E 20
Q 10

0 10 20 30 40 50 60 70 80
CLASSES

To construct an ogive,
1. Obtain cumulative percentage frequencies.
2. Plot the cumulative percentage frequencies in each class on the vertical scale. Choose
appropriate scales, on a graph sheet, such that the ogive is not distorted.
1. Label the horizontal axis as scores or classes.
2. Plot at the upper-class boundary of each class the relevant values of the cumulative
frequency. Join the points with a straight line.
5. Extend the line one class to the left so that the polygon touches the horizontal axis.

Importance
1. It is used for comparisons of distributions of performance especially for distributions
where the class/group sizes are not the same. Generally, the graph that moves more to the
right has better performance. The median score obtained at the cumulative frequency of
50 is also used.

26
Given the following performances in a test, draw two ogives. Which school performed better?

School A School B
Classes Frequency Cum. % Freq Frequency Cum. % Freq
91 - 100 1 100 7 100
81 – 90 2 99 17 95.3
71 – 80 11 97 30 84
61 – 70 24 86 25 64
51 – 60 20 62 15 47.3
41 – 50 16 42 11 37.3
31 – 40 12 26 19 30
21 – 30 8 14 14 17.3
11 - 20 4 6 6 8
1 - 10 2 2 6 4
Total 100 150

2. It is used to determine percentiles and percentile ranks. Later in the course, you will learn
how to obtain the percentiles and percentile ranks

Box and Whisker Plot


It is used to compare distributions by noting the 10th percentile (P10), first quartile (Q1),
median (Q2), third quartile (Q3), and 90th percentile (P90).

A box and whisker plot is drawn below. Later in the course, you will learn how to obtain the
percentiles and quartiles.

Q1 Q2 Q3
P10 P90

An example.

Assume that the following values were obtained for two classes, Form 1A and Form 1B in a
class test in Mathematics.

P10 Q2 P90
Form 2A 15 42 73
Form 2B 29 56 93
The information is presented below by two box and whisker plots.
Form 2A

27
Q1 Q2 Q3
P10 P90

Form 2B
Q1 Q2 Q3
P10 P90

0 25 50 75 100
15 29 42 56 73 93
It can be observed that P10, Q1, Q2, Q3, and P90 values are greater in Form 1B than in Form
2A. This means that performance is better in Form 1B than in Form 2A.
Also, note that the graph for Form 1B has moved more to the right towards higher values than
that of Form 2A.
UNIT THREE
APPLICATION OF THE CENTRE OF A DISTRIBUTION
We have discussed some interesting features of a quantitative data set and learned how to
look for them in pictures (graphs).

The description of statistical data may be quite elaborate or quite brief depending on two
factors: the nature of data and the purpose for which the same data have been collected.
While describing data statistically or verbally, one must ensure that the description is neither
too brief nor too lengthy. The measures of central tendency enable us to compare two or more
distributions pertaining to the same time period or within the same distribution over time.

This unit focuses on numerical summaries of the centre of the distribution.


In this unit, we will discuss some ways of generating mathematical summaries of
distributions. In these equations, we will make use of some statistical notation. We will
always use n to refer to the total number of cases in our data set. When referring to the
distribution of a variable we will use a single letter (other than n). Several of our formulas
involve summations, represented by the ∑ symbol. This is a shorthand for the sum of a set of
values.

MEASURES OF CENTRAL TENDENCY/LOCATION

What is “central tendency,” and why do we want to know the central tendency of a group of
scores? Let us first try to answer these questions intuitively. Then we will proceed to a more
formal discussion.

ACTIVITY
Imagine this situation: You are in a class with just four other students, and the five of you
took a 5-point quiz. Today your tutor is walking around the room, handing back the quizzes.

28
He/she stops at your desk and hands you, your paper. Written in bold black ink on the front is
“3/5.”

How do you react?

Are you happy with your score of 3 or disappointed? How do you decide? You might
calculate your percentage correct, realize it is 60%, and be appalled. But it is more likely that
when deciding how to react to your performance, you will want additional information. What
additional information would you like? If you are like most students, you will immediately
ask your classmates, “What did you get?” and then ask the tutor, “How did the class do?” In
other words, the additional information you want is how your quiz score compares to other
students' scores.

You, therefore, understand the importance of comparing your score to the class distribution
of scores. Should your score of 3 turn out to be among the higher scores, then you'll be
pleased after all? On the other hand, if 3 is among the lower scores in the class, you won't be
quite so happy. This idea of comparing individual scores to a distribution of scores is
fundamental to statistics. So let's explore it further by reading about the three different ways
of defining the centre of a distribution. All three are called measures of central tendency.

These measures are also called Averages. They provide single values which are used to
summarise a set of observations/data. The three main measures are the Mean, Median
and Mode.
1. They are used as single scores to describe data.
2. They help to know the level of performance by comparing with a given standard of
performance. Performance may be above average or below average where the average is
a standard such as the mean or median.
3. They give the direction of student performance.
Where Mean >Median, the distribution is skewed to the right (positive skewness) showing
that performance tends to be low.
Where Mean < Median, the distribution is skewed to the left (negative skewness) showing
that performance tends to be high.

Illustration

Mode Mean Mean Mode


Median Median
Positive skewness Negative skewness

THE MEAN ( X )

There are three types. These are Arithmetic, Geometric and Harmonic. In Education, the
Arithmetic mean is the most useful.

29
The Arithmetic Mean.

It is the sum of the observations divided by the total number of observations.


i.e. Add the values and divide by the number of observations.
15
i.e. 4 + 2 + 3 + 1 + 5 = 15 Mean = 3
5

Methods

The Arithmetic Mean ( X ) can be obtained from both the ungrouped and grouped data. It
can also be easily obtained from Microsoft Excel.

1. Ungrouped data

Given the following scores, 15, 12, 10, 10, 9, 20, 14, 11, 13, 16, to obtain the mean, all the
scores are added and divided by the total number of observations. The mean is represented

by the symbol, X
15  12  10  10  9  20  14  11  13  16 130
X    13
10 10

Generally, the letter, X, is used with a subscript to differentiate the numbers as follows.
E.g. 15, 12, 10, 10, 9, 20, 14, 11, 13, 16
X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10

The formula used for ungrouped data is: X 


 X , OR X   X where the Greek letter, ∑
n N
(sigma), shows summation and n or N is the total number of observations.

2. Grouped data
Two methods can be used. These are the long method and the coding method. The methods
are used with frequency distributions.

Long method: X 
 fx OR X   fx where f is the frequency and x, the class marks.
n N

Example using the long method


Scores Midpoint Freq
X f fx
46 – 50 48 4 192
41 – 45 43 6 258
36 – 40 38 10 380
31 – 35 33 12 396
26 – 30 28 8 224
21 – 25 23 7 161
16 – 20 18 3 54
Total 50 1665

30
Long method X 
 fx  1665  33.3
n 50

Coding method: X  AM 
 fd i , which is used for distributions with equal class intervals.
n
AM, is the assumed mean, f, is the frequency, d is the code for each class, n is the total
frequency and i, the class size.
To use the coding method, class intervals must have the same size. The class in the
middle or the class with the highest frequency is chosen for the code of 0. Classes above the
zero coded class are given positive codes and those below are given negative codes in steps
of 1.

Example using the coding method


Scores Midpoint Freq code
X f d fd
46 – 50 48 4 3 12
41 – 45 43 6 2 12
36 – 40 38 10 1 10
31 – 35 33 12 0 0
26 – 30 28 8 -1 -8
21 – 25 23 7 -2 -14
16 – 20 18 3 -3 -9
Total 50 3

 fd i 35  33  15  33.3
Coding method X  AM   33 
n 50 50
OPTIONAL
Using Microsoft Excel

1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Mean.
4. Click the empty cell directly below where you typed Mean.
5. Click white space to the right of the fx symbol.
6. Type in =AVERAGE A (cell number where data begins from: cell number where
data ends at). E.g. =AVERAGE A (B2:B32). This means that data begins at cell B2
and ends at cell B32.
7. Press Enter. (The mean is given in the empty cell clicked.

An example is shown below.

31
Properties of the Mean

1. The mean is influenced by every score or value that makes it up. If a score is changed,
the values of the mean changes.
3, 4, 2, 4, 7 Mean = 4
3, 4, 7, 4, 7 Mean = 5. The change of the score 2 to 7 has changed the mean to 5.

2. The mean is very sensitive to extreme scores (outliers).


4, 2, 3, 6, 5 Mean = 4
4, 2, 23, 6, 5 Mean = 8. All the scores are below 7 and the presence of 23, an outlier
has moved the mean from 4 to 8.

3. The mean is a function of the sum (or aggregate or total) of the scores.

X 
X
N
NX   X This implies that the number of observations multiplied
by the mean gives the sum of the scores.

Of the three measures it is the only one that is a function of the sum of the scores.
It is also possible to calculate the mean for a combined group if only the means and number
of scores (N) are available.

e.g. Mr Mensah’s class Mean =5 N = 20


Ms Addo’s class Mean = 6 N = 30

(5 X 20)  (6 X 30) 280


Mean for the total group: X    5 .6
50 50

4. If the mean is subtracted from each individual score and the differences are summed, the
result is 0.
4 – 4 =0
2 – 4 = -2

32
3 – 4 = -1
6–4=2
5–4=1
The distance of the score from the mean is known as the deviation.

5. If the same value is added to or subtracted from every number in a set of scores, the mean
goes up or goes down by the value of the number.
For example, given 8 2 10 4 X 6.
Now add 2 to each score: 10 4 12 6 X  8 ie 6 + 2

6. If each score is multiplied or divided by the same value, the mean increases or decreases
by the same value.
For example, given 8 2 10 4, X 6.
Now multiple each score by 3. 24 6 30 12 X  18 ie 6 × 3

Uses of the mean


1. It is useful when the actual magnitude of the scores is needed to get an average. E.g. total
sales for a new product, selecting a student to represent a whole class in a competition.
2. It is useful for further statistical work e.g. standard deviation, correlation coefficient.
3. It is useful when the scores are symmetrically distributed (i.e. normal).
4. It provides a direction of performance, compared with other measures of location
especially the median. Where Mean >Median, the distribution is skewed to the right
(positive skewness) showing that performance tends to be low and where Mean < Median,
the distribution is skewed to the left (negative skewness) showing that performance tends
to be high.
5. It serves as a standard of performance with which individual scores are compared. For
example, for normally distributed scores, where the mean is 56, an individual score of 80
can be said to be far above average. Also, performance can be described as just above
average or far below average or just below average.

THE MEDIAN (Mdn)


It is a score such that approximately one-half (50%) of the scores are above it and one-half
(50%) are below it when the scores are arranged sequentially.
E.g. Given the scores, 8, 4, 9, 1, 3, the Median, after sequentially arranging the scores like 1,
3, 4, 8, and 9 is 4.

(n  1)
For odd set of numbers, median occupies the th position.
2
For even set of numbers, find the mean of the two middle numbers or the number at the
(n  1)
th position.
2

The median can be obtained from both ungrouped and grouped data and also from Microsoft
Excel.

To find the median from ungrouped data


1. Arrange all observations in order of size from smallest to largest or vice versa.

33
2. If the number of observations, n, is odd, the median is the number at the centre or the
(n  1)
number at the th position.
2
3. If the number of observations, n, is even, the median is the mean of the two centre
observations.

Examples

1. For odd set of numbers


Given a set of observations as: 8 11 26 7 12 9 6 20 14
Note that there are 9 observations, which is odd.
1. Rearrange the scores in a sequential order: 6 7 8 9 11 12 14 20 26
(n  1) (9  1) 10
2. Find th position i.e. th  th  5th position
2 2 2
3. The score at the 5th position is 11.

2. For even set of numbers


Given a set of numbers as: 48 52 36 54 62 71 69 45 58 32
1. Rearrange the scores in a sequential order: 32 36 45 48 50 54 58 62 69 71
(n  1) (10  1) 11 1
2. Find th position ie th  th  5 th position. This means that the
2 2 2 2
median lies half-way between the 5 and 6 positions.
th th

3. The score at the 5th position is 50 and at the 6th position is 54. Half-way between 50
(50  54) 104
and 54 is   52 . The median is therefore 52.
2 2

To find the median from grouped data


Example
Classes Midpoint Freq Cum Freq
X f cf
46 – 50 48 4 50
41 – 45 43 6 46
36 – 40 38 10 40
31 – 35 33 12 30
26 – 30 28 8 18
21 – 25 23 7 10
16 – 20 18 3 3
Total 50

Step 1. Identify the median class. It is the class that will contain the middle score. Find the
N
value of , where N is the total score. This is the position of the middle score. Checking
2
from the cumulative frequency column, find the number equal to the position or the smallest
N 50
number that is greater than the position. From the table above,   25 , therefore the
2 2
number is 30. The class that this number belongs to is the median class. From the table
above, the median class is 31 – 35.
Step 2. Use the formula below to obtain the Median.

34
N 
 cf
 
Mdn = L  2 i where
1 f
 mdn 
 
L1 is the lower-class boundary of the median class
N is the total frequency
cf is the cumulative frequency of the class just below the median class
i is the class size/width
fmdn is the frequency of the median class

Substituting the values in the table in the formula above, we have:

Mdn =
 50 
 2  18   25  18  7
30.5   5  30.5    5  30.5   5  30.5  0.585  30.5  2.9  33.4
 12   12  12 
 
OPTIONAL

To find the median from Microsoft Excel


1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Median.
4. Click the empty cell directly below where you typed Median.
5. Click white space to the right of the fx symbol.
6. Type in =MEDIAN (cell number where data begins from: cell number where data
ends
at). E.g. = MEDIAN (B2:B32). This means that data begins at cell B2 and ends at
cell B32.
7. Press Enter. (The median is given in the empty cell clicked.

An example is shown below.

35
Features of the median
1. It is not influenced by extreme scores. For example, the median for the following
numbers, 2, 3, 4, 5, 6 is 4. If 6 changes to 23 as an extreme score, the median remains 4.
2. It does not use all the scores in a distribution but uses only one value.
3. It has limited use for further statistical work.
4. It can be used when there is incomplete data at the beginning or end of the distribution.
5. It is mostly appropriate for data from interval and ratio scales.
6. Where there are very few observations, the median is not representative of the data.
7. Where the data set is large, it is tedious to arrange the data in an array for ungrouped data
computation of the median.

Uses of the median


1. It is used as the most appropriate measure of location when there is reason to believe that
the distribution is skewed.
2. It is used as the most appropriate measure of location when there are extreme scores to
affect the mean. E.g. Typical income in a company of senior and junior staff.
3. It is useful when the exact midpoint of the distribution is wanted.
4. It provides a standard of performance for comparison with individual scores when the
score distribution is skewed. For example, if the median score is 60 and an individual
student obtains 55, performance can be said to be below average/median. Also
performance can be described as just above average or far below average or just below
average.
5. It can be compared with the mean to determine the direction of student performance.
Where Median < Mean, the distribution is skewed to the right (positive skewness)
showing that performance tends to be low and where Median > Mean, the distribution is
skewed to the left (negative skewness) showing that performance tends to be high.

36
THE MODE
It is the number that occurs most frequently in a distribution.
Given the following scores, 1, 2, 4, 6, 4, 6, 7, 2, 4 the number that occurs most frequently is 4.
This is the Mode. This number appears 3 times.
Given the following scores, 11, 22, 14, 26, 34, 6, 27, 12, 40 no number occurs most
frequently. There is, therefore, no mode.

1. The main advantage is that it is the only measure that is useful for a nominal scale.
2. It is used when there is a need for a rough estimate of the measure of location.
3. It is used when there is the need to know the most frequently occurring value e.g. dress
styles.
4. It is not useful for further statistical work because the distribution can be bi-modal or tri-
modal or no mode at all.

UNIT FOUR

MEASURES OF VARIABILITY

In the previous chapter, we have explained the measures of central tendency. It may be noted
that these measures do not indicate the extent of dispersion or variability in a distribution.
The dispersion or variability provides us one more step in increasing our understanding of the
pattern of the data. Further, a high degree of uniformity (i.e. low degree of dispersion) is a
desirable quality. If in education there is a high degree of variability in the exams scores, then
it can be assumed performance is not uniform.

Some important definitions of dispersion are given below:


1. "Dispersion is the measure of the variation of the items." -A.L. Bowley
2. "The degree to which numerical data tend to spread about an average value is called the
variation of dispersion of the data." -Spiegel
3. Dispersion or spread is the degree of the scatter or variation of the variable about a central
value." -Brooks & Dick
4. "The measurement of the scatterness of the mass of figures in a series about an average is
called measure of variation or dispersion." -Simpson & Kajka

WHAT IS DISPERSION?
It is clear from above that dispersion (also known as scatter, spread or variation) measures the
extent to which the items vary from some central value. Since measures of dispersion give an
average of the differences of various items from an average, they are also called averages of
the second order. An average is more meaningful when it is examined in the light of
dispersion.

The main measures that are used mainly in education are:

1. The range
2. The Variance
3. The Standard Deviation
4. The Quartile Deviation (Semi-interquartile range)
They are used as single scores to describe individual differences in terms of achievement.

37
For example: 48, 51, 47, 50 Total = 196 Mean = 49 …..(i)
30, 72, 90, 4 Total = 196 Mean = 49 …..(ii)
However, a closer look at the two sets of data shows that the distribution within each set is
not the same. Where the scores cluster around the mean, performance is said to be
homogeneous as in (i). Where the scores move away from the mean, performance is said to
be heterogeneous as in (ii).

THE RANGE
It is the difference between the highest and the lowest values in a set of data.
e.g.: 48, 51, 47, 50 Total = 196 Mean = 49 …..(i) Range: 51 – 47 = 4
30, 72, 90, 4 Total = 196 Mean = 49 …..(ii) Range: 90 – 4 = 86

Features
1. It is easy to compute.
2. It is easy to interpret.
3. It is a crude measure of dispersion and does not take into account all the data/scores.
4. It ignores the spread of all the scores.
5. It uses only two values and does not consider how the other scores relate to each other.
6. The range does not consider the typical observations in the distribution but concentrates
only on the extreme values.
7. It can give a distorted picture of the variation within a set of data.
8. Different distributions can have the same range which would give misleading conclusions.

Uses
1. When data is too scanty or too scattered to justify the computation of a more precise
measure.
2. When knowledge of extreme scores or total spread is all that is needed.

VARIANCE & STANDARD DEVIATION


The variance is always considered together with the standard deviation. It is the square of the
standard deviation. Both variance and standard deviation are computed for both ungrouped
and grouped data. Microsoft Excel is also useful in obtaining the variance and standard
deviations.

Ungrouped data
This is based on raw data. It is computed by using the following formulae.
Variance (S2,  )
2

 X  X 
2
X X X
2 2 2

1. Var ( S
2
) 2. Var ( S 2 )  X 2 3. Var ( S 2
)   

n n n  n 

Standard Deviation (S,  )


 X  X  X
2 2

Std .Dev ( S )  Or Std .Dev( S )   X 2 or


n n

38
2

Std .Dev ( S ) 
X 2
X
 


n 
 N 

Given a set of data as 48 51 50 47 and the mean of the distribution as 49, the variance and the
standard deviation could be computed as follows:
196
X   49
4
X XX X  X  2
X2
48 -1 1 2304
51 2 4 2601
47 -2 4 2209
50 1 1 2500
Total 10 9614

2
SD 
 X  X  
10
 2.5  1.58 Variance = 1.582 = 2.5
n 4

OR

 X2  2 9614
 492  2403.5  2401.0  2.5  1.58 Var. = 1.582 = 2.5
SD 
n X 
4

OR

2
SD 
 X 2    X  
9614  196 

2
  2403.5  2401.0  2.5  1.58
n  n  4  4 

Grouped data:
This is based on a frequency distribution of the scores.

 f X  X 
2
f X  X 
2
Long method: SD  Var 

n n

 fX 2    fX  fX 2   fX 
2


2

Short method: SD    Var   




n  n  n  n 

2
 fd 2   fd 
Coding Method SD  i   This is useful with equal class intervals.
n n
 

39
Using the short method

Scores Midpoint Freq


X f X2 fX fX2
46 – 50 48 4 2304 192 9216
41 – 45 43 6 1849 258 11094
36 – 40 38 10 1444 380 14440
31 – 35 33 12 1089 396 13068
26 – 30 28 8 784 224 6272
21 – 25 23 7 529 161 3703
16 – 20 18 3 324 54 972
Total 50 1665 58765

Short method:

 fX 2    fX 
2
2
58765  1665 
SD    1175.3  1108.89  8.15
n  n   50

 50 
 

Using the coding method

Scores Midpoint Freq code


X f d d2 fd fd2
46 – 50 48 4 3 9 12 36
41 – 45 43 6 2 4 12 24
36 – 40 38 10 1 1 10 10
31 – 35 33 12 0 0 0 0
26 – 30 28 8 -1 1 -8 8
21 – 25 23 7 -2 4 -14 28
16 – 20 18 3 -3 9 -9 27
Total 50 3 133

Coding Method

2
 fd 2   fd  133  3 
2

SD  i   5     5 2.66  0.0036  8.15


n n 50  50 
 
OPTIONAL
Using Microsoft Excel

Standard Deviation
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Std. Dev.
4. Click the empty cell directly below where you typed Std. Dev.
5. Click white space to the right of the fx symbol.

40
6. Type in =STDEVPA(cell number where data begins from: cell number where data
ends at). E.g. =STDEVPA(B2:B32). This means that data begins at cell B2 and ends
at cell B32.
7. Press Enter. (The standard deviation is given in the empty cell clicked.

Variance
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Variance.
4. Click the empty cell directly below where you typed Variance.
5. Click white space to the right of the fx symbol.
6. Type in =VARPA(cell number where data begins from: cell number where data
ends at). E.g. =VARPA(B2:B32). This means that data begins at cell B2 and ends at
cell B32.
7. Press Enter. The variance is given in the empty cell clicked.
An example is below:

Features of the variance/standard deviation

1. The standard deviation/variance of a constant is zero.


Eg. 18 18 18 18 18 18 has a standard deviation/variance of zero
2. It is not resistant. It is affected by extreme scores or outliers. For example, given the
scores, 5, 8, 10, 6, 7, SD = 1.9 and Var = 3.7. However, if the scores become, 5, 8, 10, 6,
24, SD = 7.7 and Var = 59.8
3. The standard deviation/variance is independent of change of origin. If each score in a set
of data is reduced or increased by the same amount, the standard deviation/variance of the
new set of data does not change. For example, given the data 48 51 47 50 with a standard
deviation of 1.58. If 5 points is added to each score to obtain 53 56 52 55, the standard
deviation/variance remains unchanged.
4. The standard deviation/variance is not independent of change of scale. If each score in a
set of data is multiplied or divided by the same amount, say a constant k, the resulting
standard deviation equals k multiplied by the old standard deviation but the variance
equals k2 multiplied by the old variance. For example, given the data 1 2 3 4 5 with a

41
standard deviation of 2. If each score is multiplied by 10 points to obtain10 20 30 40 50 ,
the standard deviation becomes 2 x 10 = 20 and the variance becomes 102 x 4=400
5. It uses every value in the distribution.
6. It is difficult to calculate for open-ended distributions.
7. It is affected by extreme values. It gives more weight to extreme values.

USES
1. It is used as the most appropriate measure of variation/dispersion when there is reason to
believe that the distribution is normal.
2. It helps to find out the variation in achievement among a group of students. i.e. it
determines if a group is homogeneous or heterogeneous.
Where the standard deviation is relatively small, the group is believed to be homogeneous
i.e. performing at the about the same level. On the other hand, where the standard
deviation is relatively large, the group is believed to be heterogeneous, i.e. performing at
different levels.
To be more precise, the coefficient of variation (CV) is computed.


CV = x 100 If the value of CV is greater than 33, the group is heterogeneous,
x
otherwise it is homogeneous.
With this information, the teacher has to adopt a teaching method to suit each group.
3. It is helpful in computing other statistics e.g. standard scores, correlation coefficients.
4. It is useful in determining the reliability of test scores. The split-half correlation method
or internal consistency methods use the standard deviation of the scores.

In most score interpretations in education and for descriptive statistics, the standard
deviation is preferred to variance because
1. the standard deviation (S), is the natural measure of spread or variation for normal
distributions
2. the variance (S2) involves squaring the deviations and does not have the same unit of
measurement as the original observations.

QUARTILE DEVIATION

It is also called the semi-inter quartile range and it depends on quartiles.


Quartiles divide distributions into 4 equal parts. Practically there are 3 quartiles.
The QD is half the distance between the first quartile (Q1) and the third quartile (Q3).

Q3  Q1
Method: QD =
2

Computing Quartiles from Ungrouped data

There are two methods – the median method and the formula method.

The median method

1. First arrange the scores in a sequential order.

42
 n  1
2. Find the median (i.e. score at the   position) for the data set. The median
 2 
divides the distribution in two equal parts.
3. Find the median for the first half/part. This median becomes Q1, the first quartile.
4. Find the median for the second half/part. This median becomes the Q3, third quartile.

Example.
Given the following scores, 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9.

Arrange in ascending order as, 4, 6, 7, 8, 9, 10, 12, 13, 18, 22, 25

Q1 Median Q3

The Formula method


1
Q1 is the score at the (n+1)th position.
4
3
Q3 is the score at the (n+1)th position.
4

Given the following scores, 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9, after arranging them in
ascending order as,4, 6, 7, 8, 9, 10, 12, 13, 18, 22, 25
1 1
Q1 = (n+1)th position → (12) = 3rd position
4 4
3 3
Q3 = (n+1)th position → (12) = 9th position
4 4

Q3  Q1 18  7 11
QD     5.5
2 2 2

Computing Quartiles from grouped data.

N 
 4  cf 
Q1 = LQ1   i where
 fQ1 
 
LQ1 is the lower-class boundary of the lower quartile class
N is the total frequency
cf is the cumulative frequency of the class just below the lower quartile class
i is the class size/width
fQ1 is the frequency of the lower quartile class

 3N 
  cf 
Q3 = LQ3   4 i where
 fQ3 
 

43
LQ3 is the lower-class boundary of the upper quartile class
N is the total frequency
cf is the cumulative frequency of the class just below the upper quartile class
i is the class size/width
fQ3 is the frequency of the upper quartile class

Given the distribution below:

Example
Classes Midpoint Freq Cum Freq
X f cf
46 – 50 48 4 50
41 – 45 43 6 46
36 – 40 38 10 40
31 – 35 33 12 30
26 – 30 28 8 18
21 – 25 23 7 10
16 – 20 18 3 3
Total 50

Step 1. Identify the quartile class. It is the class that will contain the quartile of interest.
N 3N
Find the value of , for the lower quartile and for the upper quartile (where N is the
4 4
total score) as positions. Checking from the cumulative frequency column, find the number
equal to the position or the smallest number that is greater than the position. From the table
N 50 3 N 150
above,   12.5 , therefore the number is 18 and   37.5 therefore the number
4 4 4 4
is 40. The classes that these numbers belong to are the quartile classes. From the table above,
the lower quartile class is 26 – 30 and the upper quartile class is 36 – 40
.
Step 2. Use the formulae below to obtain the lower and upper quartiles.

N 
 4  cf 
Q1= LQ1   i =
 fQ1 
 
 50 
 4  10  12.5  10   2 .5 
25.5   5  25.5  5  25.5   5  25.5  1.5625  27.06
 8   8   8 
 

44
 3N 
  cf 
Q3= LQ3  4 i =
 fQ3 
 
 150 
 4  30   37.5  30   7 .5 
35.5   5  35.5  5  35.5   5  35.5  3.75  39.25
 10   10   10 
 

Q3  Q1 39.25  27.06 12.19


QD     6.095
2 2 2

OPTIONAL
Using Microsoft Excel

First/Lower Quartile

1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Q1 (Lower Quartile).
4. Click the empty cell directly below where you typed Q1.
5. Click white space to the right of the fx symbol.
6. Type in =QUARTILE (cell number where data begins from: cell number where data
ends at,1). E.g. = QUARTILE (B2:B32, 1). This means that data begins at cell B2
and
ends at cell B32 and 1 means first or lower quartile) .
7. Press Enter. The Q1, first/lower quartile is given in the empty cell clicked.

Third/Upper Quartile

1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Q3 (Upper Quartile).
4. Click the empty cell directly below where you typed Q3.
5. Click white space to the right of the fx symbol.
6. Type in =QUARTILE (cell number where data begins from: cell number where data
ends at,3). E.g. = QUARTILE (B2:B32, 3). This means that data begins at cell B2
and ends at cell B32 and 3 means third or upper quartile).
7. Press Enter. The Q3, third/upper quartile is given in the empty cell clicked.

45
An example is shown below.

Features of the Quartile Deviation


1. For skewed distributions, where the median is used as a measure of location the
quartile deviation is a better measure of variability.
2. The quartile deviation is a measure of individual differences. It helps to find out the
variation in achievement among a group of students. i.e. it determines if a group is
homogeneous or heterogeneous.
Where the quartile deviation is relatively small, the group is believed to be homogeneous
i.e. performing at the about the same level. On the other hand, where the quartile
deviation is relatively large, the group is believed to be heterogeneous, i.e. performing at
the different levels.
To be more precise, the coefficient of variation (CV) is computed.

QD
CV = x 100 If the value of CV is greater than 33, the group is heterogeneous,
Mdn
otherwise it is homogeneous.
With this information, the teacher has to adopt a teaching method to suit each group.
3. It does not make use of all the information provided by the scores.

46
UNIT FIVE

MEASURES OF RELATIVE POSITION


A test score in and of itself is usually difficult to interpret. For example, if you learned that
your score on a measure of shyness was 35 out of a possible 50, you would have little idea
how shy you are compared to other people. More relevant is the percentage of people with
lower shyness scores than yours. This percentage is called a percentile. If 65% of the scores
were below yours, then your score would be the 65th percentile. Statisticians often talk about
the position of a value, relative to other values in a set of data. The most common measures
of position are percentiles, quartiles, and standard scores (aka, z-scores).

There are two main measures. These are Percentiles and Percentile Ranks, Z scores and T
scores. Z scores and T scores are often referred to as standard scores.
The main purpose of these measures is to describe an individual’s position in relation to a
known group or the norm group.

PERCENTILES
Definition: They are points in a distribution below which a given percent, P, of the cases lie.
 There are 99 percentiles that divide a distribution into 100 equal parts.
 Percentiles are individual scores.
Notation: P40 = 60. Sixty is the score below which 40% of the scores lie in a specific
group after the scores have been arranged sequentially. This means that a
student who obtains a score of 60 has done better than 40% of the members in
the specific group.
P75 = 50. Fifty is the score below which 75% of the scores lie in a specific
group after the scores have been arranged sequentially. This means that a
student who obtains a score of 50 has done better than 75% of the members in
the specific group.
 A score in one group may be a different percentile in another group.
For example, in Statistics Quiz 1, a student with a score of 15 may be at P90 in the Social
Science group but the same score may put the student at P85 in the Home Economics
group.
 P50 is the same as the median. P25 is the first quartile and P75 is the third quartile.

PERCENTILE RANKS
Definition: The percentage of cases falling below a given point on the measurement scale. It
is the position on a scale of 100 to which an individual score lies.
Notation: PR of 60 = 75. Seventy-five is the position for a score of 60 when the
distribution is divided into 100 parts. This means that a student who obtains a
score of 60 has 75% of the scores falling below him/her in the group.
The easiest way to obtain percentiles and percentile ranks is to use the ogive (cumulative
percentage graph).

47
100
90
80
70
60
50
40
30
20
10
0
0 5 10 15 20 25 30 35 40 45

Scores
From the ogive, P60 = 34. PR of a score of 26 is 40.

STANDARD SCORES (Z, T)


It indicates the number of standard deviation units an individual score is above or below the
mean of each group. It represents an individual score that has been transformed into a
common standard using the mean and the standard deviation.

XX
Formula: Z  , T = 50 + 10Z, where mean is 50 and standard deviation is 10.
s

Example. Given that a student obtained 15 in a quiz with a mean of 12 and a standard
deviation of 2. The Z and T scores become
15  12
Z  1 .5 T = 50 + 10(1.5) = 65
2
 For z scores, 0 is the mean score. Positive scores are scores above the mean (average)
and negative scores are scores below the mean (average).
 An individual’s performance can be described as far above average, above average,
just above average, just below average and far below average.
 In case of T scores, 50 is the mean score. Scores greater than 50 are above average
and scores less than 50 are below average.
 Z scores range between ─ 4 and + 4 while T scores are between 10 and 90.

Self-Practice

1. A student had a Z score of 2.5. The mean for the class was 60 with a standard deviation
of 4.0. What was the student’s observed score?

XX X  60
Z → 2 .5  → 10  X  60 →X = 10 + 60 = 70
s 4

2. A student obtained a raw score of 70 in an examination. If the raw score gives her a Z-
score of 3.5, what would be the class mean if it is known that the standard deviation is 5.0?

48
XX 70  X
Z → 3 .5  → 17.5  70  X → X = 70 − 17.5 = 52.5
s 5

USES

1. It helps the teacher to know an individual’s position in relation to the rest of the class.
A student with a Z score of 3.2 is performing far above average.
2. It enables the teacher to compare student’s performances in different subjects to know
individual strengths and weaknesses.

Eg. Mathematics Social Studies


Mean 50 60
Standard deviation 2.5 4.0
Observed Score 55 55
Salome’s Z score 2.0 -1.25

Salome has done better in Mathematics than Social Studies, considering the class
performance.
3. It helps the teacher to guide and counsel the student to choose the correct course for a
future career and vocation

English Maths Pre-Tech


Mean 80 70 75
Standard Deviation 6.0 2.0 4.0
Observed Score 85 76 80
George’s Z score 0.42 3.0 1.25

George is more likely to succeed in Maths-related course.

49
UNIT SIX

MEASURES OF RELATIONSHIPS

Are stock prices related to the price of gold? Is unemployment related to Stealing? Is
academic performance of students related to attendance? Correlation can answer these
questions, and there is no statistical technique more useful or more abused than correlation

Concept

Natural relationships exist in the world. Parents and children as well as twins have things in
common. Males are normally attracted to females and rain results in good harvest.

In education, absenteeism tends to go with performance in class tests and examinations.


Studies have also shown that females generally do better than men in the reading subjects
while males generally tend to do better than females in the science-related subjects like
Physics, Chemistry and Mathematics.

The concept of correlation provides information about the extent of the relationship between
two variables. Two variables are correlated if they tend to ‘go together’. For example, if
high scores on one variable tend to be associated with high scores on a second variable, then
both variables are correlated.

Correlations aim at identifying relationships between variables and also to be able to predict
performances based on known results.

The statistical summary of the degree and direction of the linear relationship or association
between any two variables is given by the coefficient of correlation. Correlation coefficients
range between -1.0 and +1.0. Correlation coefficients are normally represented by the
symbols, r and ρ (rho).

Scatter plots
A scatter plot or scatter diagram shows the nature of the relationship between any 2 variables.
To obtain a scatter plot, marks are made on a graph representing the intersection of the two
variables. Scatter plots could either be linear or curvilinear.

Examples

50
Linear relationship
Assumptions

1. The variables are random. Neither the values of X nor Y are predetermined.
2. The relationship between the variables is linear.
3. The probability distribution of X’s, given a fixed Y, is normal, i.e. the sample is
drawn from a joint normal distribution.
4. The standard deviation of X’s, given each value of Y is assumed to be the same, just
as the standard deviation of Y’s given each value of X is the same.

Assume the following scores in two tests.


Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X 14 16 15 10 9 18 18 14 12 13 15 18 10 12 16 20 15 12 14 10
Y 10 12 15 10 12 15 15 12 14 14 14 10 12 15 10 12 15 15 10 14

Y = 10, X = 14, 10, 18, 16, 14; Distribution of X is normal


Y = 12, X = 16, 9, 14, 10, 20; Distribution of X is normal
Y = 14, X = 12, 13, 15, 10; Distribution of X is normal
Y = 15, X = 15, 18, 18, 12, 15, 12; Distribution of X is normal

Nature of the linear relationship

The relationship is described by direction and degree.

(a) Direction: Positive, (+) High values go with high values and low values go
with low values.
Negative (─) High values go with low values and low values go with
high values.

(b) Degree: High (strong) r > 0.60 r < −0.60


Moderate (mild) 0.40 ≤ r ≤ 0.60 −0.40 ≥ r ≥ −0.60
Low (weak) r < 0.40 r > −0.40
Perfect r = 1.0 r = −1.0
Zero r = 0.0
Some examples

51
Perfect linear positive correlation Perfect linear negative correlation

Zero linear correlation

High linear positive correlation

Moderate linear positive


correlation

52
Low linear positive correlation

Commonly Used Types

1. Pearson Product Moment correlation coefficient (r). This is applicable when both
variables are continuous in nature. It uses interval and ratio scale data. For example,
the relationship between test scores and age of students.
2. Spearman’s rank correlation coefficient (ρ). This is suitable for variables that are both
continuous and ranked. It uses ordinal scale data. For example, ranks in terms of
school attendance and age
3. Phi coefficient (φ). This is used when both variables are natural dichotomies. It is
also applicable for nominal data. For example, the relationship between gender and
political party affiliation.
4. Point biserial correlation coefficient (rpb). This is applicable when one variable is
continuous and the other is a natural dichotomy. It combines nominal scale data with
either interval or ratio scale data. For example, the relationship between gender and
test scores

Coefficient of Determination (r2)


It is the square of the correlation coefficient. It is the proportion of the variance in Y
accounted for by X. An r of 0.71 gives r2 to be 0.50. This means that 50% of the variance in
Y is associated with variability in X. For example, if the correlation between class
attendance and performance in Statistics is 0.8, then class attendance explains 64% of the
variation in the scores in performance.

Causation and correlation


The presence of a correlation between two variables does not necessarily mean that there
exists a causal relationship between the two variables. A very strong or high relationship
between two variables does not imply that one causes the other. No cause and effect
relationship are determined purely by correlation coefficients.

Computational examples

1. The Pearson product moment r (for interval and ratio scales)

53
r =
Co var iance ( X , Y )
=
 ( X  X )(Y  Y ) =
S X .S Y n S x.S Y

 ( X  X )(Y  Y ) ……...................….(1)
 ( X  X ) . (Y  Y )
2 2

n XY   X  Y )
r= .....(2)
[n X 2  ( X ) 2 ][n Y 2  ( Y ) 2 ]

Student Quiz 1 Quiz 2


XX (X  X )2 Y Y (Y  Y ) 2 ( X  X )( Y  Y )
No. X Y
1 4 6 -2 4 -1 1 2
2 8 8 2 4 1 1 2
3 10 9 4 16 2 4 8
4 7 7 1 1 0 0 0
5 6 8 0 0 1 1 0
6 3 2 -3 9 -5 25 15
7 8 9 2 4 2 4 4
8 5 10 -1 1 3 9 -3
9 5 6 -1 1 -1 1 1
10 4 5 -2 4 -2 2 4
Total 60 70 44 50 33

Note: X = 6 and Y  7

Using Formula 1:

33 33
r= = = 0.7
( 44)(50) 46.9

1. The Spearman rank correlation coefficient (ρ): For ordinal scale variables

6 d 2
ρ = 1

N N 2 1 
Given the following scores:

Student Quiz 1 Quiz 2 Quiz 1 Quiz 2 d d2


No. X Y Ranks Ranks Q1-Q2 ranks
1 44 16 8.5 7.5 1 1.0
2 48 18 2.5 4.5 -2.0 4.0
3 50 19 1 2.5 -1.5 2.25
4 47 17 4 6 -2 4.0
5 46 18 5 4.5 0.5 0.25

54
6 43 12 10 10 0
0.0
7 48 19 2.5 2.5 0
0.0
8 45 20 6.5 1 5.5
30.25
9 45 16 6.5 7.5 -1.0
1.0
10 44 15 8.5 9 -0.5
0.25
43.00
________________________________________________________

6 d 2 643 258
ρ = 1 1 1  1  0.26  0.74

N N 1 2
 10100  1 990

The Phi Coefficient (φ) for nominal scale variables

2
Φ=
n
This is used when there are only two sub-categories for rows as well as columns i.e. 2x2

The Contingency Coefficient (C) for nominal scale variables

2
C=
n 2
This is used when there is at least more than two sub-categories for either row or column.
i.e. 2x3, 3x3, 2x4, 3x4, etc.

The formula for calculating the χ2 value is as follows.


2
r c

Oij  Eij 

i 1 j 1
Eij
where Oij is the observed count in each cell and Eij is the

expected count in each cell. It is obtained by the formula:

Eij = [(ith row total) (jth column total)]/Grand total

Example 1: Association between Gender and Passing Driving Test (2x2)

Using the Phi Coefficient

Gender
Male Female
Result Total

Pass 150 100 250


(125) (125)
Fail 50 100 150
(75) (75)
Total 200 200 400

55
The figures in bold and in bracket are the expected counts in each cell.


r c Oij  Eij 2 150  1252 100  1252 50  752 100  752
  
  = 125 125 75 75
i 1 j 1 Eij

625 625 625 625


=    = 5 + 5 +8.33 + 8.33 = 26.66
125 125 75 75

2 26.66
Φ= = = 0.365
n 200

The result shows that there is a weak positive association between gender and passing a
driving test.

Example 2: Association between Halls of Residence and Region of Birth (3x3)

Using the Contingency Coefficient

Hall of Residence
Region of Birth Hall 1 Hall 2 Hall 3 Total
Region 1 40 30 30 100
(30) (30) (40)
Region 2 50 40 60 150
(45) (45) (60)
Region 3 30 50 70 150
(45) (45) (60)
Total 120 120 160 400
The figures in bold and in bracket are the expected counts in each cell.
2
r c

Oij  Eij 

i 1 j 1
Eij
=

2
 40  30  30  30 2 30  40 2 50  45 2 40  45 2
    
30 30 40 45 45

60  60 2 30  45 2 50  45 2 70  60 2


  
60 45 45 60
100 0 100 25 25 0 225 25 100
=        
30 30 40 45 45 60 45 45 60

56
= 3.3+0.0+2.5+0.56+0.56+0.0+5.0+0.56+1.67 = 14.15

2 14.15 14.15
C=
2
   0.0342  0.185
n 400  14.15 414.15

The result shows that there is a very weak positive association between gender and passing a
driving test.

Uses of correlation in education

1. It is useful for selection and placement. For example, if mathematics scores relate well
with scores in chemistry, then mathematics scores can be used for selection into a
chemistry class without conducting a chemistry selection examination.
2. It is used to determine the reliability of standardized and classroom tests. The
Spearman-Brown split-half method uses correlation coefficients.
3. It aids in the provision of evidences for the validity of assessment instruments.
Construct and criterion-related validity evidences are obtained through the computation
of the correlation between two variables.
4. It puts the teacher in a position to predict the future performance of a student. An
established relationship between two subjects is often used as the basis for predicting
performance, but not with 100% certainty. For example, if those with aggregate 6,
from WASHSCE have been found in the University of Cape Coast to be obtaining First
Class degrees, then it can be predicted that anyone with WASHSCE aggregate 6,
would do well in the University.
5. It is useful for research purposes. A study of the relationship between study habits and
the academic performance of students in the University of Cape Coast would use
correlations.

57
UNIT SEVEN

SIMPLE REGRESSION ANALYSIS


Purpose
Simple regression is concerned with the prediction of the value of a dependent random
variable e.g. (Y) on the basis of known measurement of an independent controlled variable
(X). For example, UCC may predict final degree classification (1st class, 2nd Upper, 2nd
Lower), Y, from SSSCE grades, X. A teacher may predict the performance of a student in a
final examination from performance in a class quiz, X.
Prediction and correlation are closely related. The degree of correlation between any two
variables determines the usefulness of prediction. A correlation of 0.9 between any two
variables would produce better prediction results than a correlation of 0.6.

Conditions/Assumptions
1. The possible values of the independent variable, X, are fixed in advance.
2. The true relationship between the variables, X and Y, is linear and expressed by the
equation,
Y = a + bX +ei known as the regression equation. a and b are parameters of the population
and are estimated while ei is the random error. The equation is the line of regression of Y on
X. a is the Y intercept and b is the regression coefficient or the slope of the regression line.
3. The probability distribution of Y’s, given a fixed X, is normal.

Estimating the parameters, a, b


The most common method of obtaining the parameters is the Least Squares Method. The
least squares method is named such because the sum of squares of the vertical deviations of
the points from this line is less than the sum of squares of the vertical deviations from any
other line.

STEPS
1. The first step is to present the variables on a scatter diagram to be sure that the relationship
between the variables is linear.

58
2. Normal equations are solved to obtain the equations for the parameter estimates in raw
score form.
Y = a + bX
∑Y = na + b∑X
∑XY = a∑X + b∑X2

Slope (regression coefficient) b =


 XY - nXY OR b = r
Sy
 X  nX
2 2
Sx
The regression coefficient shows the increase in Y, the dependent variable as X the
independent variable increases by 1 unit.

Intercept, a =
 Y - b X OR a = Y  bX
n
The intercept is the point on the Y axis where X, the independent variable has a value of 0.

Example
The following scores were obtained in Quiz 1 and Final Examination.
Quiz 1 Final
Exam
X Y XY X2
18 75 1350 324
12 55 660 144
10 45 450 100
20 85 1700 400
15 65 975 225
15 65 975 225
14 60 840 196
10 60 600 100
12 50 600 144
11 50 550 121
18 70 1260 324
16 75 1200 256
9 45 405 81
13 60 780 169
17 70 1190 289

∑X = 210 ∑Y = 930 ∑XY = 13535 ∑X2 = 3098


r = 0.93 Sy = 11.77 Sx = 3.36 Y  62 X  14

b=
 XY - nXY =
13535 - 151462 13535 - 13025 510
= = =3.23
 X  nX 3098  1514
2
2 2
3098  2940 158
OR
 11.77 
b = 0.93   =3.22
 3.36 

59
a=
 Y - b X =
930  3.23210  251.7
= =16.8 OR a = Y  bX =62-3.23(14) = 16.8
n 15 15

Estimated equation: Yˆ  16.8  3.22 X


Final Exam score = 16.38 + 3.22 (Quiz 1 Score)

Use in prediction
After obtaining the estimates of a and b, the least squares regression line can be drawn using
two values for X (including X = 0 to obtain the intercept). Corresponding Y values are
obtained for the X values and these values are used to draw the estimated regression line.
Values can then be read from the regression line to obtain the predicted values.

Method 1
Given Yˆ  16.8  3.22 X ,
Select two values say 0 and 10 for x and compute the corresponding Y values.

For example:
X = 0, Y = 16.8 +3.22(0) =16.8
X = 10, Y = 16.8+3.22(10) =49
Plot the values (0, 16.8) and (10, 49) on the graph using a graph sheet and draw a
Straight line. Then estimate any value of Y given an X value on the regression line.

Yˆ  16.8  3.22 X =

49

16.8
0 10

Method 2
The estimated regression equation can be used by substituting the given X values to obtain
the predicted values for Y.

Given Yˆ  16.8  3.22 X .

60
i. What would be the exam score for a student who obtains 12.5 in Quiz 1?
Yˆ  16.8  3.22(12.5)  57

ii. A student obtained 72 in her exam. However, she did not take part in Quiz 1.
What would be an estimate of her Quiz 1 score?

72  16.8  3.22 X
72-16.8 = 3.22X
3.22X = 72 ─ 16.8
72  16.8
X= =17
3.22

UNIT EIGHT

THE NORMAL DISTRIBUTION


Nature It is regarded as the foundation of all statistical distributions. It can be referred to as
the ‘mother’ of all distributions. It is often regarded as the most important of all the statistical
distributions.

The horizontal axis is measured in terms of standard deviation units. The values decrease to
the left and increase to the right from the centre.

Suppose the standard deviation is 4 with a mean of 21. The distribution takes the form below.

-3  -2  -1  μ 1 2 3
9 13 17 21 25 29 33

61
Symbol
2
A variable which is distributed normally has the symbol, X ~ N( μ, σ ) where μ, is the
2
mean and σ , the variance. This is read as ‘the variable, X, is distributed as norma1 with
a given mean and a given variance.

Features
1. It is a bell-shaped curve.
2. It is unimodal.
3. It is symmetrical.
4. It is asymptotic.
5. The total area under the curve is 1.0.
6. The mean, mode and median are all equal.
7. When the values of a normal distribution have been converted to standard z-scores, a
standard normal curve is obtained. The standard normal curve has a mean of 0 and a
standard deviation/variance of 1.

Symbol: X ~ N (0, 1) read as ‘the variable, X, is distributed as norma1 with


Mean of 0 and variance of 1.

-3 -2 -1 0 1 2 3
Mean = mode = median = 0
The mean of 0 also means that the Z value is 0.

8. Areas under the normal curve. Note that these areas are obtained from the table on
normal distributions. Refer to Appendix to follow the areas.

One tail Two tail


1. μ+1σ = 0.3413 (34.13%) μ  1 = 0.6826 (68.26%)

2. μ+2σ = 0.4772 (47.72%) μ  2 = 0.9544 (95.44%)

3. μ+3σ = 0.4987 (49.87%) μ  3 = 0.9974 (99.74%)

62
4. μ+1.645σ = 0.4500 (45.00%) μ  1.645 = 0.90 (90%) Also 1.65

5. μ+1.96σ = 0.4750 (47.50%) μ  1.96 = 0.95 (95%)

6. μ+2.575σ = 0.4950 (49.50%) μ  2.575 = 0.99 (99%) Also 2.58

Basic applications
Finding Probabilities
1. The distribution for a Statistics examination is normal with a mean of 60 and variance
of 64 (i.e. X ~ N (60, 64). A student is selected at random from the class. What is the
probability that the student selected obtains a score above 68? Above 76? Below 52?
68  60
P (X>68) = P ( Z  )
8
8
= P(Z  )
8
= P( Z 1)

0 1
=0.5000-0.3413
= 0.1587
2. The distribution for a Statistics examination is normal with a mean of 60 and variance
of 64 (i.e. X ~ N(60,64). A student is selected at random from the class. What is the
probability that the student selected obtains a score between 52 and 76? Between 68
and 76?

52  60 76  60
P (52<X<76) = P ( Z  )
8 8
8 16
= P( Z  )
8 8
= P ( 1 Z  2)

─1 0 2
=0.3413 + 0.4773

63
= 0.8186

Finding performance levels

1. Given that a distribution of scores is normal, with mean 16 and standard deviation of 2.
About what percent of students obtained scores less than 12? More than 14?

12  16
P (X<12) = P ( Z  )
2
4
= P( Z  )
2
= P(Z<−2)

─2 0
=0.5000 − 0.4772
= 0.0228 (About 2%. Actual 2.28%)

2. Given that a distribution is normal, with a mean of 50 and a standard deviation of 10.
From a class of 2000 students, approximately how many students obtained scores above
70? Between 40 and 60?

70  50
P (X>70) = P ( Z  )
10
20
= P( Z  )
10
= P ( Z  2)
=0.5000-0.4772
= 0.0228
Number of students: 0.0228 x 2000 = 45.6 ≈ 46

3. In a promotion examination, a pass mark was fixed at 40. Given that the distribution is
normal, with a mean of 50 and a standard deviation of 5.1, approximately how many
students failed from a class of 400?

40  50
P (X<40) = P ( Z  )
5 .1
 10
= P( Z  )
5 .1
= P ( Z  1.96)

64
─1.96 0
=0.5000-0.475
= 0.025
Number of students: 0.025 x 400 = 10

APPENDIX A
NORMAL DISTRIBUTION TABLE
This z-table (normal distribution table) shows the area to the right hand side of the curve. Use these values to
find the area between z=0 and any positive value. For an area in a left tail, look at this left-tail z-table instead.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.5 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998

65
3.6 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.7 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.8 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999

Two-tailed Z-Table
This table shows the area to the left of Z. In other words, the area of a left hand tail. If you want to find the
value between z=0 and a positive number, use the right-hand z-table (above) instead (Hint: if you’re asked to
look at the “z-table”, in most cases you’ll want to be looking at the other z-table!)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.0120 0.0160 0.0199 0.5239 0.0279 0.0319 0.0359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6064 0.1064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

66

You might also like