Hanneke Loerts, Wander Lowie, Bregtje Seton - Essential Statistics For Applied Linguistics - Using R or JASP-Red Globe Press (2020)
Hanneke Loerts, Wander Lowie, Bregtje Seton - Essential Statistics For Applied Linguistics - Using R or JASP-Red Globe Press (2020)
Using R or JASP
ESSENTIAL
STATISTICS FOR
APPLIED
LINGUISTICS
ESSENTIAL
STATISTICS FOR
APPLIED
LINGUISTICS
Using R or JASP
HANNEKE LOERTS,
WANDER LOWIE AND
BREGTJE SETON
© Hanneke Loerts, Wander Lowie and Bregtje Seton, under exclusive licence
to Macmillan Education Limited 2013, 2020
Any person who does any unauthorized act in relation to this publication
may be liable to criminal prosecution and civil claims for damages.
The authors have asserted their rights to be identified as the authors of this
work in accordance with the Copyright, Designs and Patents Act 1988
Red Globe Press® is a registered trademark in the United States, the United
Kingdom, Europe and other countries.
This book is printed on paper suitable for recycling and made from fully
managed and sustained forest sources. Logging, pulping and manufacturing
processes are expected to conform to the environmental regulations of the
country of origin.
A catalogue record for this book is available from the British Library.
A catalog record for this book is available from the Library of Congress.
CONTENTS
PART 1 1
1. Types of Research 3
1.1 Introduction 3
1.2 Hypothesis generating vs. hypothesis testing 4
1.3 Case studies vs. group studies 7
1.4 Description vs. explanation 8
1.5 Non-experimental vs. experimental 9
1.6 Process research vs. product research 11
1.7 Longitudinal vs. cross-sectional 12
1.8 Qualitative vs. quantitative 14
1.9 In-situ/naturalistic research vs. laboratory research 15
1.10 The approaches taken in this book 18
3. Descriptive Statistics 27
3.1 Introduction 27
3.2 Statistics: Means versus relationships 27
3.3 Describing datasets: Means and dispersion 29
3.4 A different view on variability 35
3.5 Frequency distributions 37
4. Statistical Logic 43
4.1 Introduction 43
4.2 Independent versus dependent variables 43
4.3 Statistical decisions 45
4.4 The sample and the population 52
4.5 Degrees of freedom 55
4.6 Checking assumptions 55
v
CONTENTS
PART 2-R
Practicals in R/RStudio 127
Getting ready to start using R and RStudio 129
Why use R? 129
Download and start using R and RStudio 130
vi
CONTENTS
PART 2-JASP
Practicals in JASP 195
Getting ready to start using JASP 197
Why use JASP? 197
Download and start using JASP 197
vii
CONTENTS
References 239
Index 242
Packages used in 2-R 246
Functions used in 2-R 247
viii
PREFACE: HOW TO USE
THIS BOOK
ix
PREFACE: HOW TO USE THIS BOOK
the use of codes or syntax. This might initially cause frustration, especially for
students who are unfamiliar with scripts or programming languages. The prac-
ticals in Part 2 of this book are, however, also designed for those who have no
programming or statistical experience whatsoever. The structure of the practical
assignments is built up in such a way that you are first taken by the hand in a
step-by-step procedure and after that will be asked to apply the newly developed
skills on your own. Again, following this procedure will give you the biggest
chance of success. R might have a steep learning curve, but is invaluable for
most students and PhD students in Linguistics and especially those who would
like to conduct quantitative research using larger datasets. JASP is a program
that bears more resemblance to SPSS in the sense that it has a user interface
and buttons you can press, without having to use any code. An advantage of
JASP compared to SPSS, apart from that it is free, is that it is much more user-
friendly, that it also allows for so-called Bayesian analyses, and that the program
is based on R, which leads to R-like graphs and tables. The disadvantage of JASP
is that it has diminished functionality in relation to R. However, for basic anal-
yses, JASP is a very useful program. In addition to the theoretical part (Part 1)
and the practical assignments (Part 2), the companion website (macmillanihe.
com/loerts-esfal) contains How To units for both JASP and R/RStudio includ-
ing step-by-step explanations of how to perform the various tests discussed.
In our own research methodology course we always try to show that
everyone can learn to understand and apply basic statistics. In the past thir-
teen years, every single student who made a serious effort to acquire statis-
tical knowledge and skills has managed to do so using this approach, even
students who had initially considered themselves as ‘hopeless cases’ in this
respect! So there is hope, but a serious effort is required. For students who
really like the topic or who need to learn more about more advanced tech-
niques, we will offer opportunities to broaden their horizon with references
to advanced statistics, so that this group will be sufficiently challenged.
The preparation of this new edition of Essential Statistics for Applied Lin-
guistics would not have been possible without the valuable feedback from
our students (from the MA Linguistics and the Research Master in Lan-
guage and Cognition in Groningen) on several draft versions of the book.
We are particularly grateful to the patient and constructive student assis-
tants who helped us in setting up and testing the practicals and the How To
units, and their editorial assistance: Marith Assen and Mara van der Ploeg.
Thank you very much!
It may be obvious, but the data that are used as examples throughout the
book, in the practical assignments (Part 2) and in the How To units online,
are mostly made-up data to illustrate a point. The authors will not accept
any claims based on these data.
Hanneke Loerts
Wander Lowie
Bregtje Seton
x
PART
1
1 TYPES OF RESEARCH
1.1 Introduction
The field of Applied Linguistics (AL) is a large one and this means that
applied linguists are interested in many issues. Here is a random list show-
ing a variety of topics:
• The effectiveness of early bilingual education: how effective is an early
start?
• The relation between characteristics of linguistic input and language
development in early bilingual children.
• Assessment of problems of elderly migrants learning Swedish.
• The lag in development in language proficiency of migrant children or
children in deprived areas where a local dialect is spoken.
• The storage of multiple languages in our head: language selection and
language separation (How do we keep our languages apart?).
• Can you ‘block’ a language while listening the way you do while speaking?
• The impact of learning a third language on skills in the first language
(L1) and the second language (L2).
• The role of interaction in the language classroom: who is talking, what
is the input?
• What is the impact of ICT on language learning?
• How can a threatened language be protected?
• Are Dutch people better at learning English than, for example, German
learners?
• Why are prepositions in an L2 so difficult to learn?
• How can a forgotten language be reactivated?
This list could be extended for pages. A quick look at the Current Contents
list of journals in Arts and Humanities, which shows the tables of contents
of over 1000 journals, will make clear that the creativity has no limits and
that even for journals focusing on second language development, the range
of issues is breathtaking.
3
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 1.1
There are many topics of research, but the range of types of research
in AL is much more limited. In this chapter we want to give a systematic
overview of different types of research: what are relevant distinctions and
how are different types related? There will be no single optimal method for
all research topics, since each topic can be worked out in different ways. For
your understanding of the research literature, it may be useful to become
acquainted with the major categories, so you know what to expect and
how to evaluate the use of a particular design. Also, some familiarity with
research terminology and categorization will be helpful in finding relevant
research on your own topic.
For clarity’s sake, we will make use of contrasting pairs of types of
research, but it should be stressed from the outset that these contrasts are
actually far ends on a continuum rather than distinct categories, and that
the contrasts are all dimensions that may partly overlap.
4
Types of Research
such as the use of gestures with motion verbs. This means that for relatively
unexplored topics, we may first have to run some exploratory studies to
generate hypotheses that can be tested in subsequent research.
The next step in the research cycle is to test the hypotheses we have gen-
erated. In research reports, we often see phrases like ‘In this study, we test
the hypothesis that …’. However, the formulation of appropriate hypotheses
is not always obvious. For instance, if someone claims to ‘test the hypothesis
that after puberty a native level of proficiency can no longer be attained’,
then we may wonder what that actually means: is that true for every learner,
no matter what? If only one single individual can be found who can achieve
this, is the hypothesis then falsified? A hypothesis needs to be narrowed
down as far as possible to show what will be tested and what outcomes
count as support or counter evidence.
ACTIVITY 1.2
Formulating a research hypothesis
It is not easy to formulate a research hypothesis that is not too broad
and not too narrow. The more specific the hypothesis is, the better
the chance to test it successfully. The development of a research
hypothesis typically goes in stages. Consider a hypothesis like the
following:
Still, this is rather broad and some concepts are not clear, such as the
definition of ‘elderly’ and ‘middle-aged’. For the hypothesis this will
do, but in the description of the population the age range will have to
be made clear. Likewise, do you also want to include elderly people
suffering from dementia or other diseases? And do you want to test
every part of the language system? Maybe it is better to limit the
study to syntax, morphology, lexicon, or fluency. And do you want
to look at all second languages? How about the level of education,
which is likely to play a role? Narrowing the hypothesis down further
could result in something like:
‘Healthy elderly people forget words in their first second language more
quickly than education-matched middle-aged people.’
5
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
6
Types of Research
7
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
8
Types of Research
at the learning results of the group that used method 1 and the one that used
method 2 together, we will find some learners who improved a lot, some less,
and some not at all. If we look at the groups for methods 1 and 2 separately,
we may find that for one method most learners have improved a lot, while for
the other method learners improved only slightly or not at all. Even though
there is a great deal of variation in the two groups taken together because
there will be good and bad learners, there may be less variation within the
groups than between the groups. In statistical terms, this is referred to as
‘variance explained’, variance being a specific type of variation. In this par-
ticular example, the variation between the groups can largely be explained
by the different methods they were using. The goal in experiments is to
explain as much variation as possible, as that will tell us to what extent we
can explain a given effect. Again, this is not an explanation in the theoretical
sense, but it is a description of the effect of one variable on another.
9
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 1.3
……………………………………………………………………….........................................
The supermarket
You are standing in line at the checkout with a shopping trolley full
of groceries. You are late for an important meeting. There is one
man in front of you. What do you say to the man in front of you?
……………………………………………………………………….........................................
10
Types of Research
The use of such tasks allows for systematic variation of the variables, but it
is of course not a natural setting or real conversation. Ideally, data from such
controlled experiments should be validated through a comparison with real
conversational data.
11
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
that seem to have an effect at the individual level to testing that effect on a
larger sample to get an estimate of its strength and then back to the individ-
ual level again to study the impact in more detail. An example could be the
role of motivation: its effect may be suggested when we look at the learning
process of an individual learner who indicates why she was motivated to
invest time and energy in learning a language at some moments in time and
not at all at other moments. To know more about the potential strength of
such a factor, we may then do a study on a larger sample of similar learners
that we compare at one point in time (a product study). With that informa-
tion, we can go back to individuals and see how it affects them. The general
pattern will be less detailed and typically will not give us information about
change of motivation over time in the same way that an individual case
study can.
ACTIVITY 1.4
12
Types of Research
years, and transcripts of the recordings are analysed with respect to rele-
vant aspects, such as mean length of utterance and lexical richness. But also
other types of development can be studied longitudinally: Hansen et al.
(2010) looked at the attrition of Korean and Japanese in returned missionar-
ies who typically acquired the foreign language up to a very high level, used
it in their work as missionaries, but hardly ever used it after they returned.
This study is unusual, because it is longitudinal with only two moments
of measurement in 10 years. In many longitudinal studies there are more
moments of measurement with smaller time intervals. Longitudinal studies
often take long; even three-year data collection periods may be too short to
cover a significant part of the developmental process. And funding agencies
are not very keen on financing projects that take more than four or five
years to generate results. Therefore, the number of longitudinal studies is
small, but those projects (like the European Science Foundation study on
untutored L2 development in different countries (see Klein & Perdue, 1992;
Becker & Carroll, 1997)) have had a major impact on the field.
Because of the time/money problem that characterizes longitudinal stud-
ies, many researchers use cross-sectional designs. In cross-sectional research,
individuals in different phases of development are compared at one moment
in time. For the study of the development of morphology in French as an
L2, a researcher may compare first, third, and fifth graders in secondary
schools in Denmark. Rather than follow the same group of learners for four
years as they progress from first to fifth grade, different groups in the three
grades are compared at one moment in time.
Both longitudinal designs and cross-sectional designs can have their
problems. In longitudinal studies, the number of participants is generally
very small because a large corpus of data is gathered on that one (or very small
group of) individual(s). Large numbers of participants would make both
the data collection procedure and the processing and analysis of the data
extremely demanding and time-consuming. Small numbers, on the other
hand, mean that the findings may be highly idiosyncratic and difficult to
generalize. As we discussed in 1.6, this may not be a problem in studies that
use the uniqueness of the individual’s development as the c entral issue, as is
normally the case in CDST approaches to language development. Another
problem of longitudinal studies is subject mortality, that is the dropping out
of subjects in a study. With each new measurement, there is a risk of subjects
dropping out, and the longer and more demanding the study, the higher the
risk of drop-out. An additional problem is that in such studies drop-out is
typically not random, but selective or biased: in a study on acquisition or
attrition, subjects that do not perform well will be more likely to lose their
motivation and drop out than more successful ones, leaving a biased sample
that is even less generalizable.
Cross-sectional designs can also be problematic, because the assumption
that the three groups that are compared behave like one group tested lon-
gitudinally may not be true. There may be specific characteristics of differ-
ent age groups, such as changes in the school curriculum, natural disasters,
13
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
From these definitions it follows that the two approaches differ fundamen-
tally in epistemological terms and with respect to the research methods used.
Qualitative research is holistic, trying to integrate as many aspects that are
relevant into one study. It is also by definition interpretative and therefore
in the eyes of its opponents ‘soft’. In qualitative research a number of tech-
niques may be used, such as diaries of learners, interviews, observations, and
introspective methods such as think-aloud protocols (see Mackey & Gass,
2005, Chapter 6, and Brown & Rodgers, 2002, Chapters 2–4 for discussions
of various methods). One of the main problems is the lack of objectivity in
those methods: in all these methods, the researchers interpret what is going
on, and some form of credibility can only be achieved through combinations
of data (triangulation) and the use of intersubjective methods, in which the
interpretations of several ‘judges’ are tested for consistency. All of this may
not satisfy the objections raised by hard-core quantitativists. For them, only
objective quantitative data are real ‘hard’ data. Such data are claimed to be
14
Types of Research
ACTIVITY 1.5
ACTIVITY 1.6
16
Types of Research
Description Explanation
Non-experimental Experimental
Longitudinal Cross-sectional
Qualitative Quantitative
17
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
18
2 SYSTEMATICITY IN
STATISTICS: VARIABLES
2.1 Introduction
“A Pearson r revealed that there was a significant negative relation-
ship between age of acquisition and proficiency level r(27) = −.47,
p = .01.”
Many students tend to skip the results sections of research reports, because
they do not understand the meaning of sentences like the one exemplified
above. The unfortunate consequence of this is that they will not be able to
fully assess the merits and weaknesses of those studies. Moreover, they will
not be able to report on their own studies in a conventional and appropriate
way. This is not only unfortunate, but also unnecessary. Once the underlying
principles of statistics are clear, understanding statistics is a piece of cake!
The purpose of this book is to come to an understanding of some ele-
mentary statistics, rather than providing a full-fledged statistics course.
We will demonstrate why it is necessary for many studies to apply statistics
and which kind of statistic is normally associated with which kind of study.
After studying this chapter and the next three chapters, you will already be
able to understand many of the statistical reports in articles and apply some
very basic statistics to your own investigations. We will do this by encour-
aging you to think about problems in second language research through a
set of tasks. Wherever possible, we will ask you to try and find your own
solutions to these problems before we explain how they are conventionally
solved. Obviously, this learning-by-doing approach will only work if you
work seriously on the tasks, rather than skip to the solution right away!
In the previous chapter, we discussed the relationship between theory
and empirical observations for different types of research. In the current
chapter, we will move on to a more practical application of these observa-
tions: doing and interpreting research. Since an understanding of research
based on traditional theories is crucial for a full appreciation of the field of
Applied Linguistics, we will focus on the more traditional methodologies,
and will occasionally refer to approaches that are more appropriate for the
investigation of non-linear development.
One of the most important characteristics of any type of research is that
it is carried out systematically and according to research conventions that
are generally agreed upon. The purpose of the current chapter is to discuss
the most relevant of these conventions and to outline the systematicity of
empirical research.
19
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 2.1
75 75 75
Score
Score
Score
50 50 50
25 25 25
0 0 0
0 20 40 60 0 20 40 60 0 20 40 60
AoA AoA AoA
The ‘correct’ answers to the questions in Activity 2.1 are not very relevant.
The purpose of the exercise is to make you aware that it can be very difficult
to make decisions based on empirical observations. In the above activity, you
probably noticed that it is relatively easy to say something about the relation-
ship between the two variables on the basis of plot C and the absence of such
a relationship on the basis of plot A. The data in plot B, however, are more
difficult to base conclusions on. The question is where to draw the line between
20
Systematicity in Statistics: Variables
ACTIVITY 2.2
21
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
The answers to the questions asked in Activity 2.2 may vary wildly
and may yield interesting discussions about the validity of empirical stud-
ies. However, the most important point appearing from this activity is that
there is not one correct answer to these questions and that it is always the
researcher’s responsibility to carefully operationalize constructs into varia-
bles. Another important point is that all of the variables operationalized in
Activity 2.2 take a synchronic perspective, which appears from the addi-
tional phrase ‘at one point in time’. It will be obvious that research designs
for the investigation of the development of these factors over time may be
considerably more complex and require different methods and techniques
from the ones needed to investigate one time slice only.
There are two more points we need to discuss in relation to variables:
the type of variables and their function in an empirical study. Answering
the questions in Activity 2.3 below will make you aware of the differences
that can arise between different types of variables. The different functions
of variables in statistics will come back in Chapter 4.
ACTIVITY 2.3
Each of the four questions in Activity 2.3 includes a variable that rep-
resents an underlying construct. For our purpose, the underlying con-
struct is not relevant this time, as we will concentrate on the variables
themselves and the operations we are allowed to carry out with different
types of variables. The calculation of the first one, a person’s height as
measured in metres, will not pose any problems. The average length can
be calculated by the sum of the values, divided by the number of val-
ues: (1.58+1.64+1.70)/3. The answer is 1.64m. The second question is
22
Systematicity in Statistics: Variables
23
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 2.4
24
Systematicity in Statistics: Variables
25
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
26
3 DESCRIPTIVE STATISTICS
3.1 Introduction
Basically, two types of statistics can be distinguished: descriptive and induc-
tive statistics. Inductive statistics help us in making decisions about the results
of a study. They usually involve hypothesis testing, in which the results of a
small sample are generalized to a larger population; something we will return
to in Chapter 4. Descriptive statistics are used to describe the characteristics
of a particular dataset or to describe patterns of development in longitudinal
analyses. This chapter will focus on the most important notions of descrip-
tive statistics that are commonly used in Applied Linguistics: means, ranges,
standard deviations, z-scores, and frequency distributions. These descriptives
are used to describe observations for two different kinds of perspectives on data:
means and relationships. ‘Means’ are used when we want to compare different
groups, like different language backgrounds or different sexes. Statistics within
the ‘means’ perspective are used to compare the mean value of one group (e.g.
males) to another (e.g. females) to see whether they differ. ‘Relationships’, on the
other hand, are used when we are not interested in contrasting groups, but want
to say something about the relationship between variables measured within a
group, such as reading skills and writing skills, or age and proficiency scores.
80
L1
Score
60 German
Spanish
40
20
10 20 30 40
AoA
ACTIVITY 3.1
For the first question of Activity 3.1, you are interested in examin-
ing whether L2 proficiency score decreases as age of acquisition increases.
In other words, you are interested in the relationship between two interval
variables and you are mainly focusing on individual data points. Such a
relationship can nicely be visually examined using a scatterplot such as the
one in Figure 3.1. On the contrary, when looking at the second question,
you are not necessarily interested in individual data points, but rather in
comparing the data of two different groups. In this case, these groups are
divided on the basis of the variable ‘L1 background’, a nominal variable
with two levels. Although you might be able to say something about the
28
Descriptive Statistics
100
80
Score
60
40
20
German Spanish
L1
Figure 3.2 Boxplot
This boxplot visualizes the difference between proficiency scores of the people with a German L1
background (in light grey) and those with a S panish L1 background (in dark grey).
difference in scores between the German (light grey) and Spanish (dark grey)
data points on the basis of the scatterplot, a look at how the group per-
forms in general is likely to be more useful. Consider, for example, the
boxplot in Figure 3.2.
The boxplot is based on exactly the same dataset as the scatterplot
in Figure 3.1, but the focus is completely different. A scatterplot shows a
relationship between all available data points, while a boxplot summarizes
the data points for a group of individuals taken together. As you may have
realized after having completed Activity 3.1, this focus depends on the meas-
urement scales of the variables. From Chapter 5 onwards, we will further
discuss the differences between the statistical tests that are needed to answer
questions that either focus on a relationship between interval variables or on
group comparisons. First, we will focus on describing and visualizing data,
as such summaries are a crucial first step in understanding and getting to
know your data, regardless of your research questions and the focus of your
research.
29
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
of that dataset. The mean can therefore also be seen as a simple statis-
tical model (Field et al., 2012, p. 36) in that it constitutes a summary
or simplified approximation of the real dataset. To calculate the mean
value, add the values in the dataset and divide the sum by the number
of items in that dataset. This operation is summarized by the following
formula:
X=
∑X (3.1)
N
–
where: X = mean
Σ = sum (add up)
X = items in the dataset
N = number of items
ACTIVITY 3.2
The last dataset in Activity 3.2 clearly shows that it is not possible to per-
form mean calculations for each and every possible dataset. As explained in
Chapter 2, means can only be calculated for interval data and not for o rdinal
or nominal data. Therefore, we will exclude this dataset from our further
discussion. The answer to the first question in Activity 3.2 shows that all
the datasets have the same mean value. This goes to show that although a
mean value may reveal one important characteristic of a dataset, it certainly
does not give the full picture. Two additional descriptive statistics relating
to the central tendency are the mode and the median. The mode requires
no calculations, as it is simply the value that occurs most frequently. Some
distributions have more than one mode (which are then called bimodal,
trimodal, etc.) and some have none. The median is the point in the dataset
that is in the middle: half of the data are below this value and half of the
data are above.
30
Descriptive Statistics
Looking at the mean, mode, and median provides some useful informa-
tion about the dataset, but it does not tell us exactly how similar (as in the
second set) or different (as in the fourth set) all the items in the set are to
each other. In other words, the mean as a statistical model may represent all
data values perfectly (as in the second set) or it may not be such a good rep-
resentation (as in the fourth set). To find out more and describe the dataset in
a better way, we need details about the dispersion of the data. One of these is
the range of a dataset, which is the difference between the highest and lowest
value in the set. To calculate the range, take the highest value and subtract
the lowest value.
ACTIVITY 3.3
What are the mode, the median and the range in Activity 3.2?
Provide the mean, the mode, the median, and the range of the
following dataset:
1, 2, 2, 2, 3, 9, 9, 12
The first dataset from Activity 3.2 has no mode, as all values occur
only once. In the second set the mode is 6, in the third it is 4, in the
fourth it is 1, and in the final one it is D. The medians in the datasets
in Activity 3.2 are 6, 6, 6, and 4 respectively. The ranges of the data-
sets in Activity 3.2 are 6 (9 – 3), 0 (6 – 6), 6 (10 – 4), and 13 (14 – 1)
respectively.
One of the problems of the range is that it is rather strongly affected
by extreme values. For instance, for the dataset 1, 2, 3, 4, 4, 5, 5, 6, 6,
8, 29 (henceforth variable a) the range will be 28, but that is not very
telling for this dataset. What is therefore often done is to cut off the
extreme values (25% of the values on either end) and then take the range
(see also Figure 3.3). This is called the interquartile range. To calculate
the interquartile range, you will have to divide the data into four equal
parts. First, determine the median. The median will divide the data into
two equal parts. Then take the median again, but now for each of these
parts. The median of the lower half is called the lower quartile; the
median of the upper part is called the upper quartile. The range between
the lower quartile and the upper quartile is called the interquartile
range. In the example above, the median is 5, the lower quartile is 3,
the upper quartile is 6, and so the interquartile range is (6 − 3) = 3. The
quartiles divide the dataset into four equal parts:
1 2 3 4 4 5 5 6 6 8 29
31
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 3.4
30
outlier
20
maximum
upper quartile
value
interquartile
range
10 median
lower quartile
minimum
0
a b
variable
32
Descriptive Statistics
ACTIVITY 3.5
9 9–6 3
Σ 0
The next step would be to divide the sum by the number of items. How-
ever, in this case, we have a problem because the sum is zero, which is caused
by the positive and negative values that are set off against each other. One
way to solve this problem is to apply a common mathematical trick: we
square all the values and get rid of the negative values.1 We are allowed to
1 An alternative solution is to take the absolute values of the numbers. This can also be
done, but the result is a different measure. We will only discuss the most commonly
used way of calculating average dispersion.
33
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
do so, as long as we undo this operation by taking the square root at the very
end of the calculations. The following table illustrates the squaring of each
distance from the mean (Table 3.2):
4 4−6 −2 4
5 5−6 −1 1
6 6−6 0 0
7 7−6 1 1
8 8−6 2 4
9 9−6 3 9
Σ 0 28
–
Notes: For each and every individual value in the dataset, we subtract the mean value (X – X ) and
square the result. Then we take the sum (Σ) of the squares and take the square root of that value.
The number of items in the list (N) is 7 and, for reasons that will be
explained in the next chapter, statistical formulas often use the number of
items minus 1 (N – 1). So, to get the variance (s2) of this dataset, we take the
sum of all squared distances and divide it by (N – 1), so the result so far is 28/6
= 4.67. The final step is to take the square root of 4.67, which is 2.16. So, the
average dispersion of the first dataset in Activity 3.5 is 2.16. This value is com-
monly referred to as the standard deviation (SD). The complete formula for the
standard deviation reads as follows and should now be completely transparent:
SD =
∑(X − X )2 (3.2)
(N − 1)
ACTIVITY 3.6
34
Descriptive Statistics
dataset will emerge. Imagine that someone told you that a group of
German learners on average scored 74.7 (SD = 16.9) on a proficiency
test and another group of Spanish learners overall scored 59.7 (SD =
21.1). This information should now be enough for you to visualize
an overall higher score for the German learners, but an overall larger
spread in scores for the Spanish. In other words, this is a very brief,
but informative written summary of the visual information we saw in
the boxplot in Figure 3.2.
The SD thus provides information about how similar or different the
individual values in a dataset are. Moreover, it can sometimes be used to
provide information about how an individual item in the dataset is related
to the whole dataset: we can calculate how many standard deviations an
individual score is away from the mean. The number of standard deviations
for an individual item is called the z-score: (the score – the mean)/SD. Let
us illustrate this with another example. Suppose a group of students has
taken a 100-item language test. Now suppose the mean score for all the
students is 60 (correct items) and the standard deviation is 8. A person who
has a score of 68 can then be said to have a score that is exactly 1 standard
deviation above the mean. That person’s z-score is therefore +1. The z-score
of someone who scored 56 on that test is half a standard deviation below
the mean, which is −0.5, and a person who scored 80 has a z-score of 2.5
(80 − 60 = 20; 20/8 = 2.5). The advantage of the z-score is that it gives an
immediate insight into how an individual score must be valued relative to
the entire dataset.
ACTIVITY 3.7
35
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Jorge
80
70
60
50
40
30
20
10
0
0 5 10 15 20
Figure 3.4 Min-Max variability analysis depicting the change over time of
‘don’t V’ negation
Notes: the dots represent the percentage of use of this type of negation in each of the recordings:
the top line is the maximum value and the bottom line the minimum value
36
Descriptive Statistics
2 1
3 1
4 3
5 7
6 11
7 9
8 5
9 2
10 1
ACTIVITY 3.8
37
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
11
10
9
Number of occurrences
1 2 3 4 5 6 7 8 9 10
Score
10
8
8
Frequency
Frequency
6
6
4
4
2
2
0
0 2 4 6 8 10 0 2 4 6 8 10
Score Score
Figure 3.5 Frequency distributions
The different kinds of graphs help us visualize how the data is dispersed.
This brings us to an interesting phenomenon that characterizes frequency
distributions that are based on natural data like human behaviour. If there
are very many data points, the frequency distribution will often result in the
same bell-shaped line graph that is commonly referred to as a normal distri-
bution. This phenomenon will be used as a reference point for almost all of
the statistics discussed in the remainder of this chapter. Figure 3.6 provides
an example of the normal distribution.
38
Descriptive Statistics
Number of students
42 51 60 69 78
Vocabulary Score
Figure 3.6 Frequency polygon representing a normal distribution
This frequency polygon (a graph type based on a histogram to visualize the overall pattern of
frequency distributions) represents the scores of a very large number of scores on a vocabulary
test.
39
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Figure 3.7 The normal distribution with SDs and percentile scores
ACTIVITY 3.9
100 people have taken a reading test. The test contains 11 items. The
results are as follows:
Score Frequency
11 2
10 1
9 2
8 15
7 20
6 30
5 13
4 7
3 5
2 3
1 2
0 0
40
Descriptive Statistics
Frequency
Frequency
ACTIVITY 3.10
41
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
A B C D
Variable
42
4 STATISTICAL LOGIC
4.1 Introduction
In the previous chapter, we discussed descriptive statistics that can be
used to get a first impression of the data. The next step could be to apply
inductive statistics, which will help to evaluate the outcome of the data.
To illustrate this, let us go back to the data in Activity 2.1 concerning the
relationship between age of acquisition and proficiency score. We could
add some descriptive statistics to the data, but the descriptives, in com-
bination with the scatterplot, will not provide us with the tools to esti-
mate how certain we can be that the observed relationship is not based
on a coincidence. Moreover, since it is generally impossible to test entire
populations, researchers normally select a small representative sample.
Inductive statistics will help us in generalizing the findings from a sample
to a larger population. However, there are some important assumptions
underlying inductive statistics, which require testing by descriptive sta-
tistics. For instance, many inductive statistical tests constitute so-called
parametric tests that have been developed with the assumption that the
data that are analysed approximately follow a normal distribution. This
means that when a sample shows data that are not normally distributed,
we will have to select tests that are not based on this assumption, so-called
non-parametric tests. In this chapter, we will introduce the logic behind
inductive statistics and discuss the issue of generalization from samples to
populations. Before we can explain the logic behind inductive statistics,
there are a few other concepts that need to be introduced, which have to
do with operationalization.
43
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 4.1
It can be argued that the dependent variable (like the vocabulary knowl-
edge in our example above and running time in Activity 4.1) changes as a
result of the independent variable (the presence or absence of instruction
or the amount of training). What is measured is normally the dependent
variable; what we want to investigate is the influence of the independent
variable. In many experimental studies, more than one independent var-
iable is included at the same time. In more complex designs, more than
one dependent variable may be included. For instance, in our example of
instruction and vocabulary knowledge, we could include an additional
independent variable, such as the L1 of the participants. In this study,
we would then have one dependent variable (vocabulary knowledge) and
two independent variables (instruction, L1). The number of independ-
ent variables we can include is not limited, but the interpretation of the
data may become increasingly complex. Sometimes, researchers explicitly
decide not to include a variable in a study if they are not interested in
the effects of this variable and want to exclude any effect such a variable
44
Statistical Logic
may have on the dependent variable. This can be done by not changing
the variable, or in other words, keeping the variable constant. Such a var-
iable is then referred to as a control variable. Applied to the vocabulary
study described above, the researcher may decide to include only female
learners in the study. We can then say that gender is a control variable in
this study. The dependency relationship between variables is not equally
relevant to all types of statistical studies, such as studies examining rela-
tionships between variables; this is something we will return to in the next
chapter (Section 5.2).
ACTIVITY 4.2
45
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
A 45
B 45
C 45
40 40 40
Group
Score
Score
Score
35 35 35 Instruction
No Instruction
30 30 30
25 25 25
Figure 4.1 Boxplots showing the dispersion in the data of the instructed
group (light grey) and the uninstructed group (dark grey) in three hypothetical
situations
ACTIVITY 4.3
46
Statistical Logic
ACTIVITY 4.4
Really cheated?
Yes No
Decision made
Excluded OK α-error
4.3.2 Hypotheses
To avoid logical errors, one of the first steps of conducting a statistical study
is to formulate specific research hypotheses about the relationship between
the dependent and independent variables. When we apply this reasoning to
the strongly simplified example about vocabulary acquisition by instructed
learners and uninstructed learners, we can formulate three possible hypoth-
eses: instructed learners do better, uninstructed learners do better, or there
is no difference between the instructed and uninstructed learners. For each
of these hypotheses, a decision table can be set up like the one in Table 4.1.
However, the main hypothesis that is tested is the null hypothesis (H0), which
states that there is no difference. The other two possible hypotheses are
referred to as alternative hypotheses (H1 and H2): the instructed learners per-
form better, or the uninstructed learners perform better. The reason for stat-
ing the null hypothesis is that it is impossible to prove a hypothesis right, but
it is possible to prove a hypothesis wrong; in other words, we can only reject
a hypothesis. This principle of falsification, which was already briefly intro-
duced in Section 1.2, is best illustrated with another example. Suppose we
want to investigate whether all swans are white. If we formulate our hypoth-
esis as ‘all swans are white’, we can accept this hypothesis only after making
sure that all swans in the world are indeed white. However, we can easily
reject this hypothesis as soon as we have seen one non-white swan. Therefore,
hypotheses are always formulated in such a way that we are able to attempt to
reject them, rather than accept them. In research terms, this means that we
will normally test the rejectability of the null hypothesis. In our vocabulary
learning example, we would thus test the hypothesis that there is no differ-
ence between instructed and uninstructed L2 learners. If we can reject this
hypothesis beyond reasonable doubt, we can implicitly accept the alternative
hypothesis that there is a difference. The mean scores of the instructed and
the uninstructed learners will then reveal which group performs better.
Suppose one group does indeed score better than the other. If the groups
we are testing are our only concern, we need no further analyses. But how
48
Statistical Logic
can we be sure that this difference in scores in the sample can be generalized
to an actual difference in reality? In other words, if you performed the test
again and again, with new people in the instructed and uninstructed group,
would you find the same difference? Or, to put it differently, how can we
say beyond reasonable doubt that the findings in our sample were not found
just by chance? And what do we mean by beyond reasonable doubt? These
questions are related to the alpha error that was introduced in the previous
paragraph. The alpha error concerns the possibility that we incorrectly reject
the null hypothesis (and thereby implicitly incorrectly accept the alternative
hypothesis). This is illustrated in Table 4.2.
To avoid the possibility that we reject the H0 incorrectly, we try to calcu-
late the degree of chance that might have been involved in obtaining these
scores by chance. In other words, we want to obtain a value that signifies the
exact probability of us committing an α-error, that is the chance of finding
an effect in our study while it does not exist in reality. If we assume that the
distribution of scores in our groups follows the normal distribution, then we
can quantify which scores are the least likely to have occurred. Recall the
frequency polygon of the normal distribution (see Figure 3.7). The small
areas on either side of the mean beyond 2 SDs refer to a little over 2% of
the scores. These scores are less likely to occur than the scores around the
mean, and might have occurred by chance. Therefore, scores in this area
are related to the conventionally accepted probability of an alpha-type error
(abbreviated as p for probability), which is 2.5% on either side of the distri-
bution, and 5% in total.
To elaborate on this a little further, let us go back to the data in Figure
3.6 concerning vocabulary scores. This normally distributed dataset is plot-
ted once more in Figure 4.2 on the left. Imagine we tested a second group of
students who received a new teaching method as compared to our original
group and they scored 96 on average, as visualized on the right in 4.2. Our
original group had a mean score of 60 (SD = 9) and, as we discussed on the
basis of the normal distribution, a score below 42 (−2 SD) or above 78 (+2
SD) would be highly unlikely (around 5% in total). If a person from our
Reality
H0 false H0 true
Decision made
H0 rejected OK α-error
H0 accepted β-error OK
49
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Number of students
42 51 60 69 78 87 96 105 114
Do not reject H0 Reject H0
original group told us that they scored 96, we would highly doubt their state-
ment as the chance of such a score occurring is extremely small. On the
contrary, if a person from the group who received the new teaching method
told us that they scored 96, we would probably immediately believe them.
Using that same reasoning, imagine someone cannot remember which
group they were in, but really needs to know what teaching method they
received. The only thing they do know is that their score was 105. In that
case, you can quite safely state that they differ from the original group and
hence tell them that they most likely received the new teaching method.
But what if someone came up to you and said they scored 78? In that case,
you cannot really say that there is a bigger chance of this score occurring
in the first or the second group as the chances of obtaining such a score are
approximately equal in both groups (around 2%). If you said they belonged
to the original group, you might be committing a beta error, that is saying
that they do not differ from the original group when they in fact do if they
were in the new group. If you told them that they were in the new group,
you might be committing an alpha error, that is saying that they differ from
the original group when in fact they may not as they were in reality part
of the original group. You can probably imagine that the degree of overlap
between the curves determines the probability of making these kinds of
mistakes. More specifically stated, the chance of making an alpha error
decreases the further the two curves are apart. When performing inductive
statistics, we want to quantify the probability of making an alpha error and
this probability is reflected in the p-value.
4.3.3 Significance
In the previous paragraphs, we have explained how we can quantify ‘beyond
reasonable doubt’ justified by general observations about the normal distri-
bution. Yet, the interpretation of ‘reasonable’ varies among disciplines and
is strongly based on conventions. The conventionally accepted maximum
chance of incorrectly rejecting the null hypothesis is 5% or less (α = .05),
50
Statistical Logic
but in cases of life or death, such as clinical trials, the chance of making an
alpha error is usually set to 1% or less (α = .01). It is important to realize
that it is the researcher’s choice to set the alpha level and thereby define the
term ‘beyond reasonable doubt’. However, researchers will have to follow
the conventions. For instance, it may seem attractive to set the alpha level
at 50%, so that we make it easy to reject the H0, but logically this will not
be accepted. On the contrary, making it difficult to reject the H0 will make
a research conclusion more convincing. The selected chance of incorrectly
rejecting the null hypothesis is closely related to the level of significance, that
is the p-value. A significant result is one that is acceptable within the scope
of the statistical study and a report should include both an interpretation of
the significance as well as the exact probability of this interpretation being
incorrect. An expression like ‘a statistical analysis showed that the difference
between the two groups was significant’ must always be followed by, for
example, ‘p = .02’. This should be interpreted as ‘I accept that there is a dif-
ference between the two groups, and the chance that I have made an error
in assuming this is approximately 2%’.
ACTIVITY 4.5
We have now seen that in statistics we can never say that a hypothesis is
true; we can only express the chance of making the wrong decision. Con-
ventionally, if that chance is smaller than 5% or 1% (depending on how
serious it is to make the wrong decision), this is taken to be acceptable. In
Figure 3.7 (of the normal distribution), we saw that values between z = –2
and z = 2 make up approximately 96% of the scores of the entire population
(95.72%, to be precise). What z-scores do you think will belong to the 95%
confidence interval? The answer should be ‘slightly less than 2’, which also
corresponds with the two lines reflecting the values 42.36 and 77.64 in
the left curve in Figure 4.2. To be precise, these lines reflect the z-scores
–1.96 and 1.96, which are the critical values related to the top 5% of the
distribution. So on either side of the ‘tails’ of the distribution we find 2.5%
beyond these points. If a new group of people scored anywhere between
42.36 and 77.64, we would not reject H0 and we would conclude that there
is no significant difference between this new group and our original group
with respect to their vocabulary scores. We would reject H0, however, if a
new group’s average exceeded the critical values, that is if they scored lower
51
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
than 42.36 or higher than 77.64. If the starting point is that the chance of
making the wrong decision should be smaller than 1%, then the critical
values (z-scores) for being 99% certain about a decision are –2.58 and 2.58
(or 36.78 and 83.22 in our vocabulary example).
ACTIVITY 4.6
Think of two different null hypotheses that test one aspect of second
language acquisition. One of these should clearly be a ‘two-tailed’
problem (so there are two alternative hypotheses) and one should be
a ‘one-tailed’ problem (so there is only one alternative hypothesis).
For each, formulate the null hypotheses and the alternative
hypotheses as precisely as possible.
52
Statistical Logic
53
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 4.7
σ
SE = (4.1)
N
ACTIVITY 4.8
A statistic that you will regularly come across in JASP and R output is
the standard error of the mean (SE). This number expresses how well
the sample mean matches the population mean. It should officially
be calculated by dividing the standard deviation of the population
(σ) by the square root of the sample size. Since we mostly do not
know σ, the standard deviation of the sample (SD) is normally used
to estimate the value of SE.
What is the SE when the SD is 9 in a test with 9 participants?
What is the SE when the SD is 10 in a test with 100 participants?
Explain in your own words what the effect is of sample size on
the value of the SE.
54
Statistical Logic
For statistical calculations, the difference between the sample and the
population has some important consequences. The generalizability of the
results found in the sample to the entire population will ‘automatically’ be
taken into account in the calculations done by, for instance, JASP or R.
We have already seen one example of this: in Equation 3.2 of the SD, we
used N – 1 for the number of participants in the group, rather than simply
N, and the reason for this is explained in the following section.
ACTIVITY 4.9
How many participants does each group have, where the numbers in
each group are the same, for the following dfs and number of groups:
df = 28, 2 groups?
df = 57, 3 groups?
55
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
56
Statistical Logic
57
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
(such as the difference between the groups), we want to be 80% certain that
we really find that effect in our sample. This number, 80%, is the desired
power of an experiment: the power is 1 – β, since β is .20. Let us briefly
return to the two groups we compared in Figure 4.2. Imagine that the two
groups are indeed different, but their scores overlap a lot more. The area of
overlap would also constitute an increased chance of making a beta error
(β) and incorrectly concluding that the groups do not differ. In the current
situation, a person scoring 75 would most likely belong to the old teaching
method group as the chance of obtaining such a score with the new method
is very small (or at least smaller than 5%). If, however, the mean of the
second group was 87, then the likeliness of obtaining a score of 75 in the
old teaching method group is the same, but the chance of obtaining such a
score in the new teaching method group has become less unlikely. There is
thus a bigger chance that we will not be able to reject H0. Not only has this
increase in overlap led to an increase in β, but the power has also decreased.
If, on the other hand, there was no overlap whatsoever between the two
curves, then β would be 0 as we would always correctly reject H0. Conse-
quently, the power of the experiment, the chance of finding the difference
between the two groups, would be 100%.
The power of an experiment is logically related to the number of par-
ticipants in a sample. Theoretically, if the entire population were tested,
we would also be 100% certain that we find an effect that really exists.
The smaller the samples are, the weaker the power and the more difficult it
is to demonstrate an existing effect.
The chance that we find an existing effect is not only dependent on
the sample size, but also on the size of the effect. A big effect can be found
with a limited sample size, but to demonstrate a small effect, we will need
really large samples. The following numbers are suggested for this by Field
et al. (2012, p. 59) based on work by Cohen (1992). Using an alpha level
of .05 and a beta level of .2, we would need 783 participants (per group in
a means analysis) to detect a small effect size, 85 participants to detect a
medium effect size, and 28 participants to detect a large effect size. This
means that to assess meaningfulness based on the power of an experiment,
we will have to calculate the effect size of the experiment and we will have to
be able to interpret the effect size. But what is a ‘small’ and what is a ‘large’
effect size? There are several statistical calculations that can be used for this
purpose. We will discuss this when we dig more deeply into inductive sta-
tistics in Chapters 5, 6, and 7.
For now, it is important to realize that the smaller the effect, the larger the
sample needs to be in order to actually find the effect. In other words, trying
to find a small effect with a limited number of participants may be a waste of
time, as it is like finding a needle in a haystack. At the same time, if we are
looking for a large effect, it is a waste of time to try and find a large sample.
Therefore, it is important for the evaluation of the meaningfulness of a result
that the effect size be calculated. In the past years, it has become more and
58
Statistical Logic
more common to report effect sizes in addition to the chance that the found
difference or relationship was based on chance. In fact, the American Psycho-
logical Association (APA) recommends to always report effect sizes (Vanden-
Bos, 2010). In the following chapters, you will thus also learn how effect sizes
can be calculated for the different statistical tests we will discuss.
59
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
60
Statistical Logic
ACTIVITY 4.10
ACTIVITY 4.11
5.1 Introduction
In Chapter 3, we already introduced two different kinds of perspectives on
data: comparing groups and looking at relationships. In the coming three
chapters, we will elaborate on specific types of statistical tests within, and
to a certain degree slightly beyond, these two categories. The steps that
need to be taken when conducting inferential statistics, as summarized in
Section 4.9, apply to all these tests. Moreover, more or less the same princi-
ples hold for these tests. The only differences between the various statistical
tests concern the researcher’s perspective and, consequently, the scale and
number of the variables involved in the study.
It should be noted that our broad division in the present chapter, com-
paring groups versus investigating relationships, is one possible way of pre-
senting these statistics and some of the types may overlap. For each of the
parametric options, we will also give you the non-parametric alternatives.
Remember that for parametric tests, you will always need to make sure that
your data are normally distributed, show homogeneity of variance, and are
measured on an interval scale. If one or more of these assumptions are vio-
lated, you should opt for a(n) (non-parametric) alternative. In addition, as
we argued in Chapter 4, virtually all parametric statistics are based on the
assumption of linear relationships between variables.
63
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
60
Proficiency Score
40
20
0
0 100 200 300 400 500
Hours of Instruction
Figure 5.1 Scatterplot showing the relationship between hours of instruction
(x-axis) and proficiency score (y-axis)
64
Assessing Relationships and Comparing Groups
The more closely r xy approaches 1 or −1, the stronger the relationship is.
Evans (1996) suggests the following interpretation:
• .00–.19: very weak;
• .20–.39: weak;
• .40–.59: moderate;
• .60–.79: strong;
• .80–1.0: very strong.
A positive correlation, as in Figure 5.1, means that if one variable goes up,
the other also goes up. A negative relationship, however, means that if the
values of one variable go up, the other one goes down. In SLD research, neg-
ative correlations are sometimes reported for the starting age of L2 learning
and the level of L2 proficiency in adulthood. The older a person is when he
or she starts learning the language, the lower the L2 proficiency later in life
tends to be (also see Figure 3.1). Besides the strength of a correlation, the
output of a Pearson r analysis also reports on the significance of the correla-
tion, that is, the estimated chance of incorrectly rejecting the H0 that there
is no relationship, and the 95% Confidence Interval (CI). The CI provides
two values showing that, if we were to repeat the correlation analysis using
different samples, the correlation coefficient would lie somewhere between
the estimated upper and lower bound CI values approximately 95% of the
time. In the example mentioned above, on the relationship between the
hours of instruction and L2 English proficiency, the analysis yields the out-
put in Tables 5.1 for R and 5.2 for JASP.
65
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
From the output in Tables 5.1 and 5.2, it follows that the two variables
in our example show a rather strong correlation of .78 (bottom lines of the
R output and the first figure in the JASP output). The H0 can be rejected,
as the level of significance, that is, the p-value, is smaller than .001 (the
chance of incorrectly rejecting H0 is thus less than 1%) and the 95% confi-
dence interval [0.63, 0.88] shows that the relationship is highly likely to be
a strong positive one.
A common pitfall of correlation studies is that a correlation is interpreted
as a causal relation. Although it is tempting to say that the increasing pro-
ficiency scores are caused by the hours of instruction, this is certainly not
testified in a simple correlation study. The correlated variables are not in
a dependency relation. Therefore, the distinction between dependent and
independent variables is not relevant for correlation studies. To determine
causality, advanced statistical methods are required, referred to as causal mod-
elling, but this type of statistics goes beyond the scope of the current book.
ACTIVITY 5.1
Student R L
1 20 65
2 40 69
3 60 73
4 80 77
5 100 80
6 120 84
7 140 89
8 160 95
66
Assessing Relationships and Comparing Groups
67
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 5.2
Below you see two scatterplots, one with a small sample (left) and
one with a larger sample (right).
A B
Y
X X
decision as well as information on the size of the effect. Therefore, the APA
citation format now also explicitly requires researchers to include effects
sizes in their scientific papers (VandenBos, 2010).
Every test has its own effect size measure and the effect size for a cor-
relation is the easiest, as the r-value itself is already an expression of the
strength of the relationship. This value is sometimes squared to obtain an
effect size that is related to the amount of variance that can be explained by
the variables in our experiment. The explained variance is thus calculated
by taking r 2. This means that if r = .50, r 2 = .502 = .25. So then 25% of the
variance is explained by the variables in our experiment. This is considered
a large effect. Conventionally, the interpretation of effect sizes is as follows:
• r = .1 (r 2 = 1%) is considered a small effect
• r = .30 (r 2 = 9%) is considered a medium size effect
• r = .50 (r 2 = 25%) or higher is considered a large effect
In the example datasets in Activity 5.2, the effect size r 2 of plot A is thus
.982 = 96%. The effect size r 2 of plot B, on the other hand, is .322 = 10%.
So, as the dots in plot A almost follow a perfectly straight line, without a lot
of variance, the amount of variance in this dataset that can be explained by
the relationship between x and y is 96%. It should be clear that the relation-
ship between variables x and y in plot B cannot explain all the data points as
there is more spread in the data.
For every statistical test, we will introduce the most commonly used
effect size measure. The interpretation of these, however, largely resembles
the one given here.
69
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Table 5.3 shows that the participants with a lower SES used ‘haven’t
got’ 58 times and ‘don’t have’ 30 times. The participants with a higher
SES used ‘haven’t got’ 68 times and ‘don’t have’ 62 times. At face value, it
looks as if there is a difference in the use of this construction for the par-
ticipants in the lower SES, but that the constructions are approximately
levelled for the higher SES group. The question we want to answer, how-
ever, is if this difference is significant. What is the chance of incorrectly
rejecting H0?
The statistic that is most appropriate for this type of data is the chi-
square (or χ2) analysis. The chi-square calculates the number of occurrences
in a particular cell relative to the margins of that cell, that is, the total num-
ber of instances of each construction and the total number of participants in
each group. The value of the statistic, χ2, runs from 0 to infinity; the most
common values reported are between 0 and 10. The (simplified) R output
of this chi-square analysis can be found in Table 5.4, with the same JASP
output in Table 5.5.
70
Assessing Relationships and Comparing Groups
haven´t got
150
don´t have
100
Frequency
50
0
lower higher
Reply
Figure 5.2 Barplot of the use of ‘haven’t got’ and ‘don’t have’ by people with a
lower and higher socio-economic status
71
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 5.3
(A + B)(A + C)
TotalFreq
f) Now calculate the expected value for each cell. Are any of these
expected values below 5?
g) The next step is to calculate (FE – FO) for each cell. For some
cells, this is bound to result in negative values. To neutralize
the negatives, square the outcome of each cell: (FE – FO)2.
h) Finally, divide this value by the FE, so
(FE − Fo)2
FE
i) Then add up this value for each cell (A, B, C and D), so
(FE − Fo)2
∑
FE
72
Assessing Relationships and Comparing Groups
73
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
anyone will ever have to calculate the value of t manually, we will give the
calculation in order to provide more insight:
X −X
Group1 Group2
t= (5.1)
SD Group12 SD Group 22
+
NGroup1 NGroup 2
ACTIVITY 5.4
Explain in your own words what the formula for the t-test does.
In your description, do not use technical terms (like ‘standard
deviation’ or ‘variance’) but language that people without a
background in statistics would also understand.
When the value of the statistic has been calculated, the next step is to
interpret that value by determining the probability (the chance of incorrectly
rejecting H0) associated with that value. The simplest way is to feed the data
into a computer program like JASP or R and to read the probability from
the output. To illustrate the t-test, let us return to our example of the Span-
ish and German learners of English that we also contrasted in the boxplots
in Chapter 3 (Figure 3.2). When we compare the two groups using an inde-
pendent samples t-test, R provides us with the output in Table 5.6, which
is slightly simplified here (the output from JASP will follow in Table 5.8).
Before dealing with the more general output of a t-test for both
JASP and R users, we will first briefly look at the R output in
Table 5.6. R users will see that the table first reveals our code for the
74
Assessing Relationships and Comparing Groups
sample estimates:
mean in group 1 mean in group 2
55.18750 71.24242
t-test, followed by the output that starts with the test that we have
conducted: a two sample t-test, which is the same as an independent
samples or Student t-test. It then shows that we have looked at Score
by Group (first line) and that, at face value, the scores in Group 2
are higher than the scores in Group 1 (bottom lines of the out-
put). However, we will need to apply inductive statistics to estimate
the probability of the difference between the groups. In other words,
can the H 0 (that there is no difference between the groups) be rejected
beyond reasonable doubt? The answer to this question can be found in
the first few lines in Table 5.6 or, for JASP users, in Table 5.8, which
contain the most important information in the output for the t-test: the
value for t, for df, and for p, where you can see that the calculated value
of t is – 4.853. For this value, the chance of incorrectly rejecting the H 0
that is related to these specific conditions is .000008348 (R output),
which is smaller than .001 (JASP output). As this value is extremely
small, the H 0 that there is no difference between the means of the two
groups can be rejected. Given the relatively large difference between the
groups (55 vs. 71), this outcome is not surprising. Of course, the chance
of us making a mistake can never be exactly .00 and we therefore report
that the chance of incorrectly rejecting the H 0 is smaller than .001 (i.e.
p < .001).
You may also come across another parametric version of the t-test: the
one sample t-test. Although it is used when the researcher has collected
data from one sample, the goal is still to compare two groups. It can be
used, for example, to compare the mean score of a sample (e.g. the IQ
score of Linguistics students) to a known value (e.g., the average IQ of the
population).
75
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
For R users, it is important to note that the output of Table 5.6 and 5.7
is not identical. When you compare the output here to the simplified version
provided in Table 5.6, you will notice that we previously not only left out the
95% confidence interval for the difference in means, but also that almost all
the values are slightly different. For JASP users, the same holds when com-
paring the first line (the Student t-test) to the second line (Welch’s t-test) in
the output. In Table 5.6, we used var.equal=TRUE in R when performing
the t-test and we have changed this to FALSE in the code in Table 5.7 (also
see the first line that repeats the code to perform this t-test). This code refers
to the assumption of homogeneity, the equality of variances (also see Sec-
tion 4.6). In R, TRUE is used in case your variances are equal and FALSE in
case they are not comparable. When equal variances are not assumed and the
Welch’s adjustment is used, the degrees of freedom (df ) are adapted slightly.1
Since the t-test is especially very sensitive to violations of homogeneity
of variance, and especially in cases with unequal sample sizes, researchers
often first run a Levene’s test to assess whether the variances in the groups are
1 The actual formula for this adaptation is extremely difficult to understand, even to
statisticians (e.g. Field et al., 2012), but relates to power. The more people we test, the
more power our study has. If we cannot benefit from the assumption of homogeneity
of variance and especially if our sample sizes are very different, we have to adjust the
degrees of freedom to be somewhere between the larger and the smaller sample we
are testing. That is what the Welch-Satterthwaite correction to the degrees of free-
dom does for us.
76
Assessing Relationships and Comparing Groups
approximately similar. If this test is significant, this means that the assumption
is violated. Although it is good practice to test homogeneity of variance using
Levene’s test, Welch’s adjustment can easily be used in both JASP and R to
correct for unequal variances. Because of this, some statisticians recommend
using this Welch’s t-test in all situations (also see Field et al., 2012, p. 373).
When comparing the interval scores of two groups of subjects, such as the
Spanish and the German learners of English, we are dealing with one inter-
val dependent variable that might be influenced by one nominal variable (‘L1
background’) with two levels (‘Spanish’ and ‘German’). We therefore have to
conduct an independent samples t-test that is based on a calculation of the
t-statistic (also see the formula in 5.1). As can be seen in Table 5.9, however,
there is also a dependent or paired version of the t-test in which the scores of
the two groups are related to one another. Imagine we are not comparing
Spanish and English learners, but instead we focus on the Spanish learners
by comparing their scores on an English proficiency test before and after
they followed an English course. In this particular example, every learner
has been tested twice and the data of the groups we are comparing are thus
related, dependent, or ‘paired’. As the two groups mostly consist of the same
people, the assumption of homogeneity of variance does not have to be met
for the paired t–test. Additionally, the paired version does not assume that
the data within the two different groups are normally distributed. Instead,
the difference in scores should approximate normality.
Before being able to conduct an independent or paired samples t-test,
we need to look at our data using descriptive statistics and test the assump-
tions. If one or more of these assumptions have been violated, we have to
use (non-parametric) alternatives. As we have seen above, Welch is used
when variances are not equal or homogeneous in an independent samples
t-test. When the data are not normally distributed, the Mann-Whitney
U test is used to compare groups. The most commonly used non-parametric
alternative for the dependent t-test is the Wilcoxon signed-rank test (also see
Table 5.9).2 The Mann-Whitney and the Wilcoxon are generally interpreted
as comparing medians as opposed to means.
2 Do note that the Mann-Whitney is sometimes also referred to as the Wilcoxon rank-
sum test (or even as the Mann-Whitney-Wilcoxon or Wilcoxon-Mann-Whitney test).
77
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
t2
r2 = (5.2)
t 2 + df
So if t = 2 and df = 36, then r 2 = .096 (a medium effect), which means that
9.6% of the variance in the data can be explained by the impact of the inde-
pendent variable. The effect size of the same t-value will be smaller if the
sample gets bigger. For example, if t = 2 and df = 136, then r 2 = .029 (a small
effect), which suggests that only 2.9% of the variance can be explained. At
the same time, larger t-values will result in larger effect sizes if sample sizes
remain the same. The interpretation of this effect size is identical to the one
for correlations explained in 5.2.1.
Cohen’s d is very similar, but reflects the difference in means of the two
groups divided by the average of their standard deviations. So, a value of 1
simply means that the means of the groups differ by one SD and a d of .5
reveals a mean difference of half an SD. The interpretation difference is thus
related to either a difference in terms of standard deviations (Cohen’s d) or
the amount of variance explained (r 2). Although they are both often used,
the values for different effect sizes differ slightly as exemplified in Table 5.10.
78
Assessing Relationships and Comparing Groups
79
6 SIMPLE AND MULTIPLE
LINEAR REGRESSION
ANALYSES
6.1 Introduction
When we are doing research, we are usually interested in testing the dif-
ference between groups or determining the relationships between two var-
iables. However, especially in Applied Linguistics, it is often worthwhile
to predict the outcome based on one or various independent variables.
We might additionally want to know what the exact influence is of the
independent variable and, in case of multiple predictor variables, which of
these contributes most to the outcome. For example, a language teacher
may want to estimate what the influence is of motivation and time spent
doing homework on learning a second language. To find the answer to this
question, we can do a regression analysis.
You will see that regression is very similar to correlations as it can establish
the strength of a relationship between two numeric variables. Interestingly,
regression can also be compared to means analyses, as it is also a technique
that allows us to compare the means of two (or more) groups. In a sense, regres-
sion, and then we are talking about multiple regression (as discussed in Sec-
tion 6.3), allows us to combine what we can do with correlations (Pearson r)
and mean comparisons (t-tests or ANOVAs) and can be seen as a technique
used to predict the value of a certain dependent variable as a function of one or
more nominal and/or continuous independent variables. Interestingly, multi-
ple regression additionally allows us to examine which of these variables con-
tributes most to the outcome by controlling for the effect of the other variables.
As experimental researchers, who traditionally mostly used to choose
ANOVAs (an extension of the t-test that will be discussed in detail in
Chapter 7) to compare various groups, are increasingly using more flexible
and advanced regression techniques to analyse their data, we will spend an
entire chapter on regression. It should be noted, however, that regression is a
very broad field and we will only introduce you to the basics here.
80
Simple and Multiple Linear Regression Analyses
5
4
3
Y
2
1
0
0 1 2 3 4 5
ACTIVITY 6.1
Figure 6.1 shows two perfectly straight lines: one reflecting a positive
correlation and one reflecting a negative one. For every increase in X for the
solid line, the Y value goes up by 2 points. For the dashed line, however,
the Y value goes down by 1 point for every increase in X. The value of Y
thus increases more rapidly in response to a change in X in case of the solid
line as compared to the dashed line. This exact steepness of a line is often
81
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
referred to as the slope and finding this number is exactly what is aimed
for when performing a regression analysis, as it tells us exactly how much
a change in a predictor variable (X) affects the value of the outcome varia-
ble (Y). When performing a regression analysis, the null hypothesis that is
being tested is that the slope equals 0 (there is no relationship or effect of
the IV) and the alternative hypothesis is that the slope is not 0 (there is a
relationship or an effect of the IV).
There is, however, another important difference between the lines,
which concerns their starting point. The solid line crosses the y-axis at a
lower value, 2 to be precise, than the dashed line, which starts at 5. In addi-
tion to the slope, this intercept, that is the point at which the line crosses the
y-axis, helps us characterize and summarize a straight line. Before explain-
ing these terms in more detail, let us go back to the relationship between
hours of instruction and proficiency.
ACTIVITY 6.2
60
Proficiency Score
slope
40
20 intercept
0
0 100 200 300 400 500
Hours of Instruction
Figure 6.2 Scatterplot with regression line through the data
In Activity 6.2, we took the scatterplot and drew a so-called regression line
through the data points. This regression line will be the closest-fitting linear
line that can summarize, model, and therefore predict the data. In Figure 6.2,
the regression line starts around the proficiency score of 19. If we want to cre-
ate a formula for a regression line, we need several elements. First, we need the
starting point, that is the intercept, also called the constant, which is referred
to as b0 in the formula. Secondly, we need to know the exact steepness of the
line, that is the slope, b1 in our formula. So if we want to create a model that
can predict a specific value (i) of our outcome variable (Y), we take the slope
(b1) multiplied by the number of hours of instruction (X) and we add this to
the intercept (b0). The slope and intercept are often referred to as the regression
coefficients. Of course, the actual students in this study deviate from the line
(the model). As you can see, for example, there is one person in the sample who
had exactly 200 hours of instruction. How far is this person approximately
away from our model? This deviation of a particular person from the model
is referred to as the error (ε). So to express the difference between the real data
and the model, we need to add the error of a particular person to the equation
to get the real data points. With this equation (see 6.1) we can then calculate
the outcome on the y-axis for each specific participant (if we know the error).
The equation for the model then becomes:
We can run a simple regression analysis to calculate the exact numbers for
the intercept and the slope. Tables 6.1 and 6.2 show the output from R and
JASP, revealing that the constant or the intercept is 19.572, and the slope is
(almost) 0.100.
83
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Table 6.1 R output table for the linear model (lm) and its
regression coefficients
Call:
lm(formula = proficiency ~ hours_instruction)
Coefficients:
(Intercept) hours_instruction
19.57208 0.09976
Table 6.2 First half of the JASP Output table for the linear model
and its regression coefficients
So, to illustrate the use of such a model using the equation we intro-
duced before, let us fill in the details for the person who had 200 hours of
instruction:
Outcome Yi = b0 + b1Xi + εi
Proficiency scorei = 19.572 + 0.100*200 + εi
The above calculation reveals that, after 200 hours of instruction, the model
predicts a proficiency score of 39.57. Of course, a model is a simplification
and you can see in Figure 6.2 that the person who received exactly 200
hours of instruction scored higher than the model would predict (around
47 or 48).
An important concept in regression concerns these residuals, that is,
the error-term (ε) in the equation above, which are simply the deviations
of each data point from the model and very similar to the deviations away
from the mean that we used to calculate the SD in Chapter 3. In other,
more statistical, words, the residuals refer to the difference between the
observed values (the actual data) and the fitted values (as predicted by
the model). The person who received 200 hours of instruction, for exam-
ple, had a fitted value of about 39, but an observed value of about 48.
84
Simple and Multiple Linear Regression Analyses
ACTIVITY 6.3
Using the formula above, calculate what score the model would
predict for the following people. Take into account that you do not
know the error and you cannot add that to the formula:
John had 300 hours of instruction: what is his proficiency score
according to the model?
Elena had 17 hours of instruction: what is her proficiency score
according to the model?
Belinda has a proficiency score of 50: how many hours of
instruction should she have had according to the model?
Can you think of a problem with these kind of data? What would
have happened to the regression line if you had only measured up
to 200 instruction hours? What do you think would happen if we had
more data from people with more hours of instruction? Would the
slope go up or down?
A 10.0 B 10.0
7.5 7.5
5.0 5.0
Y
2.5 2.5
0.0 0.0
0.0 2.5 5.0 7.5 10.0 0 5 10 15
X X
Figure 6.3 Scatterplots with regression lines through the data with relatively
small (A) and larger residuals (B)
85
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 6.4
The straight lines plotted through the data points represent the best
fitted lines and are also referred to as the regression model. As negative and
positive deviations even each other out, the sum of squared differences is often
used to assess how well the linear line fits the data. The smaller the squared
differences, the better and more representative a model is, and it is exactly
this line with the lowest sum of squared differences that is aimed for when
performing a regression analysis (the so-called least squares approach).
Let us go back to our model that predicted proficiency on the basis of
hours of instruction. We can obtain a full picture of the model in R or in
JASP. The regression formula in R (Table 6.3, box A) and the output of the
models in both R and JASP are shown in Tables 6.3 and 6.4, respectively.
Table 6.3 R summary table for the linear model and its
regression coefficients with (A) a repetition of the code and
the regression formula; (B) the residuals of the model; (C) the
coefficients table and; (D) a summary of the model
Call:
A
lm(formula = proficiency ~ hours_instruction)
Residuals:
Min 1Q Median 3Q Max B
-33.669 -7.310 0.379 6.907 36.162
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.57208 3.79896 5.152 8.25e-06 ***
hours_instruction 0.09976 0.01278 7.809 2.04e-09 *** C
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
86
Simple and Multiple Linear Regression Analyses
Table 6.4 JASP tables for the linear model and its regression
coefficients
87
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
88
Simple and Multiple Linear Regression Analyses
of our data. The nice thing about regression is that we can add multi-
ple independent variables that are interval (as in the example above), but
we can also add categorical independent variables to predict the value of
our dependent variable. The latter is not (yet) possible in JASP, so when
working in JASP, we can only analyse categorical independent variables
with an ANOVA (or a so-called ANCOVA when we also have continuous
independent variables). For the sake of clarity, we will now stick with an
example containing continuous predictors only, but an example of a multi-
ple regression model in R with both continuous and categorical predictors
can be found in the How To unit (for R) on multiple regression on the
companion website.
Imagine that the students in the previously mentioned example not only
had different amounts of instruction, but that they also had varying levels
of foreign language anxiety. The output of a multiple regression model with
this additional explanatory variable can be found in Table 6.5 (R output)
and Table 6.6 (JASP output).
Table 6.5 R output table for the linear multiple regression model
and its coefficients
Call:
lm(formula = proficiency ~ hours_instruction + anxiety)
Residuals:
Min 1Q Median 3Q Max
-33.885 -6.189 0.193 6.620 37.523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.23170 5.77635 5.061 1.17e-05 ***
hours_instruction 0.08982 0.01305 6.880 4.10e-08 ***
anxiety -0.13911 0.06471 -2.150 0.0382 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The R output in Table 6.5 again starts with a repetition of the model
and its code and information on the residuals. After that, we find the coeffi-
cients table, which is the third table in Table 6.6, that can tell us whether the
explanatory variables have an effect on proficiency and how big this effect is.
We find an estimate of the intercept (that slightly changed with the added
variable) and the estimates of the slopes for the two different independent
variables with their accompanying standard error values. As explained in
89
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Table 6.6 JASP output table for the linear multiple regression
model and its coefficients
Section 6.2, the table also provides t-values and the corresponding p-values
for each independent variable, denoting whether the effect is significantly
different from 0.
ACTIVITY 6.5
Can you tell by looking at Table 6.5 or Table 6.6 whether there is a
significant effect of foreign language anxiety?
Remember that the slope (the estimate) tells us how much the
dependent variable goes up or down for every added value to the
predictor. So, from Table 6.5 and Table 6.6, we can deduct that
the proficiency score goes up by about 0.09 for every added hour
of instruction. What is the effect of anxiety? Is this what you would
have expected?
In this particular example, we could say that the base proficiency score
is 29.23 (the intercept). Now, the slope for hours of instruction is 0.09. This
means that for every additional hour of instruction, the proficiency score
goes up with 0.09. So, after 300 hours of instruction, this model predicts
that the proficiency score will be around 56 (29.23+(300*0.09)), which is
slightly different from the prediction based on the simple model. The model
additionally shows that the effect of anxiety is significant (p = .038) and the
direction of the effect is a negative one: with every added value on the used
anxiety scale, proficiency is expected to decrease with a value of −0.14. This
value of the slope can be used to make specific predictions with respect to
90
Simple and Multiple Linear Regression Analyses
ACTIVITY 6.6
Using the equation above and the output presented in Table 6.5 or
Table 6.6, what would be the expected proficiency score for someone
who had 100 hours of instruction and an anxiety score of 30? And
what about their classmate who also had 100 hours of instruction,
but a maximum anxiety score of 120?
91
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
1 Traditionally, the r-squared value is often written in a lower case (r 2) when reporting
the results of simple and linear regression and a capital letter is often used (R 2) for
multiple and non-linear regression. These two values are, however, identical.
92
Simple and Multiple Linear Regression Analyses
A 15 B 15
10 10
Y
Y
5 5
0 0
0 5 10 15 0 5 10 15
X X
Figure 6.4 Scatterplots with regression lines through the data with a
homoscedastic (A) and and a heteroscedastic pattern (B)
The regression line in the left plot (A) in Figure 6.4 obviously presents
a better-fitting model with overall lower residuals than plot B on the right.
Homoscedasticity is, however, not related to the closeness of the actual data
points to the regression line. Instead, it refers to the equality of the closeness
across the entire regression line. In the left plot (A), the residuals form a band
around the regression line: some points are above the line and some are below,
but their deviation from the line is approximately the same for lower and higher
values. In the right plot (B), on the other hand, the actual data points are close
to the line for lower values, but deviate increasingly the higher the values get.
To put it differently, you could say that the model in B is good at predicting
lower values, but it is not good at predicting values in higher ranges. The devi-
ations from the model are thus not equal across the regression line in plot B.
Homoscedasticity is thus similar to the homogeneity of variance assumption
and assumes that the residuals or errors, that is the differences between the
observed values and the ones fitted by the model, vary constantly.
Multicollinearity is very specific to multiple regression and refers to a
situation in which multiple variables relate to the same underlying variable.
Imagine you are testing the effects of proficiency on the motivation level of
students and you have included both listening proficiency and reading pro-
ficiency. The two proficiency measures are likely to correlate and when you
enter both of them as independent explanatory variables, you will likely find
that they both impact motivation level. But how much does each of these
variables contribute? Is reading proficiency a more important predictor for
motivation than listening proficiency, or is it the other way round? The
problem with predictor variables that correlate is that you cannot answer
these questions. Reading correlates with listening and so they are, together,
influencing motivation. In this particular situation, you will not be able to
assess the individual contribution of each of these variables. This is referred
to as multicollinearity and should be avoided.
93
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Finally, the errors in a regression, that is, the residuals, should be nor-
mally distributed (see Levshina, 2015 for a more detailed explanation of
each of these assumptions). Luckily, R and JASP have functions to check
each of these assumptions and they will be explained in detail in Practical 5
and the How To units on Regression on the companion website.
Regression can also be done to predict the outcome of ordinal or cate-
gorical data, in which case you cannot use the regular parametric version.
Imagine, for example, that a group of participants was asked to perform a
lexical decision task in which they saw both actual as well as non-words
pop up on a computer screen. For every word, they had to decide whether it
was a real word or not and they had to do this as fast as possible. When the
question is whether it is possible to predict the response times for the words,
we can use the regression analysis outlined above, but when the question
is about predicting the accuracy for each word, we would need to opt for
logistic regression, as we would be predicting a binary outcome (correct vs.
incorrect).
As you have probably realized, a dataset can influence the model greatly,
depending on what was measured. In addition to a linear regression line to
fit the data, it is also possible to choose to use different regression lines, for
example a logarithmic model. Just make sure that you are aware of the choices
you make. For more information about regression and the more advanced
versions (such as logistic regression and mixed-effects regression, which we
will briefly touch upon in Chapter 8), we recommend Baayen (2008).
The present chapter has given an introduction to regression analyses.
When we are interested in the relationship between two interval variables,
we can use a Pearson r correlation. When, however, we want to predict one
variable on the basis of a continuous predictor variable, we should use sim-
ple regression instead. When the design is somewhat more complicated, for
example because it contains multiple independent variables that are nom-
inal and/or interval, we can opt for a multiple regression analysis. In addi-
tion to revealing which variables significantly impact the values of the
dependent variable, it can also determine how much each variable impacts
the dependent variable. Please remember that when you are dealing with
multiple independent nominal variables and an interval dependent variable,
you can choose either regression or more complicated versions of ANOVA.
This latter group of tests will be discussed in more detail in the upcoming
chapter. For now, we suggest you work through Practical 5 to get some
hands-on experience performing a regression analysis and to practise some
of the other tests that you have learned thus far.
94
7 ADDITIONAL
STATISTICS FOR
GROUP COMPARISONS
7.1 Introduction
When you start to get the hang of research, you will probably start to add
more groups or more variables to your research designs. For example, you
might want to examine the influence of L1 background on proficiency level
in English as a second language, but you want to compare three or more lan-
guage backgrounds. Or you might want to look at the differences between
two groups that receive a different type of instruction while learning French
as a second language, but you additionally want to look at the effect of their
L1. Or maybe you want to investigate the effect of instruction, but you want
to do so comparing vocabulary scores to listening and reading skills. Or you
may want to investigate the differences between these instruction groups at
different points in time on a vocabulary test. In these situations, you would
not only have two levels of your independent variable instruction, but you
would have either an additional independent variable (L1) or more than one
dependent variable.
In the very first example, when comparing three or more groups, you
cannot use the t-test that we used to compare two groups, but you will have
to use Analysis of Variance (ANOVA, also see Table 7.1). Various versions of
the ANOVA exist that can deal with almost all possible combinations of
designs consisting of more than two groups. The second situation, in which
the effects of both instruction and L1 are examined, allows for a multiple
regression with independent categorical predictors, but traditionally most
experimental linguists would choose to perform a two-way ANOVA on these
data (also see Table 7.1). In the third example, vocabulary scores, listening
skills, and reading skills are independent measures of each other, although
they might correlate (the more vocabulary knowledge, the better the listen-
ing and reading skills). This can be tested using a Multivariate ANOVA
(see Table 7.1). In the final example, the different measures of vocabulary
knowledge at different times cannot be called independent from each other,
as they are repeatedly measuring the same thing. The latter should there-
fore be tested with a so-called Repeated Measures design, which treats the
related dependent variables as different levels of one dependent variable.
We will not give detailed examples for all means analyses here; the prin-
ciples and interpretations of these are largely the same as those for the t-test
95
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
and the one-way ANOVA that will be discussed in detail in Section 7.2
below. After introducing the basic one-way ANOVA, we will look into fac-
torial (n-way) ANOVAs and alternatives in case of violations of one or more
of the assumptions.
1 2 1 Independent t
samples t-test
1 1 2 Paired- t
samples t-test
2 Any 1 two-way F
factorial
ANOVA
N Any 1 n-way F
factorial
ANOVA
96
Additional Statistics for Group Comparisons
first languages, say Spanish, Chinese, and Sutu, would each form a different
level of the variable L1 background. The null hypothesis of this study would
be that there is no difference between any of these groups. The descriptive
statistics of this fictitious study are shown in Table 7.2 and a boxplot of the
data can be found in Figure 7.1.
ACTIVITY 7.1
70
60
50
Score
40
30
20
Chinese Spanish Sutu
L1
Figure 7.1 Boxplot showing the spread in scores for the Chinese, Spanish, and
Sutu learners of English
97
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
At face value, the mean proficiency scores for the Chinese learners are the
highest; the scores for the Spanish learners are the lowest. The Sutu are the
most homogeneous (smallest SD), and the Chinese learners show the largest
differences within their group. The question we want to answer is whether
the H0 can be rejected. The appropriate test, as illustrated in Table 7.1, is the
one-way Analysis of Variance (ANOVA). This is a test that calculates F, which
represents the proportion of the variance between the groups to the variance
within the groups, which is very similar to t. To reject the H0, we would obvi-
ously prefer a large difference between the groups, while the variance within
the groups (as expressed by the SD) should be as small as possible:
Variance between groups (7.1)
F=
Variance within groups
This equation shows just this. Like the value of t, the value of F increases
with increasing between-group differences, but decreases with increasing
SDs within groups. So the greater the value of F is, the more likely it is
that we can reject H0. Whether or not we can actually reject H0 not only
depends on the value of F, but also on the sample size (as expressed by df )
and on the significance value. Running a one-way ANOVA in R on the data
of the Chinese, Spanish, and Sutu learners using aov()1 yields the result in
Table 7.3. The outcome of the same test in JASP is shown in Table 7.4.
In this case you will see that there is a df referring to the number of
groups minus one (3 − 1 = 2) and one referring to the number of participants
1 Some might prefer to use lm(), but here we provide an example using aov() to show
the more ‘traditional’ ANOVA output and to be able to do planned contrasts later on.
98
Additional Statistics for Group Comparisons
minus the number of groups (37 − 3 = 34). Tables 7.3 and 7.4 additionally
show that, for an .05 level of significance, the H0 can be rejected. This sig-
nals a difference in English proficiency scores between the groups. But this
is not the end of our analysis, because the output does not reveal whether all
groups differ from one another or whether only one group differs from the
other two. We would thus want to know if the scores for each of the groups
differ significantly from both of the other groups. To test this, it would be
tempting to do three t-tests, one for each L1 pair. However, we cannot use
multiple comparisons on the same dataset, because every time we run the
test we are again allowing the 5% chance of making the wrong decision.
If we ran several t-tests on the same sample, we would therefore ‘capitalize
on chance’ and the eventual level of significance would be more than 5%.
Therefore, any program that can calculate the F for a one-way ANOVA
also provides the opportunity to run a post-hoc test. A post-hoc test does
the same as a t-test, but includes a correction for the multiple comparisons.
A post-hoc test makes two-by-two comparisons between all the levels of
the independent variable. Table 7.5 is the output of a post-hoc analysis for
the current example where we used the Tukey Honest Significant Differences
(HSD) test, which provides ‘honest’ or corrected p-values:
Table 7.5 shows the differences in means for every L1 pair (diff) and also
the lower (lwr) and upper (upr) boundaries revealing that, if we were to repeat
the test on different samples, R estimates that the mean difference would lie
somewhere between those two values in approximately 95% of the cases (95%
CI). The last column contains the adjusted p-values for every L1 pair compari-
son and shows us that the mean proficiency of the Spanish and Chinese learn-
ers differs significantly at p = .012, but that no difference is found between
the Sutu and the Chinese (p = .77). Judging by the p-values, the difference
between the Spanish and Sutu learners is not very convincing with p = .064
and might at the most be reported as a trend towards a significant difference.
The output for the Tukey HSD in JASP can be seen in Table 7.6 and
shows approximately the same outcomes, but with additional information
on the t-value, which can be compared to the t-value in a t-test, and the
> TukeyHSD(Model1)
Tukey multiple comparisons of means
95% family-wise confidence level
$`L1`
diff lwr upr p adj
Spanish-Chinese -14.467949 -26.1237780 -2.812119 0.0122250
Sutu-Chinese -3.333333 -15.2199936 8.553327 0.7725194
Sutu-Spanish 11.134615 -0.5212139 22.790445 0.0635628
99
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Standard Error. Note that the differences between the groups are positive
here, when they were negative in R, and the other way around. This just
depends on which group is subtracted from which group, but makes no
difference for the interpretations of the difference.
ACTIVITY 7.2
The lower (lwr) and upper (upr) bound values in Table 7.5 and
Table 7.6 reveal that, if we repeatedly test other participants
in each language group, the mean difference is estimated to
fall between those two values in approximately 95% of the
cases. If the table had not provided any p-values, would you
be able to use these lower and upper values to assess whether
there is a significant difference in means between the groups?
To put it differently, what is the difference between the lower
and upper values for the case in which the p-value is .012 and
the case with a p-value of .77? And can you explain why the
p-value of the last comparison would be lower if the lower and
upper values were both negative or both positive instead of
containing the 0 value?
Suppose we added an additional independent variable to the
research design discussed in this section, for example gender.
In that case, would you still be able to analyse the data using a
one-way ANOVA? If not, which statistic should you choose? Use
Table 7.1 to help you make a decision.
Looking closely at the output in Tables 7.5. and 7.6 will help you to
interpret how R and JASP calculated the p-values. For the Chinese and
the Spanish, the mean difference is estimated to fall on one side, that is
either positive (JASP) or negative (R), in approximately 95% of the cases. If
all estimated differences fall on the same side, the adjusted p-value will be
below .05. If, however, the upper and lower values include a 0, this means
100
Additional Statistics for Group Comparisons
that the mean difference might also be 0. Hence, the difference is unlikely
to be significant.
If our independent variable contains more than two levels, such as when
we are comparing three groups, we should opt for a one-way ANOVA.
When our design is somewhat more complicated, for example because it
contains multiple independent variables that are nominal, we should opt for
a factorial ANOVA, which will be explained in the next section.
ACTIVITY 7.3
Let us assume you are adding yet another variable to the analyses
in the design described above: gender. What are the dependent and
the independent variables and which statistic should be used to test
the significance of the difference between these groups (consult
Table 7.1)?
101
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
60
50
Score
explicit
implicit
40
30
102
Additional Statistics for Group Comparisons
60
50
Score
explicit
implicit
40
30
Figure 7.3 Example in which there are no main effects, but there is an
interaction between language background and teaching method
103
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
ACTIVITY 7.4
A B
60 60
50 50
explicit explicit
Score
Score
implicit implicit
40 40
30 30
C D
60 60
50 50
Score
Score
explicit explicit
implicit implicit
40 40
30 30
The upper left plot (A) clearly shows no difference between the two
teaching methods, but there is a difference between L1 backgrounds with
Spanish learners scoring lower as compared to the other two groups. Such
a main effect of L1 is absent in plot B in the upper right corner: if you
compare the average score of all Chinese, Spanish, and Sutu learners, there
is hardly a difference. Similarly, the overall average scores of implicit versus
explicit learners are very comparable. This upper right plot, however, does
show an interaction effect, that is a differential effect of teaching method
for the different L1 groups, with Chinese and Sutu learners benefiting more
from explicit teaching methods while the Spanish seem to gain more from
implicit teaching.
The lower right plot (D) shows a main effect of teaching method: when
we average across the L1 groups, there will be an overall higher score for
104
Additional Statistics for Group Comparisons
60
50
Score
explicit
implicit
40
30
Figure 7.4 Example in which there is a main effect of teaching method and an
interaction between L1 background and teaching method
105
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Response: Score
Sum Sq Df F value Pr(>F)
(Intercept) 128400 1 1301.5357 < 2.2e-16 ***
L1 364 2 1.8465 0.1666484
Teaching 1234 1 12.5100 0.0007877 ***
L1:Teaching 2714 2 13.7574 1.208e-05 ***
Residuals 5919 60
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tables 7.7 and 7.8, score is modelled on the basis of both L1 and teaching
method and the asterisk (*) and/or colon (:) signify an interaction between
these two variables. The results reveal that there is no main effect of L1
(p = .167), but there is an effect of teaching method (p < .001). As can be
seen in Figure 7.4, the students receiving explicit instruction did better
overall. This is, however, not the case for the Sutu learners, who performed
slightly better in the implicit group. This differential effect of teaching
method for the different L1 groups is confirmed in the output by the pres-
ence of a significant interaction between L1 and teaching (p < .001). We
can thus conclude, based on the statistical output and the visualization of
the data in Figure 7.4, that the effect of teaching method is not the same
for the different L1 backgrounds: while both the Chinese and the Spanish
seem to perform a lot better when the instruction is explicit, the Sutu learn-
ers show a slightly higher score in the implicit teaching condition instead.
106
Additional Statistics for Group Comparisons
ACTIVITY 7.5
107
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Equivalent most
Regular test Assumption violated commonly used
Independent homogeneity of
Samples t-test variance Welch’s t-test
independent
observations Paired Samples t-test
interval data/
normal distribution/
independent
observations Wilcoxon signed-rank
2 The attentive reader will have noticed that Table 7.9 contains no row for the factorial
ANOVA. The simple reason is that there is no suitable (non-parametric) equivalent
for this test. Luckily, ANOVAs are generally used by researchers who carefully set up
their design. In these cases, ANOVA is known to be able to deal with non-normal data
pretty well.
108
Additional Statistics for Group Comparisons
109
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
In Chapter 7, you have learned about the ANOVA and the more compli-
cated versions of ANOVAs in studies in which you are using several inde-
pendent variables (factorial ANOVAs). We have also given an overview of
the various (non-parametric) alternatives for mean comparisons. We suggest
you now try Practical 6 before reading the final chapter of this book.
110
8 SUMMARY AND
CONCLUDING REMARKS
8.1 Introduction
In this book, we focused on the most basic and most common statistical
tests used within the field of Linguistics and Applied Linguistics. Based
on these chapters, you should be able to understand and judge the statis-
tics done in most studies conducted in the field. In this final chapter, we
will summarize the different statistical tests we have discussed, their most
important aspects, assumptions, and how to report the results of each par-
ticular test. Where appropriate, we will also briefly mention more advanced
statistical tests and provide references to explore such tests for future use.
111
112
Table 8.1 Overview of the tests discussed in this book
“Independent” Variable
“Dependent” Variable
difference)
nominal Chi-Square
Summary and Concluding Remarks
113
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
115
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
the residuals of the model, these assumptions are generally only tested after
fitting the regression model.
The results of a regression are often displayed in a table with additional
information concerning the model and the variance it explained, as in Table
8.5. Consequently, the table is referred to in the report, as in the following
example:
117
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
be italicized (VandenBos, 2010). The careful reader of the report below will
notice that homogeneity of variance has also been implicitly dealt with in
the report by mentioning Welch’s t-test:
118
Summary and Concluding Remarks
120
Summary and Concluding Remarks
121
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
122
Summary and Concluding Remarks
Also remember that research goes beyond being able to analyse data: an
important issue concerns the interpretation of your results in light of the
bigger picture. In the activities, when a result was statistically significant,
we have regularly asked whether you would consider the result meaningful
as well. In most cases where we illustrated research with very few cases, your
answer would correctly have been ‘no’. If we want to say something sensible
about the difference between two groups, it will be obvious that we need
more than about ten participants per group! The number of participants
needed to make an experiment meaningful depends on the β-error, the
effect size, and the power of an experiment, as we explained in Chapter 4.
There we saw that to demonstrate the existence of a small effect, the group
size should be at least about 780 participants. In Applied Linguistics this is
rather exceptional. We can only hope that the effects we are looking for are
large effects, so that we can make do with about 30 participants per group
to make it meaningful.
When the decision is made to conduct a certain study in such a way that
the data can be analysed according to a certain statistical tradition, then
there should be an awareness of the written and unwritten rules that this
tradition brings. It is important to remember that if there is a significant
effect, then this does not directly ‘prove’ your hypothesis; it only ‘supports’
it within the context of the study. As we explained in Section 4.8, we should
interpret the concept of significance with great care for several reasons.
Conversely, if there is no significant effect, then it should be clear that it is
not acceptable to say that there is a difference, but it is also rather prelimi-
nary to claim that this ‘proves’ that there is no difference at all. The absence
of a significant effect could be due to many factors, like limited power and/
or sample size, or a flaw in the study. The only time when we could even
possibly begin to think about claiming to ‘prove’ something is after repli-
cating the same outcome (once or maybe even more than once) and finding
a significant effect in each replication.
Nevertheless, even when all conditions of a statistical study have been
met, the validity is ensured, and the study is sufficiently reliable, the appli-
cation of statistics is no more than one way of evaluating research data.
Although it may be valuable, a statistical study is not the final answer to just
any research problem. There are many ways to analyse data, which makes
the choice of which statistics to use a subjective one. A researcher needs to
give good arguments for why he or she has decided to use a specific statis-
tical measure. And most importantly, careful interpretation of the research
findings is required, which should be reflected in our reports. We should
avoid the all-or-nothing interpretation of significance, always report the
descriptive statistics as well as effect sizes and sample sizes (and preferably
the power), and always be sufficiently tentative in our conclusion.
Finally, all the statistics discussed in this book are limited to the descrip-
tion of a synchronic situation, a measurement at one point in time. For
the investigation of development over time, more advanced techniques and
123
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
statistics are required that all have their own limitations and drawbacks.
One severe limitation of quantitative studies in general is that they are
strongly geared toward generalization of human behaviour. This becomes
obvious when we realize that the very basis of statistical argumentation is
created by the normal distribution. When the emphasis on generalization
becomes too strong, it may obscure the variation of individuals in a sample.
In CDST-based research it is precisely this variation that may reveal the true
nature of language development. We cannot do without statistical methods
if we want to make generalizations about human behavior, but paramet-
ric statistics do not possess the magical power to provide a solution to all
research questions.
With this book we have tried to give a very basic introduction into the
world of statistics. We hope you now feel confident enough to continue
doing statistics by yourself.
We recommend you now perform Practical 7, which contains 8 prob-
lems and the corresponding datasets to answer these problems. You should
be able to conduct all steps necessary to conduct the correct analyses and
answer these research questions in a report that follows the conventions.
The problems are also listed in Activity 8.1 below, so you can already answer
some of the questions needed to determine the correct statistical test to use
for each dataset.
ACTIVITY 8.1
124
Summary and Concluding Remarks
125
PART
2 -R
PRACTICALS IN R/RSTUDIO
GETTING READY TO
START USING R AND
RSTUDIO
Why use R?
R (R Core Team, 2018) is becoming more and more popular, and is used
for processing and analysing data as well as for creating graphics. A lot of
analyses that can be done in R can also be done in SPSS, but R is more
capable of working with larger datasets and performing more advanced sta-
tistical analyses. One of the most important benefits of R is that it is free
software that runs on Windows, Mac, and a variety of UNIX platforms
(e.g. Linux). R is free in that (1) you do not have to pay for it and (2) anyone
can download and modify the code. Moreover, R is actively maintained
and continuously improved and can be extended by add-on packages for
specific purposes that are continuously created and made freely available by
excellent programmers around the world.
When compared to a program such as SPSS or JASP, where the user
primarily has to learn where to find the menu options for the specific calcu-
lations and analyses he or she wants to perform, R has one major disadvan-
tage in that R users normally have to give R assignments in the form of codes
or syntax. This might initially cause frustration, especially for students who
are unfamiliar with programming and programming languages. R therefore
has a steep learning curve, but is invaluable for most (graduate) students in
129
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
130
GETTING READY TO START USING R AND RSTUDIO
Figure R.2 Select Show All Panes in order to see the four windows (in case they
do not show automatically)
When using RStudio, there are four windows that you will use (also
see Figure R.1): in the Console, you can type commands and see output;
in the Editor window, you can write collections of commands and store
them as a file (script and/or output file) so you can use them later and neatly
present your work to others; in the Workspace window, you can see all
the active objects (environment tab), and the history tab shows you a list
of commands used so far; finally, the Plots and files window shows all the
files and folders in your default workspace (files tab) and any graphs you
make (plots tab). The plots and files window additionally contains tabs with
information about packages, a topic that will be discussed in more detail
in Practical 1, as well as tabs linking to online help, manuals and tutorials
(help tab), and local web content (viewer).
In the next section, Practical 1, we will use R as a simple calculator to get
familiar with the program.
131
R PRACTICAL
EXPLORING R AND
1 RSTUDIO AND
ENTERING VARIABLES
(CHAPTER 2)
In this practical, you will become familiar with the statistical program
R and the RStudio interface. You will perform simple calculations, practise
defining variables, enter data in R, and open and save a dataset. You will
also learn to work with a package called ‘R Markdown’ (Allaire et al., 2018)
that allows you to turn your calculations, analyses, and interpretations into
an easy-to-read document. All this will prepare you for the statistical analy-
ses we will be carrying out in the following practicals.
For this practical, we assume that you know how to open RStudio. It
is thus important that you have read, carried out, and understood ‘Getting
ready to start using R and RStudio’ before starting this practical.
Open RStudio, read the instructions, and enter the codes below (typed
in grey in this font) to perform some simple calculations. The grey font
will be used for all code and text that is either typed in or provided by RStudio.
> 11+11
[1] 22
The answer is preceded by [1] which simply indicates that this is the first
(and, in this case also the only) element in your answer.
It is important to note that the commands can be entered into R by directly
copying the examples from the book. The ‘ >’ character in the window is a prompt
produced by R and is not part of the command, so you should not copy that
character!
132
Exploring R and RStudio and Entering Variables (Chapter 2)
Let us say you want to work with the sum of this equation, in this
case the number 22. You might just type 11 + 11 and that would defi-
nitely be doable for this single example, but you can also store your
equation in a variable. In R this is done using the following operator:
<- (or by using the equals sign (=)). In our example this would look as
follows:
When you ask R now to show x by simply typing x and pressing enter it will
show the outcome:
> x
[1] 22
Our object x is now stored in memory, linked to the answer of our calcula-
tion, and can be used as such in all sorts of analyses and calculations. Try
the following:
> 2*x
[1] 44
These variables can in turn also be stored in other variables:
You will see that R keeps track of the variables you created and lists them as
well in the workspace view in the top-right window.
Using objects in this way may seem a bit redundant when you only have
one number and/or one object, but it is very useful when you are working
with large datasets. Programming basically means that you get the com-
puter to do the work for you, something we already did in the above exam-
ples, and this is even more important when you are working with large
datasets. A regular dataset, in which you for example need to look at the
sum of a person’s scores, normally contains at least 20 and maybe even a
million rows of data. It would be very cumbersome if we had to calculate the
sum of scores for each individual person and this is exactly where a program
like R can help.
Before looking at a real dataset, let us look at a sequence of elements of
the same type, often referred to as a vector in R.
We will create our first vector, a simple list of numbers, using the c com-
mand (where c comes from concatenate or combine). Type in the following
code (without the ‘>’ character):
133
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
What R has done now, is create an object called List containing 5 num-
bers. If you ask R to show the list, it will provide all numbers you entered
previously:
> List
[1] 2 4 8 10 12
Now, R does not do anything else unless you ask R to do something. We
could, for example, create a new list in which we add another number
(say 14) without having to type all numbers in again. (Note that your
vectors and their characteristics are again added in the workspace
window.)
> str(List)
num [1:6] 2 4 8 10 12 14
In this case, R tells us that our object called List is numeric (num) and
thus consists of numbers, and that there is one dimension with numbers
containing a total of 6 numbers ([1:6]). If you simply want to know whether
an object is numeric or not, you can also formulate a question to R:
> is.numeric(List)
[1] TRUE
In the example above, R has confirmed that List is indeed numeric by
answering our question with TRUE. A vector can be of various types and
this answer constitutes another important type: the logical type (consisting
134
Exploring R and RStudio and Entering Variables (Chapter 2)
of the values TRUE and/or FALSE). We know our List is not made up of
the values true and false, but we can check this to be sure:
> is.logical(List)
[1] FALSE
If you need help on a particular function, type ?function without the
arentheses. So, for example, type ?is.logical. Based on this command,
p
R provides information on logical vectors in the ‘Plots and files’ window
(the Help tab) in the bottom-right corner of RStudio (also see Figure R.1 in
‘Getting ready to start using R’).
In R, we can use [] brackets to ask for specific information from a data
frame or vector. Try the following:
> List[3]
[1] 8
By adding [3] you are asking R to provide you with the third element in the
vector, which in this case is number 8.
Note that the first element of a vector can be found using the number
1 as opposed to other programming languages in which the first element is con-
sidered to be 0.
It can be useful to create scripts in R containing a series of commands
that can be executed in one go. This script is saved with a .R extension, but
it is basically a simple text file that contains the commands you could also
enter in the R Console. Any line or combination of lines from a script can be
executed by selecting the line(s) and pressing Ctrl+Enter/Return or pressing
the run button (see Figure R.3).
You can add the following code into an R script by typing it in the Edi-
tor window (the upper window in Figure R.3):
135
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Figure R.3 Select the script and press Ctrl+Return or the run button to
execute the code
in Figure R.3. You will see in the Console window, which should resemble
the one in Figure R.3, that R has executed all lines that contain code. When
the cursor is on a specific line and you press Ctrl+Return, it will just run
that specific line.
The above used script shows you how to perform simple calculations
in R (do note that you need brackets as you would in any equation with
several elements). It also shows that you can add text to explain your
script by starting a line with the # character. RStudio automatically
prints this additional information in a separate colour (i.e. green in the
Editor window), making it easy for you to see what in your script con-
stitutes code and what consists of textual information. Adding expla-
nations to your script is very useful for others who might want to copy
your script, but also for yourself. You often forget why you have done
something with your data in that particular way and we highly recom-
mend to add such information in the scripts that you use for your anal-
yses. If, for whatever reason, you later on decide to change one specific
part of the analyses (e.g. by deleting one item) you only have to add one
line and a short explanation and run the entire script again to see your
new and improved results.
136
Exploring R and RStudio and Entering Variables (Chapter 2)
We will soon open a bigger data frame to practise working with these
codes in more detail. First, we will install R Markdown (Allaire et al., 2018),
a package that you will be using to create easy-to-read HTML files contain-
ing all your answers to the R practicals.
> install.packages("rmarkdown")
> library("rmarkdown")
With the library command you should not get a ‘response’ or ‘output’ from
R because you are just asking R to open a library and load the code in the
background, not to give you any information back.
Now that you have loaded the code for the R Markdown package, you
will be able to create a new R Markdown file by clicking on the icon in
the top left and then select ‘R Markdown’ from the list (also see Figure R.4
below) or via ‘New File’ in the ‘File’ menu.
You will be presented with a dialog box as presented in Figure R.5. If
R mentions that this requires a newer version to be installed, click OK to
continue.
1 There are many different packages to help you perform functions, do certain tests,
and help you make better plots. Some of them will be discussed, but please see this
link for an overview of all the packages currently available: https://cran.r-project.org/
web/packages/available_packages_by_name.html
137
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
The dialog also visible in Figure R.5 provides you with the option to give the
file a title (note that this is not the file name) and you can also specify the author
of the document and the output type. Simply add your name under ‘Author’ and,
as mentioned, we suggest using HTML for the output (PDF and Word are the
other two available options). Do make sure that you save the RMD file, that
is, the Markdown file, in the folder where you will perform your analyses from,
otherwise you will have problems loading the data we will use later on. Click on
‘File’ and ‘Save as’ and save the file in a folder (e.g. R Practicals) where you will save
all data and scripts for these practicals and name the file ‘Prac1-yourinitials.rmd’.
Once we have created the file we see the output in the Editor that is also
visible in R.6.
138
Exploring R and RStudio and Entering Variables (Chapter 2)
Figure R.6 Default R Markdown script with the header (box A) and a piece
of code for the file to be made (marked by the square). Box B additionally
shows how to create a new R chunk containing code
The header tells us the title, author, and date of the file (see box A in
Figure R.6). Output tells us what form the output will be in, in this case
HTML. Next we see a code to set some global options for the document
and then, below box A in Figure R.6, a lot of text that can be deleted as we
will create our own. There are three things important to remember when
writing an R Markdown document:
1. # is used to indicate a header (## a smaller header and so on);
2. R code must be written between ```{r} and ``` as in the example below:
```{r}
R code
```
You do not necessarily have to type this: you can also click on ‘Insert’ to
insert such a new R chunk (also see box B in Figure R.6).
3. ‘Normal’ text that you want in your Markdown, you can just write down.
Remove the text in the RMD file in the Editor window that is below the
header and add ‘Practical 1: test’ as the header of the file (use ## Practical 1:
test) and add a simple calculation below the header in an R chunk, for example:
```{r}
20*10
```
139
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Figure R.8 R Markdown script with a line of code asking for the current
working directory
If you run the chunk of code in Figure R.8, the directory you
are currently working in will show up in the Console window as
well as in your HTML output. Make sure this directory corre-
sponds to the one where you saved the data that we will be working
with.
> install.packages("readxl")
> library("readxl")
> data <- read_excel("FileName.xlsx")
Note that you only have to install a package once through the Console.
On the other hand, you will have to load a package every time you are
going to use it (e.g. in your Markdown file).
It is also possible to import SPSS files in R. In order to do so, you
need the ‘memisc’ package (Elff, 2018), and to import a SAV file you
need the following line of code:
141
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Please note that there are almost always several different packages availa-
ble for doing what you want to do. Googling will usually help you make
a well-informed decision.
While Excel and SPSS files require packages, .csv (comma separated
values file) and .txt (tab-delimited text file) formats can be opened with-
out installing packages and we would recommend that you use data in
one of these formats. You can save a file as a .txt or .csv format from
Word, Excel, or most other text editors you may be using.
a. The data file we want to open is a CSV file called ‘Data-Practical1.
csv’. We can do this by adding the following code in an R chunk
(remember: without the “>”):
> dir()
If it is not in the list, locate the file on your computer and copy-paste it
to the working directory.
b. Now that we have imported data into R, we need to check whether
our data have been imported correctly. Using the head() function,
we can see the first six/seven rows of our dataset. Alternatively,
tail() will show us the last part of the dataset.
> head(Practical1)
> tail(Practical1)
If we want to see more than the first 7 lines, for example the first 10
lines, we can add the number of lines we would like to see as follows:
c. For any variable, or file, we can ask R what type of data it is.
> class(Practical1)
[1] "data.frame"
142
Exploring R and RStudio and Entering Variables (Chapter 2)
> names(Practical1)
[1] "participant" "age" "gender" "profsc"
We now have a better feel for the data and what it looks like, but we
are still missing crucial information that is needed in order to fully
interpret and analyse this dataset.
Use str() to look at the structure of the data:
> str(Practical1)
143
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
real difference between integer and numeric is that integer variables are
whole values while numeric variables contain decimals. For this intro-
ductory course, it is not necessary to distinguish between numeric and
integer values. Moreover, R makes sure your data type is stored correctly
as either numeric or integer.
a. Now realize that not all variables in the data frame relate to actual
numbers. We have used numbers to label different levels as for exam-
ple for the variable gender. We have to tell R that gender does not
relate to numbers, but that it is actually a factor, a nominal variable
with numeric labels for the two levels. R needs to know where to find
this variable and we use Practical1$gender to tell R that from the
dataset Practical1 we want to use the variable gender.
The code above tells R to take the variable gender from the data
frame ‘Practical1’ and regard it as a factor.
b. If you now use the str() code again, R will correctly report that
gender is a factor with 2 levels: ‘1’ and ‘2’. We currently do not know
what the numbers ‘1’ and ‘2’ refer to, but this is crucial for an analy-
sis comparing the genders and their proficiency scores.
> str(Practical1)
d. Run your code in the Console in the Markdown file and press ‘knit’
( ) to create an updated HTML file.
144
Exploring R and RStudio and Entering Variables (Chapter 2)
As we will use this dataset again in R during the next practical, we will save
our file as one possible R format file (RDS) that will remember our changes
using the following code:
> install.packages("swirl")
> library("swirl")
> swirl()
145
R PRACTICAL
DESCRIPTIVE STATISTICS
2 (CHAPTER 3)
In this practical you will become familiar with some more functions of
R. You will use the data that you also used during Practical 1 to perform
some first analyses involving descriptive statistics. Please note that one of
the implicit assignments in Practical 1 was to change all variables to the
appropriate type (assignment C-3c). If you have not turned participant into
a factor, please make sure to correct this before continuing to work on that
dataset during this second practical.
Part A
1. CREATE A MARKDOWN FILE and OPEN THE DATA IN R
a. Start a new Markdown file, name it Prac2-yourinitials.rmd, and save it
in the folder where you want to work from (e.g. the ‘R Practicals’ folder
we used for Practical 1). Remember that you need to save your Mark-
down in the folder where you also save(d) the data you will be using.
b. Delete unnecessary text and add an informative heading for the sec-
tions and questions using #-signs (e.g. ‘## Part A’ followed by ‘### 1.
Open the data’, ‘#### 1a’ and so on), to return an output similar to
the HTML one on the right in the viewer of Figure R.9.
c. Open the RDS file we saved last time by using the following code:
146
Descriptive Statistics (Chapter 3)
Figure R.9 R Markdown script (left) with the corresponding HTML output
(right) after knitting the file. The script can be executed in various ways.
You can select the code and press Ctrl+Return, you can click the run button
( ), or you can click on the ‘run current chunk button’ next to your line of
code ( ). The output is obtained by pressing the knit button right above the
script ( )
> mean(File$VariableName)
> median(File$VariableName)
> range(File$VariableName)
> sd(File$VariableName)
> which.max(tabulate(File$VariableName))
147
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
> min(File$VariableName)
> max(File$VariableName)
Or using:
> range(File$VariableName)
148
Descriptive Statistics (Chapter 3)
Your labels are now not really nice, but you can easily change that
using xlab and ylab as in the code below. Additionally, this formula
adds a title above the graph using main.
Using the above formula, you will get means for the Score (profi-
ciency in this case) grouped by a Factor (gender in this case) as the
tilde (~) basically means ‘depends on’. In Chapter 4 of Part 1, we
discuss this dependency relationship in more depth. Try to fill in the
above formula and report on your results.
b. Which group has higher proficiency scores, the male or the female
participants?
c. Which group scored more homogeneously? You can use the formula
above to answer this question, but you need to change the value that
will be reported by the formula.
149
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Part B
1. ENTERING DATA MANUALLY
a. Enter the following 4 datasets in R, and provide the mean, the mode,
the median, the range, and the standard deviation. As there is very
little data in these datasets, you can enter them manually using the c
command we also used in Practical 1 to create our vector called List.
You can simply name the variables ‘a’, ‘b’, ‘c’, and ‘d’.
a. 3, 4, 5, 6, 7, 8, 9
b. 6, 6, 6, 6, 6, 6, 6
c. 4, 4, 4, 6, 7, 7, 10
d. 1, 1, 1, 4, 9, 12, 14
150
Descriptive Statistics (Chapter 3)
from the package, so the second line of code below should be added to
your Markdown file:
> install.packages("psych")
> library("psych")
Part C
We will use a larger sample of data containing information on the motiva-
tion to learn French and the score on a French Proficiency test. The data
can be downloaded either as a ready-to-use R data file (Practical2C.rds)
with correct variable types and structures or as the original CSV file
(Data-Practical2C.csv) in which variable types and labels still have to be
added and altered. Please use the CSV file in case you are up for a challenge.
In case you already had some difficulty, feel free to use the RDS file and
continue with step 1b.
b. Save the RDS file in the folder you are currently working in (if you
have completed step 1a) and open it using the code we used before.
Check the structure of the file (str()) and look at the data itself to get
a feel for the data.
2. DESCRIPTIVE STATISTICS
a. Find out the mean proficiency score, the median, and standard devi-
ation for the group of students as a whole and then for the different
151
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
By using the above formula, you are asking R to create a subset called
‘High’ that consists of the variable Proficiency from the file Practi-
cal2C, but only the scores for the data points for which motivation
equals (==) ‘high’.
Not sure what the names of the levels were? Check them as follows:
> levels(Practical2C$Motivation)
Once you have calculated all the scores, it is important to also report
them in an informative way. In order to create a table in R Mark-
down, you should copy, and potentially expand, the text below and
then replace the x’s by the actual values you calculated. Note that
this should not be added in your Markdown file as code within an R
chunk, but just as plain text.
152
Descriptive Statistics (Chapter 3)
If you were to use the above code without prob=TRUE, you would
get a histogram with the raw frequencies that together add up to the
total number of occurrences in the dataset. By adding the code, we
make sure to plot probabilities instead of counts. Probability, or den-
sity plots, are often more useful when looking at distributions as they
are unaffected by the number of bins or bars used.
b. Add a distribution curve to your histogram by adding the following
line directly beneath your histogram function:
153
R PRACTICAL
CALCULATIONS USING
3 R (CHAPTER 4)
In this practical you will review some descriptive statistics and you will
learn to get a first impression about the normality of a distribution. Finally,
you will do some first ‘real’ statistics.
Part A
In the file ‘Data-Practical3a.csv’, you will find the results of the English Pho-
netics I exam of the student cohort 2000 in the English Department in Gro-
ningen. The scores are specified per question (Q1, Q2, etc.), and we are going
to assume that these scores are measured on an interval scale. Since these are
the real results, we have replaced the names by numbers for discretion. The
questions below all refer to this file. Create one Markdown file containing all
your answers and in which you also show and explain informative tables and
graphs. We know that some would prefer to just copy-paste code and move
on to the next question, but please always also answer the questions based on
the output by reporting on it in the text you add in the Markdown.
154
Calculations using R (Chapter 4)
4. USING Z-SCORES
a. Calculate and report on the z-scores of the TOTAL scores of the fol-
lowing students: 11, 33, 44, and 55. You can calculate z-scores in R by
using scale(). So in our case it would be something like:
This calculates z-scores for the whole column and will also print all
z-scores in your output. You can use this list and look for the correct
students, but you can also add the z-scores to your existing dataset in
a column called z-score, and subsequently create subsets as we did at
155
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
the beginning of this practical to find the 4 students. You can create
the new column as follows:
Apart from seeing the values for skewness and kurtosis you also see
values for Skew.2SE and Kurt.2SE. These are the values of skewness
and kurtosis divided by two times the standard error (SE) of the dis-
tribution (note that this is different from the SE of the mean). You
do not have to know exactly how this works, but for samples that are
quite small (up to say, 30), we can assume that values of Skew.2SE and
156
Calculations using R (Chapter 4)
Can we say that the data of the two teachers are normally distributed?
> shapiro.test(File$Variable)
Do note that you will need to run this test for both teacher groups!
With the above code, we are testing whether the distribution for
each group is different from the normal distribution. If the significance
value is (well) above .05, then you can assume that the data are nor-
mally distributed. In this case, do the data show a normal distribution?
Samples > 30
Check Samples < 30 and < 200 Samples > 200
Histogram Good to check, but Good to check Very important to
will probably not look check because it
normally distributed will give you the
best information
Skewness and - Between -1 and 1 -
kurtosis
Skew.2SE and Between -1 and 1 Between -1.29 -
Kurt.2SE and 1.29
Normality tests Shapiro-Wilk Shapiro-Wilk -
158
Calculations using R (Chapter 4)
The F value seen in the R output is the outcome of the Levene test
(similar to the ANOVAs F-value that will come back in later practi-
cals) and the most important part for our purposes is expressed under
Pr(>F). If this value, that is, the significance level, is smaller than .05,
you can NOT assume equal variances. If it is bigger than .05, you can
assume equal variances.
In this case, are your groups equal in variance?
159
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
In the above formula, you will recognize the ∼-operator to signify that
there is a numeric (y) dependent variable or outcome of which the
result may depend on a binary (x) variable, factor, or predictor. Which
variable is the factor or predictor variable, that is the independent?
And which one is the dependent variable?
The final part (var.equal = TRUE/FALSE) of the code can be
added to specify equal variances. If Levene’s test was significant, and
equal variances cannot be assumed, the var.equal should be set to
FALSE. If, however, equal variances can be assumed, you can add
TRUE instead. If we leave it off completely then the default FALSE
is used, which is mostly considered to be the more conservative and
hence the better option anyway.
The output shows us some interesting numbers, but it also contains
some information that is currently redundant. We see the p-value,
which is the chance of incorrectly rejecting the null h ypothesis (the
chance of getting an alpha error!). If you added var.equal =
FALSE, you will notice that the degrees of freedom (df ) is a funny
number because it is 127.08. Do not worry about this: it is due to the
way it is calculated in R (using Welch’s test for unequal variances).
What is the chance of incorrectly rejecting the null hypothesis?
b. What is the conclusion you would draw with regard to the research
question in 2b? Would you reject the H0? What is the chance of
incorrectly rejecting the H0, and what does this mean? Is your con-
clusion about the H0 in line with what you would expect from the
descriptives?
In this second version, the ∼ is replaced by a comma and the choice of one
or the other depends on the way in which your data is formatted. If your
data has one column for the IV distinguishing groups and one column
for the DV scores (the more common long format), then you should use
the first version with the ∼-operator. If, however, your data contains one
column with all scores for group 1 and another column containing all
scores for group 2 (also known as the wide format), such as y1 being
the scores for teacher A and y2 containing the scores for teacher B, you
should use a comma as in the second version instead.
We have now practised the most common version of the t-test, but
you will only have to change the code slightly to perform one of the
other versions. If your aim is to perform a paired samples t-test, the only
160
Calculations using R (Chapter 4)
For a one sample t-test, you would compare your variable to a theoretical
mean μ (mu), which by default is 0 (but can be changed to any value by
simply replacing the 0):
> t.test(File$VariableName, mu = 0)
The interpretation is almost identical for all three versions of the t-test.
Part B
The file ‘Data-Practical3b.csv’ contains the results of a vocabulary test
(interval scores) for participants from two different motivation levels. The
data result from an experiment in which motivation was a nominal inde-
pendent variable and vocabulary score an interval dependent. Using all the
tools and knowledge you have used so far, determine if there is a (significant)
effect of Motivation on the vocabulary scores and report on it. Please make
sure to turn the motivation variable into a factor first. Also: do not forget
to look at the descriptives of your data, to plot the data, and to include your
interpretation of the effect in the report.
161
R PRACTICAL
INDUCTIVE STATISTICS
4 (CHAPTER 5)
In this practical we will take the next step in applying inductive statistics.
You will do a simple means analysis and a correlation analysis. You will
also learn how you should report the results of these statistical calculations.
This practical contains two more advanced assignments on correlation for
reliability.
Create a new Markdown file and add an appropriate heading for part A.
Student R L
1 20 65
2 40 69
3 60 73
4 80 77
5 100 80
6 120 84
7 140 89
8 160 95
We will enter the above data in R manually using the long format, that
is each row is one student, and you can follow the instructions below. So
these are the steps you need to carry out:
• Create a variable called Student using the following code:
162
Inductive Statistics (Chapter 5)
5. At face value, do you think reading and listening, as plotted in the graph,
are related?
6. We want to know if we can conclude that reading skills and listening
comprehension are significantly related. To determine this, you will
have to calculate a Pearson r (or r xy). Before you do this, however, you
should realize that Pearson r is a parametric test that requires your data
to be normally distributed. As the sample is very small, a histogram will
probably not look normally distributed. For small samples (n < 30) it
is better to look at skewness, kurtosis, and the Shapiro-Wilk (also see
Practical 3A). Are the data approximately normally distributed?
7. If the assumption of normality has been met, please continue to perform
the Pearson r using the following code:
163
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
about the strength of the relationship, some people prefer to also calculate
r-squared, which is simply obtained by squaring the r-value.
What is the value of r xy? Is this a strong correlation? What is the
chance of incorrectly rejecting your H0? What do you decide? What is
the effect size?
8. When reporting on results like these in scientific journals, there are
particular rules and regulations on what and how to report. Here, we
follow the most recent guidelines of the American Psychological Asso-
ciation, which states that we should report the value of the test statistic
(r in this case), the degrees of freedom, the exact p-value (unless it is less
than .001), and the size and direction of the effect (VandenBos, 2010).
Based on this, write a sentence that you could include in the Results
section of an article about the outcome of your test. It will be some-
thing like this (please choose an option or fill in the correct numbers
between the {accolades}):
As you can see in the sentence, both the direction of the result AND
the significance or p-value are reported. Note that the r is only for the
Pearson r analysis. When you do a Spearman’s Rho, you preferably use
the Greek symbol ρ (or write out rho), and when you do a Kendall’s Tau,
you preferably use the Greek symbol τ (or write out tau). In Markdown,
such symbols can be created by typing: $\rho$ or $\tau$.
Apart from the r-value and the p-value, you will see that there is
another value mentioned in between brackets behind the r. This number
refers to the degrees of freedom (df ). In a correlation analysis, you can
simply calculate the degrees of freedom by taking the number of partici-
pants -2. The lower and upper limits of the 95% confidence interval can
be taken directly from the output.
Do not include entire R output in your results section. The results
from the table are usually reported in the text, as in the above sentence.
It is very helpful, however, to add the scatterplot to your report that you
refer to in an additional sentence in which you also explicitly mention
the direction of the effect. Conventionally, charts are included for sig-
nificant results only.
Questions:
1. List the variables included in this study and, for each variable, say what
its function is (dependent, independent, etc.) and its type (nominal,
ordinal, interval).
2. How would you formulate H0 and Ha?
3. Which statistical test could be used?
Please note that there are two ways in which data can be entered for our
analysis. The first and usually most common option also for other datasets
is when the data are organized for each individual case. An example of this
type of data organization would the format in Table R.4, which is generally
referred to as long format (such as the data used in Part A of this practical).
The test we will be doing is also very easy to replicate afterwards, because
it does not require the whole dataset; a contingency table as in Table R.3
with the total frequencies for each of the cells is enough.
4. The example code below shows exactly how to add the data in R using
the contingency table format. You start by creating a data frame, using
the cbind-function to combine rows and columns. Every c-element
165
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
in the list represents a column, while every number within the c-ele-
ment represents the value of a row for that specific column. Enter
the data into R by using the following lines of code based on cbind() to
put (or bind) numbers into columns:
If you simply type the name of your table, Table, to display it, you will
see that the columns and rows do not have names yet. We can change
that by adding the correct names in these codes:
This will provide the chi-square value (chisq) and the expected values.
The output contains a table with many versions for our cell frequencies,
and below you will find the actual chi-square value (you do not need the
one with Yates correction).
Although we are dealing with a non-parametric test, we do have to check
some assumptions before conducting the actual test. One assumption is that
every subject only contributes to one of the cells, which can normally be
checked by comparing the number of subjects to the total of all cells. In this
particular example, you can assume that this one has been met.
Secondly, as mentioned in Section 5.2.3, in a 2×2 table, none of the
expected frequencies in the table should be lower than 5. Do note that
in a larger table, the expected counts must be at least 1, and no more
than 20% of the cells are allowed to be less than 5. If your expected cell
frequencies are below 5, you should look at the outcome based on the
Yates correction.
You should be able to find the expected values in the output pro-
vided by the above code. You will get quite a large output table, but note
that the very first part explains what we can find in each cell, which
includes the raw and expected counts. Has the assumption concerning
the expected frequencies been met?
6. The actual results of the chi-square test can be found in the very bottom
line of the output. Can you reject the null hypothesis?
1 You can also use R’s built-in chisq.test(), but we chose CrossTable() as it
provides more details on the expected counts and percentages as well.
166
Inductive Statistics (Chapter 5)
7. What is the effect size? You can use assocstats() from the ‘vcd’ pack-
age (Meyer et al., 2017) and look for phi – φ (for variables with 2 levels)
or Cramer’s V (for variables with more than 2 levels). A value of .1 is
considered a small effect, .3 a medium effect, and .5 a large effect.
8. A template for reporting the results of a chi-square would be (please
choose an option or fill in the correct numbers between the {accolades}):
You might also want to add a legend to explain the colours in your
graph:
If you look up barplot in the help menu, what other way is there to add
a legend to the barplot?
9. What happens if you remove beside=TRUE from the code (or change it
to beside=FALSE)?
167
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Data:
Girls Boys
17 16
16 15
14 13
19 19
18 15
17 14
16 13
15 12
16
15
19
2 It is always good to plot a histogram of the data because it gives you a good impres-
sion of the spread of the scores. However, with samples that are smaller than, say, 30,
the histogram is not the best way to check normality. For this, we really need to look
at the values for Skew.2SE and Kurt.2SE , and the Shapiro-Wilk outcome (also
see Practical 3A).
168
Inductive Statistics (Chapter 5)
t2
r2 =
t 2 + df
169
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
our t-test using the procedure below. Following Field et al. (2012,
p.385), we will store our t-test output as an object by using the fol-
lowing code (you can simply use your previously used t-test formula
for this):
We then store both the value of the t-statistic, which is stored in our
output as a [statistic], and our degrees of freedom, which is stored as
a [parameter]. Use the following code to save those values as t and df:
> cohen.d(FileName$DependentVariable,
FileName$IndependentVariable)
Note that the ‘psych’ package also has a cohen.d code, so when you
load the ‘effsize’ package while ‘psych’ has already been loaded in the
library, you will get the message: The following object is masked
from 'package:psych': cohen.d. In this way, R gives priority to
the package that was loaded last if there are overlapping functions.
Use d or r 2 to report on the effect size for this study? What would
that mean for the number of participants you need to get enough
power? See also Section 5.2.2 in Chapter 5 of Part 1.
12. Reporting on results of statistical studies has to be done according to
fixed conventions. It is important to include descriptives per group as
well as the important statistical values (t, df, p, r 2/d). Also note that
statistical notations should be reported in italic (VandenBos, 2010) and
that it is common practice to include a (reference to) a plot in your
report. Below is the format that you should use for t-test results. Please
choose an option or fill in the correct numbers between the {accolades}:
170
Inductive Statistics (Chapter 5)
13. What can you say about the meaningfulness of this outcome?
14. Is there any additional information you would like to have about this
study?
> wilcox.test(DV~IV)
In case of a paired version and a violation of normality, you should use:
171
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
> psych::alpha(VariableName)
The psych:: part makes sure that R uses alpha from the ‘psych’ pack-
age. This is important because there is also an alpha function in the
‘ggplot2’ package (Wickham, 2016) which will have priority over the
‘psych’ package if that package was installed first. So just to make sure,
add the package name when a function is present in multiple packages.
3. The output we get is quite big so let us work our way through it. The
value of alpha at the top is Cronbach’s Alpha, which tells us the overall
reliability of the variable. As you can see there are two alphas, but you
should look at the raw alpha. Do you think this is a reliable test?
4. Now we will check the individual items. The column raw_alpha in
the table below the one we just discussed gives us the alpha statistic if we
were to delete each item in turn. We basically want to find those values
that are greater than the overall alpha value. Would removing any of the
items substantially improve the reliability of the test?
5. The next table in our output provides us with more information about
each item. The column labelled r.drop will tell us what the correlation
would be between that particular item and the scale total if that particu-
lar item was not part of the scale total. So basically, this is a correlation
such as the one we have seen in part B of this Practical, and you might
recognize the r in the second/third column and look at that one. This
regular r-value, however, is problematic because the item we are inter-
ested in is included in the scale total here. This means that, of course,
there will be some kind of correlation because an item will always corre-
late with itself. In short: we have to look at r.drop as opposed to r.
But what does this statistic tell us and what should we look out for? Sim-
ilar to ‘normal’ correlations, the higher the value of r.drop, the higher
the correlation with the other items. And that is what we want: for the
item to correlate with the overall score from the scale. So as a rule of
thumb, r.drop values below .3 are problematic and the item should be
removed (Field et al., 2012).
Which items should be removed because of problematic r.drop
correlations?
172
R PRACTICAL
REGRESSION/
5 MISCELLANEOUS
ASSIGNMENTS
(CHAPTER 5/6)
This practical consists of three assignments, two containing tests that you
have not yet performed and one that should be at least somewhat familiar
to you.
Questions:
173
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
4 32
5 33
4 28
7 48
3 24
4 24
6 32
7 41
7 42
7 38
6 42
2 16
3 18
1 16
5 36
4 34
10. This part of the assignment is slightly more advanced. As you have read
in Part 1, to get the power of 0.8 (1-β), we need about 28 participants
to get a large effect. You now know how to find effect sizes, but you
can also calculate power in R, by using the ‘pwr’ package (Champely,
2018). For the sake of the calculation, you can assume that the outcome
of your statistical test was a parametric one, and you can fill in the
observed statistic as if it were the parametric one.
174
Regression/Miscellaneous Assignments (Chapter 5/6)
1. Download the data file from the website, save it, and inspect it.
2. What are the (independent and dependent) variables and what kind of
measures (nominal, ordinal, interval) are used for the variables?
3. Formulate the relevant statistical hypotheses.
4. Plot the data in a scatterplot with the independent variable on the x-axis
and the dependent variable on the y-axis. What do you see in the plot?
5. Which statistical test could be used to predict score on the basis of age?
6. Apply the statistical test you chose using the following code:
Remember that R only provides information when you ask for it. You
will thus have to ask for a summary() of the model to get the results.
The summary() code will provide the regression coefficients and corre-
sponding significance levels for the different coefficients. It also provides
the effect size, an F-value, and the degrees of freedom for the model you
built.
7. Can you reject H0? If you have a problem interpreting the results, the
explanation in Winter’s tutorial (2013) might help.
8. What is the effect size?
9. Remember that this is not the end of the story; you have to check the
assumptions! You should:
a. Check whether the relationship is linear by plotting the data (you
can assess this on the basis of the scatterplot you made before);
b. Assess whether the residuals all deviate in a similar way from the
model (‘homoscedasticity’), for which you can use the following
code to plot the fitted values on the x-axis and the residuals on the
y-axis:
175
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
> ncvTest(model)
> qqnorm(residuals(model))
The above code plots the residuals of our model in a sorted order
against data from a normal distribution. In other words, two sets
of quantiles, the quantiles from our data and quantiles from a nor-
mal distribution, are plotted against one another. This is why this
plot is referred to as a Q-Q or Quantile-Quantile plot. The dots
should approximately follow a straight line if both of the plotted sets
of quantiles come from the same distribution. You can additionally
check normality of the residuals by using a Shapiro-Wilk test.
10. Report on the results of this study in the way that it is conventionally done
in research papers. Mostly, you will see that researchers also report on the
F-value to say something about how well the model accounts for the varia-
tion in the dependent variable and whether the model built was significant
(also see Winter, 2013), but that is obviously not enough. According to
Field et al. (2012), it is best to report the (unstandardized) coefficients,
which are the estimated effects of the predictor variables, and all associ-
ated values in a table (including standardized betas). For now, we will use
a (slightly simplified) table with the intercept, the regression coefficients
(‘estimates’ in the R output), their standard errors, t-values, and p-values
in your report. Note that this is basically the output you already obtained
when asking for a model summary. You will often see (both in articles as
well as in R output) that significance is denoted by asterisks with:
176
Regression/Miscellaneous Assignments (Chapter 5/6)
Intercept
Factor
Below you will find an example report that you can use:
Plot the data in a scatterplot with different colours for the people in the
different groups using the following code from the ‘ggplot2’ package
(Wickham, 2016):
Take a look at the output and please note that the effect of a nominal/
categorical predictor in a regression model is always more difficult to
interpret than the effects of continuous variables. We have a variable
with two levels (‘guided’ and ‘none’), but only one level is shown in the
output (‘typenone’). This reveals that R used the guided writing group
as a reference to compare the group who received no instruction to.
Can you reject H0?
8. We have not tested all potential effects yet. To add the interaction term,
we can use the following code (remember to use a different name for this
model, so you can compare the two):
The asterisk makes sure to test the interaction. For variables added in
an interaction, the main effects are also always calculated, so we do not
have to add those separately.
Can you reject H0?
9. As you know, interpreting interactions works best on the basis of a plot
comparing the effect of one independent variable within the levels of the
other independent variable. A useful package for this is the ‘visreg’ pack-
age (Breheny & Burchett, 2017). Install it, load it, and use the following
code to plot the interaction:
178
Regression/Miscellaneous Assignments (Chapter 5/6)
The xvar part will add IV1 on the x-axis and the by=IV2 part will
create separate graphs for the levels of the variable that you enter there.
10 What is the effect size?
11. One additional and very useful way to find out whether the addition
of a variable, or in this case an interaction, makes your model better
is to use ANOVAs to compare models that differ only with respect to
one added variable (see for example Baayen, 2008, Chapter 6, p. 183
onwards). You could, for example, compare the model without the
interaction to the model with the interaction with an ANOVA using
the following code:
The result of the ANOVA shows the RSS, the residual sum of squares,
which refers to the amount of variance in the data that cannot be
explained by the model. The lower the RSS value, the better the model
is at explaining the variation in the dependent variable. What you really
want to know, however, is whether the addition of a variable or interac-
tion term significantly improves the model, which can be interpreted on
the basis of the p-value. Is the interaction term a significant improvement
or is it better to stick with the simpler model with only the main effects?
12. Remember that this is not the end of the story: we have to check the
assumptions! You should:
a. Check whether the relationship is linear by plotting the data;
b. Assess whether the residuals all deviate in a similar way from the
model (‘homoscedasticity’);
c. Assess whether the residuals are normally distributed by creating a
histogram or Q-Q plot of the residuals. If you have forgotten how
to do this, please check Practical 5B.
d. For multiple regression, it is additionally important to check
(multi)collinearity. As explained in Section 6.4, this is generally
best assessed using common sense. You could, however, also check
(multi)collinearity in different ways depending on the measurement
scales of your variables:
i. If your IVs are interval, create a scatterplot and perform a
Pearson r;
ii. In case your IVs are nominal, create a barplot and perform a chi-
square test;
iii. In case you have one nominal and one interval variable, check-
ing multicollinearity becomes a bit more problematic. We added
some solutions to this in ‘How To: Multiple Regression Analysis’
on the companion website.
179
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
We would not want your IVs to correlate and so, in all the above situa-
tions, we would not want to obtain significant test results.
In addition to the above solution, you may also use a test for this
using the code below from the ‘car’ package (Fox & Weisberg, 2011).
Do make sure to leave out an interaction, however, as an interaction
term always correlates highly with the separate factors as well.
> car::vif(model)
The general rule of thumb is that the VIF-scores should preferably not
exceed 5, but definitely not exceed 10.
13. Now report on the results of this study in the way that it is convention-
ally done in research papers (also see Part B).
Note that, while the unstandardized coefficients (i.e. the estimates) that we
obtained suffice for us, especially since we are including an interaction term,
some journals suggest that you include the standardized coefficients in the report
as well if the model only contains main effects. As mentioned in Section 6.3 of
Part 1, checking and reporting on these standardized beta coefficients is espe-
cially useful in case of a multiple regression, as these values allow us to assess the
relative importance of the different predictor variables. Fortunately, standard-
ized beta coefficients can easily be obtained using the lm.beta() function from
the ‘QuantPsyc’ package (Fletcher, 2012).
180
R PRACTICAL
MORE ADVANCED
6 GROUP COMPARISONS
(CHAPTER 7)
In this practical you will carry out some special versions of the t-test and
the ANOVA.
181
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
no instruction lectures GW
34 65 68
58 54 87
56 43 94
47 57 69
35 65 81
31 49 75
55 74 94
65 79 78
61 54 63
27 65 78
Questions:
1. List the variables in the study – if relevant, say which variables are
dependent and which are independent.
2. What kind of measures (nominal, ordinal, interval) are used for the
variables?
3. In the case of independent variables, how many levels does each inde-
pendent variable have?
4. Add the data to a data frame in R. You can do this manually in R by
following the steps below (or you can save the data as a CSV file first
after entering it into Excel):
a. Use seq() to enter a sequence with numbers for all participants;
b. Remember that we advise adding one column for each variable,
which means that you cannot import the data in the same format as
182
More Advanced Group Comparisons (Chapter 7)
Use the c-command that you used before to enter Type and Score as
separate columns.
c. Put all variables together in one data frame as you did in Practical
4A, and do not forget to check whether the data was entered cor-
rectly with labels added to the factor type when necessary.
5. Formulate the statistical hypotheses.
6. Which statistical test could be used?
7. Create a boxplot for your data; what do you see?
8. Provide a table with the following descriptive statistics for each group:
mean, minimum, maximum, and standard deviation. In order to do
so, you need to tell R to look at the scores for each type of instruction.
You can create subsets, as you did in Practical 2, but you can also use
the by() function (remember that R needs to know where to find your
variables!):
The above code will provide you with all the values from the ‘describe’
function you also used before. If you prefer, you can also ask R to only
provide a value such as the mean or the SD.
183
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
> summary(Test)
After you save the test, you can simply check a summary of the output
(summary(Test)) by entering the object name you chose, as in the sec-
ond line of code above. Can you reject the H0?
Now that we know whether the model is significant or not, we need
to carry out some post-hoc contrasts. After all, we do want to know
which of the groups differ significantly from which of the other groups.
When there is homogeneity of variance, and the sample sizes are the
same, Tukey’s post-hoc test can be done. To use this test, you can simply
type:
> TukeyHSD(Test)
The output table will help you to assess whether the groups differ signif-
icantly from each other. The column labelled diff provides the differ-
ence in means for the two groups.
What to do in case of violations of normality or homogeneity?
If you want to compare the scores of more than two groups and the
assumption of normality is violated, you should opt for a non-parametric
Kruskal-Wallis and the code for this test would be:
> kruskal.test(DV~IV)
> oneway.test(DV~IV)
10. What is the effect size? You can use the calculation in Chapter 7, but
you can also use the following code where ‘m1’ is the name of the object
containing the ANOVA output:
> summary.lm(m1)$r.squared
184
More Advanced Group Comparisons (Chapter 7)
Do note that this is the r2 value. To get the r value, you need to take the
square root of this value. Note that in statistics papers, you might also see
other effect sizes, such as eta-squared (η 2), omega-squared (ω 2) or Cohen’s
f. Eta-squared is exactly the same as r-squared, so when you read eta-squared
you can just interpret it as you would interpret r2. Omega-squared is said
to be a bit more reliable, since eta-squared tends to be biased with small
samples. Cohen’s f, like Cohen’s d, is a bit more difficult to interpret, but
can be used to calculate power with the ‘pwr’ package that was used in Part
A, question 10, of Practical 5. To check the power of our ANOVA study,
however, we need the effect size Cohen’s f and we can obtain that using
the ‘sjstats’ package (Lüdecke, 2018). Once we have obtained that value,
we can do a power analysis using the ‘pwr’ package (Champely, 2018).
We have to fill in the number of groups (k), the number of participants
per group (n), the value of Cohen’s f (f), and the significance level (i.e. the
p-value). Do remember to install and load the appropriate packages. If you
are interested in performing this analysis, we recommend you have a look
at the ‘sjstats’ package (Lüdecke, 2018) and the information on his website
(Lüdecke, 2017): https://strengejacke.wordpress.com/2017/07/25/effect-siz
e-statistics-for-anova-tables-rstats/.
11. What can you say about the meaningfulness of this outcome?
12. Report on the results of this study in the way that it is conventionally
done in research papers using the following format for main effects:
And the following format is an example of how you could report the
actual group differences according to the post-hoc comparisons:
Note that the report always contains descriptives per group, the
important statistical values (F, df, p, and η 2), as well as an explicit
185
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
interpretation of the direction and size of the effect and a (reference to)
a boxplot or a descriptives plot.
In the above example, we are reporting SD instead of SE. Which one
to choose often also depends on the criteria of the journal in which you
will publish your work.
1. Download the data from the website and open and inspect the data.
2. What are the variables, what are their functions, and what kind of meas-
ures (nominal, ordinal, interval) are used for the variables?
3. Formulate the relevant statistical hypotheses (note that you need one
pair of hypotheses for every independent variable or combination of var-
iables (i.e. potential interactions)!).
4. Which statistical test could be used?
5. Now before performing the test, we would like to create a table with
descriptives and a plot to visualize the data. Remember that we are now
not only interested in main effects, but also in interaction effects, that
is combined effects of the two independent variables. You know how to
retrieve the means for the two levels of each independent variable, but
we would like to get the descriptives for every combination of the levels.
In order to achieve this, you have to slightly adapt the by() code we used
before to include a list() function that will split up our data into four
groups based on the combination of our two independent variables. Use
the following formula as a basis:
6. The easiest way to put all four groups into one boxplot would be to use:
There is, however, another way to create a nice boxplot and for that
we need the package ‘ggplot2’ (Wickham, 2016), which was also
used to create most of the Figures in Part 1 of this book. So, we will
install the package in the Console and load it in our Markdown
file.
Now fill in the correct names and use the following function follow-
ing the explanation above on where to put each variable (you do not have
to fully understand it, but do give it a try!):
For a first try this does look nice! However, this plot is not yet exactly
what we want. First of all, let us alter the labels by adding the following
directly after the formula for the graph:
> +ggtitle("Title")+labs(x="NameXAxis-IV1",y="NameYAxis
- DV", fill = "NameIV2")
Of course you can change colours, angles and much more – try exper-
imenting with this yourself! And do not forget to add an informative
caption!
7. Now it is almost time to apply the statistical test you chose, but do not
forget to check assumptions first. One of these assumptions is homoge-
neity of variance and, since we are interested in an interaction between
the language of the subtitles and the proficiency of the learners, we
would want to compare the variances of the four groups. You can do
so using the interaction() function from the ‘car’ package (Fox &
Weisberg, 2011) as in the following code:
187
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Note that contrasts are a little more complicated when you have varia-
bles with 3 levels. If you want or need to know more on why and how
to do this, please read Levshina (2015, pp. 185–186) and/or Field et al.
(2012, pp. 414–425).
We can now fit our ANOVA model using the code we used before and
then use Anova() from the ‘car’ package to specifically ask for the Type
III results:
188
More Advanced Group Comparisons (Chapter 7)
Can you reject the different H0s that you formulated before?
8. As mentioned in Chapter 7, most people would use omega-squared (ω2)
as an effect size measure for ANOVAs and we will do the same here.
Luckily for us, the ‘sjstats’ package (Lüdecke, 2018) will easily provide
the effect size for you, so install and load the package and use the fol-
lowing code:
This code will provide us with the partial omega-squared values for
all independent variables and the interaction separately. The code
artial=TRUE provides the partial version of (ω2), which partials
p
out other effects and is hence the one we need when we have multiple
independent variables as it assesses the effect sizes of each effect while
partialling out the other effects.
The interpretation of (partial) ω2 is as follows (Kirk, 1996; in Field
et al., 2012):
• ωp2 = .01 = small
• ωp2 = .06 = medium
• ωp2 = .14 = large
What is the effect size for the two main effects and the interaction?
9. What can you say about the meaningfulness of this outcome?
10. Report on the results of this study in the way that it is convention-
ally done in research papers. You can base your report on the example
below.
190
R PRACTICAL
EXAM PRACTICE
7
In this practical you will practise an exam. Below, you will find a list with
8 problems (the same as those in Activity 8.1). Choose at least 2 of the
following problems (you are welcome to do them all), and work these out
in detail.
Include the following points in your answers to each of the problems
below:
• List the variables in the study – if relevant, say which variables are
dependent and which are independent.
• For each of the variables determine its scale (nominal, ordinal,
interval).
• In the case of independent variables, how many levels does each inde-
pendent variable have?
• Identify the appropriate perspective: assessing relationships, comparing
means, predicting an outcome, or a combination of these; then choose
the most appropriate statistical test.
• Formulate the relevant research hypotheses (H0 and H1/H2).
• Report on the results of this study in the way that it is conventionally
done in research papers. Your report of the outcome must include:
° Descriptive statistics;
° Value of the test-statistic;
° Value of df;
° Significance (the exact p-value or < .001) and the 95% CI, if possible;
° Direction of the effect including, if applicable, descriptive statistics;
° Effect sizes;
° If applicable, also report on the assumptions, for example linearity,
homogeneity of variance, and normality of the d
istribution;
° Do not forget to illustrate your answer with tables and figures.
° Reflect on the meaningfulness of the outcome.
a) A researcher wants to investigate if motivation affects the pronun-
ciation of English by Dutch learners. To investigate the possible
effect of motivation on pronunciation, she makes recordings of 24
191
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
192
Exam Practice
Sports
YES NO
stress1 20 19
stress2 24 17
stress3 28 14
193
PART
2 -JASP
PRACTICALS IN JASP
GETTING READY TO
START USING JASP
This section briefly explains why a student or researcher would want to use
JASP (JASP Team, 2018) as opposed to, for example SPSS, and how to down-
load and open the program. The most important components are also briefly
discussed, and after this short introduction to JASP, you can start doing your
first calculations in the program by following the instructions in Practical 1.
As JASP is often being updated, some features might be slightly different from
what we present here. The version used for this book is JASP 0.10.2 (July 2019).
197
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
198
GETTING READY TO START USING JASP
Note that whenever you choose one of the analyses from the menu in
JASP, this analysis will stay in your Output, unless you remove it. You can
remove it either by clicking on the cross on the top right of the Options
menu, or by hovering over the title of the output part and clicking on the
little triangle arrow pointing down next to the title, and select Remove.
In the next section, Practical 1, we will use JASP to explore the data file
further.
Figure J.3 Drag one of the three dots to move the options screen
199
JASP Practical
EXPLORING JASP AND
1 ENTERING VARIABLES
(CHAPTER 2)
In this practical you will become familiar with the statistical program
JASP. You will practise defining variables, entering data via Excel, and
opening and saving a dataset. You will also learn how to make an easy-to-
read report in JASP. All this will prepare you for the statistical analyses you
will be carrying out in the following practicals.
For this practical, we assume that you know how to open JASP. It is thus
important that you have read, carried out, and understood ‘Getting ready
to start using JASP’ before starting this practical.
200
Exploring JASP and Entering Variables (Chapter 2)
201
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
202
JASP Practical
DESCRIPTIVE
2 STATISTICS (CHAPTER 3)
In this practical you will become familiar with some more functions of
JASP. You will use the data that you entered in the previous practical and
do some first analyses involving descriptive statistics. You will use the JASP
file to answer questions that will be asked later in this practical.
Part A
1. OPEN THE JASP FILE FROM PRACTICAL 1A
2. FIRST CALCULATIONS: DESCRIPTIVE STATISTICS
a. During this first step, we want to find the descriptives for the
variable age. You should still see the Descriptive Statistics Option
menu in your JASP file, which should look like the screenshot in
Figure J.5.
b. Click on the variable age and move it to the Variables box on the
right. You will see that the table in the right half of the screen sud-
denly gives you various statistics. In the options menu, to the right
203
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
of the title Descriptive Statistics, you can see three symbols, a pen,
an i symbol, and an x. The x is to close the analysis and delete it
from your file, the i is to get more information on the options in the
menu, and the pen is to change the title. Click on the pen to change
the title to ‘Descriptive Statistics on Age’. To the left of the title,
you will find a little arrow pointing down. Click on this to close the
options menu.
c. What are the minimum age, the maximum age, the mean age,
and the standard deviation? Report on your findings by adding a
note to the JASP output. Also give your table a clear and explana-
tory caption above the table: ‘Table 1. Descriptive statistics of the
variable Age.’
d. The default selected descriptives are the mean, the standard deviation,
and the minimum and the maximum values. If you want to add more,
click on the Output table, and you will see that the Descriptives menu
opens again. This is how you can always make changes to existing
‘analyses’ without having to redo everything. Now click on Statistics
in the Descriptives menu to see more central tendency and dispersion
measures to select from.
e. Which age occurs most often? Answer this question in your JASP
report.
f. What does the little superscript a mean? Hint: check the data file to
help you find the complete answer to the previous question (manual
counting).
g. Close the options menu again by clicking on the arrow to the left of
the title. Also make sure you regularly save your file by clicking Ctrl+s
(Cmd+s on a Mac OS).
3. MORE DESCRIPTIVES
a. By opening a new Descriptives menu, find out the mean, the median,
the mode, the range, and the standard deviation of the proficiency
score in your data. Do not forget to change the title of your analysis
to e.g. ‘Descriptive Statistics on Proficiency’. Report on your findings
in the JASP File. Please keep in mind that we use a dot (and not a
comma) to report decimals (e.g. 0.5) and that we do not always have
to report all decimals (use common sense!). JASP often uses 2 or 3
decimals by default. If you want to decide how many decimals JASP
reports, you can change the decimals by going to the JASP menu and
selecting Preferences > Results and checking the box that says Fix the
number of decimals. As a general guideline you can use the informa-
tion in Table J.1.
204
Descriptive Statistics (Chapter 3)
205
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Part B
1. Open the data file that you made in Practical 1B.
2. Provide the mean, the mode, the median, the range, and the standard
deviation of each of the four datasets.
3. Report your findings in the JASP file by adding comments and adding
a caption (above the table).
4. Do you agree with JASP’s calculation of the mode for variable A?
Part C
We will use a large sample of data containing information on the motivation
to learn French and the score on a French Proficiency test. The data to this
part can be found on the website. The file is called ‘Data-Practical2C.csv’.
1. OPEN THE FILE IN JASP
Check the variable types and the data itself and make sure the labels
are added to the levels of the variable Motivation. Do remember that
we are now dealing with an ordinal variable. The labels for Motivation
are:
2. DESCRIPTIVE STATISTICS
a. Find out the mean proficiency score, the median, mode, and standard
deviation for the group of students as a whole and then for the differ-
ent motivation groups. As in the previous parts, report on the answers
in the JASP file.
b. Make a boxplot with the different motivation groups. Judging from
the boxplot, do you think the groups will differ from one another?
Report on your findings.
206
Descriptive Statistics (Chapter 3)
207
JASP Practical
CALCULATIONS USING
3 JASP (CHAPTER 4)
In this practical you will review some descriptive statistics and you will
learn to get a first impression about the normality of a distribution. Finally,
you will do some first ‘real’ statistics.
Part A
208
Calculations using JASP (Chapter 4)
4. USING Z-SCORES
a. JASP allows you to calculate and add a new variable, by clicking on
the plus-symbol that can be found next to the last column of your
dataset. We are going to calculate z-scores of the TOTAL score by
creating such a formula. Click on the plus symbol, name your new
variable ‘z-scores’, and drag and drop the following to the formula
window:
• TOTAL
• -
• mean
• TOTAL
• /
• σy
• TOTAL
It should now read: (TOTAL - mean(TOTAL))/σy TOTAL. After
you click on Compute column, a new column should appear in your
data file.
b. Report on the z-scores of the following students: 11, 33, 44, and 55.
5. INDUCTIVE STATISTICS
a. So far you have only done the descriptive statistics. What is your
first impression about the difference between the groups of the two
teachers?
b. We will now go through a first example of hypothesis testing. The ques-
tion we want to answer is whether there is a difference between the
total scores of the students of teacher A and the total scores of the
students of teacher B. We are going to find out if your impressions in
a. are correct. What is the null hypothesis belonging to the research
question?
209
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
c. The statistical test we will carry out to test the null hypothesis is the
t-test. We will discuss this test later on in more detail; this is just to
give you some first hands-on experience. The test we want to use is
the independent samples t-test. That is because the two samples we
took (from the two teachers) are not related (there are different stu-
dents in each group). Which variable is the grouping variable, that is,
the independent? And which one is the dependent variable?
d. The independent samples t-test that we will be using is a parametric
test. We will thus first have to check for normality. In Practical 2, we
already saw that the closer to 0 the values of skewness and kurtosis
are, the closer they are to the normal distribution. Apart from this,
it would also be nice to know how close they are to a normal distri-
bution, or in other words when the values of skewness and kurtosis
are close enough to the normal distribution. For this, we will have to
make our own calculation.
Go to Descriptives > Descriptive Statistics and add the independent var-
iable in the Split box and the dependent variable in the Variables box.
Select the Skewness and Kurtosis boxes under Statistics > Distribution.
In the table you will not only get the values for skewness and kurtosis,
but also the Standard Errors of skewness and kurtosis. We are going to
divide the skewness and kurtosis values by their standard errors. You
do not have to know exactly how this works, but this is as if you are
calculating a z-score for the skewness and kurtosis values. For samples
that are quite small (say, up to 30), we can assume that the outcome of
skewness/SEskewness and kurtosis/SEkurtosis that are between –1.96 and
1.96 are close enough to a normal distribution (also see Field et al.,
2012, p. 175). For samples that are a bit larger (say, between 30 and
200), it is fine if they stay within the –2.58 and 2.58 range. We have
a sample of 130 students, so for this group it is fine if the values are
between –2.58 and 2.58. What can you say about skewness and kur-
tosis now? Can we say that the data of the two teachers are normally
distributed?
e. The t-test compares the scores of the two groups, so that you can esti-
mate the difference between them, that is whether the null hypothesis
can be rejected or accepted. In JASP, we conduct a t-test by going to
T-Tests > Independent Samples T-Test. Add the dependent variable and
the independent variable to their respective boxes.
f. You do not need to change anything under Hypothesis, but what do
you think the different options here are meant for?
g. Under Additional Statistics, tick the option Descriptives and Descrip-
tives plots. What is your first impression about the difference between
the groups of the two teachers?
210
Calculations using JASP (Chapter 4)
211
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
As you now know, you can also perform a Shapiro-Wilk for each group
and make sure that these are non-significant to ascertain that your data
are normally distributed. In general, it is best to use the Shapiro-Wilk
to check for a normally distributed dataset. However, when sample sizes
212
Calculations using JASP (Chapter 4)
Samples > 30
Check Samples < 30 and < 200 Samples > 200
Histogram Good to check, Good to check Very important to
but will probably check because it
not look normally will give you the
distributed best information
Skewness and kurtosis - Between –1 -
and 1
Skewness and kurtosis Between –1.96 Between –2.58 -
divided by their and 1.96 and 2.58
Standard Errors
Normality tests Shapiro-Wilk Shapiro-Wilk -
are large, the Shapiro-Wilk test is often too strict, and it is very easy to
get a significant value for this test. What is advised with samples larger
than 200 is to have a good look at the histograms, for example (also see
Table J.2).
It is always important to use various ways to check your data for
normality and it should be clear now that the correct way of testing nor-
mality highly depends on your sample size. Table J.2 will give a rough
guideline on how to check for normality with different sample sizes.
8. INDUCTIVE STATISTICS
a. Now that we have checked the assumptions, we can look at the out-
comes of the t-test. You will find this information in the first table
of this analysis (under Independent Samples T-test in your output).
The output shows us some interesting numbers, but it also contains
some information that is currently redundant. For now, we will first
focus on the significance value or the p-value of the t-test. We see the
p-value which indicates the chance of incorrectly rejecting the null
hypothesis (the chance of getting an alpha error!).
b. What is the chance of incorrectly rejecting the null hypothesis con-
cerning the two teachers?
c. What is the conclusion you would draw with regard to the research
question in 2a? Would you reject the H0? What is the chance that
we go wrong in our decision to reject H0 (the α-error)? Is your con-
clusion about the H0 in line with what you would expect from the
descriptives?
We have now practised the most common version of the t-test, but you
will only have to click on one of the other options in the T-Test menu of
JASP to perform one of the other versions. If your aim is to perform a
213
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
paired samples t-test, the only difference when compared to the above
information is that you would have one group of participants who have
two scores, so the scores would be next to each other in two columns
instead of in one column.
For a one sample t-test, you would compare your variable to a theoret-
ical mean μ (mu), which by default is 0 (but can be changed to any value
by simply replacing the 0 under Test value in JASP).
The interpretation is almost identical for all three versions of the t-test.
After you have reported on everything, make sure you save your
JASP file.
Part B
The file ‘Data-Practical3b.csv’ contains the results of a vocabulary test
(interval scores) for participants from two different motivation levels. The
data result from an experiment in which motivation was a nominal inde-
pendent variable and vocabulary score an interval dependent. Using all the
tools and knowledge you have used so far, determine if there is a (significant)
effect of motivation on the vocabulary scores and report on it. Please make
sure to turn the Motivation variable into a factor first. Also: do not forget to
look at the descriptives of your data, to plot the data, and to include your
interpretation of the effect in the report.
214
JASP Practical
INDUCTIVE STATISTICS
4 (CHAPTER 5)
In this practical you will take the next step in applying inductive statistics.
You will do a simple means analysis and a correlation analysis. You will
also learn how you should report the results of these statistical calculations.
This practical contains two more advanced assignments on correlation for
reliability.
Student R L
1 20 65
2 40 69
3 60 73
4 80 77
5 100 80
6 120 84
7 140 89
8 160 95
215
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
As you can see in the sentence, both the direction of the result AND
the significance or p-value are reported. Note that the r is only for the
216
Inductive Statistics (Chapter 5)
217
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
are organized for each individual case. An example of this type of data
organization would be the format in Table J.4, which is generally referred
to as long format (such as the data used in Part A of this practical).
The second option is when you only have the total frequencies for
each of the cells in the contingency table, such as the one in Table J.3.
For this practical, we will use the raw data in the long format. These
can be found in the file ‘Prac4B_data.csv’. The data file consists of val-
ues of 1s and 2s. For social class, 1 is high and 2 is low social class. For
the reply, 1 is ‘haven’t got’ and 2 is ‘don’t have’. Open the file in JASP.
5. Go to Frequencies > Contingency Tables and move one variable to Rows
and the other to Columns. To check the expected values, select Expected
under Cells. Under Statistics, you want to deselect the χ 2 (chi-square)
symbol, since we are not interested in that yet.
Although we are dealing with a non-parametric test, we do have to check
some assumptions before conducting the actual test. One assumption is
that every subject only contributes to one of the cells, which can normally
be checked by comparing the number of subjects to the total of all cells.
In this particular example, you can assume that this one has been met.
Secondly, as mentioned in Section 5.2.3, in a 2x2 table, none of the
expected frequencies in the table should be lower than 5. Do note that
in a larger table, the expected counts must be at least 1, and no more
than 20% of the cells are allowed to be less than 5. If your expected cell
frequencies are below 5, you should look at the outcome based on the χ2
continuity correction, which is also known as the Yates correction. With
samples smaller than about 30, it is advised to use the Likelihood ratio test.
You should be able to find the expected values in the output pro-
vided. Has this assumption been met?
6. Go to Frequencies > Contingency Tables again, and move the variables
again. At this point we want to make sure that the χ 2 symbol is selected
and that Expected is deselected. It is also useful to select the different
Percentages (Row, Column, and Total), as these will give us the relative
frequencies in each cell.
The actual results of the chi-square test can be found in the second table
of the output. Can you reject the null hypothesis?
218
Inductive Statistics (Chapter 5)
7. What is the effect size? Click on Phi and Cramer’s V under Statistics. Phi
(φ) is used for variables with 2 levels and Cramer’s V for variables with
more than 2 levels. A value of .1 is considered a small effect, .3 a medium
effect, and .5 a large effect.
8. In other statistics programs it is sometimes possible to visualize the
results of a chi-square analysis in a barplot. In JASP this is not quite
possible (yet), but it is possible to make two histograms under Descrip-
tive Statistics, one for social class and one for reply. Try to make these
histograms. Which values in the contingency table do the bars in these
histograms correspond to?
9. A template for reporting the results of a chi-square would be (please
choose an option or fill in the correct numbers between the {accolades}):
219
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Answer the following questions in your JASP file, but carefully consider
the first four questions before you enter the data in a spreadsheet and open
them in JASP:
1. What are the dependent and independent variables and what kind of
measures (nominal, ordinal, or interval/scale) are used for the variables?
2. How many levels does the independent variable have?
3. Formulate your statistical hypotheses (H0 and Ha).
4. Which statistical test could be used? (Consult Table 8.1 in Chapter 8.)
5. Taking the previous questions in consideration, enter the data in Excel
using the format you used before. Tip: the two columns (Girls and Boys)
in the data are not necessarily the variable columns in R. Remember that
columns should represent variables, not levels of variables! Once you
have entered the data, do not forget to check whether all variables have
the correct scale.
6. Provide the following descriptive statistics for both groups: means, min-
imum, maximum, standard deviations.
7. What are your first impressions about the difference between the boys
and the girls?
8. Create a boxplot to visualize the results.
9. We will test the statistical significance of this experiment, but we first
have to check the assumptions:
a. Check the distribution of the data by looking at the histogram,1
and the skewness and kurtosis values, and by performing the Shap-
iro-Wilk as we did in the previous practical.
b. Also test homogeneity of variance using Levene’s test.
c. Now run the test (see Practical 3 if you have forgotten how to do
this).
d. Carefully study the first table in the JASP output. This contains the
values for t, the degrees of freedom (df ) and the level of significance,
that is, the p-value. The degrees of freedom are related to the sample
size: for each group, this is the sample size minus one. These two
values together form the value for degrees of freedom. What is the
value of t? Which degrees of freedom are applied to this test? What
is the level of significance? Can you reject H0?
1 It is always good to plot a histogram of the data because it gives you a good impres-
sion of the spread of the scores. However, with samples that are smaller than, say,
30, the histogram is not the best way to check normality. For this, we really need to
look at the values for skewness and kurtosis, and the Shapiro-Wilk outcome (also see
Practical 3A).
220
Inductive Statistics (Chapter 5)
Do note that we can often only use the SD of the sample as we do not
know the standard deviation of the population (σ).
g. The descriptive plots option will generate a line graph with error bars
that represent the 95% Confidence Intervals.
10. We are using an independent samples t-test. Why do you have to use
this test rather than the one sample t-test or the paired samples t-test?
(Explain this in your JASP file.)
11. As we explained in Chapter 4, it is important not only to look at the
values of the test statistic and its corresponding p-values but also to look
at effect sizes. Finding a significant difference does not automatically
mean that this difference is meaningful or important. The effect size, as
you will have deduced by its name, measures the size or magnitude of
the effect (also see Chapter 5). Please note that there are various effect
sizes, but we will use r 2 as it is used relatively often and is also relatively
easy to understand. The formula for the effect size r 2, which was also
discussed in more detail in Chapter 5 (Section 5.3.3), is:
t2
r2 =
t 2 + df
221
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
222
Inductive Statistics (Chapter 5)
223
JASP Practical
REGRESSION/
5 MISCELLANEOUS
ASSIGNMENTS
(CHAPTER 5/6)
This practical consists of two assignments, one containing a test that you
have not yet performed and one that should be at least somewhat familiar
to you.
224
Regression/Miscellaneous Assignments (Chapter 5/6)
4. Plot the data in a scatterplot with the independent variable on the x-axis
and the dependent variable on the y-axis. What do you see in the plot?
5. Which statistical test could be used to predict score on the basis of age?
6. Apply the statistical test you chose by going to Regression > Linear Regres-
sion, and adding the variables to their respective boxes. The most impor-
tant part of the output is to be found in the Coefficients table, which will
provide estimates of the intercept and slope AND the accompanying
standard error values. You will also find t-values and the corresponding
p-values. The Model Summary gives the effect size. The F-value and the
degrees of freedom for the model you built can be found in the ANOVA
table. This table actually gives the same results as if you were to analyse
the same data with a so-called ANCOVA (Analysis of Covariance).
7. Can you reject H0? If you have a problem interpreting the results, the
explanation in Winter’s tutorial (2013) might help.
8. What is the effect size?
9. Remember that this is not the end of the story; you have to check the
assumptions! You should:
a. Check whether the relationship is linear by plotting the data (you
can assess this on the basis of the scatterplot you made before);
b. Assess whether the residuals all deviate in a similar way from the
model (‘homoscedasticity’) by going to Plots and selecting the Resid-
uals vs. predicted option.
As discussed in Chapter 6 of Part 1, residuals are the differences between
the observed values (the actual data) and the fitted values (as predicted
by the model). A residuals plot is a plot in which the original scatterplot
is slightly tilted and flipped and the model line would be the horizontal
line at 0 (suggesting no deviation from the model). Homoscedasticity
refers to the equality of the closeness across the entire regression line.
We prefer not to see any odd patterns in this residual plot as we want the
differences between the observed values and fitted values to vary con-
stantly. In other words, if the plot does NOT show a particular pattern,
this means that all residuals vary more or less equally.
c. Assess whether the residuals are normally distributed by creating a
histogram or a Q-Q plot of the residuals, two options below the
residual plots option. For the histogram, it is best to tick Standard-
ized residuals, as this will probably give you a better picture. The
Q-Q plot will also plot the standardized residuals. In such a Q-Q
plot, two sets of quantiles, the quantiles from our data and quantiles
from a normal distribution, are plotted against one another. This is
why this plot is referred to as a Q-Q or Quantile-Quantile plot. The
dots should approximately follow a straight line if both of the plotted
sets of quantiles come from the same distribution. If the histogram
226
Regression/Miscellaneous Assignments (Chapter 5/6)
Below you will find an example report and table that you can use:
Intercept
Factor
227
JASP Practical
MORE ADVANCED
6 GROUP COMPARISONS
(CHAPTER 7)
In this practical you will carry out some special versions of the t-test and
the ANOVA.
228
More Advanced Group Comparisons (Chapter 7)
no instruction lectures GW
34 65 68
58 54 87
56 43 94
47 57 69
35 65 81
31 49 75
55 74 94
65 79 78
61 54 63
27 65 78
Questions:
1. Put the data in Excel, save it as a CSV file, and open it in JASP. Remem-
ber that we advise you to add one column for each variable, which means
that we cannot import the data in the same format as it is presented
above. We want to compare the scores people received and then conclude
which type of instruction worked best. Therefore, your data should look
something like this:
229
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
2. List the variables in the study – if relevant, say which variables are
dependent and which are independent.
3. What kind of measures (nominal, ordinal, interval) are used for the
variables?
4. In case of independent variables, how many levels does each independ-
ent variable have?
5. Formulate the statistical hypotheses.
6. Which statistical test could be used?
7. Create a boxplot for your data. What do you see?
8. Provide the following descriptive statistics for each group: mean, mini-
mum, maximum, standard deviation.
9. Check the normality with skewness and kurtosis divided by their
standard errors. A Shapiro-Wilk test of Normality is not really pos-
sible to carry out in JASP with nominal variables with more than
two levels. An ANOVA can handle slight deviations from normality
quite well, especially if the design is balanced, so we can assume
here that we can continue with the ANOVA. The ANOVA does also
have an option to check the Q-Q plot of the residuals. When the
residuals in this plot are more or less on a straight line, normality
can be assumed. If there is no normal distribution, you can always
decide to run a non-parametric version of the ANOVA, which is the
Kruskal-Wallis.
10. Next, we are going to assess the statistical significance of this experi-
ment. Go to ANOVA > ANOVA, and add the independent and depend-
ent variables to the specific boxes on the right. Do not forget to check for
230
More Advanced Group Comparisons (Chapter 7)
And the following format is an example of how you could report the
actual group differences according to the post hoc comparisons:
231
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Note that the report always contains descriptives per group, the impor-
tant statistical values (F, df, p, and η 2), as well as an explicit interpre-
tation of the direction and size of the effect. Additionally, we always
advise including a (reference to) a plot.
In the above example, we are reporting SD instead of SE. Which
one to choose often also depends on the criteria of the journal in which
you will publish your work.
232
More Advanced Group Comparisons (Chapter 7)
in the Descriptive Statistics menu, but we would like to get the means and the
standard deviations for every combination of the levels.
a. To make a boxplot with all four groups, you need to make a new
column with a combination of the two independent variables. The
other option would be to filter one independent variable (e.g. Pro-
ficiency), using the filter option that is visible to the left of the
Proficiency column. You then enter the other independent variable
(e.g. Subtitles) in the Split box. This option should work but does
not let you save the results per in JASP version 0.10.2. In that case,
you would have to create a column in Excel with a combination of
the two independent variables (e.g. LowL1, LowL2, and so on).
b. You can also go to ANOVA and click Descriptive statistics under
Additional Options. You can also make a plot for the interaction
under Descriptives Plots, by choosing, for example, Proficiency on the
horizontal axis and separate lines for the Subtitles.
6. Now it is almost time to apply the statistical test you chose, but do
not forget to check assumptions first! When checking assumptions for a
factorial ANOVA, remember that you want to compare all the combina-
tions of groups. The distribution, for example, should be approximately
normal in each group or combination of groups. The newly created col-
umn should be of help here.
7. Before actually performing the statistical test, we have to add a small
explanation on the different F tests available that we did not address in
Chapter 7: Type I, II, III, and IV. These different types of tests have to
do with the order in which variables are being added to the model, and
whether this order is important or not. For our balanced design with
only variables with no more than two levels, the difference is not crucial,
but it might be good to be aware of the differences. Although there is no
consensus on when to use which, we can give you the following, overly
simplified, rules to work with:
• Type I: Is sequential and the order in which the variables are added
can affect the results. Because of this, this type is often not used in
cases where you have multiple main effects and interactions.
• Type II: Evaluates main effects while taking into account other main
effects, but not interactions. Therefore, this type is only used to assess
main effects.
• Type III: Is used when the effect of a variable or interaction needs to
be evaluated by taking all other effects in the model into account,
including interactions. The order is not important in the Type III
test.
• Type IV: Is the same as Type III, but can be used when there is miss-
ing data.
233
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
There was a significant main effect of {fill in IV1} on {fill in DV}, F({fill
in df1}, {fill in df2}) = {fill in F-value}, p = {fill in exact p-value or
< .001}. This effect was {small/medium/large}, {ω ²/ηp2} = {fill in
value of o mega-squared or partial eta-squared}.
234
More Advanced Group Comparisons (Chapter 7)
235
JASP Practical
EXAM PRACTICE
7
In this practical you will practise an exam. Below, you will find a list with
7 problems (the same as those in Activity 8.1). Choose at least 2 of the
following problems (you are welcome to do them all), and work these out
in detail.
Include the following points in your answers to each of the problems
below:
• List the variables in the study – if relevant, say which variables are
dependent and which are independent.
• For each of the variables, determine its scale (nominal, ordinal, interval).
• In the case of independent variables, how many levels does each inde-
pendent variable have?
• Identify the appropriate perspective: assessing relationships, comparing
means, predicting an outcome, or a combination of these; then choose
the most appropriate statistical test.
• Formulate the relevant research hypotheses (H0 and H1/H2).
• Report on the results of this study in the way that it is conventionally
done in research papers. Your report of the outcome must include:
° Descriptive statistics;
° Value of the test-statistic;
° Value of df;
° Significance (the exact p-value or < .001) and the 95% CI if possible;
° Direction of the effect including, if applicable, descriptive statistics;
° Effect sizes;
° If applicable, also report on the assumptions, for example linearity,
homogeneity of variance, and normality of the d
istribution;
° Do not forget to illustrate your answer with tables and figures.
° Reflect on the meaningfulness of the outcome.
a) A researcher wants to investigate if motivation affects the pronuncia-
tion of English by Dutch learners. To investigate the possible effect of
236
Exam Practice
237
ESSENTIAL STATISTICS FOR APPLIED LINGUISTICS
Sports
YES NO
stress1 20 19
stress2 24 17
stress3 28 14
238
REFERENCES
239
REFERENCES
Grosjean, P. and F. Ibanez (2018) pastecs: Package for Analysis of Space-Time Ecologi-
cal Series (version 1.3.21)[Computer software]. Available at https://CRAN.R-project.
org/package=pastecs
Hansen, L., E. S. Kim and Y. Taura (2010) ‘L2 vocabulary loss and relearning: The dif-
ference a decade makes’. Paper presented at the AAAL Annual Conference, Atlanta,
6 March 2010.
Hendriks, B. C. (2002) More on Dutch English ... please?: A study of request performance by
Dutch native speakers, English native speakers and Dutch learners of English (Nijmegen:
Nijmegen University Press).
Ioannidis, J. P. A. (2005) ‘Why most published research findings are false’, PLoS Medicine,
2(8): e124. Retrieved from http://www.plosmedicine.org/article/info:doi/10.1371/
journal.pmed.0020124.
JASP Team (2018) JASP (Version 0.9)[Computer software]. Available at https://
jasp-stats.org/
Kirk, R. (1996) ‘Practical significance: A concept whose time has come’, Educational and
Psychological Measurement, 56: 746–59. 10.1177/0013164496056005002.
Klein, W. (1989) ‘Introspection into what? Review of C. Faerch and G. Kasper (eds.)
in “Introspection in second language research 1987”’, Contemporary Psychology:
A Journal of Reviews, 34: 1119–20.
Klein, W. and C. Perdue (1992) Utterance structure: Developing grammars again (Amster-
dam: John Benjamins).
Kross, S., N. Carchedi, B. Bauer and G. Grdina (2017) swirl: Learn R, in R (version
2.4.3) [Computer software]. Available at https://CRAN.R-project.org/package=swirl
Levshina, N. (2015) How to do linguistics with R: Data exploration and statistical analysis
(Amsterdam: John Benjamins).
Lowie, W. and B. Seton (2013) Essential statistics for applied linguistics (Basingstoke:
Palgrave Macmillan).
Lüdecke, D. (25 July 2017) Effect Size Statistics for Anova Tables #rstats. (Last accessed
25 July.) Retrieved from: h ttps://strengejacke.wordpress.com/2017/07/25/effect-size-
statistics-for-anova-tables-rstats/
Lüdecke, D. (2018) sjstats: Statistical Functions for Regression Models (version 0.17.1)
[Computer software]. Available at http://doi.org/10.5281/zenodo.1284472
Mackey, A. and S. M. Gass (2005) Second language research: Methodology and design
(New York/London: Routledge).
Mackey, A. and S.M. Gass (2016) Second language research: Methodology and design (2nd
edition) London/New York: Roudledge.
Meyer, D., A. Zeileis and K. Hornik (2017) vcd: Visualizing Categorical Data. (version
1.4-4) [Computer software].
Mohamed, A. (2018) ‘Exposure frequency in L2 reading: An eye-movement perspec-
tive of incidental vocabulary learning’, Studies in Second Language Acquisition, 40(2):
269–93.
Open Science Collaboration (2015) ‘Estimating the reproducibility of psychological
science’, Science, 349. 10.1126/science.aac4716.
R Core Team (2018. R: A language and environment for statistical computing (version
3.5.1) [Computer software]. Available at https://www.R-project.org/
R Markdown Cheat Sheet (August 2014) Retrieved from https://www.rstudio.com/
wp-content/uploads/2015/02/rmarkdown‑cheatsheet.pdf
240
REFERENCES
241
INDEX
B D
Bar graph see Barplot Degrees of freedom (df ) 55
Barplot 71, 167, 219 Dependent samples t-test see t-test
Bayesian statistics 60, 197 Dependent variable (DV) see Variable
Beta error (β) 47–50 Descriptive research see Research
Between-subjects 108–109 Descriptive statistics 27, 147–148, 198,
Boxplot 29, 32, 150, 187, 205–206, 233 203–204
Dispersion 31–33, 151
C Dunn test 108
Case studies 7–8, 11–12
Categorical variable see Variable E
Causal modelling 66 Ecological validity see Validity
Central tendency 30, 151, 204 Editor window 130–131
CDST see Complex Dynamic Systems Effect size 57–59, 67–69, 78, 109–110
Theory Equality of variances see Homogeneity of variance
Chi-square (χ2) 69–73, 113, 115, 166–167, Error (ε) 83
218–219 Error bars 105, 221
CI see Confidence Interval Eta-squared (η2) 109–110, 185, 231
Cohen’s d 78, 170, 222 Expected frequency (FE) 72, 166, 218
Cohen’s f 185 Experimental research see Research
Cohort effect 14–15
Collinearity see Multicollinearity F
Complex Dynamic Systems Theory 7, Factor analysis 118–119
11–13, 18, 35–36, 122 Factorial ANOVA see ANOVA
242
INDEX
Factors 24, 44 L
Falsification (principle of ) 6, 48 Laboratory research see Research
Frequency Least squares approach 86
analysis see Chi-square Level of significance see Significance
distribution 37–39 Levels 23, 96
polygon 39–40 Levene’s test 76–77, 159, 211
tally 37 Line graph 37–38, 221
Frequentist statistics 60, 197 Linearity 56, 92, 122
F-value 88, 98 Logistic regression see Regression
Long format 160, 162, 165, 218
G Longitudinal research see Research
G*Power 225 Lower quartile 31–32
Gaussian 39, 122
Generalization 6, 13, 36, 55, 124 M
Generalized additive mixed models Main effect 102–104
(GAMMs) 121–122 Mann-Whitney U-test 77, 110, 159, 171,
Goodness of fit 87–88 212, 221
Grouping variable see Variable Mean 29–30
Meaningful(ness) 58, 62, 123
H Means analyses 27, 95–96, 116
Histogram 37–38, 153, 157–158, 207 Median 30–32, 147, 204
Homogeneity of variance 56, 76–77 Min-Max graph 36
Homoscedasticity (of variance) 56, 92–94, Mode 30–31, 147–148, 204
175–176, 226 Monotonic 57
Hypothesis 4–6, 48–49, 59, 61 Multicollinearity 92–93, 119, 179
alternative (H0/H1/H2) 48–49, 52, Multiple regression see Regression
61
null (H0) 48–49, 52, 59, 61 N
testing 4, 6, 156 Naturalistic research see Research
NHST see Null Hypothesis Significance
I Testing
Independent samples t-test see t-test Nominal variable see Variable
Independent variable (IV) see Variable Non-Constant Error Variance test 176
Inductive statistics 27, 43, 162, 215 Non-parametric statistics 43, 55–57
Inferential statistics 63 Normal distribution 38–41, 153, 156–158,
In-situ research see Research 176, 207, 210–213
Integer variable see Variable Null hypothesis (H0) see Hypothesis
Interval variable see Variable Null Hypothesis Significance Testing
Interaction 102–106, 177–178, 186, (NHST) 59
232 Numeric variable see Variable
Intercept (b0) 82–83, 93
Interquartile range (IQR) 31–32, 45 O
Introspection 16 Observed value (FO) 72
Observer’s paradox 62
K Omega-squared (ω2) 110, 185, 189, 231, 234
Kendall’s Tau (τ) 67, 79, 111, 113–114, One sample t-test see t-test
163–164, 216–217 One-tailed testing 52
Kruskal-Wallis H test 107–108, 184, 230 One-way ANOVA see ANOVA
Kurtosis 41–42, 153, 156–158, 207, 210, Operationalization 21–25, 43, 61
212–213 Ordinal variable see Variable
243
INDEX
244
INDEX
245
PACKAGES USED IN PART 2-R
246
FUNCTIONS USED IN PART 2-R
247
FUNCTIONS USED IN PART 2-R
248