0% found this document useful (0 votes)
81 views2 pages

Gibson & Fedorenko 2010

The document discusses the need for quantitative methods in linguistics research to address weaknesses in commonly used methodology. It argues that linguistics research often relies on the judgments of single researchers rather than quantitative data from multiple naive participants. This risks influencing results through cognitive biases. The document calls for applying quantitative standards from cognitive science using corpus analysis and controlled experiments with large, naive participant samples.

Uploaded by

Emily Fedele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views2 pages

Gibson & Fedorenko 2010

The document discusses the need for quantitative methods in linguistics research to address weaknesses in commonly used methodology. It argues that linguistics research often relies on the judgments of single researchers rather than quantitative data from multiple naive participants. This risks influencing results through cognitive biases. The document calls for applying quantitative standards from cognitive science using corpus analysis and controlled experiments with large, naive participant samples.

Uploaded by

Emily Fedele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Update

Letters

Weak quantitative standards in linguistics research


Edward Gibson1 and Evelina Fedorenko2
1
Department of Brain and Cognitive Sciences, Department of Linguistics and Philosophy, Massachusetts Institute of Technology,
46-3035, MIT, Cambridge, MA 02139, USA
2
Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology,
46-4141C, MIT, Cambridge, MA 02139, USA

A serious methodological weakness affecting much Carnegie Mellon University), which is discussed along with
research in syntax and semantics within the field of lin- other examples elsewhere (Gibson, E. and Fedorenko E.,
guistics is that the data presented as evidence are often not The Need for Quantitative Methods in Syntax, unpub-
quantitative in nature. In particular, the prevalent method lished)}. Without quantitative data from naı̈ve partici-
in these fields involves evaluating a single sentence/mean- pants, cognitive biases affect all researchers.
ing pair, typically an acceptability judgment performed by Furthermore, even if such cases were rare, the fact that
just the author of the paper, possibly supplemented by an this methodology is not valid has the unwelcome con-
informal poll of colleagues. Although acceptability judg- sequence that researchers with higher methodological
ments are a good dependent measure of linguistic complex- standards will often ignore the current theories from the
ity (results from acceptability–judgment experiments are field of linguistics. This has the undesired effect that
highly systematic across speakers and correlate with other researchers in closely related fields are unaware of inter-
dependent measures, but see Ref. [1]), using the research- esting hypotheses in syntax and semantics research.
er’s own judgment on a single item/pair of items as data To address this methodological weakness, future syn-
sources does not support effective testing of scientific tax/semantics research should apply quantitative stan-
hypotheses for two critical reasons. First, as several dards from cognitive science, whenever feasible. Of
researchers have noted [2–4], a difference observed be- course, gathering quantitative data does not guarantee
tween two sentences could be as a result of lexical proper- the lack of confounding influences or the correct interpret-
ties of the materials rather than syntactic or semantic ation of data. However, adopting existing standards for
properties [5,6]. Multiple instances of the relevant con- data gathering and analyses will minimize reliance on data
struction are needed to evaluate whether an observed that are spurious, driven by cognitive biases on the part of
effect generalizes across different sets of lexical items the researchers.
[7]. The focus of this letter, however, is on a second problem Corpus-based methods provide one way to quantitat-
with standard linguistic methodology: because of cognitive ively test hypotheses about syntactic and semantic
biases on the part of the researcher, the judgments of the tendencies in language production. A second approach
researcher and his/her colleagues cannot be trusted (Box 1) involves controlled experiments. Experimental evalu-
[8,9]. As a consequence of these problems, multiple items ations of syntactic and semantic hypotheses should be
and multiple naı̈ve experimental participants should be conducted with participants who are naı̈ve to the hypoth-
evaluated in testing research questions in syntax/seman- eses and samples large enough to make use of inferential
tics, which therefore require the use of quantitative statistics. A variety of experimental materials should be
analysis methods.
The lack of validity of the standard linguistic method- Box 1. Cognitive biases and linguistic judgments.
ology has led to many cases in the literature where ques-
There are at least three types of unconscious cognitive biases [8,9]
tionable judgments have led to incorrect generalizations
that can adversely affect the results of intuitive judgments, given the
and unsound theorizing, especially in examples involving way that they are currently typically gathered in the syntax/
multiple clauses, where the judgments can be more subtle semantics literature:
and possibly more susceptible to cognitive biases. As one 1. Confirmation bias on the part of the researcher: researchers will
example, a well-cited observation in the syntax literature often have a bias favoring the success of the predicted result,
with the consequence that they will tend to treat data that do not
is that an object–subject–verb question asking about three
support the hypothesis as flawed in some way (e.g. from a not
elements is more natural than one asking about only two quite native speaker, or from a speaker of a different dialect).
(e.g. What did who buy where? is claimed to sound better 2. Confirmation bias on the part of the participants: individuals that
than What did who buy?). Several theories explain and the researcher asks to provide a judgment on a linguistic example
build on this purported phenomenon [10,11]. However, it – including the researcher him/herself – might be biased because
they understand the hypotheses. When faced with complex
turns out that the empirical claim is not supported by materials, they could then use these hypotheses to arrive at the
quantitative measurements [12,13]. There are many other judgment.
such examples of questionable judgments leading to 3. Observer–expectancy effects (the ‘‘clever Hans’’ effect): indivi-
unsound theorizing {[2–4]; including an example from duals that the researcher asks to provide a judgment could be
the first author’s PhD thesis (Gibson, E., 1991, PhD Thesis, biased because they subconsciously want to please the research-
er and are consequently affected by the researcher’s subtle
positive/negative reactions.
Corresponding author: Gibson, E. ([email protected]).

233
Update Trends in Cognitive Sciences Vol.14 No.6

used to rule out effects as a result of irrelevant properties of 6 Gibson, E. and Pearlmutter, N. (1998) Constraints on sentence
comprehension. Trends Cogn. Sci. 2, 262–268
the experimental items (e.g. particular lexical items). It is
7 Clark, H.H. (1973) The language-as-fixed-effect fallacy: a critique of
our hope that strengthening methodological standards in language statistics in psychological research. J. Verbal Learn. Verbal
the fields of syntax and semantics will bring these fields Behav. 12, 335–359
closer to related fields, such as cognitive science, cognitive 8 Evans, J.S. et al. (1983) On the conflict between logic and belief in
neuroscience and computational linguistics. syllogistic reasoning. Mem. Cogn. 11, 295–306
9 Nickerson, R.S. (1998) Confirmation bias: a ubiquitous phenomenon in
many guises. Rev. Gen. Psychol. 2, 175–220
References 10 Kayne, R. (1983) Connectedness. Linguist. Inq. 14, 223–249
1 Edelman, S. and Christianson, M. (2003) How seriously should we take 11 Pesetsky, D. (2000) Phrasal Movement and its Kin, MIT Press
Minimalist syntax? Trends Cogn. Sci. 7, 60–61 12 Clifton, C., Jr et al. (2006) Amnestying superiority violations:
2 Schütze, C. (1996) The Empirical Base of Linguistics: Grammaticality processing multiple questions. Linguist. Inq. 37, 51–68
Judgments and Linguistic Methodology, University of Chicago Press 13 Fedorenko, E. and Gibson, E. Adding a third wh-phrase does not
3 Cowart, W. (1997) Experimental Syntax: Applying Objective Methods to increase the acceptability of object-initial multiple-wh-questions.
Sentence Judgments, Sage Publications Syntax (in press), doi:10.1111/j.1467-9612.2010.00138.x
4 Wasow, T. and Arnold, J. (2005) Intuitions in linguistic argumentation.
Lingua 115, 1481–1496
5 MacDonald, M.C. et al. (1994) The lexical nature of syntactic ambiguity
1364-6613/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved.
resolution. Psychol. Rev. 101, 676–703 doi:10.1016/j.tics.2010.03.005 Trends in Cognitive Sciences 14 (2010) 233–234

Letters Response

Quantitative methods alone are not enough: Response


to Gibson and Fedorenko
Peter W. Culicover1 and Ray Jackendoff2
1
Department of Linguistics, The Ohio State University, 222 Oxley Hall, 1712 Neil Ave., Columbus, OH 43210-1298, USA
2
Center for Cognitive Studies, Tufts University, Medford, MA 02155, USA

Gibson and Fedorenko [1] (see also [2,3]) correctly point out Julesz’s random-dot stereograms are quick and dirty exper-
that subjective judgments of grammaticality are vulner- iments that produce robust intuitions [5]. These phenomena
able to investigator bias, and that – where feasible – other do not occur in nature, so corpus searches shed no light on
types of data should be sought that shed light on a linguis-
tic analysis. Major theoretical points often rest on asser-
tions of delicate judgments that prove not to be uniform Box 1. The need for proper controls in Gibson and
among speakers or that are biased by the writer’s theor- Fedorenko’s experiment
etical predispositions or overexposure to too many Fedorenko and Gibson’s argument turns on the claim that super-
examples. iority violations with two wh-phrases are supposedly worse than
with three. Their experiment [9] disputes this judgment. The
Another problem with grammaticality judgments is
relevant sentence types are illustrated in (i).
that linguists frequently do not construct enough control
examples to sort out the factors involved in ambiguity or (i) Peter was trying to remember . . .
ungrammaticality. But this problem cannot be ameliorated a. who carried what.
b. who carried what when.
by quantitative methods: experimental and corpus
c. what who carried.
research can also suffer from lack of appropriate controls d. what who carried when.
(see Box 1).
Nevertheless, theoreticians’ subjective judgments are They find that, in contrast to longstanding judgments in the
literature, (ic) is worse than (id), the two are judged to have about
essential in formulating linguistic theories. It would crip-
equal (un)acceptability.
ple linguistic investigation if it were required that all They do not control by replacing the third wh-phrase with a full
judgments of ambiguity and grammaticality be subject phrase as in (ii).
to statistically rigorous experiments on naive subjects,
(ii) Peter was trying to remember . . .
especially when investigating languages whose speakers a. who carried what last week.
are hard to access. And corpus and experimental data are b. what who carried last week.
not inherently superior to subjective judgments.
In fact, subjective judgments are often sufficient for We find (iia) as good as (ia,b), but (iib) worse than (ic,d). If so,
some violations with two wh-phrases are worse than counterparts
theory development. The great psychologist William James
with three. The difference calls for a reexamination of the examples
offered few experimental results [4]. Well-known visual in the literature, controlling for this factor. Ratings studies might be
demonstrations such as the Necker cube, the duck-rabbit, helpful in establishing the reliability of these judgments. We doubt
the Kanizsa triangle, Escher’s anomalous drawings, and relevant examples will be found in corpora of natural speech and
writing. And we also doubt that Bolinger’s original observation in
[10] resulted from investigator bias.
Corresponding author: Jackendoff, R. ([email protected]).

234

You might also like