0% found this document useful (0 votes)
181 views33 pages

Goldberg Cxs LMMs 2023

The document summarizes the constructionist framework in linguistics. It makes two key points: 1) The constructionist approach recognizes that linguistic units at all levels, from words to idioms to grammatical patterns, pair form with function. Constructions are learned based on usage and can vary in specificity. 2) Language knowledge is dynamic and usage-based, emerging from the linguistic input people experience and general cognitive factors. The constructionist view aims to understand how language users learn, represent, and employ constructions based on their experience and interactions.

Uploaded by

Albis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views33 pages

Goldberg Cxs LMMs 2023

The document summarizes the constructionist framework in linguistics. It makes two key points: 1) The constructionist approach recognizes that linguistic units at all levels, from words to idioms to grammatical patterns, pair form with function. Constructions are learned based on usage and can vary in specificity. 2) Language knowledge is dynamic and usage-based, emerging from the linguistic input people experience and general cognitive factors. The constructionist view aims to understand how language users learn, represent, and employ constructions based on their experience and interactions.

Uploaded by

Albis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Goldberg, AE (to appear) in Constructions and Frames

A Chat about constructionist approaches and LLMs


Adele E Goldberg

The constructionist framework is more relevant than ever, due to efforts by a broad
range of researchers across the globe, a steady increase in the use of corpus and
experimental methods among linguists, consistent findings from laboratory
phonology and sociolinguistics, and striking advances in transformer-based large
language models. These advances promise exciting developments and a great deal
more clarity over the next decade. The constructionist approach rests on two
interrelated but distinguishable tenets: a recognition that constructions pair form
with function at varying levels of specificity and abstraction and the recognition that
our knowledge and use of language are dynamic and usage-based.

1. Introduction
I use the term constructionist approach to emphasize two claims (Goldberg 2006).1 First,
language is comprised of a dynamic network of CONSTRUCTIONS, at varying levels of
complexity and abstraction, which pair each form with a conventional range of functions.
Equally important, languages are learned or CONSTRUCTED on the basis of the linguistic input
witnessed, together with general cognitive, pragmatic and processing factors. These and several
other basic tenets of the constructionist approach are stated below:

1) All levels of description are understood to involve form--function pairings, including


filled and partially filled words (aka morphemes); filled and partially filled idioms, and
partially lexically filled and fully abstract schematic grammatical patterns.
2) Constructions are understood to be learned on the basis of the input and general
cognitive mechanisms and are expected to vary cross linguistically.
3) Cross-linguistic generalizations are explained by the functions of the constructions and
general cognitive constraints.
4) An emphasis is placed on subtle aspects of the way we construe events and states of
affairs.
5) A “what you see is what you get” approach to syntactic form is adopted.
6) Language-specific generalizations across constructions are captured via dynamic
networks
7) The totality of our knowledge of language is captured by a dynamic network of
constructions: a constructicon (or ConstructionNet).

1
I do not attempt to organize the full range of research that falls under the heading
“Construction Grammar” or “constructionist.” But see eloquent and thoughtful descriptions of
the landscape available elsewhere (Gonzálvez-García and Butler 2006; Hoffman & Trousdale,
2013; Ungerer and Hartmann 2023)

1
Goldberg, AE (to appear) in Constructions and Frames

1.1. Constructions all the way down


While some constructionists distinguish words from complex constructions, or fully specified
collocations and idioms from partially schematic constructions, there is value to using a single
term, constructions (or signs), because of the profound parallels between all types of linguistic
units (see also Diessel, Dabrowska, and Divjak 2019; Fillmore, Kay, and O’Connor 1988; Goldberg
1995). All can be lexically filled, partially filled, or fully abstract; all pair form with function; and
all can be compositional to variable degrees.
Each construction of a language serves a function, or more typically, a range of related
functions. That is, both words and grammatical constructions tend to convey a conventional range
of polysemous and occasionally homonymous meanings (e.g., Lakoff 1987; Goldberg 1995).There
should be little controversy about this point since there would be no reason to access and articulate
a construction that had no impact on the comprehender. When constructionists occasionally argue
for functionless constructions, evidence typically rests on the existence of a form that is associated
with a range of arguably unrelated functions. For instance, Jackendoff (2002) suggests the English
verb-particle construction is a syntactic pattern that serves no generalizable function. Yet he does
not argue that any particular verb-particle combination is functionless. The observation is instead
that forms tend to be reused for multiple purposes (Croft 2001: 132-34; Culicover and Jackendoff
2005:42; MacDonald 2013). This observation holds of traditional lexical items as well as
grammatical constructions, likely due to efficiency considerations (MacDonald 2013; Piantadosi,
Tily, and Gibson 2012). However, just as few would suggest that a word is meaningless just
because it can be used to convey multiple functions, neither is there reason to posit meaningless
grammatical patterns. In this way, the recognition of parallels between traditional lexical items
and grammatical constructions (recall tenet [1]) reveals as a non-issue, the question as to whether
constructions exist that have no functions. Example constructions at varying levels of complexity
and abstraction are offered in Table 1.

Table 1. English constructions at varying levels of complexity and abstraction. Lexically-filled


aspects are in italics

CONSTRUCTIONS ENGLISH EXAMPLES (can be lexically filled, partially


filled, fully abstract)

Words predate, going, saunter, afraid, walk, walked

Words with open slots (MORPHEMES) pre-N, V-ing, V-ed

Constructions that convey states, events o Verb + clausal complement


or relationships between multiple states or believe [clause]; know [clause]
events o Double object construction
gimme Obj2; tell Obj1rec Obj2;

2
Goldberg, AE (to appear) in Constructions and Frames

o Gossip construction
It’s nice of you to be here; It was stupid of me.
o the Xer, the Yer construction:
The bigger they come, the harder they fall
Constructions that structure discourse o Information questions
(and lexically specified instances in italics) What does that mean?
o Polarity questions
Does it matter? Is that a thing?
o Relative Clauses
things you can do
o Passives
Mistakes were made

Idioms, collocations happily ever after


raise the roof
hazard a guess

Idioms/collocations with open slots hazard <a guess>


I hope this < message> finds you well

1.2. Definition
My understanding of constructions has evolved as I’ve gained a better appreciation of human
memory, learning, and the brain. Rather than abstract constructions being reified entities that
exist independently of their instantiations, the description in (8) is more accurate:

(8) A construction is an emergent cluster of lossy (imperfect) memory traces that are aligned
within our high-dimensional conceptual space on the basis of shared form, function, and
contextual dimensions (Goldberg 2019:7).
The definition of construction in (8) is based on evidence of the usage-based nature of our
knowledge of language, very briefly reviewed in the following section.

2. The usage-based nature of language


Recognizing that our knowledge and use of language is incredibly rich and complex, the usage-
based constructionist perspective aims to understand how language users are able to learn,
represent, combine and employ constructions appropriately in ever-changing contexts and while
interacting with a combination of familiar and new individuals. The approach emphasizes that
language is used to communicate along with the importance of the linguistic input, including the
role of distributions, token frequencies, and formulaic language (Beckner et al. 2009; Arnon and
Christiansen 2017; Diessel, Dabrowska, and Divjak 2019; Herbst 2011; Hunston and Francis 2000;
Langacker 1988; Wray 2013).
The usage-based approach thus demands we recognize our vast associative memory and the
importance of social and communicative factors that give rise to the distributional patterns evident

3
Goldberg, AE (to appear) in Constructions and Frames

within each community of language users (see Croft, this volume; van Trijp, this volume). Not
every constructionist emphasizes the usage-based nature of language. Indeed, this aspect was only
in my own peripheral vision early on (e.g., Goldberg 1995:135). A far deeper appreciation of
statistical information and discourse factors came into focus by the time I wrote Constructions at
Work (Goldberg 2006), as I interacted with and read more work by colleagues such as Ron
Langacker, Joan Bybee, Liz Bates, Wallace Chafe, Jeff Elman, Knud Lambrecht, Mike Tomasello,
the other authors and editors of this volume and others.
But the usage-based nature of language is tacitly endorsed by nearly every psychologist and
machine learning expert, as well as those of us who explicitly describe our perspective as usage-
based (Abbot-Smith and Tomasello 2010; Ambridge and Lieven 2011; Arnon and Snider 2010;
Boas 2008; Diessel and Hilpert 2016; Dunn 2019; Kidd, Lieven, and Tomasello 2010; Hilpert 2015;
Ibbotson 2022). The frequencies of constructions and the frequencies of their subparts
simultaneously influence language processing and language change (Baayen and Prado Martin
2005; Bybee, 2010; Gries 2010; Goldberg and Lee 2021; Gries and Hilpert 2010; Traugott &
Trousdale, 2013). And relationships among constructions and the forms of constructions are
shaped by users goals and conversational demands over diachronic time (e.g., Francis & Michaelis
2017; DuBois, 2014; Givón, 2014).
Since new information is related to old information, constructional generalizations emerge as
clusters of related instances within the high-dimensional network embedded in each brain, with
its nearly 100 billion neurons and roughly 60 trillion connections. As discussed at some length in
Goldberg (2019), memory traces that cluster together to form constructions involve partially
overlapping patterns of connections. Our brain’s incredibly rich network is dynamic: each person’s
constructicon (or ConstructionNet) is shaped by millions of exposures to language (Beckner et al.
2009; Bybee 2010; McClelland et al. 2010; Gries and Hilpert 2008; Perek 2015; Traugott 2014).
ConstructionNets continue to change as speakers are exposed to new contexts, new semi-idiomatic
expressions (Ok, boomer; living one’s best life; I did a thing), new individuals, different dialects
and/or new languages. Several foundational aspects of memory and learning are relevant to the
dynamic nature of ConstructionNets (Goldberg 2019:6):

• Speakers balance the need to be expressive and efficient while conforming to the
conventions of their speech communities.
• Our memory is vast but imperfect: memory traces are partially abstract (“lossy”)2
• Lossy memories are aligned when they share relevant aspects of form and function,
resulting in overlapping, emergent clusters of representations: Constructions
• New information is related to old information, resulting in a rich network of
constructions
• During production, multiple constructions are activated. If they cannot combine, they

2
Representations are “lossy,” a term from computer science, in the sense that they are not fully
specified in all detail. Models are also lossy representations, a point we return to in section 4.1.

4
Goldberg, AE (to appear) in Constructions and Frames

compete with one another to express our intended messages


• During comprehension, mismatches between what is expected and what is witnessed fine-
tune our network of learned constructions via error-driven learning

The usage-based perspective allows constructional knowledge to be both remarkably specific


and flexible. A cluster of lossy overlapping memory traces that comprise a construction will be
more specific, the narrower the range of contexts it is witnessed used in. That is, when witnessed
utterances share similar contexts of use, the learned cluster will be correspondingly narrow or
specific. When witnessed instances are more variable, the construction will be applied to new cases
more broadly (Suttle & Goldberg 2011). Yet even highly specific constructions are extended
flexibly on occasion, because speakers need to use constructions in ever-changing contexts to
convey an open-ended range of messages (Casasanto and Lupyan 2011; Christensen and Chater
2022; Christianson and Ferreira 2005; Christianson 2016; Christianson and Ferreira 2005; Cuneo,
Floyd & Goldberg forthcoming; Ferreira, Bailey, and Ferraro 2002; Goldberg and Ferreira 2022;
Ferreira, Bailey, and Ferraro 2002; Rambelli et al. 2022). The specificity and flexibility of language
can most easily be illustrated by example.

2.1. Hazard a guess


When a small group of monolingual English speakers were recruited on Prolific crowd-sourcing
platform to provide the next word in the following sentence, roughly half of them supplied the
indefinite phrase, a guess:3

(9) I can’t try to hazard __________

The phrase, hazard a guess, occurs only 187 times in the billion-word corpus of Contemporary
American English, COCA (Davies 2008), but the transitional probability of a guess following
hazard used as a verb is very high: p(a guess | hazard) = .49, in COCA. The implicit awareness
of the lexically filled construction, plainly evident even in a free recall task, for at least half of
adult English speakers, coexists with the recognition of the construction’s individual lexemes,
guess and hazard.4 The word guess is particularly transparent in the construction and is

3
This figure is based on a fill-in-the-blank task included in a Prolific survey taken by 20
English-speaking adults.

4
The verb, to hazard, is semantically related to its more common use as a noun. When one hazards
a guess, there is typically little evidence on which to base the guess, which makes the guessing
somewhat fraught or hazardous. The verb to hazard is sometimes used to imply some
communicative signal other than a guess. In this case, it retains the implication that doing so
makes the speaker vulnerable as in the attested examples from COCA:

5
Goldberg, AE (to appear) in Constructions and Frames

interpreted compositionally. It is therefore expected to occasionally appear in the plural, with


modification, and/or in the passive and it does, as in the attested examples (10)-(13),
respectively (also from COCA).

(10) “If I had to hazard guesses”


(11) “[they] can hazard a pretty sophisticated guess”
(12) “I am going to hazard a wild guess”
(13) “one can only now hazard an educated guess.”

In this way, hazard a guess has a highly specific form and function, yet it can also be extended
on occasion. Notably, because the statistics are so skewed toward guess, the meaning of “guess”
tends to be implied even when other terms are used, as in the attested example in (14), which
makes the fact that the prediction is a guess explicit in the second clause:

(14) “I would hesitate to hazard a particular percentage, but I would guess that…”

2.1. Skewed input


Constructions commonly display skewed distributions with one or more central tendencies and
conventional extensions which express a family of constructional interpretations (Barðdal,
Kristoffersen, and Sveen 2011; Goldberg and van der Auwera 2012; Goldberg and Herbst 2021;
Goldberg and Jackendoff 2004; Goldberg and Michaelis 2017; Gonzálvez-García 2009;
Kapatsinski and Vakareliyska 2013; Jong- Kim and Sells 2013; Lakoff 1987; Ungerer 2022). This
is not to say that all of the content we infer from an utterance is specified in any particular
construction or by their combination directly. As Remi van Trijp (this volume) emphasizes,
each utterance provides only cues that enable people to create an enriched situation model (or
mental model, see Johnson-Laird 1983; Christiansen & Chater, 2022).
But the meaning of the most frequent instance used in a construction often becomes
implicitly associated with the meaning of the construction (Goldberg 2006; Goldberg,
Casenhiser, and Sethuraman 2004; but cf. Perek 2016 for a different case). For example, the
verb give is the grammatical “head” of roughly half of all tokens of the English double object
construction, and its meaning—transfer from one animate being to another--is associated with
the construction even when other verbs are used. This offers an explanation for why She baked
him something entails that she intended to give him whatever it was she baked.5 Similarly, the
verb make accounts for roughly 40% of the instances of the way construction (e.g., She made her

(a) I hazarded that maybe it was glamorous living in exile with a tennis legend.
(b) She hazarded a backward glance

5
The paraphrase, She baked something for him, on the other hand can alternatively be
interpreted to mean that she baked something on his behalf (to be given to someone else), or
even that she baked something intending to throwing it at him.

6
Goldberg, AE (to appear) in Constructions and Frames

way into the room), and the construction implies that a real or metaphorical path is created
(i.e., made). The implication that a path is created is what imbues the construction with its
interpretation of self-propelled motion despite difficulty or obstacles (Goldberg 1995).

3. The usage-based nature of language is a challenge for symbolic formalisms


While all constructionists recognize Chuck Fillmore as an inspirational figure, my own thinking
has been at least as influenced by George Lakoff, my PhD advisor at Berkeley. In the first course
I took in the department, Lakoff shared chapters of his then-new book, Women, Fire and
Dangerous Things. The title was intended to trick readers into assuming that he was claiming
that women, fire, and dangerous things all share something in common. The book dispels that
misconception by explaining that linguistic elements in languages are rarely tamed by any shared
features or simple definitions. Instead, most grammatical elements, word roots, and constructions
are polysemous in that they can convey distinguishable but related interpretations. This provides
another parallel between words and abstract grammatical constructions. I’ve found the
observation that strict definitions regularly fail to account for familiar concepts to be one of the
most profound I’ve encountered, and I do my best to keep it forefront in my mind. It is the root
cause of my skepticism regarding symbolic formalisms.

3.1. Symbolic, feature-based formalisms


No simplification of a natural language into interpretable symbols will ever equal natural
languages themselves as means of expressively and efficiently conveying an open-ended range of
messages. That is, each language provides an exquisitely useful formalism for communication.
What better way to express the rich meaning involved in words like extradite, renege, fireworks
than with the lexical items themselves? An early valiant attempt to define “drink” by one eminent
linguist led to the cockeyed “CAUSE LIQUID to MOVE into one’s MOUTH.” This representation
incorrectly predicts that gargling is a type of drinking, and fails to distinguish sipping, gulping,
and chugging. The meaning of LIQUID also begs for further scrutiny, as glass and sand are
sometimes classified as liquids by physicists, yet swallowing a mouthful of sand or glass is not
considered drinking. And if we dare venture beyond LIQUID, we face the need to distinguish GIN,
VODKA, and ORANGE CRUSH in some way other than simply promoting English terms into
capital letters. I confess to occasionally being guilty of the capital-letter boondoggle myself. But
interpretable features are especially impotent when used to represent lexical semantics (Fillmore
1975). It is futile to decompose the meaning of words such Bible-belt, tailgate, gaslight, TikTok,
contention, golf or idioms such as Ok Boomer; I feel seen; cast the first stone. True, new features
can be created for each new function or distinct formal property, but in practice this results in
opaque features as “frame A” or elided representations which include simply “….”.
Grammatical constructions likewise offer effective and succinct means of capturing
compositional meanings as they convey who did what to whom as well as distinguishing questions,
statements, commands. Constructions can convey how parts of an utterance relate to other
utterances that came before or will come afterwards, a speaker’s attitude toward the comprehender

7
Goldberg, AE (to appear) in Constructions and Frames

or toward an event, the certainty of the information being conveyed, or the relative status of
speaker and hearer, to name just a few of the many functions signed by constructions.
Although I sometimes use simple decompositional representations to capture the type of
general and abstract meanings associated with common argument structure meanings in English
(e.g., CAUSE-MOVE), I do this only in an effort to provide a representation that may be
comprehended at a glance. This is possible for highly frequent constructions because they
generalize across many, varied instances, so their meanings are necessarily very general. I also
sometimes employ grammatical terms such as N(noun) or V(verb), but this again only intended
to provide information via a shorthand that I assume is familiar to readers. Each time I employ
a formal notation, I feel unsatisfied and humbled. Syntactic terms such as noun, subject, passive
do not refer to consistent categories across different languages (e.g., Croft 2001; Fried 1994;
LaPolla 1993), nor even within a single language (e.g., Croft 2001; Culicover 1999; Goldberg 2006;
Ross, 1973).
That said, I value the fact that other contributors to this volume have developed
formalisms that suit their intended goals. Proponents of Sign-Based Construction Grammar
provide a unification-based symbolic formalism for the sake of explicitness and to offer a common
descriptive language, and this is clearly valuable (Michaelis, this volume; Trousdale and Michaelis
2010; Boas and Sag 2012; Bergen and Chang 2005). As Michaelis (this volume) describes it, the
use of a “rigid” formalism can provide a “way of seeing.” Likewise, Fluid Construction Grammar
offers a fully implemented computational tool that can be used to test the compatibility of
representations in a way that captures the interactive online nature of language processing as a
means of communication (Van Trijp 2014; Steels and DeBeule, 2006; van Trijp 2014; van Trijp,
this volume). To its enduring credit, Fluid Construction Grammar has made enormous efforts and
taken great strides in grounding meaning in the goals of agentive actors in real and computational
situations.

3.2. Combining constructions: an example


The editors asked each of us to analyze the sentence in (15). I confess to finding it quite challenging
to interpret, undoubtably due to my lack of familiarity with golf and the use of rather in this
context. I therefore requested and was granted permission to analyze the sentence in (16) instead.

(15) Wasn’t it rather McIlroy who seemed never to be outdriven when playing in contention?
(16) Wasn't it actually Everett who consistently demonstrated remarkable linguistic skills,
effortlessly speaking multiple languages?

What is most interesting about the utterance in (16) (and presumably the one in [15]) is its
complex interpretation. Due to the combination of constructions it employs (see i-ix below),
example (16) conveys a hedged assertion, namely that the speaker believes the polyglot at issue
is Everett; it also presupposes that someone had incorrectly suggested (or thought) that a
different person was a remarkable polyglot. Example (16) combines the following constructions:

8
Goldberg, AE (to appear) in Constructions and Frames

(i) Polar interrogative construction (yes/no question), which can be as a rhetorical question
to imply the associated assertion is false because it literally questions the veracity of the
assertion. Since the associated assertion in this case is negated, example (16) implies the
negation is false: i.e., Everett was the polyglot. The polar interrogative construction
includes:
a. Subj-aux inversion construction (wasn’t it)
(ii) It cleft construction (It BE ___ <relative clause>), presupposes the content of the
relative clause and puts the head noun in focus. Here, Everett in focus and that someone
else has been suggested as the polyglot is presupposed. The it-cleft construction includes:
a. Relative clause construction: here, a subject-oriented non-restrictive relative
clause (since Everett is interpreted as the subject argument of the relative clause)
(iii) A negative clitic (n’t), which presupposes the relevance of the positive counterpart (e.g.,
Horn, 2010; Lakoff 2014)
(iv) The focus element (actually) emphasizes the function of the it-cleft. It treats Everett as
the focus and implicitly corrects a mistaken belief (whether previously asserted or
presupposed), in this case that someone other than Everett was the polyglot.
(v) Several instances of the noun phrase construction (it, Everett, remarkable linguistic
skills, multiple languages)
(vi) A verb phrase adjunct which is discontinuous from what it modifies (effortlessly
speaking multiple languages modifies Everett, not skills) [aka a “dangling participle”]
(vii) Two lexical adverbs modifying different verb phrases (consistently, effortlessly)
(viii) Morphological inflection constructions (V-ing, N-s, V-past)
(ix) Lexical items, each associated with related words as well as its own range of functions:
was, not, it, actually, who, consistently, demonstrated, remarkable, linguistic, skills, effortlessly,
speaking, multiple, languages

The list in (i)-(ix) clarifies that the utterance in (16) is a combination of many constructions,
but few readers are likely to find the list itself particularly illuminating. And I fully empathize
with them. Not only do (i)-(ix) fail to do justice to any of the constructions involved (dozens of
papers have been written on each), but listing constructions obscures the fact that each exists as
part of a network of related items: the polar interrogative construction in (i) is related to the
tag question construction (e.g., wasn’t it?) and to information (wh-) questions. The subject-
auxiliary construction is in reality a family of constructions in English (e.g., Goldberg, 2006).
The it-cleft construction is related to the presentational relative clause construction (e.g., there
was a guy who) and to wh-clefts (Kim & Michaelis, 2020). It can also be used as a scene-setting
device, with information structure quite distinct from that described in (ii): e.g., It was 1967
when young people from around the world were drawn to San Francisco by the promise of
peace, love, and understanding [COCA, 1997, SPOK]. Subject-oriented relative clauses are
related to non-subject oriented relative clauses. And obviously, the lexical item was is related to
were, be and is; the adverb effortlessly is related to the lexemes effortful, effort, and to the
morphological constructions N-less, Adj-ly.

9
Goldberg, AE (to appear) in Constructions and Frames

Moreover, none of the labels or features used in (i)-(ix) captures their usage-based nature.
There are no uniform tests that hold of all words we generally call adjectives in English, let alone
any tests that apply to all adjectives in all languages (Croft, 2001). While there are constructions
with comparable functions across languages (Croft 2022) and while each construction tends to be
motivated rather than random, the specifics of each construction are not strictly predictable (e.g.,
Lambrecht, 1994). The need to explain the usage-based motivation and complexity of every
feature and construction leaves me wary of symbolic formalisms.

4. A Game Changer: Large Language Models


Since the functions of constructions are essential for communication, I had been privately
skeptical that Large Language Models (LLM), which are trained only on text, would ever
produce or comprehend language in a way that might be mistaken for human (the famous
Turing Test, [French 2000]). But then a new generation of models burst onto the scene at the
end of November in 2022, beginning with ChatGPT, which led me to reverse my perspective.
And I am far from alone: The prominent New York Times newspaper swiftly published half a
dozen essays including “This changes everything” (Klein, 2023), and “Our New Promethean
Moment” (Freidman, 2023). As is shown in Figure 1, during the four months following the
release of ChatGPT and then GPT4, which followed quickly on its heels, searches for GPT in
Google quickly shot up to meet the average daily searches for excel, software used daily all over
the world, by millions (Figure 1).

Figure 1: Google Trends data from February 2020 to May, 2023.

ChatGPT and GPT4 are “generative pre-trained transformer” models, which are a type
of Large Language Models (LLMs), and they are able to produce and comprehend language to a
degree that has stunned me. This is a good a place to note that generative in “generative pre-
trained transformer” simply means that the models generate novel outputs; it is unrelated to
generative linguistics, and generative linguists are generally skeptical, arguably naively so (e.g.,
Chomsky et al., 2023). In fact, as described below, LLMs share far more with the usage-based
constructionist approaches to language than traditional generative approaches (Weissweiler et
al. 2023). Table 2 provides six striking parallels, with the final one, new to ChatGPT and
GPT4, being quite profound: the new models are specifically trained to be helpful to human

10
Goldberg, AE (to appear) in Constructions and Frames

users (section 4.6). There are to be sure, profound differences in how the parallels arise. Each is
discussed briefly in turn below.

11
Goldberg, AE (to appear) in Constructions and Frames

Table 2: Parallels between the usage-based constructionist perspective and LLMs


PARALLELS Usage-based constructionist GPTChat, GPT4 & similar recent
approach to language LLMs
LOSSY Human brains represent the world Every model involves lossy
COMPRESSION and imperfectly (lossy), with limited compression and interpolation; all
INTERPOLATION recourses (compressed); we neural net models interpolate to
generalize from familiar to related generalize within the range of the
cases (via training data.
interpolation/coverage/induction)
CONFORM TO Humans display a natural inclination Pre-training to predict the next word
CONVENTIONS to conform to the conventions of requires that outputs conform to
their communities. conventions in the input.
COMPLEX Structured distributed Structured distributed representations
DYNAMIC representations at varying levels of at varying levels of complexity and
NETWORK complexity and abstract are learned abstraction are learned from massive
from input + understanding of amounts of input text; can be flexibly
others’ intentions and real-world extended.
grounding; can be flexibly extended.
CONTEXT- Humans use linguistic and non- Only linguistic context is available, via
DEPENDENT linguistic context for interpretation. a thousand words of preceding text.
INTERPRETATIONS
RELATIONSHIPS Made possible via working memory Made possible by attention heads
AMONG and attention. (Transformer models)
DISCONTINUOUS
ELEMENTS
GOAL is to BE Humans display natural tendency to Special training provided by
HELPFUL be helpful to others in their InstructGPT taught GPTs to provide
communities. helpful responses.

12
Goldberg, AE (to appear) in Constructions and Frames

4.1. Lossy compression and interpolation


An essay in the New Yorker magazine argued that the new LLMs are (simply) analogous to a
blurry image of the web (Chiang 2023), in that they involve standard mechanisms of lossy
compression and interpolation. It is true that every model involves lossy compression, in the sense
that no model is a veridical replication of reality. That is, every model omits some information (is
lossy) in order to compress its information, since models don’t have infinite recourses.
Interpolation is what allows models to “fill in” missing information by averaging neighboring
vectors when missing information falls within the training space. The usage-based constructionist
approach also recognizes that the human brain involves lossy compression and interpolation
insofar as memories are imperfect (lossy), and brain resources are vast but finite at any given time
(requiring compression). Humans also readily generalize from familiar cases to similar new cases
(interpolation or coverage, Sutttle & Goldberg, 2011; Goldberg 2019). Lossy compression and
interpolation cannot on their own explain the sudden and impressive leap the new generation of
models have taken, since the past five decades of neural net (connectionist) models have employed
lossy compression and interpolation (e.g., McClelland & Rumelhart 1986a, 1986b). While the
earlier models enjoyed successes within limited domains, they never approached the stunning
performance of current GPT models.

4.2. Conform to conventions


The “pre-training” involved in current LLMs models (i.e., the P in GPT) has also been used in
some form or other for decades: models are trained to predict the next word in coherent texts of
natural language. Initially, their predictions are random, but they learn to iteratively self-correct
based on receiving the next word that actually appears in the text. As in certain earlier neural
network models, each word is divided into substrings, which allows the models to learn morphology
from regularities in written text. Importantly, this pre-training regime forces the models to
conform to human input, which ensures that they produce the conventions reflected in that input
as best they can. We have also known at least since Elman (1990) that models trained in this way
learn hierarchical structure, which essentially groups strings of text into coherent units insofar as
single words can substitute for the string in other texts (Langacker, 1997). Finally, the next-word
prediction task allows for the emergence of constructions with open but constrained slots: an
increase in entropy in which word will appear next indicates an open slot, and the distribution of
potential next-words serves to constrain the type of filler that may appear in that slot (Dunn,
2022).
The same key attributes are recognized to be critical in the usage-based constructionist
approach: the inclination to conform to others is required for a community to converge on a shared
system; the tendency to try to construe meaningful units from continuous pieces, and of course
the possibility for constructions to include open but constrained slots. Humans spontaneously
display an inclination to conform to other members of their community, at least in comparison to
other apes. For instance, both preschool-aged children and chimps are able to learn to perform
multi-step processes in order to receive a reward, but only children persist in conforming to the

13
Goldberg, AE (to appear) in Constructions and Frames

same process when an easier solution is evident. In these contexts, children recognize that there
is a conventional or “correct” way to perform the activity and they conform their behavior
accordingly (Gergely, Bekkering, & Király 2002; Horner & Whiten, 2005). Humans also naturally
segment the natural world into meaningful units in vision, memory, and in language (Chater
2018). Humans naturally construe parts of scene that move together as parts of the same entity
(Ostrovsky, 2009), we come to recognize parts with relatively high transitional probabilities as units
(Saffran et al 1996), and we understand contiguous words that combine to form a coherent unit
to be a semantic constituent.
Finally, there is evidence that humans spontaneously predict the next word that will be
uttered while comprehending language (Kutas and Federmeier 2011). For instance, the N400
ERP component detectable from EEG recordings on the scalp, while people listen to text,
correlates quite well with how predictable each word is in context (e.g., Nieuwland & Van
Berkum 2006). Less predictable words result in a higher amplitude N400 and highly predictable
words results in a negligible N400. Yet, despite the efficacy of predict-the-next-word training, it
is unlikely to be sufficient to explain the dramatic improvement in ChatGPT, because it has
been used for decades.

4.3. Complex dynamic network of constructions at varying levels of abstraction and complexity
LLMs include far more layers, with exponentially more nodes and connections than earlier
connectionist models. This accounts for their characterization as “deep learning” models
(Graves, Mohamed, and Hinton 2013). And ChatGPT was trained on 300 billion words of text
scraped from the internet in all languages found online. The massive amount of input allows it
to learn the thousands of collocations, idioms, and semi-idiosyncratic constructions within the
vast training data, a key hallmark of usage-based constructionist approaches. The compression
involved requires a rich network of conventional constructions to partially share representational
structure with related constructions, in the same spirit as the clustering described in section 2.
To be clear, ChatGPT and GPT4 receive orders of magnitude more input than any
human receives in their entire lifetime. And no human can learn any new language only by
scanning text or by listening to the radio, even for a billion years. We would not be able to
glean any meaning whatsoever. Humans, however, have access to real or imagined grounding in
various contexts, and importantly, humans are good understanding of the intentions of others
(Tomasello, 2003; 2010). I had been skeptical that models trained only on text could converse
coherently with humans, but ChatGPT has proven my intuition wrong. How is this possible?

4.4. Context-dependent interpretation


In order for natural language to be an efficient and flexible form of communication, it provides
cues to listeners about the intended message, rather than specifying the message in its entirety
(Christiansen and Chater 2022). For instance, we are far more likely to say, Hey in greeting than I,
the speaker appearing before you, hereby acknowledge and greet you informally with this utterance
that I expect you can hear and interpret as intended. It is also not possible to predict the next
word with any accuracy in the absence of context. If a young child asks for a drink, they likely

14
Goldberg, AE (to appear) in Constructions and Frames

to want water, juice, or milk, while if an adult at a bar ask for a drink, it is far more likely to be
a Manhattan or a Mojito. Today’s GPT models have no access to non-linguistic contexts, but
they make use of the large amount of linguistic context: each token of input includes thousands
of words of the preceding text.

4.5. Semantic relationships among discontinuous elements


The T in GPT stands for transformer. The essential innovation in transformer models is that
they include “attention heads,” which allow each part of the input to be weighted differently in
different layers (Vig 2019). Attention heads enable discontinuous relationships of all kinds to be
captured, therefore serving a role similar to working memory and attention in humans, albeit a
more powerful one. The attention heads allow GPT models to produce and respond coherently
to all sorts of long-distance dependencies including not only those within sentences (if/then, wh-
questions, etc. etc.), but those that hold across sentences (to respond appropriately to a
prompt), and those that exist across multiple prompts (enabling pronouns to be interpreted as
referring to entities mentioned earlier in the discourse).

4.6. NEW: the goal is to be helpful


While early generative pre-trained transformer models (GPT-1, GPT-2) were impressive in many
ways (e.g., Hawkins et al. 2020; Dasgupta et al. 2022; Grand et al. 2022; Mahowald 2023), they
were prone to grammatical errors and regularly produced totally incoherent sequences of text.
This was perhaps not surprising, since humans do not generate language with the goal of producing
the most likely next word, and as noted, human learners are able to glean others’ intentions in
context (e.g., Tomasello, 2003). Our goal in producing and comprehending language is to convey
messages and respond appropriately to others.
I find it remarkable that what appears to have led to the new generation of GPT models
to engage in far more human-like conversations than their predecessors is that they received
additional training aimed to align them with human conversational goals. That is, ChatGPT and
GPT4 were trained to be helpful, or more specifically: helpful, true, non-toxic, respectful, humble
about its own certainty, and overall preferred. This special training came from a separate
“InstructGPT” model, created on the basis of human rankings of sets of machine-generated
responses according to how “helpful, honest and harmless” the responses were (Ouyang et al.
2022). The InstructGPT model then used reinforcement learning based on the human feedback
collected (RLHF) (Christiano et al. 2017). This helpfulness-training resulted in ChatGPT
performing better, with a mere 1.3 billion parameters, than the same model that had 175 billion
parameters but no helpfulness training (Ouyang et al. 2022).
A great deal of experimental and observational data has found that non-human primates
rarely if ever interpret communicative signals as intended to be helpful (Tomasello 2009, 2016).
Chimps fail to understand an experimenter’s pointing gesture to where food is hidden, even while
they understand grabbing gestures, in which an experimenter reaches for food as though competing
for the resource (Call, Agnetta, and Tomasello 2000; Herrmann and Tomasello 2006). On the
other hand, Grice’s (1975) essential point was that human language users are cooperative: we are,

15
Goldberg, AE (to appear) in Constructions and Frames

and assume others are: relevant, truthful, brief, and mannerful. The goal of being helpful is mighty
close to being cooperative: one would seem to need to be relevant, generally honest, allow for turn-
taking (one way to interpret the idea of being “brief”), and be appropriate in context (or
“mannerful”). In fact, the assumption that others are being helpful or cooperative is a well-
recognized prerequisite for natural language, present in young children (Tomasello 2010).
In fact, what I find most intriguing about the latest LLMs is the way they succeed as well
as they do. The assumption that others intend to communicate cooperatively and the inclination
to conform to relatively arbitrary social norms are prerequisites for languages (and complex
cultures) to emerge in humans. This combination of prerequisites explains why none of our primate
cousins are able to learn a language anywhere near as complex as the natural language of humans
(Tomasello, 2010; 2019).

4.7. With great power comes great responsibility


Many have rightly observed that there is a dark side to successful language models that ought
not be ignored (e.g., Bender et al. 2021; Klein 2003; van Dis et al. 2023). Their statistical nature
makes them prone to exaggerating biases in their input and incorporating falsehoods in their
responses (recall the lossy compression). The first of the new generation of models, ChatGPT, was
accurate perhaps 80% of the time, potentially enough to lull users into accepting its
“botsplanations” at face value. The models can easily be used to generate propaganda and
disinformation. There will surely be anticipated and unanticipated consequences of the new
technology that individuals and societies need to think through carefully. Yet legitimate concerns
about current and future societal impacts do not imply we should bury our heads in the sand and
pretend that the latest models do not produce and comprehend languages that they have had
sufficient exposure to. They evidently do.
The speed of innovation itself is remarkable. I am keenly aware anything I write will likely be
outdated before this paper is published. I encourage any readers who have not yet had a chance
to play with the latest models to find the time to do so.

8. GPTs at work
In what follows, I include a series of representative responses to prompts I provided to GPT-4 in
March of 2023. Like humans, GPT models are not deterministic. What follows includes
responses to the first (and only) time I provided a prompt. Your results will vary. While it is
remarkably impressive overall, an illustrative instance in which it fails in an illuminating way is
included as well (Figure 9).

8.1. Intention-reading and social inferences


Inspired the idea that pointing is unique to humans among all apes (Tomasello 2010), I
prompted GPT-4 with the following: “If I’m walking with a friend and I point to a bicycle
parked by a house and wink, what do you think I might mean?” Its response was coherent and
quite human-like (Figure 2).

16
Goldberg, AE (to appear) in Constructions and Frames

Figure 2: GPT-4 displaying an ability to interpret the description of a pointing gesture.

In another test of GPT-4’s ability to make appropriate social inferences, I asked, “When
Tim’s husband said he was at the gym all morning, Tim turned red. What might have
happened?” GPT4’s response was again remarkably appropriate (Figure 3).

Figure 3: A simple probe that elicited a range of plausible inferences.

I tested the model on whether it could supply reasonable and distinct inferences when given
single-word utterances, fire! vs. coffee! Chat-4’s unedited and appropriate responses are
provided in Figure 4 (graciously overlooking my typo [the want]:

17
Goldberg, AE (to appear) in Constructions and Frames

Figure 4: GPT-4’s markedly distinct and highly appropriate interpretations of the single word
utterances, “fire!” and “coffee!”

8.2. GPT correctly interprets unusual examples


Curious how GPT-4 would respond to instances in which constructional meaning should coerce
the interpretation of the utterance, I asked it “what does ‘she sneezed the foam off the
cappuccino’ mean?” and then “what does ‘3 computers ago’ mean?” While these “novel” inputs
may be contained in GPT4’s vast training data, it is doubtful that the intended interpretation
of either was detailed. GPT-4 nonetheless responded with appropriate and detailed descriptions
of both interpretations (Figure 5):

18
Goldberg, AE (to appear) in Constructions and Frames

Figure 5: GPT-4’s interpretation of “she sneezed the foam off the cappuccino” and “three
computers ago” which require constructional meaning.

Another example of appropriately interpreting novel input comes from GPT-4’s


interpretation of a novel Phrase-as-Lemma (PAL) construction (Shirtz and Goldberg,
submitted). Examples of the PAL construction in (17)-(20) come from the COCA corpus
(Davies 2008):

(17) a don't-mess-with-me driver (19) We're at the people-are-moving-to-


Jersey stage of nationwide collapse
(18) It’s not a “call Ronan Farrow” (20) This is my “can you believe this
scenario bull***t?” face.

Using corpus analysis, survey data, and cross-linguistic comparison, we provide motivation for
the form and function of phrases that are treated syntactically as if they were words. In
particular, we argue that novel uses of the PAL construction are ideally suited for conveying
what comedians call “observational humor.” The reasoning for this is sketched in Table 3.

Table 3: Why the Phrase-As-Lemma construction has the interpretation it has


(a) The construction treats a phrase formally as if it were a word root. (e.g., Trips and
Kornfilt 2017)
(b) The concept associated with a lexeme is what psycholinguists refer to as a lemma Definition
(c) Lemmas evoke familiar, recurrent semantic frames. (Fillmore 1985;
Geeraerts 2006)
(d) PALs are therefore understood to convey a type of event or situation that the (a)-(c)
speaker expects the lister to find familiar.
(e) Observation humor involves talking about familiar events that are not usually
Definition
talked about.
(f) Novel PALs express events, presumed to be familiar, that are not often talked (d)-(e)
about, convey observational humor.

We confirmed the hypothesized function of the PAL construction with survey data that asked
participants to compare pairs of sentences that either included a PAL or a non-PAL paraphrase
(Shirtz & Goldberg submitted). Results showed that the sentences with PALs implied more
shared background between speaker and listener and were judged to be more witty and more
sarcastic than non-PALs. With this as background, I asked GPT-4 what the following means:

19
Goldberg, AE (to appear) in Constructions and Frames

“I’m officially ‘slows down at all of the yellow traffic lights’ years old.” (Shirtz & Goldberg, to
appear). Remarkably, GPT-4 recognized the “humorous” flourish of the PAL construction
(Figure 6) and interpreted the phrase accurately:

Figure 6: Probe of the PAL construction (see Shirtz & Goldberg, forthcoming)

8.3. GPT-4 on a simple math problem


GPT-4’s ability to solve simple arithmetic word problems can be impressive, as illustrated in
Figure 7. The response combines arithmetic and world knowledge when asked, “If 40 people can
fit on a bus, how many buses are needed to drive 84 people 6 blocks?” (I included 6 blocks in
the prompt in an unsuccessful attempt to misdirect the model).

Figure 7: GPT-4 response to a mathematical word problem.

8.4. GPT-4 appropriately characterizes conceptual metaphors


Can GPT-4 make sense of conceptual metaphors? My initial query provided the response in the
left panel of Figure 8. When then asked to “tell me like I’m in first grade” it provided the
simplified description at the top right of Figure 8, which includes a novel yet sensible
metaphorical extensions, “going up, up, up into the sky.” The final response comes from a
prompt to generate a novel metaphor and once again, GPT-4 is impressively competent (Figure
8).

20
Goldberg, AE (to appear) in Constructions and Frames

Figure 8: GPT-4 on conceptual metaphors.

8.5. Over-reliance on associations can lead GPT models (and humans) astray
Insight into how ChatGPT worked comes from a series of examples posted on Twitter by
@PaulMainwood, (2/22/23).6 In one, Mainworth cleverly provides ChatGPT with a twist on a
widely shared riddle, written by undergraduate students at Boston University in 2008, and
intended to highlight implicit sexism. A version of the familiar riddle follows in (21):

(21) Familiar riddle (included in training data): “A boy was rushed to the hospital after
a terrible car crash in which his father was killed. The surgeon looks at the boy and says
‘I can’t operate: he’s my son.’ How is this possible?”

People who hear the riddle for the first time are sometimes flummoxed until it is revealed that
the surgeon is the boy’s mother. Mainworth explains a different situation to ChatGPT. It is not
a riddle at all but is strongly but vaguely reminiscent of the original riddle. He stated that the
man at the wheel was the child’s biological father and that the surgeon is the child’s adoptive
father. Strikingly undeterred, ChatGPT blindly forged ahead and provided the standard answer
to the standard riddle, incongruously responding, “The surgeon is the boy’s mother.” I tried
the same prompt on GPT-4 and it performed similarly, assuredly but incorrectly stating that
the surgeon was the boy’s adopted mother (Figure 9).

6
[Link] The examples recall Searle’s
famous Chinese Room argument (Searle 1980).

21
Goldberg, AE (to appear) in Constructions and Frames

Figure 9. GPT-4’s response to a prompt inspired by @PaulMainwood, illustrating its over-


reliance on context.

GPT-4’s response is obviously wrong, but it associates the given prompt with a specific
context—the standard riddle -- undoubtably encountered in its training. This type of error is
potentially interesting. We humans are also prone to context-based errors, which have been
described as a result of “good-enough” processing (Christianson 2016; Ferreira, Bailey, and
Ferraro 2002; Goldberg and Ferreira 2022). For example, when asked, How many pairs of
animals did Moses take on the ark, people commonly fail to notice that Moses, rather than
Noah, is referred to in the question.7 Similarly, students are often misled by math and physics
word problems that vary from the specific types of content they had been previously exposed to
(Bassok 1990). In fact, when I shared Figure 9 with two quite brilliant colleagues, each made
the same error that ChatGPT and GPT-4 did, by failing to notice that the prompt was not the
standard riddle.
In the narrow domain of human natural language production and comprehension, GPT
models available now, in 2023, make every phrase structure grammar and every syntactic parser
that came before look like line drawings next to Gaudi’s Sagrada Familia. These models have
limitations in terms of novel spatial reasoning, complex math problems, and descriptions of
world events not included in its training data. Like la Sagrada Familia, the models are works in
progress. Advancements will continue. It is up to humans to put the models to work in ways
that benefit humanity. And it will be left to cognitive scientists and linguists to explore how
they work.

7
Unsurprisingly, since the example is a class example of good-enough processing, GPT-4 was
not fooled by this particular question, responding “Moses did not bring animals onto an ark; it
was Noah who brought animals onto the ark according to the biblical story found in the Book of
Genesis...”

22
Goldberg, AE (to appear) in Constructions and Frames

9. Looking ahead
I fully agree with Martin Hilpert that the future of construction grammar is in excellent hands.
Researchers ought to follow their own curiosity, wherever it takes them. But before closing, I
offer what I personally take to be the most promising directions for new work over the coming
decade.

o GPT models put on full display the power of usage-based constructionist models. Systematic
investigation of such models will likely help us better understand parallels and differences
with human language and cognition (Hawkins et al. 2020; Mahowald forthcoming; McCoy et
al. 2021; Piantadosi, 2023). At the very least, since we know that context always matters, we
need to move away from static representations of the ConstructionNet and embrace dynamic
models to the extent possible (see also Barak and Goldberg 2017; Dasgupta et al. 2022; van
Trijp 2014, 2015; Steels and DeBeule, 2006).

o Field work will always be highly valuable and large-scale cross-linguistic comparisons will
help us better understand shared aspects of our semantic and pragmatic construal of the
world, as well as the processing pressures that result in languages patterning as they do
(Bohnemeyer et al. 2007; Croft 2001; Croft 2022; Haspelmath 2010; Kemmerer 2011; Majid et
al. 2004).

o A fuller, deeper appreciation of information structure and lexical semantics can unlock
puzzles that have long been assumed to require syntactic stipulations, including island
constraints, scope, anaphora, and binding (Ackerman and Nikolaeva 2014; Cole, Hermon, and
Yanti 2014; Culicover and Jackendoff 2005; Cuneo and Goldberg 2022; Francis and Michaelis
2017; Goldberg and Michaelis 2017; Israel 2001; Ackerman and Nikolaeva 2014;
Namboodiripad et al. 2022).

o Laboratory phonology and sociolinguistics are thriving subfields of linguistics. Each field has
long provided compelling evidence for the usage-based approach to language. Researchers
equipped to hypothesize and test potential parallels between phonological and grammatical
phenomena will be in a position to offer coherent and insightful perspective across subareas
(e.g., Bybee 2010; Docherty and Foulkes 2014; Harmon and Kapatsinski 2016).

o We ought not feel constrained to focus only on traditional questions. For instance, emotion
drives most everything we do, so it is worthwhile to better understand its role in
communication and language (Citron and Goldberg 2014; Foolen 2012). We also need to
incorporate constraints and implications of communicative gestures (Congdon et al. 2018;
Khasbage et al. 2022; Steen and Turner 2012; Willems and Hagoort 2007), and conversational
dynamics to more accurately understand natural language (Du Bois, Kumpf, and Ashby
2003; Hopper and Thompson 1980; Stephens, Silbert, and Hasson 2010).

23
Goldberg, AE (to appear) in Constructions and Frames

o Applications of the constructionist approach to education, atypical language development


(e.g., Goldberg and Abbot-Smith 2021), and language documentation (e.g., Bast et al. 2023)
are among the most exciting new directions for constructionists to develop.

10. Conclusion
Let’s allow ChatGPT to have the final word, offered in the style of Ovid (Left, Figure 10) and
Dr. Seuss (Right, Figure 10).

Figure 9: Final remarks, generated by GPT4 (March, 2023).

Acknowledgements
I thank the other contributors to this volume for helpful feedback, particularly Bill Croft and
the editors for reviewing an earlier version of this paper. I am also grateful to Arielle Belluck for
her expert editing.

References

Abbot-Smith, K., & Tomasello, M. (2010). The influence of frequency and semantic similarity
on how children learn grammar. First Language, 30(1), 79–101.
[Link]
Ackerman, F., & Nikolaeva, I. (2014). Descriptive typology and linguistic theory: A study in the
morphosyntax of relative clauses. Stanford: CSLI Publication.
Ambridge, B., & Lieven, E. V. M. (2011). Child language acquisition: Contrasting theoretical
approaches. Cambridge: Cambridge University Press.
Arnon, I., & Christiansen, M. H. (2017). The role of multiword building blocks in explaining L1
L2 differences. Topics in Cognitive Science, 9(3), 621–636.
Arnon, I., & Snider, N. (2010). More than words: Frequency effects for multi-word phrases.
Journal of Memory and Language, 62(1), 67–82.
Baayen, R. H., & del Prado Martin, F. M. (2005). Semantic density and past-tense formation in

24
Goldberg, AE (to appear) in Constructions and Frames

three Germanic languages. Language, 81(3), 666–98.


[Link]
Barak, L., & Goldberg, A. E. (2017). Modeling the partial productivity of constructions. In
AAAI Spring Symposium - Technical Report, Vol. SS-17-01.
Barðdal, J., Kristoffersen, K. E., & Sveen, A. (2011). West Scandinavian ditransitives as a
family of constructions: With a special attention to the Norwegian ‘V-REFL-
NP’construction. Linguistics, 49(1), 53–104. [Link]
Bassok, M. (1990). Transfer of domain-specific problem-solving procedures. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 16, 522–533.
[Link]
Beckner, C., Ellis, N. C., Blythe, R., Holland, J., Bybee, J., Christiansen, M. H., Larsen-
Freeman, D., Croft, W., & Schoenemann, T. (2009). Language is a complex adaptive
system. Language Learning, December, 1–26.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of
stochastic parrots: Can language models be too big? 🦜. In Proceedings of the 2021 ACM
Conference on Fairness, Accountability, and Transparency, 610–623. Virtual Event
Canada: ACM. [Link]
Bergen, B., & Chang, N. (2005). Embodied construction grammar in simulation-based language
understanding. In J-O. Östman & M. Fried (Eds.), Construction grammar(s), cognitive
and cross-language dimensions. Philadelphia: John Benjamins.
Boas, H. C. (2008). Determining the structure of lexical entries and grammatical constructions
in construction grammar. Annual Review of Cognitive Linguistics 6(November), 113–
144. [Link]
Boas, H., & Ivan Sag, eds (2012). Sign Based Construction Grammar. CSLI Publications.
Bohnemeyer, J., Enfield, N. J., Essegbey, J., Ibarretke, I., Kita, S., Lupke, F., & Ameka, F.
(2007). Principles of event segmentation in language. Language, 83(3), 495–532.
Bybee, J. (2010). Language, usage and cognition. Cambridge: Cambridge University Press.
Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic
theory based on domain general principles of human cognition. The Linguistic Review,
22(2–4), 381–410.
Call, J., Agnetta, B., & Tomasello, M. (2000). Cues that chimpanzees do and do not use to find
hidden objects. Animal Cognition, 3 (1), 23–34
Casasanto, D., & Lupyan, G. (2011). Ad hoc cognition. In L. Carlson, C. Hölscher, & T. F.
Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science
Society (pp. 826). Austin, TX: Cognitive Science Society.
Chater, N. (2018). Mind is flat: The remarkable shallowness of the improvising brain. Yale
University Press.
Chiang, T. (2023). ChatGPT Is a Blurry JPEG of the Web. New Yorker Magazine, February 9,
2023.
Chomsky, Noam, Ian Roberts, and Jeffrey Watumull. "Noam Chomsky: The False Promise of
ChatGPT." The New York Times 8 (2023).

25
Goldberg, AE (to appear) in Constructions and Frames

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep
reinforcement learning from human preferences. In Advances in Neural Information
Processing Systems. Vol. 30. Curran Associates, Inc.
Christiansen, M. H., & Chater, N. (2022). The language game: How improvisation created
language and changed the world. Hachette UK.
Christianson, K. (2016). When language comprehension goes wrong for the right reasons: Good
enough, underspecified, or shallow language processing. Quarterly Journal of
Experimental Psychology, 69(5), 817–828.
[Link]
Christianson, K., & Ferreira, F. (2005). Conceptual accessibility and sentence production in a
free word order language (Odawa). Cognition 98(2), 105–135.
[Link]
Citron, F. M. M., & Goldberg, A. E. (2014). Metaphorical sentences are more emotionally
engaging than their literal counterparts. Journal of Cognitive Neuroscience, 26(11),
2585–2595. [Link]
Cole, P., Hermon, G., & Yanti (2014). The grammar of binding in the languages of the world:
Innate or learned? Cognition, 141, 138–60.
[Link]
Congdon, E. L., Novack, M. A., Brooks, N., Hemani-Lopez, N., O’Keefe, L., & Goldin
Meadow, S. (2018). Better together: Simultaneous presentation of speech and gesture in
math instruction supports generalization and retention. Learning and Instruction, 50,
65–74. [Link]
Croft, W. (2001). Radical construction grammar. Oxford University Press.
Croft, W. (2022). Morphosyntax: Constructions of the world’s languages. Cambridge University
Press.
Culicover, P. W. (1999). Syntactic nuts: Hard cases, syntactic theory and language acquisition.
Cognitive Linguistics, 10(3), 251–261.
Culicover, P. W., & Jackendoff, R. (2005). Simpler syntax. Oxford: Oxford University Press.
Cuneo, N., & Goldberg, A. E. (2022). Islands effects without extraction: The discourse function
of constructions predicts island status. Proceedings of the Cognitive Science Society.
Dasgupta, I., Lampinen, A. K., Chan, S. C. Y., Creswell, A., Kumaran, D., McClelland, J. L., &
Hill, F. (2022). Language models show human-like content effects on reasoning. arXiv.
[Link]
Davies, Mark (2008). The Corpus of Contemporary American English (COCA): One Billion
Words, 1990-2019.
Diessel, H., Dabrowska, E. & Divjak, D. (2019). Usage-based construction grammar. Cognitive
Linguistics, 2, 50–80.
Diessel, H., & Hilpert, M. (2016). Frequency effects in grammar. In Oxford Research
Encyclopedia of Linguistics. [Link]
Docherty, G. J., & Foulkes, P. (2014). An evaluation of usage-based approaches to the
modelling of sociophonetic variability. Lingua, SI: Usage-Based and Rule-Based

26
Goldberg, AE (to appear) in Constructions and Frames

Approaches to Phonological Variation, 142(April), 42–56.


[Link]
Domanchin, M., Guo, Y. (2017). New frontiers in interactive multimodal communication In A.
Georgakopoulou & T. Spilioti (Eds.), The Routledge handbook of language and digital
communication.
Du Bois, J. W. (2014). Towards a dialogic syntax. Cognitive linguistics, 25(3), 359-410
Du Bois, J. W., Kumpf, L. E., & Ashby, W. J. (2003). Preferred argument structure: Grammar
as architecture for function. Philadelphia: John Benjamins Publishing.
Dunn, J. (2019). Frequency vs. association for constraint selection in usage-based construction
grammar. In Proceedings of the Workshop on Cognitive Modeling and Computational
Linguistics, 117–128. Minneapolis, Minnesota: Association for Computational Linguistics.
[Link]
Dunn, Jonathan. Natural language processing for corpus linguistics. Cambridge University
Press, 2022.
Ferreira, F., Bailey, K. G. D., & Ferraro, V. (2002). Good-enough representations in language
comprehension. Current Directions in Psychological Science, 11(1), 11–15.
[Link]
Fillmore, C. J. (1975). Against checklist theories of meaning. Proceedings of the First Annual
Meeting of the Berkeley Linguistics Society.
Fillmore, C. J. (1985). Frames and the semantics of understanding. Quaderni Di Semantica
6(2), 222-253.
Fillmore, C. J., Kay, P., & O’Connor, M. C. (1988). Regularity and idiomaticity in grammatical
constructions: The case of let alone. Language, 64, 501–538.
Foolen, A. (2012). The relevance of emotion for language and linguistics. Moving Ourselves,
Moving Others: Motion and Emotion in Intersubjectivity, Consciousness and Language,
349–369.
Francis, E., & Michaelis, L. (2017). When relative clause extraposition is the right choice, it’s
easier. Language and Cognition, 9(June), 332–70.
[Link]
French, R. M. (2000). The Turing test: The first 50 years. Trends in Cognitive Sciences, 4(3),
115–122. [Link]
Fried, M. (1994). Grammatical functions in case languages: Subjecthood in Czech. In Annual
Meeting of the Berkeley Linguistics Society, 20, 184–193.
Geeraerts, D. C. N. (2006). Words and other wonders: Papers on lexical and semantic topics.
Berlin – New York: Mouton de Gruyter.
Gergely, G., Bekkering, H., & Király, I. (2002). Rational imitation in preverbal infants.
Nature, 415(6873), 755-755.
Givón, T. (1979). From discourse to syntax: Grammar as a processing strategy. In Discourse
and syntax (pp. 81-112). Brill.
Goldberg, A (2006). Constructions at work: The nature of generalization in language. Oxford
University Press. [Link]

27
Goldberg, AE (to appear) in Constructions and Frames

Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument


structure. Chicago: Chicago University Press.
Goldberg, A. E. (2016). Subtle implicit language facts emerge from the functions of
constructions. Frontiers in Psychology, 6(January), 1–11.
[Link]
Goldberg, A. E. (2019). Explain me this: Creativity, competition, and the partial productivity of
constructions. Princeton University Press.
Goldberg, A. E., & Abbot-Smith, K. (2021). The constructionist approach offers a useful lens on
language learning in autistic individuals: Response to Kissine. Language, 97(3), e169–
183. [Link]
Goldberg, A. & van der Auwera, J. (2012). This is to count as a construction. Folia Linguistica,
46(1), 109-132. [Link]
Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning argument structure
generalizations. Cognitive Linguistics, 14(3), 289–316.
Goldberg, A. E., & Ferreira, F. (2022). Good-enough language production. Trends in Cognitive
Sciences, 26(4), 300–311. [Link]
Goldberg, A. E., & Herbst, T. (2021). The nice-of-you construction and its fragments.
Linguistics, 59(1), 285–318. [Link]
Goldberg, A. E. & Jackendoff, R. (2004). The English resultative as a family of constructions.
Language, 80(3), 532-568.
Goldberg, A. E., & Lee, C. (2021). Accessibility and historical change: An emergent cluster led
uncles and aunts to become aunts and uncles. Frontiers in Psychology, 12.
Goldberg, A. E., & Michaelis, L. A. (2017). One among many: Anaphoric one and its
relationship with numeral one. Cognitive Science, 41(March), 233–258.
[Link]
Gonzálvez-García, F. (2009). The family of object-related depictives in English and Spanish:
Towards a usage-based constructionist analysis. Language Sciences, 31(5), 663–723.
[Link]
Gonzálvez-García, F., & Butler, C. S. (2006). Mapping functional-cognitive space. Annual
Review of Cognitive Linguistics, 4(October), 39–96.[Link]
Grand, G., Blank, I. A., Pereira, F., & Fedorenko, E. (2022). Semantic projection recovers rich
human knowledge of multiple object features from word embeddings. Nature Human
Behaviour, 6(7), 975–987. [Link]
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural
networks. arXiv. [Link]
Gries, S. T. (2010). Gries, Stefan. (2011). Phonological similarity in multi-word units. Cognitive
Linguistics, 22, 491–510.
Gries, S., & Hilpert, M. (2008). The identification of stages in diachronic data: Variability-based
neighbour clustering. Corpora, 3, 59-81. [Link]
Gries, S. T., & Hilpert, M. (2010). Modeling diachronic change in the third person singular: A

28
Goldberg, AE (to appear) in Constructions and Frames

multifactorial, verb- and author-specific exploratory approach. English Language and


Linguistics, 14(03), 293–320. [Link]
Harmon, Z., & Kapatsinski, V. (2016). Accessibility differences during production drive
semantic (over)-extension. BUCLD.
Haspelmath, M. (2010). Comparative concepts and descriptive categories in crosslinguistic
studies. Language, 86(3), 663–687. [Link]
Hawkins, R. D., Yamakoshi, T., Griffiths, T. L., & Goldberg, A. E. (2020). Investigating
representations of verb bias in neural language models. arXiv.
[Link]
Herbst, T. (2011). The status of generalizations: Valency and argument structure constructions.
ZAA, 4(4), 347–368.
Herrmann, E., & Tomasello, M. (2006). Apes' and children's understanding of cooperative and
competitive motives in a communicative situation. Developmental Science, 9(5), 518-529.
Hilpert, M. (2015). From hand-carved to computer-based: Noun-participle compounding and the
upward strengthening hypothesis. Cognitive Linguistics, 26(1), 113–147.
[Link]
Hopper, P. J., & Thompson, S. A. (1980). Transitivity in grammar and discourse. Language,
56(2), 251-299.
Horn, L. R. (Ed.). (2010). The expression of negation (pp. 73-109). Berlin: Mouton de Gruyter.
Horner, V., & Whiten, A. (2005). Causal knowledge and imitation/emulation switching in
chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal cognition, 8, 164-
181.
Hunston, S., & Francis, G. (2000). Pattern grammar. A corpus-driven approach to the lexical
grammar of English. Benjamins.
Ibbotson, P. (2022). Language acquisition: The basics. Taylor & Francis.
Israel, M. (2001). Minimizers, maximizers and the rhetoric of scalar reasoning. Journal of
Semantics, 18(4), 297–331. [Link]
Jackendoff, R. (2002). English particle constructions, the lexicon, and the autonomy of syntax.
Verb-Particle Explorations, 67–94.
Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language,
inference, and consciousness. Harvard University Press.
Kapatsinski, V., & Vakareliyska, C. (2013). [N[N]] compounds in Russian: A growing family of
constructions. Constructions and Frames, 5(1), 69–87.
[Link]
Kemmerer, D. (2011). The cross-linguistic prevalence of SOV and SVO word orders reflects the
sequential and hierarchical representation of action in Broca’s area. Language and
Linguistic Compass, 1–17.
Khasbage, Y., Carrión, D. A., Hinnell, J., Robertson, F., Singla, K., Uhrig, P., & Turner, M.
(2022). The red hen anonymizer and the red hen protocol for de-identifying audiovisual
recordings. Linguistics Vanguard, December. [Link]
Kidd, E., Lieven, E. V. M., & Tomasello, M. (2010). Lexical frequency and exemplar-based

29
Goldberg, AE (to appear) in Constructions and Frames

learning effects in language acquisition: Evidence from sentential complements. Language


Sciences, 32(1). [Link]
Kim, Jong-Bok, and Laura A. Michaelis. Syntactic constructions in English. Cambridge
University Press, 2020.
Kim, J., & Sells, P. (2013). The Korean sluicing: A family of constructions. Studies in
Generative Grammar, 23(1), 103–130.
Klein, E. (2023). This changes everything. New York Times. March 12.
Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: finding meaning in the N400
component of the event-related brain potential (ERP). Annual review of psychology, 62,
621-647.
Lakoff, G. (1987). Women, fire, and dangerous things : What categories reveal about the mind.
Chicago: University of Chicago Press.
Lakoff, G. (2014). The all new don't think of an elephant!: Know your values and frame the
debate. Chelsea Green Publishing.
Lambrecht, K. (1994). Information structure and sentence form. Cambridge: Cambridge
University Press.
Langacker, R. W. (1988). A usage-based model. In B. Rudzka-Ostyn (Ed.), Topics in cognitive
linguistics. Philadelphia: John Benjamins.
Langacker, Ronald W. 1997. “Constituency , Dependency , and Conceptual Grouping.”
Cognitive Linguistics 8: 1–32.
LaPolla, R. J. (1993). Arguments against ‘subject’ and ‘direct object’ as viable concepts in
Chinese. Bulletin of the Institute of History and Philology.
MacDonald, M. C. (2013). How language production shapes language form and comprehension.
Frontiers in Psychology, 4(April), 1–16. [Link]
Mahowald, K. (2023). A discerning several thousand judgments: GPT-3 rates the article
+ adjective + numeral + noun construction. arXiv preprint arXiv:2301.12564.
Majid, A., Evans, N., Gaby, A., & Levinson, S. (2011). The semantics of reciprocal
constructions across languages. In Reciprocals and semantic typology, 29-60.
McClelland, J. L., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T., Seidenberg, M.
S., & Smith, L. B. (2010). Letting structure emerge: Connectionist and dynamical
systems approaches to cognition. Trends in Cognitive Sciences, 14(8), 348–356.
[Link]
McClelland, J. L., Rumelhart, D. E., & PDP Research Group (1986). Parallel distributed
processing. Vol. 2. MIT press Cambridge, MA.
McCoy, R. T., Smolensky, P., Linzen, T., Gao, J., & Celikyilmaz, A. (2021). How much do
language models copy from their training data? Evaluating linguistic novelty in text
generation using RAVEN. arXiv. [Link]
Michaelis, L. A. (2001). Exclamative constructions. In M. Haspelmath, E. König, W.
Österreicher, & W. Raible (Eds.), Language universals and language typology: An
international handbook. Berlin: Walter de Gruyter.
Michaelis, L. A. (2010). Sign-based construction grammar. In T. Hoffmann & G. Trousdale

30
Goldberg, AE (to appear) in Constructions and Frames

(Eds.), The Oxford handbook of construction grammar.


[Link]
Michaelis, L., & Francis, H. (2007). Lexical subjects and the conflation strategy. The grammar
pragmatics interface: Essays in honor of Jeanette K. Gundel, 155(19).
[Link]
Namboodiripad, S., Cuneo, N., Kramer, M. A., Sedarous, Y., Sugimoto, Y., Bisnath, F., &
Goldberg, A. E. (2022). Backgroundedness predicts island status of non-finite adjuncts in
English. Proceedings of the Annual Meeting of the Cognitive Science Society, 28: 347-
355.
Nieuwland, M. S., & Van Berkum, J. J. (2006). When peanuts fall in love: N400 evidence for
the power of discourse. Journal of cognitive neuroscience, 18(7), 1098-1111.
Ostrovsky, Yuri, Ethan Meyers, Suma Ganesh, Umang Mathur, and Pawan Sinha. "Visual
parsing after recovery from blindness." Psychological Science 20, no. 12 (2009): 1484-
1491.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C.,
Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens,
M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training
language models to follow instructions with human feedback. arXiv.
[Link]
Perek, F. (2016). Using distributional semantics to study syntactic productivity in diachrony: A
case study. Linguistics, 54(1), 149-188. [Link]
Piantadosi, S. T., Tily, H., & Gibson, E. (2012). The communicative function of ambiguity in
language. Cognition, 122(3), 280–291. [Link]
Piantadosi, S. T. (2003.). Modern language models refute Chomsky’s approach to language.
Rambelli, G., Chersoni, E., Blache, P., & Lenci, A. (2022). Compositionality as an analogical
process: Introducing ANNE. In Proceedings of the Workshop on Cognitive Aspects of the
Lexicon, 78–96.
Roose, K. (2023) A Conversation With Bing’s Chatbot Left Me Deeply Unsettled. New York
Times. February 17.
Ross, J. R. (1973). A fake NP squish. New ways of analyzing variation in English, 96-140.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old
infants. Science, 274(5294), 1926-1928.
Schrimpf, M., Blank, I., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J.,
Fedorenko, E. (2021). Artificial neural networks accurately predict language processing
in the brain. Proceedings of the National Academy of Sciences, 118(45), e2105646118.
[Link]
Searle, J. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417-424.
doi:10.1017/S0140525X00005756
Shirtz, Shahar & Goldberg, A. E. (forthcoming). The English Phrase as Lemma Construction.
Silbert, L. J. (2013). The underlying neural correlates of real-world communication. Princeton.
[Link]

31
Goldberg, AE (to appear) in Constructions and Frames

Steels, L., & de Beule, J. (2006). A (very) brief introduction to fluid construction grammar.
Proceedings of the Third Workshop on Scalable Natural Language Understanding, 73–80.
New York City, New York.
Steen, F. F., & Turner, M. (2012). Multimodal construction grammar. SSRN Electronic
Journal, no. 2010, 255–274. [Link]
Stephens, G. J., Silbert, L. J., & Hasson, U. (2010). Speaker–listener neural coupling underlies
successful communication. Proceedings of the National Academy of Sciences, 107(32),
14425–14430. [Link]
Suttle, L. and Goldberg, A. (2011). The partial productivity of constructions as
induction. Linguistics. 49–6 1237–1269
Tomasello, M. (2010). Origins of human communication. Cambridge: MIT Press.
Traugott, E. C. (2014). Toward a constructional framework for research on language change.
Cognitive Linguistic Studies, 1(1), 3–21. [Link]
Tomasello, M. (2019). Becoming human: A theory of ontogeny. Harvard University Press.
Traugott, E. C., & Trousdale, G. (2013). Constructionalization and constructional changes (Vol.
6). Oxford University Press.
Trips, C., & Kornfilt, J. (2017). Further investigations into the nature of phrasal compounding.
Zenodo. [Link]
Ungerer, T. (2022). Extending structural priming to test constructional relations: Some
comments and suggestions. Yearbook of the German Cognitive Linguistics Association,
10(1), 159–182.
Ungerer, T., & Hartmann, S. (2023). Constructionist approaches: Past, present, future.
PsyArXiv. [Link]
van Dis, E. A. M., Bollen, J., Zuidema, W., van Rooij, R., & Bockting, C. L. (2023). ChatGPT:
Five priorities for research. Nature, 614(7947), 224–226. [Link]
023-00288-7.
van Trijp, R. (2014). Long-distance dependencies without filler−gaps: A cognitive-functional
alternative in fluid construction grammar. Language and Cognition, April, 1–29.
[Link]
van Trijp, R. (2015). Towards bidirectional processing models of sign language: A
constructional approach in fluid construction grammar. Proceedings of the
EuroAsianPacific Joint Conference on Cognitive Science, 1, 668--673.
Vig, J. (2019). A multiscale visualization of attention in the transformer model. arXiv.
[Link]
Warneken, F., & Tomasello, M. (2007). Helping and cooperation at 14 months of
age. Infancy, 11(3), 271-294.
Weissweiler, Leonie, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, and Hinrich
Schütze. "Construction grammar provides unique insight into neural language
models." arXiv preprint arXiv:2302.02178 (2023).
Willems, R. M., & Hagoort, P. (2007). Neural evidence for the interplay between language,

32
Goldberg, AE (to appear) in Constructions and Frames

gesture, and action: A review. Brain and Language, 101(3), 278–289.


[Link]
Wray, A. (2013). Formulaic language. Language Teaching, 46(3), 316–334.
[Link]

33

You might also like