Computational Models of Developmental Psychology
Computational Models of Developmental Psychology
CHAPTER 16
451
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
All Torque-difference
Author Model 4 stages effect
a
Sage only learned stage 3, not stages 1, 2, and 4.
b
Soar learned stages 1–3, but not stage 4.
c
C4.5 learned all four stages, but to get the correct ordering of the first two stages,
it was necessary to list the weight attributes before the distance attributes because
C4.5 breaks ties in information gain by picking the first-listed attribute with the
highest information gain.
d
To capture the torque-difference effect, C4.5 required a redundant coding of
weight and distance differences between one side of the scale and the other. In
addition to doing a lot of the work that a learning algorithm should be able to do
on its own, this produced strange rules that children never show.
e
The ordering of stages 1 and 2 and the late appearance of addition and torque
rules in ACT-R were engineered by programmer settings; they were not a natural
result of learning or development. The relatively modern ACT-R model is the only
balance-scale simulation to clearly distinguish between an addition rule (comparing
the weight + distance sums on each side) and the torque rule.
f
ACT-R showed a torque-difference effect only with respect to differences in dis-
tance but not weight and only in the vicinity of stage transitions, not throughout
development as children apparently do.
g
BP oscillated between stages 3 and 4, never settling in stage 4.
that the scale will balance when the two The ability of several different computa-
sides have equal weights (Siegler, 1976). In tional models to capture these phenomena
stage 2, children start to use distance in- is summarized in Table 16.1. The first four
formation when the weights are equal on rows in Table 16.1 describe symbolic, rule-
each side, predicting that in such cases the based models, and the last two rows describe
side with greater distance will descend. In connectionist models.
stage 3, weight and distance information are In one of the first developmental con-
emphasized equally, and the child guesses nectionist simulations, McClelland (1989)
when weight and distance information con- found that a static BP network with two
flict on complex problems. In stage 4, chil- groups of hidden units segregated for ei-
dren respond correctly on all problem types. ther weight or distance information devel-
The torque-difference effect is that prob- oped through the first three of these stages
lems with large torque-differences are eas- and into the fourth stage. However, these
ier for children to solve than problems with networks did not settle in stage 4, instead
small torque differences (Ferretti & But- continuing to cycle between stages 3 and 4.
terfield, 1986). Torque is the product of The first CC model of cognitive develop-
weight × distance on a given side; torque dif- ment naturally captured all four balance-
ference is the absolute difference between scale stages, without requiring segregation
the torque on one side and the torque on of hidden units (Shultz, Mareschal et al.,
the other side. 1994).
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
The ability of these network models to past-tense forms. Seven psychological regu-
capture stages 1 and 2 is due to a bias to- larities have been identified:
ward equal-distance problems in the train-
ing set, that is, problems with weights placed 1. Children begin to overregularize irregu-
equally distant from the fulcrum on each lar verbs after having correctly produced
side. This bias, justified by noting that chil- them.
dren rarely place objects at differing dis- 2. Frequent irregulars are more likely to be
tances from a fulcrum but have considerable correct than infrequent ones.
experience lifting differing numbers of ob- 3. Irregular and regular verbs that are sim-
jects, forces a network to emphasize weight ilar to frequent verbs are more likely to
information first because weight informa- be correct. (Two regularities are com-
tion is more relevant to reducing network bined here into one sentence.)
error. When weight-induced error has been 4. Past-tense formation is quicker with
reduced, then the network can turn its atten- consistent regulars (e.g., like) than with
tion to distance information. A learning al- inconsistent regulars (e.g., bake), which
gorithm needs to find a region of connection- are in turn quicker than irregulars (e.g.,
weight space that allows it to emphasize the make).
numbers of weights on the scale before mov- 5. Migrations occurred over the centuries
ing to another region of weight space that al- from Old English such that some irreg-
lows correct performance on most balance- ulars became regular and some regulars
scale problems. A static network such as BP, became irregular.
once committed to using weight informa- 6. Double-dissociations exist between reg-
tion in stage 1, cannot easily find its way ulars and irregulars in neurological dis-
to a stage-4 region by continuing to reduce orders, such as specific language impair-
error. In contrast, a constructive algorithm ment and Williams syndrome.
such as CC has an easier time with this
move because each newly recruited hidden The classical rule-rote hypothesis holds that
unit changes the shape of connection-weight the irregulars are memorized, and the add –
space by adding a new dimension. ed rule applied when no irregular memory
Both of the connectionist models read- is retrieved (Pinker, 1999), but this has not
ily captured the torque-difference effect. resulted in a successful published computa-
Such perceptual effects are natural for tional model. The ability of several different
neural models that compute a weighted computational models to capture past-tense
sum of inputs when updating downstream phenomena is summarized in Table 16.2.
units. This ensures that larger differences All of the models were trained to take a
on the inputs create clearer activation pat- present-tense verb stem as input and pro-
terns downstream at the hidden and output vide the correct past-tense form.
units. In contrast, crisp symbolic rules care One symbolic model used ID3, a pre-
more about direction of input differences decessor of the C4.5 algorithm that was
than about input amounts, so the torque- discussed in Section 5, to learn past-tense
difference effect is more awkward to capture forms from labeled examples (Ling & Mari-
in rule-based systems. nov, 1993). Like C4.5, ID3 constructs a de-
cision tree in which the branch nodes are
attributes, such as a particular phoneme in a
7. Past Tense particular position, and the leaves are suf-
fixes, such as the phoneme −t. Each ex-
The morphology of the English past tense ample describes a verb stem, for example,
has generated considerable psychological talk, in terms of its phonemes and their po-
and modeling attention. Most English verbs sitions, and is labeled with a particular past-
form the past tense by adding the suffix -ed tense ending, for example, talk-t. Actually,
to the stem, but about 180 have irregular because there are several such endings, the
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Ling & Taatgen & Plunkett & Daugherty & Hare &
Marinov Anderson Marchman Seidenberg Elman Westermann
Authors (1993) (2002) (1996) (1992) (1995) (1998)
a
CNN is a Constructivist Neural Network model with Gaussian hidden units.
model used a small grove of trees. Instead of tense patterns are represented in common
a single rule for the past tense as in the rule- hidden units. In these neural models, less
rote hypothesis, this grove implemented frequent and highly idiosyncratic verbs can-
several rules for regular verbs and many not easily resist the pull of regularization
rules for irregular verbs. Coverage of over- and other sources of error. Being similar
regularization in the ID3 model was due to other verbs can substitute for high fre-
to inconsistent and arbitrary use of the m quency. These effects occur because weight
parameter that was originally designed to change is proportional to network error, and
control the depth of decision trees (Quin- frequency and similarity effects create more
lan, 1993). Although m was claimed by initial error. In symbolic models, memory
the authors to implement some unspeci- for irregular forms is searched before the reg-
fied mental capacity, it was decreased here ular rule is applied, thus slowing responses
to capture development, but increased to to regular verbs and creating reaction times
capture development in other simulations, opposite to those found in people.
such as the balance scale (Schmidt & Ling,
1996).
The ACT-R models started with three 8. Object Permanence
handwritten rules: a zero rule, which does
not change the verb stem; an analogy rule, A cornerstone acquisition in the first two
which looks for analogies to labeled exam- years is belief in the continued existence of
ples and thus discovers the –ed rule; and a hidden objects. Piaget (1954) found that ob-
retrieval rule, which retrieves the past tense ject permanence was acquired through six
form from memory (Taatgen & Anderson, stages, that the ability to find hidden ob-
2002). Curiously, this model rarely applied jects emerged in the fourth stage between
the –ed rule because it mostly used retrieval; the ages of eight and twelve months, and
the –ed rule was reserved for rare words, that a full-blown concept of permanent ob-
novel words, and nonsense words. jects independent of perception did not oc-
As shown in Table 16.2, none of the com- cur until about two years. Although data
putational models cover many past-tense collected using Piaget’s object-search meth-
phenomena, but collectively, a series of ods were robust, recent work using differ-
neural-network models do fairly well. Sev- ent methodologies suggested that he might
eral of these phenomena naturally emerge have underestimated the cognitive abilities
from neural models, where different past- of very young infants.
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Reaching Tracking
outputs
Shared
hiddens
Visual
Hiddens memory
hiddens
Inputs
Object recognition network
An influential series of experiments sug- ing on which system was used to assess that
gested that infants as young as 3.5 months competence.
understand the continued existence of hid- A different model of the lag between
den objects if tested by where they look looking and reaching (Mareschal, Plunkett,
rather than where they reach (Baillargeon, & Harris, 1999) used a modular neural
1987). Infants were familiarized to a simple network system implementing the dual-
perceptual sequence and then shown two route hypothesis of visual processing (i.e.,
different test events: one that was percep- that visual information is segregated into
tually more novel but consistent with the a what ventral stream and a where dorsal
continued existence of objects and one that stream). Like the previous model, this one
was perceptually more familiar but that vi- had a shared bank of hidden units receiv-
olated the notion that hidden objects con- ing input from a recurrent bank of visual-
tinue to exist. Infants looked longer at the memory inputs. As shown in Figure 16.1,
impossible event, which was interpreted as these hidden units fed two output mod-
evidence that they understand the contin- ules: a trajectory-prediction module and a
ued existence of occluded objects. response-integration module. The former
Computational modeling has clarified was trained to predict the position of an ob-
how infants could reveal an object concept ject on a subsequent time step. The latter
with looking but not by reaching. In one was trained to combine hidden-unit activa-
model, perceptual input about occluded ob- tions in the trajectory module with object-
jects fed a hidden layer with recurrent con- recognition inputs. Here, the time lag be-
nections, which in turn fed two distinct out- tween looking and reaching was explained
put systems: a looking system and a reaching by the what and where streams needing to
system (Munakata et al., 1997). Both sys- be integrated in reaching tasks, but not in
tems learned to predict the location of an looking tasks, which can rely solely on the
input object, but the reaching system lagged where stream. The model uniquely predicted
developmentally behind the looking system developmental lags for any task requiring in-
because of differential learning rates. The tegration of the two visual routes.
same underlying competence (understand- Although both models simulated a lag be-
ing where an object should be) thus led to tween looking and reaching, some embodied
different patterns of performance, depend- simulations integrated the what and where
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
equation, the details of which can be found Table 16.3: Model coverage of the
elsewhere (Thelen et al., 2001). Somewhat A-not-B error in object permanence
informally,
Model
activation = −decay + cooperativity + h Munakata Thelen et al.
+ noise + task + cue Regularity (1998) (2001)
+ reach.memory (16.3)
Age yes yes
Delay yes yes
where decay was a linear decrease in activa- Décalage yes no
tion, cooperativity included both local exci- Distinctiveness yes yes
tation and distant inhibition integrated over Multiple locations yes no
field positions, h was the resting activation Reaching to A yes no
level of the field, Gaussian noise ensured that Interestingness no no
activations were probabilistic rather than Objectless yes yes
deterministic, task reflected persisting fea- Object helpful yes no
Adult error no no
tures of the task environment, cue reflected
the location of the object or attention to a
specific cover, and reach memory reflected
the frequency and recency of reaching in a neighboring excitation sustains local activa-
particular direction. Development relied on tion peaks whereas global inhibition pre-
the resting activation level of the field. When vents diffusion of peaks and stabilizes against
h was low, strong inputs predominated and competing inputs.
activation was driven largely by inputs and In simulation of younger infants, imple-
less by local interactions, a condition known mented by noncooperation (h = −12), a
as noncooperation. When h was large, many cue to location B initially elicited activa-
field sites interacted, and local excitation tion, which then decayed rapidly, allowing
was amplified by neighboring excitation and memory of previous reaching to A to pre-
distant inhibition, allowing cooperation and dominate. But in simulations of young in-
self-sustained excitation, even without con- fants allowed to reach without delay, the
tinual input. Parameter h was set to −6 for initial B activation tended to override mem-
cooperation and −12 for noncooperation. ory of previous A reaches. In simulation of
All parameters but h were differential func- older infants, implemented by cooperation
tions of field position and time. Variation (h = −6), the ability to sustain initial B ac-
of the cue parameter implemented different tivation across delays produced correct B
experimental conditions. Other parameters reaches despite memory of reaching to A.
were held constant, but estimated to fit psy- This model suggested that the A-not-B error
chological data. has more to do with the dynamics of reach-
The differential equation simulated up to ing for objects than with the emergence of a
10 sec of delay in steps of 50 msec. An concept of permanence.
above-threshold activation peak indicated a Comparative coverage of the psycholog-
perseverative reach when centered on the ical regularities by these two models is in-
A location or a correct reach when cen- dicated in Table 16.3. The neural-network
tered on the B location. This idea was sup- model covered almost all of these regular-
ported by findings that activity in popu- ities, and it is possible that the dynamic-
lations of neurons in monkey motor and system model could also achieve this by
premotor cortex became active in the 150 manipulation of its existing parameters. It
msec between cue and reach, and pre- would be interesting to see if the dynamic-
dicted the direction of ordinary reaching system model could be implemented in a
(Amirikian & Georgopoulos, 2003). See Fig- neurally plausible way. Both models were
ure 4.3 in this volume, which shows that highly designed by fixing weights in the
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
neural model and by writing equations and then differentially attend to novel stim-
and fitting parameters in the dynamic- uli that deviate from their representations
system model. Modeler-designed parameter (Cohen & Arthur, 1983). Because neural
changes were used to implement age-related learning is directed at reducing the largest
development in both models. sources of error, network error can be con-
sidered as an index of future attention and
learning.
9. Artificial Syntax The CC simulations captured the essen-
tials of the infant data: more interest in sen-
An issue that attracted considerable simu- tences inconsistent with the familiar pattern
lation activity concerns whether cognition than in sentences consistent with that pat-
should be interpreted in terms of symbolic tern and occasional familiarity preferences
rules or subsymbolic neural networks. For (Shultz & Bale, 2001). In addition, CC net-
example, it was argued that infants’ ability works showed the usual exponential de-
to distinguish one syntactic pattern from an- creases in attention to a repeated stimulus
other could only be explained by a symbolic pattern that are customary in habituation
rule-based account (Marcus et al., 1999). experiments and generalized both inside and
After being familiarized to sentences in an outside of the range of training patterns.
artificial language with a particular syntactic Follow-up simulations clarified that CC net-
pattern (such as ABA), infants preferred to works were sensitive to both phonemic con-
listen to sentences with an inconsistent syn- tent and syntactic structure, as infants prob-
tactic form (such as ABB). The claim about ably are (Shultz & Bale, 2006).
the necessity of rule-based processing was A simple AA network model contained a
contradicted by a number of neural-network single layer of interconnected units, allowing
models showing more interest in novel internal circulation of unit activations over
than familiar syntactic patterns (Altmann & multiple time cycles (Sirois et al., 2000). Af-
Dienes, 1999; Elman, 1999; Negishi, 1999; ter learning the habituation sentences with
Shultz, 1999; Shultz & Bale, 2001; Sirois, a delta rule, these networks needed more
Buckingham, & Shultz, 2000). This princi- processing cycles to learn inconsistent than
pal effect from one simple experiment is consistent test sentences. The mapping of
rather easy for a variety of connectionist processing cycles to recovery from habitua-
learning algorithms to cover, probably due tion seems particularly natural in this model.
to their basic ability to learn and generalize. A series of C4.5 models failed to cap-
In addition to this novelty preference, there ture any essential features of the infant data
were a few infants who exhibited a slight fa- (Shultz, 2003). C4.5 could not simulate fa-
miliarity preference, as evidenced by slightly miliarization because it trivially learned to
more recovery to consistent novel sentences expect the only category to which it was ex-
than to familiar sentences. posed. When trained instead to discriminate
One of the connectionist simulations the syntactic patterns, it did not learn the
(Shultz & Bale, 2001) was replicated (Vilcu desired rules except when these rules were
& Hadley, 2005) using batches of CC en- virtually encoded in the inputs.
coder networks, but it was claimed that this Three different SRN models covered the
model did not generalize well and merely principal finding of a novelty preference,
learned sound contours rather than syntax. but two of these models showed such a
Like other encoder networks, these net- strong novelty preference that they would
works learned to reproduce their inputs on not likely show any familiarity preference.
their output units. Discrepancy between in- Two of these SRN models also were not
puts and outputs is considered as error, replicated by other researchers (Vilcu &
which networks learn to reduce. Infants are Hadley, 2001, 2005). Failure to replicate
thought to construct an internal model of seems surprising with computational mod-
stimuli to which they are being exposed els and probably deserves further study.
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Regularity
Novelty Familiarity Simulation
Author Model preference preference replicated
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Table 16.5: Coverage of the shift from features to correlations in category learning
Effect
target values for all training patterns. Thus, timally fit in Sustain, but not in CC, and
a lower score-threshold parameter produces (c) like the well-known ALCOVE, concept-
deeper learning. learning algorithm, Sustain employs an at-
In contrast to these successful models, a tention mechanism that CC does not re-
wide range of topologies of ordinary BP net- quire. One could say that Sustain attends
works failed to capture any of these effects to learn, whereas CC learns to attend. Like-
(Shultz & Cohen, 2004). Comparative pat- wise, CC seems preferable over Shrink-BP
terns of data coverage are summarized in because CC learns much faster (tens vs.
Table 16.5. thousands of epochs), thus making a better
The three successful models shared sev- match to the few minutes of familiarization
eral commonalities. They all employed un- in the infant experiments. CC is so far the
supervised (or self-supervised in the case only model to capture the habituation effect
of encoder networks) connectionist learn- across an experimental session at a single age
ing, and they explained apparent qualitative level. Shrink-BP and Sustain would not cap-
shifts in learning by quantitative variation ture this effect because their mechanisms
in learning parameters. Also, the Shrink-BP operate over ages, not over trials; CC mech-
and Sustain (in the randomized-input ver- anisms operate over both trials and ages.
sion) models both emphasized increased vi-
sual acuity as an underlying cause of learn-
ing change. Finally, both the Sustain and CC 11. Discrimination-Shift Learning
models grew in computational power.
When CC networks with a low score Discrimination-shift learning tasks stretch
threshold were repeatedly tested over the fa- back to early behaviorism (Spence, 1952)
miliarization phase, they predicted an early and have a substantial human literature with
similarity effect followed by a correlation robust, age-related effects well suited to
effect. Tests of this habituation prediction learning models. In a typical discrimination-
found that ten-month-olds who habituated shift task, a learner is shown pairs of stim-
to training stimuli looked longer at uncorre- uli with mutually exclusive attributes along
lated than correlated test stimuli, but those two perceptual dimensions (e.g., a black
who did not habituate did the opposite, square and a white circle or a white square
looking longer at correlated than uncorre- and a black circle, creating four stimulus
lated test stimuli (Cohen & Arthur, 2003). pairs when left-right position is counter-
CC might be preferred over Sustain balanced). The task involves learning to
because: (a) the effects in Sustain are pick the consistently-rewarded stimulus in
smaller than in infants, necessitating 10,000 each pair, where reward is linked to an at-
networks to reach statistical significance, tribute (e.g., black). When the learner con-
whereas the number of CC networks sistently picks the target stimulus (usually
matched the nine infants run in each con- eight times or more in ten consecutive tri-
dition, (b) parameter values had to be op- als), various shifts in reward contingencies
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
when three examples were provided but 4. Syntax. Name generalization in these
only one of them was a randomly selected tasks is influenced by syntactic cues mark-
instance of the word meaning (the other ing the noun as a count noun or mass noun
two examples having been chosen by the (Dickinson, 1988; Imai & Gentner, 1997;
learner), subjects generalized more broadly, Soja, 1992). If an English noun is preceded
to the basic level, just as in previous studies by the article a or the, it yields a shape bias,
that had provided only one example of the but if preceded by some or much it shows a
word’s extension. The authors argued that material bias.
other theories and models that do not ex-
plicitly address example sampling could not 5. Ontology bias. Names for things tend to
naturally account for these results. not refer to categories that span the bound-
Other, related phenomena that are at- ary between solids and nonsolids, for ex-
tracting modeling concern shape and mate- ample, water versus ice (Colunga & Smith,
rial biases in generalizing new word mean- 2005). This underscores greater complexity
ings. Six regularities from the psychological than a mere shape bias for solids and ma-
literature deserve model coverage: terial bias for nonsolids. Solid things do not
typically receive the same name as nonsolid
1. Shape and material bias. When shown a stuff does.
single novel solid object and told its novel
name, 2.5-year-olds generalized the name 6. Material-nonsolid bias. In young children,
to objects with the same shape. In contrast, there is an initial material bias for nonsolids
when shown a single novel nonsolid sub- (Colunga & Smith, 2005).
stance and told its novel name, children of
the same age typically generalized the name All six of these phenomena were covered
to instances of the same material (Colunga by a constraint-satisfaction neural network
& Smith, 2003; Imai & Gentner, 1997; Soja, trained with contrastive Hebbian learning
Carey, & Spelke, 1992). These biases are (see Figure 2.2 in this volume) that ad-
termed overhypotheses because they help justs weights on the basis of correlations be-
to structure a hypothesis space at a more tween unit activations (Colunga & Smith,
specific level (Goodman, 1955/1983). 2005). Regularities 5 and 6 were actually
predicted by the network simulations be-
2. Development of shape bias and material fore being documented in children. Each
bias. The foregoing biases emerge only after word and the solidity and syntax of each ex-
children have learned some names for solid ample were represented locally by turning
and nonsolid things (Samuelson & Smith, on a particular unit. Distributed activation
1999). One-year-old infants applied a novel patterns represented the shape and mate-
name to objects identical to the trained ob- rial of each individual object or substance.
ject but not to merely similar objects. Fur- Hidden units learned to represent the cor-
thermore, the training of infants on object relations between shape, material, solidity,
naming typically requires familiar categories syntax, and words. After networks learned a
and multiple examples. This is in contrast vocabulary via examples that paired names
to 2.5-year-olds’ attentional shifts being with perceptual instances, they were tested
evoked by naming a single novel example. on how they would categorize novel things.
Statistical distributions of the training pat-
3. Shape bias before material bias. At terns matched adult judgments (Samuel-
two years, children exhibit shape bias on son & Smith, 1999). The recurrent connec-
these tasks, but not material bias (Imai & tion scheme is illustrated by the arrows in
Gentner, 1997; Kobayashi, 1997; Landau, Figure 16.2.
Smith, & Jones, 1988; Samuelson & Smith, A hierarchical Bayesian model covered
1999; Soja, Carey, & Spelke, 1991; Subrah- the mature shape bias and material bias de-
manyam, Landau, & Gelman, 1999). scribed in regularity 1 and probably could
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Word
Kemp et al. (2007) note that a common
objection is that the success of Bayesian
models depends on the modeler’s skill in
choosing prior probabilities. Interestingly,
Hidden units Syntax
hierarchical Bayesian models can solve this
problem because abstract knowledge can be
learned rather than specified in advance.
Shape Material Solidity
A final point is that the correlations be-
tween syntax, solidity, shape, and material
Figure 16.2. Topology of the network used by that underlie learning and generalization in
Colunga and Smith (2005). (Adapted with this domain are far from perfect (Samuel-
permission.) son & Smith, 1999). For example, accord-
ing to adult raters, bubble names a non-
solid but shape-based category; soap names
be extended to cover regularities 4 and a solid but material-based category; crayon,
5 (Kemp, Perfors, & Tenenbaum, 2007). key, and nail name categories that are based
Such models include representations at sev- on both shape and material. The many ex-
eral levels of abstraction and show how ceptions to these statistical regularities sug-
knowledge can be acquired at levels remote gest that symbolic rule-based models would
from experiential data, thus providing for be nonstarters in this domain.
both top-down and bottom-up learning. Hy- Particularly challenging to learn are so-
potheses at some intermediate level are con- called deictic words, such as personal pro-
ditional on both data at a lower level and nouns, whose meaning shifts with point of
overhypotheses at a higher level (see Chap- view. Although most children acquire per-
ter 3 in this volume). sonal pronouns such as me and you with-
This model also generated several pre- out notable errors (Charney, 1980; Chiat,
dictions that may prove to be somewhat 1981; Clark, 1978), a small minority of chil-
unique. For example, the optimal number dren show persistent pronoun errors before
of examples per category is two, assuming a getting them right (Clark, 1978; Oshima-
fixed number of total examples. Also, learn- Takane, 1992; Schiff-Meyers, 1983). The
ing is sometimes faster at higher than lower correct semantics are such that me refers
levels of abstraction, thus explaining why to the person using the pronoun and you
abstract knowledge might appear to be in- refers to the person who is addressed by the
nate even when it is learnable. This is likely pronoun (Barwise & Perry, 1983). Unlike
to happen in situations when a child en- most words, the referent of these pronouns
counters sparse or noisy observations such is not fixed, but instead shifts with conver-
that any individual observation is difficult to sational role. Although a mother calls her-
interpret, although the observations taken self me and calls her child you, these pro-
together might support some hypothesis. nouns must be reversed when the child ut-
As is typical, this Bayesian model is ters them. Because the referent of a personal
pitched at a computational level of analy- pronoun shifts with conversational role, an
sis, whereas connectionist models operate at imitative model for correct usage can be dif-
more of an implementation level. As such, ficult to find. If children simply imitated
a computational-level Bayesian model may what they heard in speech that was di-
apply to a variety of implementations. The rectly addressed to them, they would in-
other side of this coin is that Bayes’ rule correctly refer to themselves as you and to
does not generate representations – it instead the mother as me. These are indeed the
computes statistics over structures designed typical errors made by a few children be-
by the modelers. In contrast, connectionist fore sorting out the shifting references. A
approaches sometimes are able to show how challenge for computational modelers is to
structures emerge. explain both this rare sequence and the
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
each pronoun-property pair. The highest robot. However, humans are well known
significant chi-square value indicated the to both use and interpret gestures to com-
meaning of the utterance. Results revealed plement verbal communication (Goldin-
that you as addressee was acquired first and Meadow, 1999), and deictic (or pointing)
more strongly than I as speaker. Although gestures (McNeill, 1992) are among the first
the robot’s distinction between I and you to appear in young children, as early as ten
captures the correct semantics, it is not gen- months of age (Bates, 1979). Hence, future
erally true that children acquire second- be- humanoid robotic modelers might want to
fore first-person pronouns. If anything, there incorporate gesture production and inter-
is a tendency for children to show the re- pretation in an effort to more closely follow
verse order: first-person pronouns before human strategies.
second-person pronouns (Oshima-Takane, It would seem interesting to explore
1992; Oshima-Takane et al., 1996). computational systems that could learn all
In a simulation of blind children, the the functions relating speaker, addressee,
robot in another condition could not see referent, and pronoun to each other as
which of the humans had the ball, but could well as extended functions that included
sense whether it (the robot) had the ball. third-person pronouns. Could learning some
This blind robot fell into the reversal er- of these functions afford inferences about
ror of interpreting you as the robot, as do other functional relations? Or would every
young blind children (Andersen, Dunlea, & function have to be separately and explicitly
Kekelis, 1984). learned? Bayesian methods and neural net-
The game of catch seems like an inter- works with recurrent connections might be
esting and natural method to facilitate per- good candidates for systems that could make
sonal pronoun acquisition. Although tabu- inferences in various directions.
lating word-property counts in 2 × 2 tables
and then analyzing these tables with chi-
square tests is a common technique in the 13. Abnormal Development
study of computational semantics, it is un-
clear whether this could be implemented One of the promising ideas supported by
with neural realism. connectionist modeling of development is
A major difference between the psychol- that developmental disorders might emerge
ogy experiment and neural-network model from early differences that lead to abnor-
on the one hand and the robotic model on mal developmental trajectories (Thomas &
the other hand concerns the use of gestures. Karmiloff-Smith, 2002). One such simula-
Both the psychology experiment and the tion was inspired by evidence that larger
neural-network model liberally used point- brains favor local connectivity and smaller
ing gestures to convey information about brains favor long-distance connections
the referent, as well as eye-gaze information, (Zhang & Sejnowski, 2000) and that chil-
to convey information about the addressee. dren destined to become autistic show ab-
In contrast, the robotic model eschewed normally rapid brain growth in the months
gestures on the grounds that pointing is preceding the appearance of autistic symp-
rude, unnecessary, and difficult for robots toms (Courchesne, Carper, & Akshoomoff,
to understand. Paradoxically then, even 2003). Neural networks modeled the com-
though developmental robotics holds the putational effects of such changes in brain
promise of understanding embodied cogni- size (Lewis & Elman, 2008). The net-
tion, this robotic model ignored both ges- works were feed-forward pattern associa-
tural and eye-gaze information. The game tors, trained with backpropagation of error.
of catch, accompanied by appropriate ver- As pictured in Figure 16.3, each of two hemi-
bal commentary, nicely compensated for spheres of ten units was fed by a bank of five
the absence of such information in the input units. Units within a hemisphere were
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
5 Outputs 5 Outputs
10 Hiddens 10 Hiddens
Left hemisphere Right hemisphere
5 Inputs 5 Inputs
recurrently connected, and two units in each McClelland, 1994), and Williams syndrome
hemisphere were fully connected across (Thomas & Karmiloff-Smith, 2002).
hemispheres. Each hemisphere, in turn, fed
a bank of five output units. Both inter-
and intra-hemispheric connections exhib- 14. Conclusions
ited conduction delays, implemented by pe-
14.1. Computational Diversity
riodically adding or subtracting copy units
forming a transmission chain – the more Computational modeling of development
links in the chain, the longer the conduc- is now blessed with several different tech-
tion delay. The networks simulated inter- niques. It started in the 1970s with pro-
hemispheric interaction by growing in spa- duction systems, which were joined in the
tial extent, with consequent transmission late 1980s by neural networks. But by the
delays, at the rate of either typically devel- early twenty-first century, there were also
oping children or those in the process of be- dynamic system, robotic, and Bayesian ap-
coming autistic. proaches to development. This diversity is
Those networks that simulated autistic welcome because each approach has already
growth (marked by rapid increases in the made valuable contributions to the study of
space taken up by the network) were less af- development, just as they have in other ar-
fected by removal of inter-hemispheric con- eas of psychology. All of these approaches
nections than those networks that grew at have contributed to the notion that an un-
a normal rate, indicating a reduced reliance derstanding of development can be facili-
on long-distance connections in the autistic tated by making theoretical ideas precise and
networks. As these differences accelerated, systematic, covering various phenomena of
they were reflected in declining connec- interest, linking several different findings to-
tivity and deteriorating performance. The gether, explaining them, and predicting new
simulation offers a computational demon- phenomena. Such activities significantly ac-
stration of how brain overgrowth could pro- celerate the scientific process.
duce neural reorganization and behavioral Production systems are to be admired
deficits. for their precision and clarity in specifying
In a similar vein, researchers have exam- both knowledge representations and pro-
ined the role of initial conditions in develop- cesses that operate on these representations
mental dyslexia (Harm & Seidenberg, 1999), to produce new knowledge. Connection-
specific language impairments (Hoeffner & ist systems have the advantage of graded
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
knowledge representations and relative dynamic systems (van Gelder, 1998), have
closeness to biological neural systems in sometimes been implemented as neural net-
terms of activation functions, connectivity, works. One can also be concerned with how
and learning processes. Dynamic systems il- a system is implemented in neural tissue. Of
lustrate how the many different features of a the approaches considered in this chapter,
complex computational system may interact neural networks come closest to simulating
to yield emergent behavior and ability. De- how this might be done because these net-
velopmental robotics forces modelers to deal works were largely inspired by principles of
with the complexities of real environments how the brain and its neurons work. Growth
and the constraints of operating in real time. of brain structure within a network and in-
Bayesian methods contribute tools for mak- tegration of brain structures across networks
ing inferences despite incomplete and un- have both been stressed in this and other re-
certain knowledge. views (Westermann et al., 2006). As noted,
dynamic systems can also be inspired by neu-
roscience discoveries. There is, of course, a
14.2. Complementary Computation
continuum of neural realism in such imple-
These different modeling approaches tend mentations.
to complement each other, partly by being If the different modeling approaches do
pitched at various levels. Marr’s (1982) lev- exist at different levels, wouldn’t it make
els of computational analysis, imperfect as sense to use the lowest level to obtain the
they are, can be used to explore this. Marr finest grain of detail, perhaps to a biologi-
argued that explanations of a complex sys- cally realistic model of actual neural circuits?
tem can be found in at least three levels: Remembering the reductionist cruncher ar-
analysis of the system’s competence, design gument (Block, 1995), the answer would be
of an algorithm and representational for- negative because different levels may be bet-
mat, and implementation. Analyzing a sys- ter for different purposes. It is preferable
tem’s competence has been addressed by to work at the most appropriate level for
task analysis in symbolic approaches, dif- one’s goals and questions, rather than al-
ferential equations in a dynamic system ap- ways trying to reduce to some lower level.
proach, and Bayesian optimization. Nonetheless, one of the convincing ratio-
Every computational model must cope nales for cognitive science was that differ-
with the algorithmic level. Symbolic rule- ent levels of analysis can constrain each
based models do this with the mechanics other, as when psychologists try to build
of production systems: the matching and computational models that are biologically
firing of rules, and the consequent updat- realistic.
ing of working memory. In neural networks,
patterns of unit activations represent ac-
14.3. Computational Bakeoffs
tive memory, whereas weight updates rep-
resent the learning of long-term memory. Even if computational algorithms exist at
Activation fields geometrically represent the somewhat different levels, so-called bake-
changing positions of a dynamic system. off competitions are still possible and in-
Bayesian approaches borrow structures from teresting, both between and within vari-
symbolic approaches and compute statistics ous approaches. This is because different
over these structures to identify the most approaches and models sometimes make
probable hypothesis or structure given cur- different, and even conflicting, predictions
rent evidence. about psychological phenomena. Focusing
The implementation level can be taken on phenomena that have attracted a lot of
to refer to the details of how a particu- modeling attention, as in this chapter, pro-
lar model is instantiated. In this context, vides some ready-made bakeoff scenarios.
higher-level approaches, such as production Symbolic and connectionist models were
systems (Lebiere & Anderson, 1993) and sharply contrasted here in the cases of the
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
balance scale, past tense, artificial syntax, that these simple learning problems were
and pronouns. In balance-scale simulations, actually linearly separable, suggesting that
rule-based models, but not connectionist hidden units were making learning more
models, had difficulty with the torque- difficult than it needed to be. Other con-
difference effect. This is a graded, percep- structive versus static network competitions
tual effect that is awkward for crisp symbolic have also favored constructive networks
rules but natural for neural systems with on developmental problems (Shultz, 2006).
graded representations and update processes To simulate stages and transitions between
that propagate these gradations. Past-tense stages, there is an advantage in starting small
formation was likewise natural for neural and increasing in computational power as
approaches, which can implement regu- needed.
larities and exceptions in a homogeneous The notion of underlying qualitative
system and thus capture various phenomena changes causing qualitative changes in psy-
by letting them play off against each other, chological functioning differs from the idea
but awkward for symbolic systems that of underlying small quantitative changes
isolate rule processing from other processes. causing qualitative shifts in behavior, as in
Several connectionist models captured the mere weight adjustment in static neural net-
novelty preference in learning an artificial works or quantitative changes in dynamic-
syntax, but the one rule-based approach that system parameters. There are analogous
was tried could not do so. Although no rule- qualitative structural changes at the neu-
based models have yet been applied to pro- rological level in terms of synaptogenesis
noun acquisition, the graded effects of varia- and neurogenesis, both of which have been
tion in amount of experience with overheard demonstrated to be under the control of
versus directly addressed speech would pose pressures to learn in mature as well as devel-
a challenge to rule-based models. oping animals (Shultz, Mysore, & Quartz,
Attractive modeling targets, such as the 2007). The CC algorithm is neutral with
balance scale, artificial syntax, similarity-to- respect to whether hidden-unit recruitment
correlation shift, and discrimination shift implements synaptogenesis or neurogenesis,
also afforded some bakeoff competitions depending on whether the recruit already
within the neural approach in terms of static exists in the system or is freshly created. But
(BP) versus constructive (CC) network it is clear that brains do grow in structure and
models. On the balance scale, CC networks there seem to be computational advantages
uniquely captured final, stage-4 perfor- in such growth, particularly for simulating
mance and did so without having to segre- qualitative changes in behavioral develop-
gate inputs by weight and distance. CC also ment (Shultz, 2006).
captured more phenomena than did static This is not to imply that static connec-
BP models in simulations of the similarity- tionist models do not occupy a prominent
to-correlation shift. This was probably be- place in developmental modeling. On the
cause CC naturally focused first on identify- contrary, this review highlights several cases
ing stimulus features while underpowered in which static networks offered compelling
and only later with additional computa- and informative models of developmental
tional power abstracted correlations among phenomena. Static networks may be partic-
these features. In discrimination-shift learn- ularly appropriate in cases for which evo-
ing, the advantage of CC over static BP was lution has prepared organisms with either
a bit different. Here, BP modelers were led network topologies or a combination of con-
to incorrect conclusions about the inability nection weights and topologies (Shultz &
of neural networks to learn a mediated ap- Mareschal, 1997). When relevant biological
proach to this problem by virtue of trying constraints are known, as in a model of ob-
BP networks with designed hidden units. ject permanence (Mareschal et al., 1999),
Because CC networks only recruit hidden they can guide design of static network
units as needed, they were able to verify topologies. In some studies, the process
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
of network evolution itself has been mod- problem was not so easy for C4.5, though,
eled (Schlesinger et al., 2000). Ultimately, which could not capture any phenomena
models showing how networks evolve, from the infant experiment. Moving to re-
develop, and learn would be a worthy alistically complex syntactic patterns will
target. likely prove challenging for all sorts of
The more recently applied modeling models.
techniques (dynamic systems, developmen- Simulation of abnormal development has
tal robotics, Bayesian) do not yet have a number of promising connectionist mod-
enough bakeoff experience to draw firm els, but it is too early to tell which particular
conclusions about their relative effective- approaches will best capture which particu-
ness in modeling development. For example, lar developmental disorders.
in the A-not-B error, the dynamic system ap-
proach seemed promising but did not cover
14.4. Development via Parameter
as many phenomena as BP networks did.
Settings?
However, as noted, this dynamic system has
several parameters whose variation could be Some of the models reviewed in this chapter
explored to cover additional phenomena. simulated development with programmer-
Likewise, although Bayesian approaches designed parameter changes. Variations in
are only just starting to be applied to de- such parameter settings were used to im-
velopment, they have already made some plement age-related changes in both con-
apparently unique predictions in the do- nectionist and dynamic-systems models of
main of word learning: inferences allowed the A-not-B error, the CC model of dis-
by random sampling of examples and esti- crimination-shift learning, all three models
mates of the optimal number of examples of the similarity-to-correlation shift, and the
for Bayesian inference. Also in the word- autism model. Granted that this technique
learning domain, the Bayesian approach captured developmental effects and ar-
covered only a portion of the shape-and- guably could be justified on various grounds,
material-bias phenomena covered by the but does it really constitute a good explana-
neural-network model. Nonetheless, the hi- tion of developmental change? Or is this a
erarchical Bayesian model employed there case of divine intervention, manually imple-
seems to have the potential to integrate phe- menting changes that should occur naturally
nomena across different explanatory levels. and spontaneously? ACT-R simulations of
Before leaving these biases, it is perhaps development also have this character as pro-
worth remembering, in a bakeoff sense, that grammers change activation settings to allow
rule-based methods would likely be both- different rules to come to the fore. Perhaps
ered by the many exceptions that exist in such parameter settings could be viewed as a
this domain. preliminary step in identifying those changes
In the domain of pronoun acquisition, the a system needs to advance. One hopes that
robotics model did not address the same psy- this could be followed by model improve-
chology experiment as did the CC model, ments that would allow for more natural and
so the robot could not realistically cover spontaneous development.
the findings of that experiment. However,
a blind catch-playing robot did simulate the
reversal errors made by blind children. How- Acknowledgments
ever, a sighted robot developed you before
I , something that is not true of children. This work was supported by a grant from
The domain of syntax learning proved to the Natural Sciences and Engineering Re-
be too easy for a variety of connectionist search Council of Canada to the first au-
models, so it was difficult to discriminate thor. Frédéric Dandurand, J-P Thivierge,
among them – they all captured the main and Yuriko Oshima-Takane provided help-
infant finding of a novelty preference. This ful comments.
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Goodman, N. (1955/1983). Fact, fiction, and fore- autism phenotype: a test of the hypothesis
cast. New York: Bobbs-Merrill. that altered brain growth leads to altered con-
Gureckis, T. M., & Love, B. C. (2004). Com- nectivity. Developmental Science, 11, 135–155.
mon mechanisms in infant and adult category Ling, C. X., & Marinov, M. (1993). Answer-
learning. Infancy, 5, 173–198. ing the connectionist challenge: A symbolic
Hare, M., & Elman, J. L. (1995). Learning and model of learning the past tenses of English
morphological change. Cognition, 56, 61–98. verbs. Cognition, 49, 235–290.
Harm, M. W., & Seidenberg, M. S. (1999). Lovett, A., & Scassellati, B. (August 2004). Us-
Phonology, reading acquisition, and dyslexia: ing a robot to reexamine looking time experi-
Insights from connectionist models. Psycholog- ments. Paper presented at the Fourth Interna-
ical Review, 106, 491–528. tional Conference on Development and Learn-
Hoeffner, J. H., & McClelland, J. L. (1994). ing, San Diego, CA.
Can a perceptual processing deficit explain the Marcus, G. F., Vijayan, S., Bandi Rao, S., &
impairment of inflectional morphology in Vishton, P. M. (1999). Rule learning by seven-
development dysphasia – a computational month-old infants. Science, 283, 77–80.
investigation? In Proceedings of the Twenty- Mareschal, D., Plunkett, K., & Harris, P.
Fifth Annual Child Language Research Forum (1999). A computational and neuropsycho-
(pp. 38–49). Center for the Study of Lan- logical account of object-oriented behaviours
guage and Information, Stanford University, in infancy. Developmental Science, 2, 306–
Stanford, CA. 317.
Imai, M., & Gentner, D. (1997). A cross- Marr, D. (1982). Vision: A computational investi-
linguistic study of early word meaning: Uni- gation into the human representation and pro-
versal ontology and linguistic influence. Cog- cessing of visual information. San Francisco:
nition, 62, 169–200. W. H. Freeman.
Kemp, C., Perfors, A., & Tenenbaum, J. B. McClelland, J. L. (1989). Parallel distributed pro-
(2007). Learning overhypotheses with hier- cessing: Implications for cognition and devel-
archical Bayesian models. Developmental Sci- opment. In R. G. M. Morris (Ed.), Parallel
ence, 10, 307–321. distributed processing: Implications for psychol-
Kendler, T. S. (1979). The development of dis- ogy and neurobiology (pp. 8–45). Oxford, UK:
crimination learning: A levels-of-functioning Oxford University Press.
explanation. In H. Reese (Ed.), Advances in McNeill, D. (1992). Hand and mind. Chicago:
child development and behavior (Vol. 13, pp. University of Chicago Press.
83–117). New York: Academic Press. Munakata, Y. (1998). Infant perseveration and
Klahr, D., & Wallace, J. G. (1976). Cognitive implications for object permanence theories:
development: An information processing view. A PDP model of the AB task. Developmental
Hillsdale, NJ: Lawrence Erlbaum. Science, 1, 161–184.
Kobayashi, H. (1997). The role of actions in mak- Munakata, Y., McClelland, J. L., Johnson, M.
ing inferences about the shape and material of H., & Siegler, R. S. (1997). Rethinking in-
solid objects among 2-year-old children. Cog- fant knowledge: Toward an adaptive process
nition, 63, 251–269. account of successes and failures in object
Landau, B., Smith, L. B., & Jones, S. S. (1988). permanence tasks. Psychological Review, 104,
The importance of shape in early lexical learn- 686–713.
ing. Cognitive Development, 3, 299–321. Negishi, M. (1999). Do infants learn grammar
Langley, P. (1987). A general theory of discrim- with algebra or statistics? Science, 284, 433.
ination learning. In D. Klahr, P. Langley, & Newell, A. (1990). Unified theories of cognition.
R. Neches (Eds.), Production systems models of Cambridge, MA: Harvard University Press.
learning and development (pp. 99–161). Cam- Oshima-Takane, Y. (1988). Children learn from
bridge, MA: MIT Press. speech not addressed to them: The case of
Lebiere, C., & Anderson, J. R. (1993). A con- personal pronouns. Journal of Child Language,
nectionist implementation of the ACT-R pro- 15, 95–108.
duction system. In Proceedings of the Fifteenth Oshima-Takane, Y. (1992). Analysis of pronom-
Annual Conference of the Cognitive Science So- inal errors: A case study. Journal of Child Lan-
ciety (pp. 635–640). Hillsdale, NJ: Lawrence guage, 19, 111–131.
Erlbaum. Oshima-Takane, Y., Goodz, E., & Derevensky,
Lewis, J. D., & Elman, J. L. (2008). Growth- J. L. (1996). Birth order effects on early lan-
related neural neural reorganization and the guage development: Do secondborn children
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
learn from overheard speech? Child Develop- Shultz, T. R., & Bale, A. C. (2001). Neural net-
ment, 67, 621–634. work simulation of infant familiarization to ar-
Oshima-Takane, Y., Takane, Y., & Shultz, T. tificial sentences: Rule-like behavior without
R. (1999). The learning of first and second explicit rules and variables. Infancy, 2, 501–
pronouns in English: Network models and 536.
analysis. Journal of Child Language, 26, 545– Shultz, T. R., & Bale, A. C. (2006). Neural net-
575. works discover a near-identity relation to dis-
Piaget, J. (1954). The construction of reality in the tinguish simple syntactic forms. Minds and
child. New York: Basic Books. Machines, 16, 107–139.
Pinker, S. (1999). Words and rules: The ingredi- Shultz, T. R., Buckingham, D., & Oshima-
ents of language. New York: Basic Books. Takane, Y. (1994). A connectionist model of
Plunkett, K., & Marchman, V. (1996). Learning the learning of personal pronouns in English.
from a connectionist model of the acquisition In S. J. Hanson, T. Petsche, M. Kearns, & R.
of the English past tense. Cognition, 61, 299– L. Rivest (Eds.), Computational learning theory
308. and natural learning systems, Vol. 2: Intersec-
Quinlan, J. R. (1993). C4.5: Programs for machine tion between theory and experiment (pp. 347–
learning. San Mateo, CA: Morgan Kaufmann. 362). Cambridge, MA: MIT Press.
Raijmakers, M. E. J., van Koten, S., & Molenaar, Shultz, T. R., & Cohen, L. B. (2004). Model-
P. C. M. (1996). On the validity of simulat- ing age differences in infant category learning.
ing stagewise development by means of PDP Infancy, 5, 153–171.
networks: Application of catastrophe analy- Shultz, T. R., & Mareschal, D. (1997). Re-
sis and an experimental test of rule-like net- thinking innateness, learning, and construc-
work performance. Cognitive Science, 20, 101– tivism: Connectionist perspectives on de-
136. velopment. Cognitive Development, 12, 563–
Samuelson, L., & Smith, L. B. (1999). Early noun 586.
vocabularies: Do ontology, category structure Shultz, T. R., Mareschal, D., & Schmidt, W. C.
and syntax correspond? Cognition, 73, 1–33. (1994). Modeling cognitive development on
Schiff-Meyers, N. (1983). From pronoun rever- balance scale phenomena. Machine Learning,
sals to correct pronoun usage: A case study of 16, 57–86.
a normally developing child. Journal of Speech Shultz, T. R., Mysore, S. P., & Quartz, S. R.
and Hearing Disorders, 48, 385–394. (2007). Why let networks grow? In D.
Schlesinger, M. (2004). Evolving agents as a Mareschal, S. Sirois, G. Westermann, & M.
metaphor for the developing child. Develop- H. Johnson (Eds.), Neuroconstructinism: Per-
mental Science, 7, 158–164. spectives and prospects (Vol. 2). Oxford, UK:
Schlesinger, M., Parisi, D., & Langer, J. (2000). Oxford University Press.
Learning to reach by constraining the move- Siegler, R. S. (1976). Three aspects of cognitive
ment search space. Developmental Science, 3, development. Cognitive Psychology, 8, 481–
67–80. 520.
Schmidt, W. C., & Ling, C. X. (1996). A Sirois, S. (September 2002). Rethinking ob-
decision-tree model of balance scale develop- ject compounds in preschoolers: The case
ment. Machine Learning, 24, 203–229. of pairwise learning. Paper presented at the
Shultz, T. R. (1999). Rule learning by habitua- British Psychological Society Developmen-
tion can be simulated in neural networks. In tal Section Conference, University of Sussex,
M. Hahn & S.C. Stoness (Eds.), Proceedings of UK.
the Twenty-first Annual Conference of the Cog- Sirois, S., Buckingham, D., & Shultz, T. R.
nitive Science Society (pp. 665–670). Mahwah, (2000). Artificial grammar learning by infants:
NJ: Lawrence Erlbaum. An auto-associator perspective. Developmen-
Shultz, T. R. (2003). Computational developmen- tal Science, 4, 442–456.
tal psychology. Cambridge, MA: MIT Press. Sirois, S., & Shultz, T. R. (1998). Neural network
Shultz, T. R. (2006). Constructive learning in modeling of developmental effects in discrim-
the modeling of psychological development. ination shifts. Journal of Experimental Child
In Y. Munakata & M. H. Johnson (Eds.), Psychology, 71, 235–274.
Processes of change in brain and cognitive de- Sirois, S., & Shultz, T. R. (2006). Preschoolers
velopment: Attention and performance XXI. out of adults: Discriminative learning with a
(pp. 61–86). Oxford, UK: Oxford University cognitive load. Quarterly Journal of Experimen-
Press. tal Psychology, 59, 1357–1377.
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020
P1: IBE
CUFX212-16 CUFX212-Sun 978 0 521 85741 3 April 2, 2008 17:14
Soja, N. N. (1992). Inferences about the mean- van Rijn, H., van Someren, M., & van der Maas,
ing of nouns: The relationship between per- H. (2003). Modeling developmental transi-
ception and syntax. Cognitive Development, 7, tions on the balance scale task. Cognitive Sci-
29–45. ence, 27, 227–257.
Soja, N. N., Carey, S., & Spelke, E. S. (1991). Vilcu, M., & Hadley, R. F. (2001). Generalization
Ontological categories guide young children’s in simple recurrent networks. In J. B. Moore &
inductions of word meaning: Object terms K. Stenning (Eds.), Proceedings of the Twenty-
and substance terms. Cognition, 38, 179– third Annual Conference of the Cognitive Sci-
211. ence Society (pp. 1072–1077). Mahwah, NJ:
Soja, N. N., Carey, S., & Spelke, E. S. (1992). Lawrence Erlbaum.
Perception, ontology, and word meaning. Cog- Vilcu, M., & Hadley, R. F. (2005). Two apparent
nition, 45, 101–107. “Counterexamples” to Marcus: A closer look.
Spence, K. W. (1952). The nature of the re- Minds and Machines, 15, 359–382.
sponse in discrimination learning. Psycholog- Westermann, G. (1998). Emergent modularity
ical Review, 59, 89–93. and U-shaped learning in a constructivist neu-
Subrahmanyam, K., Landau, B., & Gelman, R. ral network learning the English past tense. In
(1999). Shape, material, and syntax: Interact- M. A. Gernsbacher & S. J. Derry (Eds.), Pro-
ing forces in children’s learning in novel words ceedings of the Twentieth Annual Conference of
for objects and substances. Language & Cogni- the Cognitive Science Society (pp. 1130–1135).
tive Processes, 14, 249–281. Mahwah, NJ: Lawrence Erlbaum.
Sun, R., Slusarz, P., & Terry, C. (2005). The in- Westermann, G., & Mareschal, D. (2004). From
teraction of the explicit and the implicit in parts to wholes: Mechanisms of development
skill learning: A dual-process approach. Psy- in infant visual object processing. Infancy, 5,
chological Review, 112, 159–192. 131–151.
Taatgen, N. A., & Anderson, J. R. (2002). Why Westermann, G., Sirois, S., Shultz, T. R., &
do children learn to say “broke”? A model of Mareschal, D. (2006). Modeling developmen-
learning the past tense without feedback. Cog- tal cognitive neuroscience. Trends in Cognitive
nition, 86, 123–155. Sciences, 10, 227–232.
Thelen, E., Schoener, G., Scheier, C., & Smith, Xu, F., & Tenenbaum, J. B. (2007). Sensitivity
L. (2001). The dynamics of embodiment: A to sampling in Bayesian word learning. Devel-
field theory of infant perseverative reaching. opmental Science, 10, 288–297.
Brain and Behavioral Sciences, 24, 1–33. Younger, B. A., & Cohen, L. B. (1986). Develop-
Thomas, M. S. C., & Karmiloff-Smith, A. (2002). mental change in infants’ perception of corre-
Are developmental disorders like cases of lations among attributes. Child Development,
adult brain damage? Implications from con- 57, 803–815.
nectionist modelling. Behavioral and Brain Zhang, K., & Sejnowski, T. J. (2000). A universal
Sciences, 25, 727–787. scaling law between gray matter and white
van Gelder, T. J. (1998). The dynamical hypoth- matter of cerebral cortex. Proceedings of the
esis in cognitive science. Behavioral and Brain National Academy of Sciences USA, 97, 5621–
Sciences, 21, 1–14. 5626.
Downloaded from https:/www.cambridge.org/core. New York University, on 26 Jun 2017 at 15:14:05, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511816772.020