0% found this document useful (0 votes)
20 views15 pages

A Sensory-Motor Theory of The Neocortex: Nature Neuroscience

Uploaded by

GNS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

A Sensory-Motor Theory of The Neocortex: Nature Neuroscience

Uploaded by

GNS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

nature neuroscience

Perspective https://doi.org/10.1038/s41593-024-01673-9

A sensory–motor theory of the neocortex

Received: 21 June 2023 Rajesh P. N. Rao 1,2

Accepted: 26 April 2024

Published online: 27 June 2024 Recent neurophysiological and neuroanatomical studies suggest a close
interaction between sensory and motor processes across the neocortex.
Check for updates
Here, I propose that the neocortex implements active predictive coding
(APC): each cortical area estimates both latent sensory states and actions
(including potentially abstract actions internal to the cortex), and the cortex
as a whole predicts the consequences of actions at multiple hierarchical
levels. Feedback from higher areas modulates the dynamics of state and
action networks in lower areas. I show how the same APC architecture can
explain (1) how we recognize an object and its parts using eye movements,
(2) why perception seems stable despite eye movements, (3) how we learn
compositional representations, for example, part–whole hierarchies, (4)
how complex actions can be planned using simpler actions, and (5) how we
form episodic memories of sensory–motor experiences and learn abstract
concepts such as a family tree. I postulate a mapping of the APC model to the
laminar architecture of the cortex and suggest possible roles for cortico–
cortical and cortico–subcortical pathways.

The predictive coding theory of cortical function, proposed in this jour- only does activity in the visual cortex get updated after movement
nal in 1999 by the author and Ballard1, has been the subject of increasing (Fig. 1b, left), but almost all recorded cortical areas had neurons
attention in recent years2–4. However, as originally proposed, the theory with activities that were predictive of upcoming movements (Fig. 1b,
ignored a fundamental aspect of perception, namely, that perception is right). Similarly, Stringer et al.11 found that about a third of the popu-
action based: we move our eyes about three times a second to recognize lation activity of ~10,000 neurons in the visual cortex of awake mice
objects in a scene, orient our heads to localize sounds, use our fingers to could be predicted from motor actions derived from a video of the
identify objects by touch and navigate ourselves in our environment to mouse’s facial movements (Fig. 1c), suggesting that sensory–motor
solve tasks that satisfy our needs. Away from experimentally imposed integration occurs even in the primary sensory cortex (the lack of a
constraints in the laboratory, perception, in its natural state, can best similar result in monkeys12 could be due to the increased functional
be viewed as an action-based hypothesis-testing process (also called specialization of primate visual cortical areas compared to that of
active sensing, active perception and active inference)5–8. the rodent cortex).
Recent studies have highlighted the important influence of Actions may be integrated differently across the different layers
impending actions across almost all cortical areas (Fig. 1). For exam- of a cortical area. For example, Jordan and Keller13 showed that both
ple, in mice solving a visual discrimination task using forepaws to layer 2/3 and layer 5/6 neurons in the mouse primary visual cortex
rotate a wheel, Zatka-Haas et al.9 observed, using widefield calcium (V1) undergo depolarization before locomotion onset (Fig. 1d, left and
imaging, extensive bilateral activity across cortical areas preceding middle). While layer 2/3 neurons appear to be computing a difference
movements on choice trials (left or right action selected) but not on between motor-related input and bottom–up visual flow input, layer
‘NoGo’ trials (no action is selected) (Fig. 1a, left). They further showed 5/6 responses were consistent with positive integration of visuomotor
that impending movement could be decoded from cortical activity inputs (Fig. 1d, right). These results complement well-known earlier
in most imaged regions by 25 ms before movement (Fig. 1a, right). In results on predictive activity in a range of cortical areas such as the
the same task, Steinmetz et al.10 used Neuropixels probes to record visual cortex14, the parietal cortex15 and frontal eye fields16 that antici-
spiking activity from thousands of neurons and showed that, not pate the visual consequences of impending eye movements.

1
Center for Neurotechnology, University of Washington, Seattle, WA, USA. 2Paul G. Allen School of Computer Science and Engineering, University of
Washington, Seattle, WA, USA. e-mail: [email protected]

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1221


Perspective https://doi.org/10.1038/s41593-024-01673-9

a Stim onset b Neurons (%)


0 100 ms 125 150 200
MOs MOp Action encoding 0 2 5
SSp VISpm neuron
Left/right
choice Contra

20 50 80
Ipsi
NoGo Decoder acc. (%)

VISp –0.3 0.3 Action decoder accuracy


VISal
dF/F (%) (25 ms before L/R action)

c Neural activity (test set) sorted by 1D embedding d


Locomotion onset Locomotion onset **

Mismatch response
Spontaneous mouse behavior

Vm response (mV)
4

Sorted neuron #
'eigenfaces'
1,000 neurons

PC1 20
2 L2/3
2
10 L5/6

(mV)
6 0
10 s 0 –2
PC2 Neural activity prediction (test set) from face motion 10 –10 –4
14

4 mV
–20 –6
t –2 0 2 4 6 L2/3 L5/6
2s
t+1 PC3
Time (s)
1,000 neurons

0.2
0
–0.2
10 s

Fig. 1 | Widespread influence of actions across the cortex. a, Left, widefield principal-component (PC) analysis of a video of a mouse’s facial movements
calcium imaging reveals bilateral activity (cortical fluorescence dF/F) from 0 to (left, example frames t, t + 1; middle, top three principal components) accurately
200 ms after stimulus (stim) onset across cortical areas preceding movements predicted (using reduced-rank regression) about a third of the population
on left or right (L/R) action-selection trials. ‘NoGo’ trials do not show such activity of ~10,000 neurons (raster representations on the right) measured
activity. Right, average action execution decoder accuracy (acc.) 25 ms before using two-photon calcium imaging of the visual cortex of awake mice (adapted
movement onset, showing that an impending movement can be decoded from with permission from ref. 11, AAAS). 1D, one dimensional. d, Intracellular
cortical activity in most imaged regions (adapted from ref. 9, CC BY 4.0). MOs, recordings in V1 of mice on a spherical treadmill with locomotion coupled
secondary motor cortical area; MOp, primary motor cortex; SSp, primary to visual flow feedback. Visual flow was halted at random times to generate
somatosensory cortex; VISp, primary visual cortex; VISal, secondary visual visuomotor mismatch events. Left, heatmap of average responses before
cortical area. b, Left, increased spiking after a correct choice movement in a and after locomotion onset across all layer 5/6 neurons. Baseline activity was
visual cortex neuron (posteromedial visual area, VISpm) for both contralateral subtracted from responses by using the average membrane potential in the 2.5 s
(contra) and ipsilateral (ipsi) stimulus presentations (orange and blue dots, before locomotion onset before averaging. Middle, average response before and
spikes; black dots, movement onset). Right, fraction of neurons in each brain after locomotion onset across all layer 5/6 (L5/6) neurons (black, 14 neurons)
region with pre-movement activity that could be accurately predicted from compared with all layer 2/3 (L2/3) neurons (gray, 32 neurons). Right, average
the animal’s movement (in either the left or right direction) (adapted from mismatch responses of layer 2/3 and layer 5/6 neurons (adapted with permission
ref. 10, Springer Nature Limited). c, Motor action information extracted using from ref. 13, Elsevier).

The emerging view, as suggested by the studies above, is that consisting of a state-prediction network and an action-prediction net-
almost all cortical areas update their representations on the basis work, both implemented within each cortical area. ‘State’ here denotes
of ‘efference copies’ of upcoming actions (‘corollary discharges’ hidden (or ‘latent’) aspects of the world inferred from sensory inputs,
(ref. 17)) as well as the results of executed actions. Such a view harmo- for example, parts of an object to be recognized or one’s location in a
nizes well with the anatomical observation that all areas of the neocor- building. ‘Action’ refers to not just motor commands but also abstract
tex (henceforth, the ‘cortex’), including areas traditionally labeled as actions, for example, ‘go to the maternal grandmother node’ in a family
sensory cortices, send outputs to subcortical motor regions (refs. 18,19 tree or ‘perform multiplication’ on two given numbers. APC postulates
and references therein) and receive input from these regions. Indeed, that feedback from higher cortical areas modulates the dynamics of
Vernon Mountcastle, in his prescient article in 1978 (ref. 20), put forth both state and action networks in lower areas, changing the functions
the hypothesis that a single unifying computational principle might they compute on the fly to suit the needs of the current task. This leads
be operating across the entire cortex, proposing the ‘cortical col- to representations that operate at multiple levels of sensory and motor
umn’ as a modular information-processing unit of the cortex (see also abstraction, as observed in cortical hierarchies implicated in percep-
refs. 21–23 for related ideas). This hypothesis is supported by the tion and action27–30. In the following sections, I present computational
remarkable anatomical similarities in laminar connectivity patterns and neurobiological aspects of APC, comparing emerging studies and
across cortical areas24,25, even though the density of cells within laminae experimental results with the model’s predictions. I present simula-
may vary across areas. Additional evidence for this hypothesis comes tions chosen to showcase the explanatory breadth of APC. While we
from experiments in which inputs from the optic nerve were diverted have previously investigated APC in the context of machine learning
via the auditory thalamus to the auditory cortex, causing the auditory (ML) and artificial intelligence (AI)31–33, here I explore APC as a model
cortex to develop visual receptive field properties26. of cortical function.
If there is indeed a common computational principle operating
across the cortex, it must be versatile enough to explain capabilities APC
as diverse as (1) learning to recognize an object from multiple visual Neuroanatomical and physiological motivation
glimpses through eye and head movements or from multiple tactile The axonal outputs of layer 5 neurons in almost all cortical areas target
sensations through finger movements, (2) solving a complex spatial subcortical motor centers19. Even in V1, outputs from layer 5 neurons
navigation task using simpler movement sequences and (3) under- target the superior colliculus34, which is involved in eye movements
standing abstract concepts (such as a family tree). (aside from other motor behaviors). Similarly, layer 5 neurons in the
In this Perspective, I suggest APC as a unifying sensory–motor primary auditory cortex (A1) send outputs to the inferior colliculus35,
theory of the cortex. APC hypothesizes a canonical cortical module as which is involved in orienting and defensive motor behaviors36, while

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1222


Perspective https://doi.org/10.1038/s41593-024-01673-9

Box 1

Canonical APC module


The canonical APC module is motivated by the laminar structure consequences of various action choices, but this mode of selecting
of the cortex (figure in Box 1, panel a) and consists of the following actions requires considerable effort and deliberation (‘system 2’
components: thinking68). A more efficient way to select actions (‘system 1’
thinking68) is to have a state-to-action ‘policy’ (ref. 44) fâ , which maps
State-transition function. The state-transition function fs models the current estimated state st̂ directly to an action at̂ to achieve the
the physics of the environment and the agent, and predicts the current goal (figure in Box 1, panel c). Biologically, the policy fâ can
next state st , given the previous state st−1 and action at−1 (figure in be implemented by a recurrent network of neurons with activity at
Box 1, panel b). In general, states and actions are vectors. By time t denoting at̂ .
learning an approximation fŝ of fs from interactions with the
world, the agent can learn an ‘internal model’ of the world81,87,88,94 Coupling perception and action. Given a policy fâ for a particular
(also called a ‘world model’, forward model or generative model) task or goal, the agent can execute an action at̂
and use it to run simulations of the world, imagine new scenarios, from the policy while simultaneously sending at̂ as an ‘efference
explore what happens when particular actions are executed and copy’ (or ‘corollary discharge’ (ref. 17)) to the learned model fŝ to
plan actions that lead to desirable states. Biologically, the function predict the sensory consequences of each action. The agent
fŝ can be implemented by a recurrently connected network of can then correct its prediction of the new state of the world
neurons, with the network activity at time t denoting an estimate st̂ based on the new sensory observation that resulted from
of the state st (figure in Box 1, panel c). Neural population activity taking the action, using prediction errors as prescribed by
in visual, auditory and somatosensory cortices, for instance, after predictive coding theory1. The corrected state estimate st̂ can in
processing a sensory stimulus (or more generally, after sequential turn be fed as input to the policy fâ to generate the next
sampling of the stimulus), can be regarded as the estimated st̂ for action for the task, continuing until the goal is achieved or
that stimulus computed by the cortical region. the task times out. A biological implementation of this idea
involves coupling the state and policy recurrent networks within
Policy function. An internal model (such as fŝ above) can be used to the laminar structure of a cortical area, as depicted in the figure in
plan actions by unrolling the model into the future to explore the Box 1 (panel c).

a 1
b c
1
st–1 fs st
2/3 at–1 ŝt–1 fs
ât–1 2/3
State
4 feedback
4

5 st
at fa fa ŝt
at–1 ât ât–1 5
6 D
D 6

Action
Lower-order Motor Higher-order Action/goal/coordinates Input feed-
thalamus centers thalamus back

Lower-order thalamus Motor centers Higher-order thalamus

Canonical APC module. a, Depiction of the six-layered laminar structure of a cortical column, showing some of the major connections
between layers and with the thalamus (not all connections are shown) (based on refs. 18,24,43). b, Canonical APC generative model. The
dashed arrow denotes a delay of one time step. D denotes a decoder converting state to input. c, One possible implementation of inference in
a canonical APC module for the generative model in b.

layer 5 neurons in the primary somatosensory cortex send outputs well-known differences in the laminar densities of neurons in different
to the spinal cord37, which controls body movements. On the other cortical areas (for example, V1 versus M1), the sensory–motor nature
hand, layer 5 neurons in the primary motor cortex (or M1) (such as Betz of the cortex is retained. This motivates the idea of a ‘sensory–motor
cells) have long been implicated in motor function via their axonal cortical module’ as a canonical feature of the cortex (described further
projections to the spinal cord38, but the middle and upper layers of in the section APC module and neuroanatomical implementation).
traditional ‘motor’ cortical areas such as M1 are involved in sensory The figure in Box 1 (panel a) depicts the laminar structure of a typi-
processing of feedback from subcortical and cortical sources39,40: for cal cortical column and its connectivity (based on refs. 18,24,43). Inputs
example, layer 2/3 neurons in mouse M1 respond to unexpected sen- from a sensory region or a lower cortical area target layer 4 neurons,
sory perturbations in a visually guided motor task41, while neurons in the outputs of which are then conveyed to the superficial layer 2/3
mouse M2 encode auditory sensation and expectation42. Thus, despite neurons. These neurons in turn send their axons to the deeper layers,

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1223


Perspective https://doi.org/10.1038/s41593-024-01673-9

Box 2

State and action networks in the cortex


State network fŝ and sensory–motor prediction in the cortex. The of an upcoming eye movement has been observed across the cortex
APC model predicts that cortical networks implementing the including the visual cortex14, the parietal cortex15 and the frontal
state-prediction function fŝ should learn to anticipate the sensory cortex16 (figure in Box 2, panel b, left, middle and right, respectively).
consequences of actions (by predicting latent sensory states) and Other types of movement, such as locomotion, can also predictively
exhibit anticipatory activity before movement. A study by Audette activate cortical neurons, for example, in V1 (ref. 13 and Fig. 1d).
et al.93 found such anticipatory activity in the primary auditory These results are consistent with the APC model’s use of a state
cortex (figure in Box 2, panel a): mice were trained to push a lever network fŝ to learn the sensory consequences of actions.
with their forelimb, which produced a pure tone at a fixed position
early in each movement; after training, omission of this learned Action network fâ and cortical motor dynamics. The APC model
movement-associated sound (figure in Box 2, panel a, top left) assumes that the network within a cortical area implementing the
revealed a large population of auditory cortex neurons firing roughly action-prediction function fâ is a recurrent network (figure in
200 ms before movement onset, with this activity peaking around Box 1, panel c) for which the outputs, in the case of the motor cortex,
the time of the expected tone (figure in Box 2, panel a, top middle encode the dynamics of movement. There is now considerable
and right). The study also found prediction error-like suppression of neurophysiological evidence that, when an animal is performing a
neural activity for anticipated sounds, consistent with the use of movement (for example, reaching toward a goal), the activities of
errors in predictive coding for state inference (figure in Box 2, panel neurons in the motor cortex are well described by a dynamical
a, bottom). Predictive activity anticipating the visual consequences system of the form ȧ = f(a, u) where u is an external input95.

a Auditory b
cortex
10
Frequency

FP2 FP1
(Hz)

Omission 6 V eye
H H eye FP2
V Stim
FP2
2 Stim Stim FP1
–0.4 0 0.4 FP1

Time (s)
Movement Passive
30 Frontal cortex
n = 377 Visual cortex Parietal
(FEF)
(V3A) cortex
100
5 Hz

100 ms 50 ms
0 200 ms
0 0.4 0 0.4 Eye movement Stimulus Eye movement Eye movement
100 ms
onset onset onset onset

Single-trial population trajectories,


c Motor
d cortex and thalamus
trajectory, thalamus

Monkey B cortex
Monkey J-array Monkey N-array PC1 PC2 PC3
Neural population Neural population
Population

trajectory, thalamus trajectory, cortex

Lift
onto jPC2 (AU)
Projection

Lift +350 ms
Lift –100 ms Lift
PC2
PC3

PC1 PC2 PC3


trajectory, cortex

PC2
PC3

Population

Lift +350 ms
PC

Lift
PC
1

Lift –100 ms
Projection Projection Projection
onto jPC1 (AU) onto jPC1 (AU) onto jPC1 (AU)

State and action networks in the cortex. a, Top left, schematic of the omission of an expected sound during a lever-press movement (black
curve) in a self-generated sound task in mice93. a, Top middle and right, neural activity in the auditory cortex aligned to six positions (colored
dots) along the lever-press movement: peak firing always corresponded to the time of the expected sound (occurring on average 36 ms
after movement onset), with the activity starting roughly 200 ms before movement onset (as reported in ref. 93). a, Bottom, heatmaps (left)
of responses for all recorded neurons and the average response (right) for 377 of those neurons responding above a threshold to the sound
when it was self-generated by a lever press (‘movement’) and when it was not self-generated (‘passive’). Note the greater suppression (green)
for self-generated sounds (adapted from ref. 93, Elsevier). b, Predictive activity anticipating the visual consequences of an upcoming eye
movement in the visual cortex (left) (adapted with permission from ref. 14, National Academy of Sciences, USA), the parietal cortex (middle)
(adapted with permission from ref. 15, AAAS) and the frontal cortex (right) (adapted with permission from ref. 16, American Physiological
Society). FP, fixation point; H, horizontal eye movement; V, vertical eye movement; FEF, frontal eye fields. c, Two-dimensional projection (using
jPCA95) of population activity of motor cortex neurons in three monkeys performing straight and curved reaching movements, showing rotation
of the neural state. Traces are colored from green to red based on the preparatory neural state (circles) for each reach condition (adapted
from ref. 95, Springer Nature Limited). jPC1 and jPC2, first and second dimensions, respectively, of jPCA space; AU, arbitrary units. d, Results
from simultaneous population recordings of the motor thalamus and the motor cortex in mice performing reach-to-grasp movements97. Left,
neural population trajectories for the thalamus (left, green) and the cortex (right, magenta) obtained using trial-averaged PCA. Right, single-trial
population activity in the thalamus (top) and the cortex (bottom) (adapted from ref. 97, Springer Nature Limited).

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1224


Perspective https://doi.org/10.1038/s41593-024-01673-9

(continued from previous page)


The figure in Box 2 (panel c) illustrates these motor cortical sensory–motor loop in cortical pattern generation in mice
dynamics in monkeys performing reach movements. Such performing dextrous movements. They showed that time-varying
dynamics can be learned and implemented by a recurrent neural thalamic inputs are required for cortical pattern generation.
network96, which in the APC model is the action network The neural population activity in both the thalamus and the
fâ . The APC action network receives as input not only local cortex exhibited strong co-modulation in trial-averaged
recurrent activity but also the current state estimate s ̂ (figure in (figure in Box 2, panel d, left) and single-trial (figure in Box 2,
Box 1, panel c), which is updated using sensory feedback from panel d, right) activity. Inactivating the thalamus perturbed
other brain regions such as the thalamus. The APC model cortical activity and disrupted limb kinematics97, implying that
therefore predicts a close interaction between thalamic inputs both local dynamics and sensory-derived state contribute to
and motor cortical outputs of the action network fâ . A recent generating cortical motor patterns (see inputs to fâ in the figure
study by Sauerbrei et al.97 confirmed the importance of this tight in Box 1, panel c).

predominantly targeting layer 5 neurons. One class of layer 5 neurons for more details, see the section Hierarchical APC and cortical
(with thick tufted apical dendrites and firing in bursts) send their axons feedback).
to subcortical motor centers such as the superior colliculus and other Layer 5 motor output neurons, for example, those sending outputs
parts of the brainstem19,34,43. Other layer 5 neurons, which do not fire in to the superior colliculus, send axon collaterals to higher-order tha-
bursts and have slender apical dendrites, project to the striatum and lamic nuclei18,19 and receive motor information from subcortical motor
other cortical regions34,43. There is also a substantial axonal projection centers such as the superior colliculus regarding actions executed.
from layer 5 back to layer 2/3, signifying recurrent feedback within a These thalamic nuclei are therefore in an ideal position to compare the
cortical column. There are additional projections from layer 5 to layer actual executed action at (from the superior colliculus or other motor
6, and layer 6 in turn sends outputs to the parts of the thalamus that center) and the cortical prediction at̂ . The resulting action feedback
send inputs to layer 4. (for example, in the form of action-prediction errors), in addition to
sensory feedback (in the form of sensory-prediction errors), can be
Computational motivation conveyed by the thalamus back to the cortex to enable the
Computational considerations point to maintaining a close link state-transition network fŝ to correct its state prediction and the action
between actions and their sensory consequences. In model-based network fâ to correct its action prediction. Indeed, it is known that
reinforcement learning44 and, more generally, in the framework of higher-order nuclei such as the pulvinar receive cortical layer 5 inputs
partially observable Markov decision processes45,46, an intelligent and information from the superior colliculus and send axons to super-
‘agent’ interacts with the world by executing an action at−1 at time t − 1, ficial layers of area V1, explaining the response of V1 neurons to saccadic
and this causes the agent’s ‘state’ to change from st−1 to st; this change eye movements51.
is governed by the state-transition function fs (st−1 , at−1 ), which gener- The implementation suggested above is consistent with growing
ates a new state st according to a probability distribution P(st |st−1 , at−1 ). experimental data on predictive coding in the cortex2,4, with predic-
When the agent executes an action based on a policy fa, for example, tion error-like activity reported in superficial layers13,52 and predictive
moving its body by walking or making an eye movement, the hidden activity observed in deeper layers53 and in cortico–cortical interac-
state changes to the next state st (a new location in a building being tions54,55. The implementation above also shares similarities with pre-
navigated or a new part of a scene being recognized) according to vious canonical circuits for predictive coding49 in specifying laminar
fs (st−1 , at−1 ) . Inferring these hidden states as the agent makes move- roles for state estimates and prediction errors but differs in the use
ments is the essence of perception47,48. of both actions and states to generate predictions, rather than being
limited to hidden causes49. Evidence for state and action networks in
APC module and neuroanatomical implementation the cortex is summarized in Box 2.
The above computational considerations motivate the canonical APC
generative model shown in the figure in Box 1 (panel b). The correspond- Hierarchical APC and cortical feedback
ing model for inference and learning, referred to as the canonical APC A characteristic feature of the cortex is the reciprocal nature of con-
module, is described in Box 1 and shown in the figure in Box 1 (panel nections between cortical areas27: ‘feedforward’ connections from a
c). This figure also suggests one possible functional mapping of APC’s cortical area A (originating in the superficial layers) to a cortical area
computational elements onto the cortical laminar structure in the B (terminating in layer 4) are invariably reciprocated by anatomically
figure in Box 1 (panel a), which builds on previous proposals mapping defined ‘feedback’ (or descending) connections from area B to area A
predictive coding to cortical laminae49,50. (originating in the deeper (and sometimes superficial) layers of B to
As shown in the figure in Box 1 (panel c), the superficial layer cortical superficial and deep layers of A). Why are cortical areas reciprocally
neurons, which receive the filtered sensory inputs from layer 4 and are connected and organized in an approximate hierarchy27,56?
recurrently connected to each other, are well suited to implementing
the state-transition function fŝ . The motor output layer 5 neurons, which Computational motivation
are also recurrently connected to each other, fit the role of neurons Consider the problem of going to the grocery store from one’s house.
computing the action–policy function fâ . The other class of layer 5 As shown in Fig. 2a, the complexity of the problem can be substantially
neurons, which convey information to other cortical areas and the stria- reduced by dividing the task into subtasks (or ‘subgoals’), dividing each
tum, could maintain the current state estimate st̂ by integrating the state subtask into ‘sub-subtasks’, and so on. Reducing a complex problem
prediction from layer 2/3 and correcting it with prediction errors from to a sequence of easier-to-solve components and reusing these com-
the feedforward thalamic inputs to layers 4 and 5/6 (ref. 1). Layer 6 neu- ponents to solve new problems gets to the heart of compositionality,
rons receiving inputs from these state-estimating layer 5 neurons are which is thought to form the basis for cognitive flexibility and fast
well placed to compute the prediction for a lower-level area: at the lowest generalization in humans57,58.
level, layer 6 neurons predict sensory input Īt for the input It (layer 6 A complex problem can be characterized by its (typically
neurons at a higher level would predict the cortical state at a lower level; high-dimensional) state-transition function, which governs how the

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1225


Perspective https://doi.org/10.1038/s41593-024-01673-9

a Go to grocery store b c

Walk to door Walk to garage Get into car + …


drive to store
… …

Open car Sit inside Close car Put on seatbelt …


door door

Fig. 2 | Using hierarchies and compositionality to simplify complex tasks. of its compositional elements, namely, the two rooms outlined in yellow and
a, Decomposition of the ‘go to the grocery store’ problem into subgoals or red that appear at several different locations within the reference frame of
subtasks, each of which can be further divided into sub-subgoals or sub-subtasks. the environment. These simpler elements can be further decomposed into
Note that the rate of change is faster at the lower levels than at the higher levels, horizontal and vertical corridors shown on the right that appear at different
leading naturally to a temporal hierarchy. b, A navigation problem in a maze-like locations within the local reference frame of each room. c, An object (such as a
building environment with corridors (black) and walls (gray). Blue dot, current handwritten digit ‘8’) can be divided into parts (loops and curves at the middle
location; green square, desired goal location. The structure of the environment level), each of which can be divided into subparts (strokes, lines, smaller curves at
(pathways and walls) can be understood in terms of the building’s state-transition the lower level). Each part and subpart is associated with its coordinates (location
dynamics, which in turn can be divided into the simpler transition dynamics and transformation) within a local reference frame.

state of the environment changes when one applies an action. Fortu- Action inference through planning and reinforcement learning. As
nately, in the natural world, the consequences of most actions are local, shown in the figure in Box 3 (panel e, right), the higher-level action
and these local dynamics tend to be shared across many environments neurons represent an abstract action (such as ‘open the door’) via an
(i+1) (i+1)
and objects, allowing complex problems to be modeled in terms of action vector at at time t. Given the abstract action at , top–down
(i) (i+1)
simpler lower-dimensional state-transition functions. This is illustrated feedback given by the embedding input Ha (at ) modulates the
in the example in Fig. 2b, a simplification of the ‘go to the grocery store’ lower-level action network and instantiates the goal-specific policy
(i)
problem: here, a maze-like environment is modeled using simpler fa , which produces lower-level actions ait,τ . The hierarchical action
components (yellow- and red-outlined rooms), each composed of even networks in the APC model can be trained in multiple ways: (1) planning:
simpler components (corridors). the state-transition networks can be used to search for sequences of
Interestingly, the same concept can also be applied to visual per- actions, starting from the highest abstraction level, that are likely to
ception. As illustrated in Fig. 2c, a visual object can be compositionally result in states with the highest cumulative reward or closest to the
defined in terms of parts and their locations within the object’s reference goal (see also active inference8, planning by inference63–65 and model
frame59; the parts can in turn be decomposed into simpler parts within predictive control66; see the section Illustrative examples of diverse
their respective reference frames. Nested compositional representations computational capabilities). Successful actions can be used as ‘labels’
(i)
of objects and environments offer substantial combinatorial flexibility in supervised learning to train the policy networks f ̂ ; (2) reinforce-
a
for solving complex problems in terms of simpler, reusable components.
ment learning: hierarchical reinforcement learning67 can be used to
The fact that the world we live in and the problems we seek to solve are
train action networks at each level to maximize the total expected
amenable to compositional solutions makes such an approach attractive,
reward according to a reward function that may be specific to that level
from both a computational and an evolutionary perspective.
(details in the section Illustrative examples of diverse computational
capabilities); (3) policies providing priors for planning: action networks
The hierarchical APC model (i)
Box 3 describes the APC model’s hierarchical architecture and its neural fâ at each level predict a distribution over actions, which can serve as
implementation. The figure in Box 3 (panel e) shows two levels of the a prior, in a Bayesian sense, for guiding the search for actions in plan-
model, implemented using top–down ‘contextual inputs’ to connect ning. Thus, predicted actions for new tasks will have high uncertainty,
the higher level to the lower level (see Box 3 for an alternate implemen- requiring effort and deliberation in planning (‘system 2 thinking’
tation based on gain modulation). (ref. 68)), while, for frequently encountered tasks, action networks are
well trained and will predict actions with high confidence (‘system 1
State inference using predictive coding and compositional learn- thinking’ (ref. 68)).
ing. As shown in the figure in Box 3 (panel e, left), the higher-level An advantage of continuous-valued state s(i+1) and action a(i+1)
(i+1)
state neurons maintain an estimate for the state st at time t and vectors is that interpolating or sampling in the neighborhood of
modulate the lower-level state network via top–down feedback given learned s(i+1) and a(i+1) generates, on the fly, new state-transition func-
(i) (i+1) (i) (i)
by Hs (st ). The lower-level state neurons maintain an estimate for tions f ̃ and new policy functions f ̃ , opening the door to fast gen-
s a
sit,τ , where τ denotes a time step at the lower level within the eralization and transfer of knowledge to new tasks. The alternative to
higher-level time interval given by t. The lowest-level state makes a continuous states is to use discrete states, as in previous models of
prediction of the input via a ‘decoder’ network D (figure in Box 1, panel predictive coding based on belief propagation or variational message
b). If D is a linear matrix U, this lowest level of APC is equivalent to the passing (for example, Fig. 10 in ref. 69). Aside from the possibility of
generative model in sparse coding (I = Us, where s is sparse60). At each fast generalization and transfer, APC’s use of continuous states also
time step, the network predicts the next input as a function of previ- allows explicit representation of prediction errors, which in turn allow
ous state and action. Feedforward pathways convey prediction errors local optimization of dynamics and learning (under Gaussian
to update state estimates1, while descending pathways convey top– assumptions)1.
down modulation as described above (see ref. 61 for an example).
Prediction errors are also used to learn the weights of the state net- Proposed neuroanatomical implementation
works at all levels using predictive coding-based self-supervised I propose that neural populations in a higher cortical area A represent-
learning1,32,61. Such learning approximates error backpropagation, ing current state and action vectors s ̂
(i+1) (i+1)
and a ̂ modulate the state
the workhorse of contemporary deep learning, in a biologically plau- (i) (i)
sible manner62. and action networks fŝ and fâ in a lower cortical area B via descend-

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1226


Perspective https://doi.org/10.1038/s41593-024-01673-9

Box 3

Hierarchical APC model and its neural implementation


In the hierarchical APC model (figure in Box 3, panel a), a state s(i+1) dynamics contextualized by slowly varying control parameters
and an action a(i+1) at abstraction level i + 1 generate, respectively, a supplied by a higher level.
(i) (i)
state-transition function fs and a policy function fa (‘option’ (ref. Can the hierarchical model in the figure in Box 3 (panel a) be
44)) at the lower level i. These functions interact with each other to implemented in networks of neurons? More specifically, how can a
generate lower-level states and actions (figure in Box 3, panel a). Each population of neurons, representing, for example, the higher-level
such state and action in turn generates transition and policy functions state vector s(i+1) (or action vector a(i+1)), generate a whole function
(i) (i)
at an even lower level of abstraction. A lower-level sequence fs (or fa ) at the lower level?
executes for a period of time until a condition is met (for example, a
subgoal is reached, a task is completed or times out or there is an Gain modulation in the cortex. There is considerable evidence for
irreconcilable error at that level). Control then returns to the higher ‘gain modulation’ in cortical networks99–102, implemented
(i+1)
level, which transitions to a new higher-level state (via fs ) and computationally by multiplying the synaptic weights or outputs of
(i+1)
action (via fa ). Such a model captures both the dynamics of states neurons by a gain factor. Evidence for gain modulation ranges from
(the ‘physics’ of the world) and actions (‘policies’) at different time multiplicative modulation of tuning curves of visual cortical neurons
scales, allowing hierarchical problem solving (Fig. 2). Biologically, the during attention103 to changes in the input–output function of
recurrent networks implementing fs and fa are governed by specific neurons in deep layers of the cortex due to ‘top–down’ modulatory
decay time constants, but differences in recurrent excitation can inputs to their apical dendrites in layers 1 and 2/3 (figure in Box 3,
allow a hierarchy of time scales, as observed across the cortex98. panel b)100. We can view gain modulation as a biologically plausible
Hierarchical dynamics in predictive coding was previously explored way of implementing a ‘hypernetwork’ (ref. 104), which in ML and AI is
in ref. 3 for perception and production of birdsong, with lower-level a neural network that produces the synaptic weights (or more

Excitatory
a b Hs(i) (st(i + 1)(
c Cortical network
Inhibitory
Initial

fs(i + 1) s(it + 1) fa(i + 1) a(it + 1) condition


Network
output
Gain (EMG)
Hs(i) Ha(i) modulation
(i)
at,τ–1 80
firing rate (f) (Hz)
5 Hz
fa(i) 60
s(i)
Relative

(i)
st,τ–1 t,τ 40 2
fs(i) (i)
fa(i) a(i)
t,τ
(i)
at,τ–1 at,τ–1 20 1 3
s(i)
t,τ
0 0 4
Gain
–20
Input (AU) 100 ms
d Fixation input Recurrent units e
256
s(it + 1) a(it + 1)
Stimulus mod 1
360°

Hs(i) Ha(i)

Stimulus mod 2 1
Fixation output
Response
Rule inputs 360°
20 (i)
st,τ–1 fs(i) s(i)
t,τ fa(i) a(i)
t,τ

1 (i)
0° (i)
at,τ–1 at,τ–1
0 1,000
Time (ms)

Hierarchical APC model. a, Two levels (levels i + 1 and i ) of a hierarchical APC generative model. Each level has a state-transition function fs
capturing the dynamics of the world at a particular level of abstraction and a policy function fa specifying that level’s actions, goals and
coordinates (conditioned on the current highest-level goal or task). Higher-level state and action vectors at time t generate, via top–down
networks Hs and Ha, lower-level state-transition and policy functions, allowing the higher level to compose a sequence of states and actions at
the lower level to accomplish a goal. b, As depicted here for a single pyramidal neuron, I hypothesize that top–down inputs Hs(i) (s(i+1)
t
) from a
higher cortical area to the apical dendrites of lower-area neurons modulate the dynamics of a network of such neurons (for example, via gain
modulation100,101), allowing the higher area to change the functions fs and fa at the lower level. c, Top, multiplicative gain modulation (for
example, due to top–down inputs) in the input–output function of neurons in a recurrent network allows the network to generate a rich set of
motor cortical dynamics matching experimental data102. EMG, electromyogram of muscle activity. Bottom, changing the gain from 1 (black) to 2
(blue) (bottom left plot) dramatically alters neuronal firing rates (three example neurons are shown on the right), mimicking quasi-oscillatory
motor cortical activity (see figure in Box 2, panels c,d) (adapted from ref. 102, Springer Nature Limited). d, The function computed by a recurrent
network (center) can be modulated using a nonchanging top–down contextual input or ‘rule input’ (one-hot vector, bottom left) in addition to
recurrent and stimulus inputs (top left), allowing the same network to solve different tasks (output for a specific task is shown on the right)
(adapted from ref. 72, Springer Nature Limited). Mod, modality. e, Implementation of the APC model in a using contextual inputs: higher-level
state and action neurons maintaining estimates of s(i+1)
t
and a(i+1)
t
modulate lower-level state and action networks via top–down contextual
inputs Hs (st ) and Ha (at ), respectively.
(i) (i+1) (i) (i+1)

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1227


Perspective https://doi.org/10.1038/s41593-024-01673-9

(continued from previous page)

plausibly, the gain parameters) for another neural network (called the model of Eliasmith and colleagues105). This is known in ML as the
‘primary network’). In the APC model, I propose that the higher-level ‘embedding approach’ and can be shown to be equivalent in
state vector s(i+1) is fed as input to a top–down feedback network computational function to hypernetworks106. In the case of APC, the
(i)
Hs , which produces the gain values to modulate the lower-level higher-level state vector s(i+1) (and action vector a(i+1)) can be fed as
(i) (i) (i)
state network fs (and similarly for the action network). The ability of input to a top–down feedback network Hs ( Ha ) that produces an
such a neural mechanism to modulate the function being computed embedding vector, which acts as a contextual input to a lower-level
(i) (i)
by a cortical network was demonstrated by Stroud et al. (figure in cortical network that computes fs ( fa ) (figure in Box 3, panel e). The
Box 3, panel c), who showed that multiplicative gain modulation of a higher level can therefore control the function being computed at the
recurrent network can generate a rich set of motor cortical dynamics lower level by changing the embedding vector (contextual input).
matching experimental data102. The APC model acknowledges the existence of both gain
modulation and contextual inputs in the cortex and postulates that
Contextual inputs in the cortex. Aside from gain modulation, higher either or both of these mechanisms are used for changing the
(i) (i)
cortical areas can also change the function being computed by lower functions fs and fa at the lower level according to the current
cortical networks using top–down contextual inputs. For example, higher-level state vector s(i+1) and the action vector a(i+1). The
Yang et al.72 showed that, by feeding a top–down contextual input examples described in the section Illustrative examples of diverse
(‘rule input’) as a nonchanging input to a recurrent network (in addition computational capabilities were implemented using the method of
to its usual recurrent and external inputs), one can change the contextual inputs. The reader is referred to ref. 61 for examples
input–output function that the network computes, allowing the same based on gain modulation and to refs. 32,33 for hypernetwork-
network to solve different tasks (figure in Box 3, panel d; see also the based examples.

ing connections that target the superficial and deep layers of area B27. tion and part–whole learning and Planning and navigation using
Feedforward connections that target layer 4 of area A arise from the hierarchical world models for examples) and similarly for all levels.
lower area B and from the higher-order thalamic region receiving ‘driver I postulate that such coordination between hierarchical levels for action
input’ from area B18,27. I propose that these feedforward connections selection occurs via cortex–basal ganglia–thalamus–cortex loops;
carry the state and action feedback (for example, prediction errors) I leave the important problem of working out the implementation
that enable the higher area A to correct its abstract state and action details in these loops to future research.
estimates. Such a neural implementation is consistent with studies
reporting prediction error-like responses in superficial layers and Illustrative examples of diverse computational
state-estimation-like responses in deeper layers of the cortex2,13. capabilities
A key difference from previous formulations of hierarchical predic- The architecture of the APC model was inspired by the hypothesis that
tive coding1,2,49,50 is that APC places subcortical (thalamic) populations evolution may have replicated a common computational principle
center stage in the evaluation of state- and action-prediction errors across the cortex20–23. If that is the case, one would expect the same
and their broadcasting to superficial pyramidal cells in the cortex70. architecture to be able to solve a diverse set of problems. Inspired by
Additionally, in APC, descending connections from higher to lower this observation, I provide here examples illustrating the APC model’s
cortical areas change the function being computed in the lower area diverse capabilities.
(through top–down modulation), rather than only conveying
lower-level state predictions as in traditional predictive coding1. Active visual perception and part–whole learning
Higher-area neurons representing action at
(i+1)
modulate the action Human vision can be viewed as an active sensory–motor process
(i)
̂ that employs eye movements to move the high-resolution fovea to
network f in a lower area, changing the policy function that this
a appropriate locations in a scene, gathering evidence for or against
lower-level network is computing. Neurophysiological evidence for
competing visual hypotheses7,48. The APC architecture is well suited
such compositional representations has recently emerged in the pre-
to modeling such a sensory–motor process, given its integrated state
motor cortex71. Computational models have demonstrated composi-
(i+1) and action networks. To illustrate this capability, we simulated31,32 a
tionality for task transfer using the embedding space of at (ref. 72). two-level APC model (figure in Box 3, panel e) in which the lower-level
Hierarchical representations found in the visual27,29 and motor sys- actions emulated eye movements by moving a fovea (‘glimpse sensor’
tems30 are also consistent with the hierarchical compositional approach (ref. 74)) to extract high-resolution information about a small part
espoused by the APC model. Finally, the compositional representations of the input image within a larger reference frame selected by the
in the cortex postulated by the APC model align well with recent hypoth- higher-level action.
eses regarding compositional hippocampal replay73. The lower-level action also predicts a new state vector st,τ+1, which
After the state and action networks have been learned for a set of generates, via a trained decoder, a prediction for the glimpse image
tasks (as described in the section The hierarchical APC model), given a expected after the ‘eye movement’. The resulting prediction error was
particular task, the topmost state vector in the hierarchy is first inferred used for state inference and learning. The state networks at both levels
from sensory inputs. This vector produces (via that level’s action net- were trained to minimize image-prediction errors, while the action
work fâ ) the topmost action vector specifying a ‘goal’ or option for the networks were trained using reinforcement learning for the task of
task; I hypothesize that this vector is maintained in the prefrontal cortex. image reconstruction (for image classification as the task, see ref. 31).
Some of the pre-movement anticipatory activity in Fig. 1a may well Fig. 3a shows an example of a learned parsing strategy by the
reflect such a ‘cognitive’ decision. This abstract action vector is decom- two-level APC model. The higher level learned to select actions that
posed hierarchically all the way down to elemental actions (for example, cover the input image sufficiently, avoiding blank regions, while the
muscle control signals in M1). When a sequence of elemental actions is lower level learned to parse subparts inside the reference frame com-
executed and a subgoal is reached, control is returned back to the level puted by the higher level. Fig. 3a also suggests a potential explanation
above to generate a new subgoal (see the sections Active visual percep- for why human perception can appear stable despite dramatic changes

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1228


Perspective https://doi.org/10.1038/s41593-024-01673-9

Initial Macro-step Macro-step Macro-step


a glimpse 1 2 3

Macro-steps

Micro-steps

Predicted glimpses

Actual glimpses

Object
perception/
decoder
output

b Part hierarchy Location hierarchy

New Recons- Predicted


c d input truction parts
T-shirt Trouser Pullover Dress Coat

Sandal Shirt Sneaker Bag Boot

Fig. 3 | Active vision, part–whole learning and transfer of knowledge. a, First parts and subparts can potentially be reused at other locations and with other
row, initial glimpse (purple box) and higher-level reference frames selected (red, transformations to compose new digits. c, Higher-level part locations selected by
green and blue boxes) at higher-level time steps (‘macro-steps’); second row, a trained APC model for a particular class of clothing items in the Fashion-MNIST
regions fixated at lower-level time steps (‘micro-steps’) within each higher-level dataset (red, green and blue dots show the average sampled locations fixated in
reference frame; third and fourth rows, predicted versus actual glimpses; fifth the following order: first, red; second, green; third, blue). Note the differences
row, the model’s ‘perception’ over time (object reconstructed by a decoder in the model’s fixation strategies between vertically symmetric items (shirts,
network from the current network state). Note the model’s ‘perceptual’ stability trousers, bags) and footwear (sandals, sneakers, boots). d, An APC model trained
despite jumps in actual glimpses, enabled by predictions of the glimpses similar on the Omniglot handwritten characters dataset (from 50 different alphabets)
to visual cortical predictions before eye movements (figure in Box 2, panel b). can transfer its learned knowledge to predict parts of previously unseen
b, The digit ‘8’ is parsed by a trained APC model as a parse tree of parts and subparts character classes. First column, input image from a new character class. Middle
(left) and their corresponding coordinates (locations) within their respective column, APC model’s reconstruction of the input. Last column, parts predicted
reference frames (right). The representation is compositional: the same set of by the model (d, adapted with permission from ref. 32, MIT Press).

in our retinal images due to eye movements: the model maintains a error input to the network to zero forces the model to predict the next
stable visual hypothesis that is gradually refined without exhibiting sequence of parts and ‘complete’ an object32. Finally, compositional
the rapid changes seen in the sampled image regions (Fig. 3a, actual learning in the APC model facilitates transfer of learned knowledge to
glimpses). This ‘perceptual’ stability is enabled by the model’s ability new objects (Fig. 3d).
to predict the expected glimpses for each planned ‘eye movement’
(Fig. 3a, predicted glimpses), similar to predictive activity observed in Planning and navigation using hierarchical world models
the visual cortex before eye movements (figure in Box 2, panel b)14–16. Interestingly, the same APC framework used above for active vision
Fig. 3b shows a learned part–whole hierarchy for a digit in terms can also be used for planning hierarchical actions for tasks such as
of strokes and mini-strokes along with their locations within nested navigation. Consider the problem of navigating from any starting
reference frames. The model learns different parsing strategies for location to any goal location in a large ‘multi-room’ building envi-
different classes of objects (Fig. 3c). Setting the image-prediction ronment such as the one in Fig. 4a (gray, walls; blue circle, current

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1229


Perspective https://doi.org/10.1038/s41593-024-01673-9

a A1 c

1
A3

b A4

x x x
x A3 A2 A3

A3
x A1 x A1 x A7 x

Reward!

d e f
100 Low-level planning RL agent
APC agent
APC planning 10
Number of planning

Goal changed Composite task


Episodic rewards

80
100
5 Pretraining
80
***

Correct (%)
60
steps

60
0
40 40
20
–5 Scratch
20 0
–10 1 2 3 4 5
0
Session
0 5 10 15 20 0 500 1,000 1,500 2,000
Distance from goal Episodes

Fig. 4 | Hierarchical planning. a, The problem of navigating in a large only three high-level actions (bottom). Small red dot, intermediate location;
environment (left) can be reduced to planning using high-level states (red- and small blue dot, intermediate goal. d, High-level planning by the APC model versus
yellow-outlined ‘rooms’) and high-level abstract actions (panels on the right low-level heuristic planning using primitive actions (see text for details). e, The
show two abstract actions, A1 and A3). Blue, current location; gray, walls; green, APC model can reuse learned high-level actions in new combinations to quickly
current goal location. b, To navigate to the goal, the APC model uses its learned solve new tasks (green circles, times at which the navigation goal changed); a
high-level state network to sample K high-level state–action sequences (K = 2 reinforcement learning (RL) agent needs to relearn a new policy from scratch.
here, shown bifurcating from the initial state). In each sequence, the high-level Blue- or red-shaded regions in d,e are 1 s.d. from the mean. f, Mice pretrained on
state is depicted by a predicted room image (red- or yellow-outlined image) and two subtasks quickly learned to combine them to solve a new composite task75
its location (marked by an ‘X’ in the rectangular global frame below the image). (compare with the APC model in e after a goal change). Blue, performance of mice
High-level actions are depicted as square local frames (next to arrows) with learning the task from scratch (compare with the reinforcement learning agent in
goal locations (purple). c, Given the sampled sequences, the model picks the e after a goal change) (a–e, adapted with permission from ref. 32, MIT Press;
sequence with the highest total reward, executes this sequence’s first (high-level) f, adapted from ref. 75, CC BY 4.0).
action to reach the blue location (top) and repeats to reach the goal location with

location; green square, current goal location). Here, the lower-level each trained using reinforcement learning to reach one of the four
states of the APC model are locations in the grid, and lower-level corners of S1 or S2 (see ref. 32 for details). Defining these policies to
actions are going north, east, south or west, with a large reward at operate within the local reference frame of the higher-level state S1 or
the goal location and smaller negative rewards for each action to S2 (regardless of global location in the building) allows the same policy
encourage shorter paths. to be reused at multiple locations.
Just as an object consists of parts at different locations, the build- The higher-level state network was trained to predict the next
ing environment in Fig. 4a is composed of smaller elements (two 3 × 3 higher-level state. This trained higher-level network was used for plan-
‘room types’, S1 (red) and S2 (yellow)) at different locations in the global ning (using model predictive control66): random state–action trajecto-
reference frame of the building. The higher-level states of the APC ries of length 4 were generated using the higher-level state network by
model are defined by state-embedding vectors S1 and S2, trained to starting from the current higher-level state and picking at random one
generate, via the top–down network Hs (figure in Box 3, panel a), the of the four higher-level actions Ai for each next higher-level state. The
lower-level transition functions fŝ for rooms S1 and S2, respectively. action sequence with the highest total reward was selected, and its first
Similar to how the APC vision model reconstructed an image in action was executed. This process was repeated. Fig. 4b,c illustrates this
the section Active visual perception and part–whole learning by com- high-level planning process using the trained APC model.
posing parts from subparts, the APC model for planning computes Fig. 4d illustrates the efficacy of the APC model’s high-level plan-
higher-level action embedding vectors Ai (option vectors) that gener- ning compared to lower-level planning using primitive actions (see ref.
ate, via the top–down network Ha (figure in Box 3, panel a), lower-level 32 for details): the APC model takes significantly fewer planning steps
policies fâ that produce primitive actions (north, east, south or west) and can reuse its learned higher-level actions in new combinations to
from any location in the local reference frame (S1 or S2) to reach a local quickly solve new tasks (for example, when the goal is changed; Fig. 4e),
goal i within that frame. Fig. 4a (right) illustrates two of the eight Ai, similar to a recent study in mice75 (Fig. 4f).

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1230


Perspective https://doi.org/10.1038/s41593-024-01673-9

Box 4

Model flexibility and predictions


The APC model appears to be flexible enough to perform a diverse model to bind sensory and symbolic representations for language
set of functions such as: (1) parsing images and learning part–whole processing and cognitive tasks such as arithmetic (see the section
hierarchies: the model uses eye movements to parse images and Learning abstract concepts and ref. 58); (12) learning abstract
learn hierarchical representations of parts and subparts of objects concepts: the same sensory–motor architecture used for perception
(details in the section Active visual perception and part–whole and planning can also be used to model abstract concepts such as
learning); (2) invariant perception: learned representations of family trees (details in the section Learning abstract concepts
objects and sequences are transformed by the APC generative and ref. 77).
model to match current inputs and remain invariant to different The APC model makes the following predictions:
types of transformations (translations were considered in the •• The laminar implementation proposed in the figure in Box 1 (panel
section Active visual perception and part–whole learning, other c) predicts that, in each cortical area, neurons representing the
transformations such as rotations and scaling are included in ref. 33); sensory-derived latent state will exhibit predictive activity,
(3) perceptual stability: inference in the APC model naturally representing the output of the state-transition function fŝ , as a
leads to integration of information across actions such as eye function of both sensory inputs and deeper-layer ‘action’ inputs;
movements, leading to perceptual stability (examples in the section experimental manipulation of deeper-layer activity should allow
Active visual perception and part–whole learning; see also refs. control of this predictive activity and change this activity similar to
23,59,74); (4) compositionality and fast transfer of knowledge: by how the predicted glimpses change in Fig. 3a as a function of
learning compositional representations, the model can compose action inputs to fŝ .
and generate new objects and action sequences, leading to fast •• In each cortical area (including traditional ‘sensory’ areas), the
generalization to new inputs and goals (examples in the sections model predicts a population of neurons in layer 5 representing
Active visual perception and part–whole learning and Planning and ‘actions’, either motor outputs (for example, in M1 or the primary
navigation using hierarchical world models); (5) efficient planning: somatosensory cortex) or abstract actions or goals (for example,
hierarchical state networks in the APC model can be used to solve in the parietal or prefrontal cortex); this action-related population
tasks efficiently (for example, navigating in a large environment) by activity in layer 5 should change in a coordinated manner across
planning using hierarchical actions (details in the section Planning cortical areas whenever the goal or task is changed, similar to
and navigation using hierarchical world models; see also ref. 107); how the lower- and higher-level actions change in the APC model
(6) habit formation: successful plans can be used to learn new whenever the goal location is changed in Fig. 4a.
policies (‘habits’; see the section Planning and navigation using •• As depicted in the figure in Box 3 (panels a,e), feedback from a
hierarchical world models); alternately, the APC model also allows higher cortical area should originate from two separate
policies to be learned using hierarchical reinforcement learning (see populations (higher-level state- and action-representing neurons)
the section Active visual perception and part–whole learning); (7) and specifically target two separate populations in the lower area
reference frames and temporal hierarchies: the APC model provides (lower-level state- and action-representing neurons, respectively);
a neural implementation of nested reference frames23 and offers an this feedback should be modulatory and capable of changing the
explanation for object-centered parts-based representations in the functional connectivity of their target populations, emulating the
cortex108 as well as cortical temporal hierarchies28,29; (8) prediction (i) (i)
effects of the ‘hypernetworks’ Hs and Ha in the figure in Box 3
and postdiction: because the model maintains a temporally stable
(panels a,e). Experimental manipulation of this feedback should
higher-level state (a ‘timeline’ (ref. 76)) encoding an entire sequence (i) (i)
(past, present and future), the update of this representation selectively change the lower-level functions f ̂ and f ̂ being
s a
during prediction error minimization explains both predictive and computed, with the effects becoming apparent in the state
postdictive phenomena in perception (for example, flash lag and predictions being generated by fŝ (see predicted glimpses in
color phi effects; see ref. 61 for details); (9) generating ‘schemas’ Fig. 3a) and the actions being output by fâ (see primitive actions in
or ‘programs’ for solving new tasks: the APC model suggests a Fig. 4c (bottom)), respectively.
neural mechanism (via top–down inputs and/or gain modulation) •• APC’s hierarchical arrangement predicts the existence of
for generating new sensory–motor ‘programs’ or ‘schemas’ on the both state and action representations in higher-order cortical
fly to solve new tasks (Box 3 and ref. 109); (10) binding and episodic areas that encode information over longer time scales than
memories of perception–action sequences: when coupled with a lower areas28,29,61,80,98,110; furthermore, the model predicts that a
hippocampus-like associative memory, the model binds multimodal substantial population-level activity change in the superficial
cortical activations at the highest level into an episodic memory, layers of a higher area, denoting a higher-level state change
allowing activity recall, prediction based on episodic context triggered by a subgoal being achieved by the lower area, would
and cortical consolidation for fast generalization and learning, cause a large population-level activity change in layer 5, signifying
as discussed in the section Episodic memories and cortical– a new high-level action or subgoal being generated (for example,
hippocampal binding; (11) language and symbolic representations: the high-level action leading to macro-step 2 or macro-step 3
making state and action representations categorical (for example, in Fig. 3a or a subgoal that reaches the lower yellow square in
ref. 78) and using cortical–hippocampal binding may allow the APC Fig. 4c (top)).

Episodic memories and cortical–hippocampal binding network when the inputs are natural videos comprise oriented spati-
Each level of the APC hierarchy learns generic ‘basis functions’ for rep- otemporal Gabor filters coding for edges and bars moving at different
resenting states and actions based on interactions with the environ- orientations61. The neural activity vector is a specific activation pattern
ment. For example, the basis functions learned by the lowest-level state coding for the current video segment in terms of the learned

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1231


Perspective https://doi.org/10.1038/s41593-024-01673-9

spatiotemporal filters. At the highest level N of the APC model, the both a state-transition network for state prediction and an action net-
(N) (N)
specific activation patterns s ̂ and a ̂ together represent (figure in work for action (or goal) prediction, and (2) higher-area neurons rep-
Box 3, panel e, with i + 1 = N) an entire sequence (or timeline76) corre- resenting more abstract states and actions modulate lower-area state
sponding to the current episode of interaction with the environment, and action networks via top–down modulatory control to change the
for example, the sequence of glimpses of an object as in Fig. 3a or the functions they are computing, leading to nested reference frames and
sequence of locations visited during navigation as in Fig. 4c (bottom). hierarchical representations of objects, states and actions. A possible
(N) (N)
By ‘binding’ these highest-level neural activation patterns s ̂ and a ̂ neuroanatomical mapping of the APC model to cortical laminar struc-
(assumed to correspond to entorhinal cortex activity in the APC model) ture was suggested in the section APC module and neuroanatomical
in a hippocampus-like associative memory, one can store the current implementation.
sequence of experienced sensations (vision, touch, sound, smell, The APC model lends support to the hypothesis20–23 that there may
rewards, etc.), locations (or coordinates within a reference frame) and be a unifying computational principle operating across the cortex by
actions as an episodic memory vector m (ref. 61). showing how the same basic APC architecture can perform a diverse
The projection from the hippocampus back to the entorhinal range of computations (see Box 4 for a summary). The APC model
cortex implies that the fused multimodal information in the current shares broad similarities with a number of other models advocating
episodic memory vector m is fed back to enable better prediction (m prediction and hierarchy as core aspects of brain function1,3,22,23,79–83,
plays a role similar to context windows in transformers in AI) and to going back to the seminal early work of MacKay84 and Albus85. The
influence state and action estimation in cortical areas down the hierar- goal of putting action on an equal footing with perception in terms
chy to the lowest levels. Particularly salient episodic memory vectors of Bayesian inference and prediction error minimization is in keeping
may be stored and later compositionally recombined73 or recalled for with the theories of free energy minimization proposed by Friston and
replay in the cortex when given an internal or external cue, for example, others3,8,69. In its current formulation, APC addresses action selection
a location where the episode occurred, a sound or smell associated via reinforcement learning (see the section Active visual perception
with the episode or a partial visual input marking the beginning of the and part–whole learning) and planning via model predictive control
episode (see ref. 61 for an example). (as described in the section Planning and navigation using hierar-
In summary, the APC model suggests that the cortex encodes chical world models). The latter is related to planning as inference
generic semantic knowledge about the world within state and action methods63–65 and active inference schemes that optimize expected
networks that implement nested reference frames. Any particular information gain plus expected value8,69.
instantiation of this knowledge invoked by, for example, an interac- Compositionality and the representation of sensory–motor infor-
tion with a person or an object, is stored temporarily as an episodic mation in cortical columns are also central tenets of the ‘thousand
memory vector m in the hippocampus. This instantiation could be brains’ theory23,59. The close interaction between state-estimation
used for reasoning about the current situation or for planning and, if networks and action-computing networks in the APC model is con-
deemed important, could be consolidated within the cortex by updat- sistent with theories of optimal motor control86, especially theories
ing cortical networks via replay during inactivity or sleep. The idea highlighting the importance of internal models in solving the inverse
of fast binding of specific instances (‘fillers’) with generic semantic problem of computing optimal motor commands to solve a task81,87,88.
‘roles’ is gaining currency in both AI58 and hippocampal modeling (for However, based on recent evidence pointing to outputs from layer 5 in
example, the Tolman–Eichenbaum machine77; see also ref. 73). The essentially all cortical areas to subcortical motor centers19,34,35,37, the APC
benefits of such a representation, including fast transfer of knowl- model proposes that all cortical areas include both state-estimation
edge and zero-shot learning, can be expected to also accrue to the and policy components. M1 is often cited as a uniquely ‘motor’ cortical
memory-augmented APC model. area missing the sensory input layer 4, with damage to M1 in primates
causing permanent loss of distal (although not proximal) movements89.
Learning abstract concepts However, even M1 receives sensory information from other cortical and
I briefly sketch here how the same sensory–motor architecture used subcortical areas90, especially in its superficial layers39,40, and could
for perception and planning above could also be potentially used to therefore, as suggested by the APC model, predict and estimate state
model abstract concepts. Take, for example, modeling the concept of (for example, proprioceptive state) and compute actions based on
a family tree. The state–action representations in the APC model can these state estimates91.
be made categorical (for example, as in ref. 78), allowing states and The APC generative model in the figure in Box 3 (panel a) focuses on
actions to represent symbols. The states can then represent abstract hierarchical structure and does not account for cross-modal (sensory
categories such as father, mother, daughter, uncle, etc., while abstract to sensory) or hierarchically ‘horizontal’ connections in the neocortex
actions (up, down, etc.) can be used to traverse and define a family tree (for example, ref. 92). However, it is possible to extend APC’s generative
sequentially. The notion of ‘fast binding’ of cortical representations in model to allow cross-modal influences and horizontal interactions to
hippocampal memory discussed above could be used to bind specific enable more accurate state prediction and estimation. For example,
persons to their roles (father, mother, etc.). consider a generative model evolved for use by an animal foraging in
Results along these lines were obtained using the Tolman–Eichen- the forest: the hidden state denoting, for example, a tiger, can generate
baum machine model77, in which a recurrent neural network (similar both a visual cue (stripes) and an auditory cue (rustling sound). In the
to the state-transition network in the APC model but for a single level) extended APC model employing such a generative model, the state
was used in conjunction with an associative memory to learn the struc- network in a sensory area (for example, V1) would leverage information
ture of family trees from examples. Extending these ideas to abstract from other sensory modalities (for example, from the auditory cortex)
state–action networks for symbolic reasoning in a hierarchical APC via horizontal cortical connections to derive an accurate estimate of
model may offer new insights into understanding how cortical–hip- the current state of the world. Extending the APC model to account
pocampal networks represent language and solve abstract cognitive for such cross-modal and horizontal cortico–cortical connections is
tasks such as arithmetic. an important direction for future work.
A large number of unknowns remain such as the exact physiologi-
Discussion cal mechanisms underlying the modulatory interactions between
Inspired by recent results highlighting the influence of actions across higher-order and lower-order cortical areas across multiple time
most areas of the cortex, I suggested APC as a sensory–motor theory of scales, the role of alpha, beta, theta and gamma oscillations in such
cortical function. APC proposes that (1) each cortical area implements interactions and the representation of uncertainty in the cortex.

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1232


Perspective https://doi.org/10.1038/s41593-024-01673-9

While there is emerging neurophysiological and neuroanatomical 23. Hawkins, J. A Thousand Brains: A New Theory of Intelligence
evidence2,9–11,13–16,18,41,42,51,55,93 that lends some support to the APC model’s (Basic Books, 2021).
predictions (Box 4), there is much that remains to be tested. I hope 24. Douglas, R. J. & Martin, K. A. Neuronal circuits of the neocortex.
that the theoretical framework offered by the APC model is helpful in Annu. Rev. Neurosci. 27, 419–451 (2004).
the design of new experiments aimed at uncovering the cortical and 25. Harris, K. D. & Shepherd, G. M. The neocortical circuit: themes and
subcortical basis of sensory–motor processing and cognition. variations. Nat. Neurosci. 18, 170–181 (2015).
26. Roe, A. W., Pallas, S. L., Kwon, Y. H. & Sur, M. Visual projections
References routed to the auditory pathway in ferrets: receptive fields of visual
1. Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: neurons in primary auditory cortex. J. Neurosci. 12, 3651–3664
a functional interpretation of some extra-classical receptive-field (1992).
effects. Nat. Neurosci. 2, 79–87 (1999). 27. Felleman, D. & Essen, D. V. Distributed hierarchical processing in
2. Keller, G. B. & Mrsic-Flogel, T. D. Predictive processing: a the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).
canonical cortical computation. Neuron 100, 424–435 (2018). 28. Murray, J. D. et al. A hierarchy of intrinsic timescales across
3. Friston, K. & Kiebel, S. Predictive coding under the free-energy primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).
principle. Philos. Trans. R. Soc. London B Biol. Sci. 364, 1211–1221 29. Siegle, J. H. et al. Survey of spiking in the mouse visual system
(2009). reveals functional hierarchy. Nature 592, 86–92 (2021).
4. Jiang, L. P. & Rao, R. P. N. Predictive coding theories of cortical 30. Grafton, S. T. & de C. Hamilton, A. F. Evidence for a distributed
function. Oxford Research Encyclopedia of Neuroscience https:// hierarchy of action representation in the brain. Hum. Mov. Sci. 26,
doi.org/10.1093/acrefore/9780190264086.013.328 (Oxford Univ. 590–616 (2007).
Press, 2022). 31. Gklezakos, D. C. & Rao, R. P. N. Active predictive coding networks:
5. Halpern, B. P. Tasting and smelling as active, exploratory sensory a neural solution to the problem of learning reference frames
processes. Am. J. Otolaryngol. 4, 246–249 (1983). and part–whole hierarchies. Preprint at arxiv.org/abs/2201.08813
6. Lederman, S. J. & Klatzky, R. L. Hand movements: a window into (2022).
haptic object recognition. Cogn. Psychol. 19, 342–368 (1987). 32. Rao, R. P. N., Gklezakos, D. C. & Sathish, V. Active predictive
7. Ahissar, E. & Assa, E. Perception as a closed-loop convergence coding: a unifying neural model for active perception,
process. eLife 5, e12830 (2016). compositional learning, and hierarchical planning. Neural
8. Friston, K. The free-energy principle: a unified brain theory? Nat. Comput. 36, 1–32 (2024).
Rev. Neurosci. 11, 127–138 (2010). 33. Fisher, A. & Rao, R. P. N. Recursive neural programs: a
9. Zatka-Haas, P., Steinmetz, N. A., Carandini, M. & Harris, K. D. differentiable framework for learning compositional part–whole
Sensory coding and the causal impact of mouse cortex in a visual hierarchies and image grammars. PNAS Nexus 2, pgad337 (2023).
decision. eLife 10, e63163 (2021). 34. Kasper, E., Larkman, A., Lübke, J. & Blakemore, C. Pyramidal
10. Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. neurons in layer 5 of the rat visual cortex. I. Correlation among
Distributed coding of choice, action and engagement across the cell morphology, intrinsic electrophysiological properties, and
mouse brain. Nature 576, 266–273 (2019). axon targets. J. Comp. Neurol. 339, 459–474 (1994).
11. Stringer, C. et al. Spontaneous behaviors drive multidimensional, 35. Stebbings, K., Lesicko, A. & Llano, D. The auditory
brainwide activity. Science 364, eaav7893 (2019). corticocollicular system: molecular and circuit-level
12. Talluri, B. C. et al. Activity in primate visual cortex is minimally considerations. Hear. Res. 314, 51–59 (2014).
driven by spontaneous movements. Nat. Neurosci. 26, 1953–1959 36. Xiong, X. et al. Auditory cortex controls sound-driven innate
(2023). defense behaviour through corticofugal projections to inferior
13. Jordan, R. & Keller, G. B. Opposing influence of top–down and colliculus. Nat. Commun. 6, 7224 (2015).
bottom–up input on excitatory layer 2/3 neurons in mouse 37. Frezel, N. et al. In-depth characterization of layer 5 output
primary visual cortex. Neuron 108, 1194–1206 (2020). neurons of the primary somatosensory cortex innervating the
14. Nakamura, K. & Colby, C. L. Updating of the visual representation mouse dorsal spinal cord. Cereb. Cortex Commun. 1, tgaa052
in monkey striate and extrastriate cortex during saccades. Proc. (2020).
Natl Acad. Sci. USA 99, 4026–4031 (2002). 38. Rathelot, J. A. & Strick, P. L. Subdivisions of primary motor cortex
15. Duhamel, J. R., Colby, C. L. & Goldberg, M. E. The updating of the based on cortico–motoneuronal cells. Proc. Natl Acad. Sci. USA
representation of visual space in parietal cortex by intended eye 106, 918–923 (2009).
movements. Science 255, 90–92 (1992). 39. Mao, T. et al. Long-range neuronal circuits underlying the
16. Umeno, M. M. & Goldberg, M. E. Spatial processing in the monkey interaction between sensory and motor cortex. Neuron 72, 111–123
frontal eye field. I. Predictive visual responses. J. Neurophysiol. 78, (2011).
1373–1383 (1997). 40. Hooks, B. M. et al. Organization of cortical and thalamic input
17. Wurtz, R. H., McAlonan, K., Cavanaugh, J. & Berman, R. A. Thalamic to pyramidal neurons in mouse motor cortex. J. Neurosci. 33,
pathways for active vision. Trends Cogn. Sci. 15, 177–184 (2011). 748–760 (2013).
18. Sherman, S. M. & Guillery, R. W. Functional Connections of Cortical 41. Heindorf, M., Arber, S. & Keller, G. B. Mouse motor cortex
Areas: A New View from the Thalamus (MIT, 2013). coordinates the behavioral response to unpredicted sensory
19. Prasad, J., Carroll, B. & Sherman, S. Layer 5 corticofugal projections feedback. Neuron 99, 1040–1054 (2018).
from diverse cortical areas: variations on a pattern of thalamic and 42. Holey, B. E. & Schneider, D. M. Sensation and expectation are
extrathalamic targets. J. Neurosci. 40, 5785–5796 (2020). embedded in mouse motor cortical activity. Preprint at bioRxiv
20. Mountcastle, V. in the Mindful Brain (eds Edelman, G. & https://doi.org/10.1101/2023.09.13.557633 (2023).
Mountcastle, V.) 7–50 (MIT, 1978). 43. Kim, E., Juavinett, A., Kyubwa, E., Jacobs, M. & Callaway, E.
21. Creutzfeldt, O. D. Generality of the functional structure of the Three types of cortical layer 5 neurons that differ in brain-wide
neocortex. Naturwissenschaften 64, 507–517 (1977). connectivity and function. Neuron 88, 1253–1267 (2015).
22. Mumford, D. On the computational architecture of the neocortex. 44. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An
II. The role of cortico–cortical loops. Biol. Cybern. 66, 241–251 Introduction 2nd edn http://incompleteideas.net/book/the-book-
(1992). 2nd.html (MIT Press, 2018).

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1233


Perspective https://doi.org/10.1038/s41593-024-01673-9

45. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and 68. Kahneman, D. Thinking, Fast and Slow (Farrar, Straus and Giroux,
acting in partially observable stochastic domains. Artif. Intell. 101, 2011).
99–134 (1998). 69. Friston, K., Parr, T. & de Vries, B. The graphical brain: belief
46. Rao, R. P. N. Decision making under uncertainty: a neural model propagation and active inference. Netw. Neurosci. 1, 381–414
based on partially observable Markov decision processes. Front. (2017).
Comput. Neurosci. 4, 146 (2010). 70. O’Reilly, R. C., Russin, J. L., Zolfaghar, M. & Rohrlich, J. Deep
47. von Helmholtz, H. Handbuch der Physiologischen Optik Vol. 3 predictive learning in neocortex and pulvinar. J. Cogn. Neurosci.
(Voss, 1867). 33, 1158–1196 (2021).
48. Friston, K., Adams, R. A., Perrinet, L. & Breakspear, M. Perceptions 71. Willett, F. R. et al. Hand knob area of premotor cortex represents
as hypotheses: saccades as experiments. Front. Psychol. 3, 151 the whole body in a compositional way. Cell 181, 396–409 (2020).
(2012). 72. Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang,
49. Bastos, A. M. et al. Canonical microcircuits for predictive coding. X.-J. Task representations in neural networks trained to perform
Neuron 76, 695–711 (2012). many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019).
50. Shipp, S. Neural elements for predictive coding. Front. Psychol. 7, 73. Kurth-Nelson, Z. et al. Replay and compositional computation.
1792 (2016). Neuron 111, 454–469 (2023).
51. Miura, S. & Scanziani, M. Distinguishing externally from saccade- 74. Mnih, V., Heess, N., Graves, A. & Kavukcuoglu, K. Recurrent
induced motion in visual cortex. Nature 610, 135–142 (2022). models of visual attention. In Advances in Neural Information
52. Keller, G. B., Bonhoeffer, T. & Hübener, M. Sensorimotor mismatch Processing Systems 27 (eds Ghahramani, Z. et al.) 2204–2212
signals in primary visual cortex of the behaving mouse. Neuron (Curran Associates, 2014).
74, 809–815 (2012). 75. Makino, H. Arithmetic value representation for hierarchical
53. Bastos, A. M., Lundqvist, M., Waite, A. S., Kopell, N. & Miller, E. behavior composition. Nat. Neurosci. 26, 140–149 (2023).
K. Layer and rhythm specificity for predictive routing. Proc. Natl 76. Hogendoorn, H. Perception in real-time: predicting the present,
Acad. Sci. USA 117, 31459–31469 (2020). reconstructing the past. Trends Cogn. Sci. 26, 128–141 (2022).
54. Leinweber, M., Ward, D. R., Sobczak, J. M., Attinger, A. & 77. Whittington, J. C. R. et al. The Tolman–Eichenbaum machine:
Keller, G. B. Sensorimotor circuit in mouse cortex for visual flow unifying space and relational memory through generalization in
predictions. Neuron 95, 1420–1432 (2017). the hippocampal formation. Cell 183, 1249–1263 (2020).
55. Schneider, D. M., Sundararajan, J. & Mooney, R. A cortical filter 78. Hafner, D., Lee, K.-H., Fischer, I. & Abbeel, P. Deep hierarchical
that learns to suppress the acoustic consequences of movement. planning from pixels. In Advances in Neural Information
Nature 561, 391–395 (2018). Processing Systems 35 (eds Koyejo, S. et al.) 26091–26104 (Curran
56. Markov, N. T. & Kennedy, H. The importance of being hierarchical. Associates, 2022).
Curr. Opin. Neurobiol. 23, 187–194 (2013). 79. Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the
57. Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. visual cortex. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 20, 1434–1448
Building machines that learn and think like people. Behav. Brain (2003).
Sci. 40, e253 (2017). 80. George, D. & Hawkins, J. Towards a mathematical theory of
58. Smolensky, P., McCoy, R. T., Fernandez, R., Goldrick, M. & cortical micro-circuits. PLoS Comput. Biol. 5, e1000532 (2009).
Gao, J. Neurocompositional computing: from the central paradox 81. Wolpert, D. M. & Miall, R. C. Forward models for physiological
of cognition to a new generation of AI systems. AI Mag. 43, motor control. Neural Netw. 9, 1265–1279 (1996).
308–322 (2022). 82. Mehta, M. R. Neuronal dynamics of predictive coding.
59. Lewis, M., Purdy, S., Ahmad, S. & Hawkins, J. Locations in the Neuroscientist 7, 490–495 (2001).
neocortex: a theory of sensorimotor object recognition using 83. Heeger, D. J. Theory of cortical function. Proc. Natl Acad. Sci. USA
cortical grid cells. Front. Neural Circuits 13, 22 (2019). 114, 1773–1782 (2017).
60. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive 84. Mackay, D. in Automata Studies (eds Shannon, C. E. & McCarthy, J.)
field properties by learning a sparse code for natural images. 235–251 (Princeton Univ., 1956).
Nature 381, 607–609 (1996). 85. Albus, J. S. Brains, Behavior and Robotics (BYTE, 1981).
61. Jiang, L. P. & Rao, R. P. N. Dynamic predictive coding: a model of 86. Scott, S. Optimal feedback control and the neural basis of
hierarchical sequence learning and prediction in the neocortex. volitional motor control. Nat. Rev. Neurosci. 5, 532–546 (2004).
PLoS Comput. Biol. 20, e1011801 (2024). 87. Jordan, M. I. & Rumelhart, D. E. Forward models: supervised
62. Whittington, J. C. R. & Bogacz, R. An approximation of the error learning with a distal teacher. Cogn. Sci. 16, 307–354 (1992).
backpropagation algorithm in a predictive coding network with local 88. Kawato, M. Internal models for motor control and trajectory
Hebbian synaptic plasticity. Neural Comput. 29, 1229–1262 (2017). planning. Curr. Opin. Neurobiol. 9, 718–727 (1999).
63. Attias, H. Planning by probabilistic inference. In Proceedings of 89. Fetz, E. E. in Textbook of Physiology (eds Patton, H. D. et al.)
the Ninth International Workshop on Artificial Intelligence and 608–631 (Saunders, 1989).
Statistics (AISTATS 2003) (eds Bishop, C. M. & Frey, B. J.) 9–16 90. Jones, E. G., Coulter, J. D. & Hendry, S. H. C. Intracortical
(PMLR, 2003). connectivity of architectonic fields in the somatic sensory, motor
64. Verma, D. & Rao, R. P. N. Planning and acting in uncertain and parietal cortex of monkeys. J. Comp. Neurol. 181, 291–347
environments using probabilistic inference. In 2006 IEEE/RSJ (1978).
International Conference on Intelligent Robots and Systems 91. Adams, R., Shipp, S. & Friston, K. Predictions not commands:
2382–2387 (IEEE, 2006). active inference in the motor system. Brain Struct. Funct. 218,
65. Botvinick, M. & Toussaint, M. Planning as inference. Trends Cogn. 611–643 (2013).
Sci. 16, 485–488 (2012). 92. Falchier, A., Clavagnier, S., Barone, P. & Kennedy, H. Anatomical
66. Richards, A. Robust Constrained Model Predictive Control. PhD evidence of multimodal integration in primate striate cortex.
thesis, MIT (2004). J. Neurosci. 22, 5749–5759 (2002).
67. Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized 93. Audette, N. J., Zhou, W., La Chioma, A. & Schneider, D. M. Precise
behavior and its neural foundations: a reinforcement learning movement-based predictions in the mouse auditory cortex.
perspective. Cognition 113, 262–280 (2009). Curr. Biol. 32, 4925–4940 (2022).

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1234


Perspective https://doi.org/10.1038/s41593-024-01673-9

94. Craik, K. J. W. The Nature of Explanation (Macmillan, 1943). 110. Friston, K. J., Rosch, R., Parr, T., Price, C. & Bowman, H. Deep
95. Churchland, M. M. et al. Neural population dynamics during temporal models and active inference. Neurosci. Biobehav. Rev.
reaching. Nature 487, 51–56 (2012). 77, 388–402 (2017).
96. Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V.
A neural network that finds a naturalistic solution for the Acknowledgements
production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015). I thank A. Fisher, D. Gklezakos, P. Jiang, P. Rangarajan and V. Sathish
97. Sauerbrei, B. A. et al. Cortical pattern generation during for many discussions and the collaborative work cited in the text.
dexterous movement is input-driven. Nature 577, 386–391 I also thank K. Friston, C. Eliasmith and members of his laboratory,
(2020). researchers at Numenta, S. Mirbagheri, N. Steinmetz and G. Burachas
98. Chaudhuri, R., Knoblauch, K., Gariel, M.-A., Kennedy, H. & for discussions and feedback. This work was supported by National
Wang, X.-J. A large-scale circuit mechanism for hierarchical Science Foundation EFRI grant 2223495, National Institutes of Health
dynamical processing in the primate cortex. Neuron 88, 419–431 grant 1UF1NS126485-01, the Defense Advanced Research Projects
(2015). Agency under contract HR001120C0021, a UW + Amazon Science
99. Salinas, E. & Sejnowski, T. J. Gain modulation in the central Hub grant, a Weill Neurohub Investigator grant, a Frameworks grant
nervous system: where behavior, neurophysiology, and from the Templeton World Charity Foundation and a Cherng Jia and
computation meet. Neuroscientist 7, 430–440 (2001). Elizabeth Yun Hwang Professorship. The opinions expressed in this
100. Larkum, M. E., Senn, W. & Lüscher, H.-R. Top–down dendritic input publication are those of the author and do not necessarily reflect the
increases the gain of layer 5 pyramidal neurons. Cereb. Cortex 14, views of the funders.
1059–1070 (2004).
101. Ferguson, K. A. & Cardin, J. A. Mechanisms underlying gain Competing interests
modulation in the cortex. Nat. Rev. Neurosci. 21, 80–92 (2020). The author declares no competing interests.
102. Stroud, J. P., Porter, M. A., Hennequin, G. & Vogels, T. P. Motor
primitives in space and time via targeted gain modulation in Additional information
cortical networks. Nat. Neurosci. 21, 1774–1783 (2018). Correspondence should be addressed to Rajesh P. N. Rao.
103. McAdams, C. J. & Maunsell, J. H. R. Effects of attention on
orientation–tuning functions of single neurons in macaque Peer review information Nature Neuroscience thanks Karl Friston,
cortical area V4. J. Neurosci. 19, 431–441 (1999). Aleena Garner, and the other, anonymous, reviewer(s) for their
104. Ha, D., Dai, A. M. & Le, Q. V. Hypernetworks. In 5th International contribution to the peer review of this work.
Conference on Learning Representations (ICLR 2017) openreview.
net/forum?id=rkpACe1lx (OpenReview.net, 2017). Reprints and permissions information is available at
105. Eliasmith, C. et al. A large-scale model of the functioning brain. www.nature.com/reprints.
Science 338, 1202–1205 (2012).
106. Galanti, T. & Wolf, L. On the modularity of hypernetworks. In Publisher’s note Springer Nature remains neutral with regard
Advances in Neural Information Processing Systems 33 (eds to jurisdictional claims in published maps and institutional
Larochelle, H. et al.) 10409–10419 (Curran Associates, 2020). affiliations.
107. Tomov, M. S., Yagati, S., Kumar, A., Yang, W. & Gershman, S. J.
Discovery of hierarchical representations for efficient planning. Springer Nature or its licensor (e.g. a society or other partner) holds
PLoS Comput. Biol. 16, e1007594 (2020). exclusive rights to this article under a publishing agreement with
108. Olson, C. R. Brain representation of object-centered space in the author(s) or other rightsholder(s); author self-archiving of the
monkeys and humans. Annu. Rev. Neurosci. 26, 331–354 (2003). accepted manuscript version of this article is solely governed by the
109. George, D. et al. Clone-structured graph representations enable terms of such publishing agreement and applicable law.
flexible learning and vicarious evaluation of cognitive maps.
Nat. Commun. 12, 2392 (2021). © Springer Nature America, Inc. 2024

Nature Neuroscience | Volume 27 | July 2024 | 1221–1235 1235

You might also like