A Sensory-Motor Theory of The Neocortex: Nature Neuroscience
A Sensory-Motor Theory of The Neocortex: Nature Neuroscience
Perspective https://doi.org/10.1038/s41593-024-01673-9
Published online: 27 June 2024 Recent neurophysiological and neuroanatomical studies suggest a close
interaction between sensory and motor processes across the neocortex.
Check for updates
Here, I propose that the neocortex implements active predictive coding
(APC): each cortical area estimates both latent sensory states and actions
(including potentially abstract actions internal to the cortex), and the cortex
as a whole predicts the consequences of actions at multiple hierarchical
levels. Feedback from higher areas modulates the dynamics of state and
action networks in lower areas. I show how the same APC architecture can
explain (1) how we recognize an object and its parts using eye movements,
(2) why perception seems stable despite eye movements, (3) how we learn
compositional representations, for example, part–whole hierarchies, (4)
how complex actions can be planned using simpler actions, and (5) how we
form episodic memories of sensory–motor experiences and learn abstract
concepts such as a family tree. I postulate a mapping of the APC model to the
laminar architecture of the cortex and suggest possible roles for cortico–
cortical and cortico–subcortical pathways.
The predictive coding theory of cortical function, proposed in this jour- only does activity in the visual cortex get updated after movement
nal in 1999 by the author and Ballard1, has been the subject of increasing (Fig. 1b, left), but almost all recorded cortical areas had neurons
attention in recent years2–4. However, as originally proposed, the theory with activities that were predictive of upcoming movements (Fig. 1b,
ignored a fundamental aspect of perception, namely, that perception is right). Similarly, Stringer et al.11 found that about a third of the popu-
action based: we move our eyes about three times a second to recognize lation activity of ~10,000 neurons in the visual cortex of awake mice
objects in a scene, orient our heads to localize sounds, use our fingers to could be predicted from motor actions derived from a video of the
identify objects by touch and navigate ourselves in our environment to mouse’s facial movements (Fig. 1c), suggesting that sensory–motor
solve tasks that satisfy our needs. Away from experimentally imposed integration occurs even in the primary sensory cortex (the lack of a
constraints in the laboratory, perception, in its natural state, can best similar result in monkeys12 could be due to the increased functional
be viewed as an action-based hypothesis-testing process (also called specialization of primate visual cortical areas compared to that of
active sensing, active perception and active inference)5–8. the rodent cortex).
Recent studies have highlighted the important influence of Actions may be integrated differently across the different layers
impending actions across almost all cortical areas (Fig. 1). For exam- of a cortical area. For example, Jordan and Keller13 showed that both
ple, in mice solving a visual discrimination task using forepaws to layer 2/3 and layer 5/6 neurons in the mouse primary visual cortex
rotate a wheel, Zatka-Haas et al.9 observed, using widefield calcium (V1) undergo depolarization before locomotion onset (Fig. 1d, left and
imaging, extensive bilateral activity across cortical areas preceding middle). While layer 2/3 neurons appear to be computing a difference
movements on choice trials (left or right action selected) but not on between motor-related input and bottom–up visual flow input, layer
‘NoGo’ trials (no action is selected) (Fig. 1a, left). They further showed 5/6 responses were consistent with positive integration of visuomotor
that impending movement could be decoded from cortical activity inputs (Fig. 1d, right). These results complement well-known earlier
in most imaged regions by 25 ms before movement (Fig. 1a, right). In results on predictive activity in a range of cortical areas such as the
the same task, Steinmetz et al.10 used Neuropixels probes to record visual cortex14, the parietal cortex15 and frontal eye fields16 that antici-
spiking activity from thousands of neurons and showed that, not pate the visual consequences of impending eye movements.
1
Center for Neurotechnology, University of Washington, Seattle, WA, USA. 2Paul G. Allen School of Computer Science and Engineering, University of
Washington, Seattle, WA, USA. e-mail: [email protected]
20 50 80
Ipsi
NoGo Decoder acc. (%)
Mismatch response
Spontaneous mouse behavior
Vm response (mV)
4
Sorted neuron #
'eigenfaces'
1,000 neurons
PC1 20
2 L2/3
2
10 L5/6
(mV)
6 0
10 s 0 –2
PC2 Neural activity prediction (test set) from face motion 10 –10 –4
14
4 mV
–20 –6
t –2 0 2 4 6 L2/3 L5/6
2s
t+1 PC3
Time (s)
1,000 neurons
0.2
0
–0.2
10 s
Fig. 1 | Widespread influence of actions across the cortex. a, Left, widefield principal-component (PC) analysis of a video of a mouse’s facial movements
calcium imaging reveals bilateral activity (cortical fluorescence dF/F) from 0 to (left, example frames t, t + 1; middle, top three principal components) accurately
200 ms after stimulus (stim) onset across cortical areas preceding movements predicted (using reduced-rank regression) about a third of the population
on left or right (L/R) action-selection trials. ‘NoGo’ trials do not show such activity of ~10,000 neurons (raster representations on the right) measured
activity. Right, average action execution decoder accuracy (acc.) 25 ms before using two-photon calcium imaging of the visual cortex of awake mice (adapted
movement onset, showing that an impending movement can be decoded from with permission from ref. 11, AAAS). 1D, one dimensional. d, Intracellular
cortical activity in most imaged regions (adapted from ref. 9, CC BY 4.0). MOs, recordings in V1 of mice on a spherical treadmill with locomotion coupled
secondary motor cortical area; MOp, primary motor cortex; SSp, primary to visual flow feedback. Visual flow was halted at random times to generate
somatosensory cortex; VISp, primary visual cortex; VISal, secondary visual visuomotor mismatch events. Left, heatmap of average responses before
cortical area. b, Left, increased spiking after a correct choice movement in a and after locomotion onset across all layer 5/6 neurons. Baseline activity was
visual cortex neuron (posteromedial visual area, VISpm) for both contralateral subtracted from responses by using the average membrane potential in the 2.5 s
(contra) and ipsilateral (ipsi) stimulus presentations (orange and blue dots, before locomotion onset before averaging. Middle, average response before and
spikes; black dots, movement onset). Right, fraction of neurons in each brain after locomotion onset across all layer 5/6 (L5/6) neurons (black, 14 neurons)
region with pre-movement activity that could be accurately predicted from compared with all layer 2/3 (L2/3) neurons (gray, 32 neurons). Right, average
the animal’s movement (in either the left or right direction) (adapted from mismatch responses of layer 2/3 and layer 5/6 neurons (adapted with permission
ref. 10, Springer Nature Limited). c, Motor action information extracted using from ref. 13, Elsevier).
The emerging view, as suggested by the studies above, is that consisting of a state-prediction network and an action-prediction net-
almost all cortical areas update their representations on the basis work, both implemented within each cortical area. ‘State’ here denotes
of ‘efference copies’ of upcoming actions (‘corollary discharges’ hidden (or ‘latent’) aspects of the world inferred from sensory inputs,
(ref. 17)) as well as the results of executed actions. Such a view harmo- for example, parts of an object to be recognized or one’s location in a
nizes well with the anatomical observation that all areas of the neocor- building. ‘Action’ refers to not just motor commands but also abstract
tex (henceforth, the ‘cortex’), including areas traditionally labeled as actions, for example, ‘go to the maternal grandmother node’ in a family
sensory cortices, send outputs to subcortical motor regions (refs. 18,19 tree or ‘perform multiplication’ on two given numbers. APC postulates
and references therein) and receive input from these regions. Indeed, that feedback from higher cortical areas modulates the dynamics of
Vernon Mountcastle, in his prescient article in 1978 (ref. 20), put forth both state and action networks in lower areas, changing the functions
the hypothesis that a single unifying computational principle might they compute on the fly to suit the needs of the current task. This leads
be operating across the entire cortex, proposing the ‘cortical col- to representations that operate at multiple levels of sensory and motor
umn’ as a modular information-processing unit of the cortex (see also abstraction, as observed in cortical hierarchies implicated in percep-
refs. 21–23 for related ideas). This hypothesis is supported by the tion and action27–30. In the following sections, I present computational
remarkable anatomical similarities in laminar connectivity patterns and neurobiological aspects of APC, comparing emerging studies and
across cortical areas24,25, even though the density of cells within laminae experimental results with the model’s predictions. I present simula-
may vary across areas. Additional evidence for this hypothesis comes tions chosen to showcase the explanatory breadth of APC. While we
from experiments in which inputs from the optic nerve were diverted have previously investigated APC in the context of machine learning
via the auditory thalamus to the auditory cortex, causing the auditory (ML) and artificial intelligence (AI)31–33, here I explore APC as a model
cortex to develop visual receptive field properties26. of cortical function.
If there is indeed a common computational principle operating
across the cortex, it must be versatile enough to explain capabilities APC
as diverse as (1) learning to recognize an object from multiple visual Neuroanatomical and physiological motivation
glimpses through eye and head movements or from multiple tactile The axonal outputs of layer 5 neurons in almost all cortical areas target
sensations through finger movements, (2) solving a complex spatial subcortical motor centers19. Even in V1, outputs from layer 5 neurons
navigation task using simpler movement sequences and (3) under- target the superior colliculus34, which is involved in eye movements
standing abstract concepts (such as a family tree). (aside from other motor behaviors). Similarly, layer 5 neurons in the
In this Perspective, I suggest APC as a unifying sensory–motor primary auditory cortex (A1) send outputs to the inferior colliculus35,
theory of the cortex. APC hypothesizes a canonical cortical module as which is involved in orienting and defensive motor behaviors36, while
Box 1
a 1
b c
1
st–1 fs st
2/3 at–1 ŝt–1 fs
ât–1 2/3
State
4 feedback
4
5 st
at fa fa ŝt
at–1 ât ât–1 5
6 D
D 6
Action
Lower-order Motor Higher-order Action/goal/coordinates Input feed-
thalamus centers thalamus back
Canonical APC module. a, Depiction of the six-layered laminar structure of a cortical column, showing some of the major connections
between layers and with the thalamus (not all connections are shown) (based on refs. 18,24,43). b, Canonical APC generative model. The
dashed arrow denotes a delay of one time step. D denotes a decoder converting state to input. c, One possible implementation of inference in
a canonical APC module for the generative model in b.
layer 5 neurons in the primary somatosensory cortex send outputs well-known differences in the laminar densities of neurons in different
to the spinal cord37, which controls body movements. On the other cortical areas (for example, V1 versus M1), the sensory–motor nature
hand, layer 5 neurons in the primary motor cortex (or M1) (such as Betz of the cortex is retained. This motivates the idea of a ‘sensory–motor
cells) have long been implicated in motor function via their axonal cortical module’ as a canonical feature of the cortex (described further
projections to the spinal cord38, but the middle and upper layers of in the section APC module and neuroanatomical implementation).
traditional ‘motor’ cortical areas such as M1 are involved in sensory The figure in Box 1 (panel a) depicts the laminar structure of a typi-
processing of feedback from subcortical and cortical sources39,40: for cal cortical column and its connectivity (based on refs. 18,24,43). Inputs
example, layer 2/3 neurons in mouse M1 respond to unexpected sen- from a sensory region or a lower cortical area target layer 4 neurons,
sory perturbations in a visually guided motor task41, while neurons in the outputs of which are then conveyed to the superficial layer 2/3
mouse M2 encode auditory sensation and expectation42. Thus, despite neurons. These neurons in turn send their axons to the deeper layers,
Box 2
a Auditory b
cortex
10
Frequency
FP2 FP1
(Hz)
Omission 6 V eye
H H eye FP2
V Stim
FP2
2 Stim Stim FP1
–0.4 0 0.4 FP1
Time (s)
Movement Passive
30 Frontal cortex
n = 377 Visual cortex Parietal
(FEF)
(V3A) cortex
100
5 Hz
100 ms 50 ms
0 200 ms
0 0.4 0 0.4 Eye movement Stimulus Eye movement Eye movement
100 ms
onset onset onset onset
Monkey B cortex
Monkey J-array Monkey N-array PC1 PC2 PC3
Neural population Neural population
Population
Lift
onto jPC2 (AU)
Projection
Lift +350 ms
Lift –100 ms Lift
PC2
PC3
PC2
PC3
Population
Lift +350 ms
PC
Lift
PC
1
Lift –100 ms
Projection Projection Projection
onto jPC1 (AU) onto jPC1 (AU) onto jPC1 (AU)
State and action networks in the cortex. a, Top left, schematic of the omission of an expected sound during a lever-press movement (black
curve) in a self-generated sound task in mice93. a, Top middle and right, neural activity in the auditory cortex aligned to six positions (colored
dots) along the lever-press movement: peak firing always corresponded to the time of the expected sound (occurring on average 36 ms
after movement onset), with the activity starting roughly 200 ms before movement onset (as reported in ref. 93). a, Bottom, heatmaps (left)
of responses for all recorded neurons and the average response (right) for 377 of those neurons responding above a threshold to the sound
when it was self-generated by a lever press (‘movement’) and when it was not self-generated (‘passive’). Note the greater suppression (green)
for self-generated sounds (adapted from ref. 93, Elsevier). b, Predictive activity anticipating the visual consequences of an upcoming eye
movement in the visual cortex (left) (adapted with permission from ref. 14, National Academy of Sciences, USA), the parietal cortex (middle)
(adapted with permission from ref. 15, AAAS) and the frontal cortex (right) (adapted with permission from ref. 16, American Physiological
Society). FP, fixation point; H, horizontal eye movement; V, vertical eye movement; FEF, frontal eye fields. c, Two-dimensional projection (using
jPCA95) of population activity of motor cortex neurons in three monkeys performing straight and curved reaching movements, showing rotation
of the neural state. Traces are colored from green to red based on the preparatory neural state (circles) for each reach condition (adapted
from ref. 95, Springer Nature Limited). jPC1 and jPC2, first and second dimensions, respectively, of jPCA space; AU, arbitrary units. d, Results
from simultaneous population recordings of the motor thalamus and the motor cortex in mice performing reach-to-grasp movements97. Left,
neural population trajectories for the thalamus (left, green) and the cortex (right, magenta) obtained using trial-averaged PCA. Right, single-trial
population activity in the thalamus (top) and the cortex (bottom) (adapted from ref. 97, Springer Nature Limited).
predominantly targeting layer 5 neurons. One class of layer 5 neurons for more details, see the section Hierarchical APC and cortical
(with thick tufted apical dendrites and firing in bursts) send their axons feedback).
to subcortical motor centers such as the superior colliculus and other Layer 5 motor output neurons, for example, those sending outputs
parts of the brainstem19,34,43. Other layer 5 neurons, which do not fire in to the superior colliculus, send axon collaterals to higher-order tha-
bursts and have slender apical dendrites, project to the striatum and lamic nuclei18,19 and receive motor information from subcortical motor
other cortical regions34,43. There is also a substantial axonal projection centers such as the superior colliculus regarding actions executed.
from layer 5 back to layer 2/3, signifying recurrent feedback within a These thalamic nuclei are therefore in an ideal position to compare the
cortical column. There are additional projections from layer 5 to layer actual executed action at (from the superior colliculus or other motor
6, and layer 6 in turn sends outputs to the parts of the thalamus that center) and the cortical prediction at̂ . The resulting action feedback
send inputs to layer 4. (for example, in the form of action-prediction errors), in addition to
sensory feedback (in the form of sensory-prediction errors), can be
Computational motivation conveyed by the thalamus back to the cortex to enable the
Computational considerations point to maintaining a close link state-transition network fŝ to correct its state prediction and the action
between actions and their sensory consequences. In model-based network fâ to correct its action prediction. Indeed, it is known that
reinforcement learning44 and, more generally, in the framework of higher-order nuclei such as the pulvinar receive cortical layer 5 inputs
partially observable Markov decision processes45,46, an intelligent and information from the superior colliculus and send axons to super-
‘agent’ interacts with the world by executing an action at−1 at time t − 1, ficial layers of area V1, explaining the response of V1 neurons to saccadic
and this causes the agent’s ‘state’ to change from st−1 to st; this change eye movements51.
is governed by the state-transition function fs (st−1 , at−1 ), which gener- The implementation suggested above is consistent with growing
ates a new state st according to a probability distribution P(st |st−1 , at−1 ). experimental data on predictive coding in the cortex2,4, with predic-
When the agent executes an action based on a policy fa, for example, tion error-like activity reported in superficial layers13,52 and predictive
moving its body by walking or making an eye movement, the hidden activity observed in deeper layers53 and in cortico–cortical interac-
state changes to the next state st (a new location in a building being tions54,55. The implementation above also shares similarities with pre-
navigated or a new part of a scene being recognized) according to vious canonical circuits for predictive coding49 in specifying laminar
fs (st−1 , at−1 ) . Inferring these hidden states as the agent makes move- roles for state estimates and prediction errors but differs in the use
ments is the essence of perception47,48. of both actions and states to generate predictions, rather than being
limited to hidden causes49. Evidence for state and action networks in
APC module and neuroanatomical implementation the cortex is summarized in Box 2.
The above computational considerations motivate the canonical APC
generative model shown in the figure in Box 1 (panel b). The correspond- Hierarchical APC and cortical feedback
ing model for inference and learning, referred to as the canonical APC A characteristic feature of the cortex is the reciprocal nature of con-
module, is described in Box 1 and shown in the figure in Box 1 (panel nections between cortical areas27: ‘feedforward’ connections from a
c). This figure also suggests one possible functional mapping of APC’s cortical area A (originating in the superficial layers) to a cortical area
computational elements onto the cortical laminar structure in the B (terminating in layer 4) are invariably reciprocated by anatomically
figure in Box 1 (panel a), which builds on previous proposals mapping defined ‘feedback’ (or descending) connections from area B to area A
predictive coding to cortical laminae49,50. (originating in the deeper (and sometimes superficial) layers of B to
As shown in the figure in Box 1 (panel c), the superficial layer cortical superficial and deep layers of A). Why are cortical areas reciprocally
neurons, which receive the filtered sensory inputs from layer 4 and are connected and organized in an approximate hierarchy27,56?
recurrently connected to each other, are well suited to implementing
the state-transition function fŝ . The motor output layer 5 neurons, which Computational motivation
are also recurrently connected to each other, fit the role of neurons Consider the problem of going to the grocery store from one’s house.
computing the action–policy function fâ . The other class of layer 5 As shown in Fig. 2a, the complexity of the problem can be substantially
neurons, which convey information to other cortical areas and the stria- reduced by dividing the task into subtasks (or ‘subgoals’), dividing each
tum, could maintain the current state estimate st̂ by integrating the state subtask into ‘sub-subtasks’, and so on. Reducing a complex problem
prediction from layer 2/3 and correcting it with prediction errors from to a sequence of easier-to-solve components and reusing these com-
the feedforward thalamic inputs to layers 4 and 5/6 (ref. 1). Layer 6 neu- ponents to solve new problems gets to the heart of compositionality,
rons receiving inputs from these state-estimating layer 5 neurons are which is thought to form the basis for cognitive flexibility and fast
well placed to compute the prediction for a lower-level area: at the lowest generalization in humans57,58.
level, layer 6 neurons predict sensory input Īt for the input It (layer 6 A complex problem can be characterized by its (typically
neurons at a higher level would predict the cortical state at a lower level; high-dimensional) state-transition function, which governs how the
a Go to grocery store b c
Fig. 2 | Using hierarchies and compositionality to simplify complex tasks. of its compositional elements, namely, the two rooms outlined in yellow and
a, Decomposition of the ‘go to the grocery store’ problem into subgoals or red that appear at several different locations within the reference frame of
subtasks, each of which can be further divided into sub-subgoals or sub-subtasks. the environment. These simpler elements can be further decomposed into
Note that the rate of change is faster at the lower levels than at the higher levels, horizontal and vertical corridors shown on the right that appear at different
leading naturally to a temporal hierarchy. b, A navigation problem in a maze-like locations within the local reference frame of each room. c, An object (such as a
building environment with corridors (black) and walls (gray). Blue dot, current handwritten digit ‘8’) can be divided into parts (loops and curves at the middle
location; green square, desired goal location. The structure of the environment level), each of which can be divided into subparts (strokes, lines, smaller curves at
(pathways and walls) can be understood in terms of the building’s state-transition the lower level). Each part and subpart is associated with its coordinates (location
dynamics, which in turn can be divided into the simpler transition dynamics and transformation) within a local reference frame.
state of the environment changes when one applies an action. Fortu- Action inference through planning and reinforcement learning. As
nately, in the natural world, the consequences of most actions are local, shown in the figure in Box 3 (panel e, right), the higher-level action
and these local dynamics tend to be shared across many environments neurons represent an abstract action (such as ‘open the door’) via an
(i+1) (i+1)
and objects, allowing complex problems to be modeled in terms of action vector at at time t. Given the abstract action at , top–down
(i) (i+1)
simpler lower-dimensional state-transition functions. This is illustrated feedback given by the embedding input Ha (at ) modulates the
in the example in Fig. 2b, a simplification of the ‘go to the grocery store’ lower-level action network and instantiates the goal-specific policy
(i)
problem: here, a maze-like environment is modeled using simpler fa , which produces lower-level actions ait,τ . The hierarchical action
components (yellow- and red-outlined rooms), each composed of even networks in the APC model can be trained in multiple ways: (1) planning:
simpler components (corridors). the state-transition networks can be used to search for sequences of
Interestingly, the same concept can also be applied to visual per- actions, starting from the highest abstraction level, that are likely to
ception. As illustrated in Fig. 2c, a visual object can be compositionally result in states with the highest cumulative reward or closest to the
defined in terms of parts and their locations within the object’s reference goal (see also active inference8, planning by inference63–65 and model
frame59; the parts can in turn be decomposed into simpler parts within predictive control66; see the section Illustrative examples of diverse
their respective reference frames. Nested compositional representations computational capabilities). Successful actions can be used as ‘labels’
(i)
of objects and environments offer substantial combinatorial flexibility in supervised learning to train the policy networks f ̂ ; (2) reinforce-
a
for solving complex problems in terms of simpler, reusable components.
ment learning: hierarchical reinforcement learning67 can be used to
The fact that the world we live in and the problems we seek to solve are
train action networks at each level to maximize the total expected
amenable to compositional solutions makes such an approach attractive,
reward according to a reward function that may be specific to that level
from both a computational and an evolutionary perspective.
(details in the section Illustrative examples of diverse computational
capabilities); (3) policies providing priors for planning: action networks
The hierarchical APC model (i)
Box 3 describes the APC model’s hierarchical architecture and its neural fâ at each level predict a distribution over actions, which can serve as
implementation. The figure in Box 3 (panel e) shows two levels of the a prior, in a Bayesian sense, for guiding the search for actions in plan-
model, implemented using top–down ‘contextual inputs’ to connect ning. Thus, predicted actions for new tasks will have high uncertainty,
the higher level to the lower level (see Box 3 for an alternate implemen- requiring effort and deliberation in planning (‘system 2 thinking’
tation based on gain modulation). (ref. 68)), while, for frequently encountered tasks, action networks are
well trained and will predict actions with high confidence (‘system 1
State inference using predictive coding and compositional learn- thinking’ (ref. 68)).
ing. As shown in the figure in Box 3 (panel e, left), the higher-level An advantage of continuous-valued state s(i+1) and action a(i+1)
(i+1)
state neurons maintain an estimate for the state st at time t and vectors is that interpolating or sampling in the neighborhood of
modulate the lower-level state network via top–down feedback given learned s(i+1) and a(i+1) generates, on the fly, new state-transition func-
(i) (i+1) (i) (i)
by Hs (st ). The lower-level state neurons maintain an estimate for tions f ̃ and new policy functions f ̃ , opening the door to fast gen-
s a
sit,τ , where τ denotes a time step at the lower level within the eralization and transfer of knowledge to new tasks. The alternative to
higher-level time interval given by t. The lowest-level state makes a continuous states is to use discrete states, as in previous models of
prediction of the input via a ‘decoder’ network D (figure in Box 1, panel predictive coding based on belief propagation or variational message
b). If D is a linear matrix U, this lowest level of APC is equivalent to the passing (for example, Fig. 10 in ref. 69). Aside from the possibility of
generative model in sparse coding (I = Us, where s is sparse60). At each fast generalization and transfer, APC’s use of continuous states also
time step, the network predicts the next input as a function of previ- allows explicit representation of prediction errors, which in turn allow
ous state and action. Feedforward pathways convey prediction errors local optimization of dynamics and learning (under Gaussian
to update state estimates1, while descending pathways convey top– assumptions)1.
down modulation as described above (see ref. 61 for an example).
Prediction errors are also used to learn the weights of the state net- Proposed neuroanatomical implementation
works at all levels using predictive coding-based self-supervised I propose that neural populations in a higher cortical area A represent-
learning1,32,61. Such learning approximates error backpropagation, ing current state and action vectors s ̂
(i+1) (i+1)
and a ̂ modulate the state
the workhorse of contemporary deep learning, in a biologically plau- (i) (i)
sible manner62. and action networks fŝ and fâ in a lower cortical area B via descend-
Box 3
Excitatory
a b Hs(i) (st(i + 1)(
c Cortical network
Inhibitory
Initial
(i)
st,τ–1 t,τ 40 2
fs(i) (i)
fa(i) a(i)
t,τ
(i)
at,τ–1 at,τ–1 20 1 3
s(i)
t,τ
0 0 4
Gain
–20
Input (AU) 100 ms
d Fixation input Recurrent units e
256
s(it + 1) a(it + 1)
Stimulus mod 1
360°
Hs(i) Ha(i)
0°
Stimulus mod 2 1
Fixation output
Response
Rule inputs 360°
20 (i)
st,τ–1 fs(i) s(i)
t,τ fa(i) a(i)
t,τ
1 (i)
0° (i)
at,τ–1 at,τ–1
0 1,000
Time (ms)
Hierarchical APC model. a, Two levels (levels i + 1 and i ) of a hierarchical APC generative model. Each level has a state-transition function fs
capturing the dynamics of the world at a particular level of abstraction and a policy function fa specifying that level’s actions, goals and
coordinates (conditioned on the current highest-level goal or task). Higher-level state and action vectors at time t generate, via top–down
networks Hs and Ha, lower-level state-transition and policy functions, allowing the higher level to compose a sequence of states and actions at
the lower level to accomplish a goal. b, As depicted here for a single pyramidal neuron, I hypothesize that top–down inputs Hs(i) (s(i+1)
t
) from a
higher cortical area to the apical dendrites of lower-area neurons modulate the dynamics of a network of such neurons (for example, via gain
modulation100,101), allowing the higher area to change the functions fs and fa at the lower level. c, Top, multiplicative gain modulation (for
example, due to top–down inputs) in the input–output function of neurons in a recurrent network allows the network to generate a rich set of
motor cortical dynamics matching experimental data102. EMG, electromyogram of muscle activity. Bottom, changing the gain from 1 (black) to 2
(blue) (bottom left plot) dramatically alters neuronal firing rates (three example neurons are shown on the right), mimicking quasi-oscillatory
motor cortical activity (see figure in Box 2, panels c,d) (adapted from ref. 102, Springer Nature Limited). d, The function computed by a recurrent
network (center) can be modulated using a nonchanging top–down contextual input or ‘rule input’ (one-hot vector, bottom left) in addition to
recurrent and stimulus inputs (top left), allowing the same network to solve different tasks (output for a specific task is shown on the right)
(adapted from ref. 72, Springer Nature Limited). Mod, modality. e, Implementation of the APC model in a using contextual inputs: higher-level
state and action neurons maintaining estimates of s(i+1)
t
and a(i+1)
t
modulate lower-level state and action networks via top–down contextual
inputs Hs (st ) and Ha (at ), respectively.
(i) (i+1) (i) (i+1)
plausibly, the gain parameters) for another neural network (called the model of Eliasmith and colleagues105). This is known in ML as the
‘primary network’). In the APC model, I propose that the higher-level ‘embedding approach’ and can be shown to be equivalent in
state vector s(i+1) is fed as input to a top–down feedback network computational function to hypernetworks106. In the case of APC, the
(i)
Hs , which produces the gain values to modulate the lower-level higher-level state vector s(i+1) (and action vector a(i+1)) can be fed as
(i) (i) (i)
state network fs (and similarly for the action network). The ability of input to a top–down feedback network Hs ( Ha ) that produces an
such a neural mechanism to modulate the function being computed embedding vector, which acts as a contextual input to a lower-level
(i) (i)
by a cortical network was demonstrated by Stroud et al. (figure in cortical network that computes fs ( fa ) (figure in Box 3, panel e). The
Box 3, panel c), who showed that multiplicative gain modulation of a higher level can therefore control the function being computed at the
recurrent network can generate a rich set of motor cortical dynamics lower level by changing the embedding vector (contextual input).
matching experimental data102. The APC model acknowledges the existence of both gain
modulation and contextual inputs in the cortex and postulates that
Contextual inputs in the cortex. Aside from gain modulation, higher either or both of these mechanisms are used for changing the
(i) (i)
cortical areas can also change the function being computed by lower functions fs and fa at the lower level according to the current
cortical networks using top–down contextual inputs. For example, higher-level state vector s(i+1) and the action vector a(i+1). The
Yang et al.72 showed that, by feeding a top–down contextual input examples described in the section Illustrative examples of diverse
(‘rule input’) as a nonchanging input to a recurrent network (in addition computational capabilities were implemented using the method of
to its usual recurrent and external inputs), one can change the contextual inputs. The reader is referred to ref. 61 for examples
input–output function that the network computes, allowing the same based on gain modulation and to refs. 32,33 for hypernetwork-
network to solve different tasks (figure in Box 3, panel d; see also the based examples.
ing connections that target the superficial and deep layers of area B27. tion and part–whole learning and Planning and navigation using
Feedforward connections that target layer 4 of area A arise from the hierarchical world models for examples) and similarly for all levels.
lower area B and from the higher-order thalamic region receiving ‘driver I postulate that such coordination between hierarchical levels for action
input’ from area B18,27. I propose that these feedforward connections selection occurs via cortex–basal ganglia–thalamus–cortex loops;
carry the state and action feedback (for example, prediction errors) I leave the important problem of working out the implementation
that enable the higher area A to correct its abstract state and action details in these loops to future research.
estimates. Such a neural implementation is consistent with studies
reporting prediction error-like responses in superficial layers and Illustrative examples of diverse computational
state-estimation-like responses in deeper layers of the cortex2,13. capabilities
A key difference from previous formulations of hierarchical predic- The architecture of the APC model was inspired by the hypothesis that
tive coding1,2,49,50 is that APC places subcortical (thalamic) populations evolution may have replicated a common computational principle
center stage in the evaluation of state- and action-prediction errors across the cortex20–23. If that is the case, one would expect the same
and their broadcasting to superficial pyramidal cells in the cortex70. architecture to be able to solve a diverse set of problems. Inspired by
Additionally, in APC, descending connections from higher to lower this observation, I provide here examples illustrating the APC model’s
cortical areas change the function being computed in the lower area diverse capabilities.
(through top–down modulation), rather than only conveying
lower-level state predictions as in traditional predictive coding1. Active visual perception and part–whole learning
Higher-area neurons representing action at
(i+1)
modulate the action Human vision can be viewed as an active sensory–motor process
(i)
̂ that employs eye movements to move the high-resolution fovea to
network f in a lower area, changing the policy function that this
a appropriate locations in a scene, gathering evidence for or against
lower-level network is computing. Neurophysiological evidence for
competing visual hypotheses7,48. The APC architecture is well suited
such compositional representations has recently emerged in the pre-
to modeling such a sensory–motor process, given its integrated state
motor cortex71. Computational models have demonstrated composi-
(i+1) and action networks. To illustrate this capability, we simulated31,32 a
tionality for task transfer using the embedding space of at (ref. 72). two-level APC model (figure in Box 3, panel e) in which the lower-level
Hierarchical representations found in the visual27,29 and motor sys- actions emulated eye movements by moving a fovea (‘glimpse sensor’
tems30 are also consistent with the hierarchical compositional approach (ref. 74)) to extract high-resolution information about a small part
espoused by the APC model. Finally, the compositional representations of the input image within a larger reference frame selected by the
in the cortex postulated by the APC model align well with recent hypoth- higher-level action.
eses regarding compositional hippocampal replay73. The lower-level action also predicts a new state vector st,τ+1, which
After the state and action networks have been learned for a set of generates, via a trained decoder, a prediction for the glimpse image
tasks (as described in the section The hierarchical APC model), given a expected after the ‘eye movement’. The resulting prediction error was
particular task, the topmost state vector in the hierarchy is first inferred used for state inference and learning. The state networks at both levels
from sensory inputs. This vector produces (via that level’s action net- were trained to minimize image-prediction errors, while the action
work fâ ) the topmost action vector specifying a ‘goal’ or option for the networks were trained using reinforcement learning for the task of
task; I hypothesize that this vector is maintained in the prefrontal cortex. image reconstruction (for image classification as the task, see ref. 31).
Some of the pre-movement anticipatory activity in Fig. 1a may well Fig. 3a shows an example of a learned parsing strategy by the
reflect such a ‘cognitive’ decision. This abstract action vector is decom- two-level APC model. The higher level learned to select actions that
posed hierarchically all the way down to elemental actions (for example, cover the input image sufficiently, avoiding blank regions, while the
muscle control signals in M1). When a sequence of elemental actions is lower level learned to parse subparts inside the reference frame com-
executed and a subgoal is reached, control is returned back to the level puted by the higher level. Fig. 3a also suggests a potential explanation
above to generate a new subgoal (see the sections Active visual percep- for why human perception can appear stable despite dramatic changes
Macro-steps
Micro-steps
Predicted glimpses
Actual glimpses
Object
perception/
decoder
output
Fig. 3 | Active vision, part–whole learning and transfer of knowledge. a, First parts and subparts can potentially be reused at other locations and with other
row, initial glimpse (purple box) and higher-level reference frames selected (red, transformations to compose new digits. c, Higher-level part locations selected by
green and blue boxes) at higher-level time steps (‘macro-steps’); second row, a trained APC model for a particular class of clothing items in the Fashion-MNIST
regions fixated at lower-level time steps (‘micro-steps’) within each higher-level dataset (red, green and blue dots show the average sampled locations fixated in
reference frame; third and fourth rows, predicted versus actual glimpses; fifth the following order: first, red; second, green; third, blue). Note the differences
row, the model’s ‘perception’ over time (object reconstructed by a decoder in the model’s fixation strategies between vertically symmetric items (shirts,
network from the current network state). Note the model’s ‘perceptual’ stability trousers, bags) and footwear (sandals, sneakers, boots). d, An APC model trained
despite jumps in actual glimpses, enabled by predictions of the glimpses similar on the Omniglot handwritten characters dataset (from 50 different alphabets)
to visual cortical predictions before eye movements (figure in Box 2, panel b). can transfer its learned knowledge to predict parts of previously unseen
b, The digit ‘8’ is parsed by a trained APC model as a parse tree of parts and subparts character classes. First column, input image from a new character class. Middle
(left) and their corresponding coordinates (locations) within their respective column, APC model’s reconstruction of the input. Last column, parts predicted
reference frames (right). The representation is compositional: the same set of by the model (d, adapted with permission from ref. 32, MIT Press).
in our retinal images due to eye movements: the model maintains a error input to the network to zero forces the model to predict the next
stable visual hypothesis that is gradually refined without exhibiting sequence of parts and ‘complete’ an object32. Finally, compositional
the rapid changes seen in the sampled image regions (Fig. 3a, actual learning in the APC model facilitates transfer of learned knowledge to
glimpses). This ‘perceptual’ stability is enabled by the model’s ability new objects (Fig. 3d).
to predict the expected glimpses for each planned ‘eye movement’
(Fig. 3a, predicted glimpses), similar to predictive activity observed in Planning and navigation using hierarchical world models
the visual cortex before eye movements (figure in Box 2, panel b)14–16. Interestingly, the same APC framework used above for active vision
Fig. 3b shows a learned part–whole hierarchy for a digit in terms can also be used for planning hierarchical actions for tasks such as
of strokes and mini-strokes along with their locations within nested navigation. Consider the problem of navigating from any starting
reference frames. The model learns different parsing strategies for location to any goal location in a large ‘multi-room’ building envi-
different classes of objects (Fig. 3c). Setting the image-prediction ronment such as the one in Fig. 4a (gray, walls; blue circle, current
a A1 c
1
A3
b A4
x x x
x A3 A2 A3
A3
x A1 x A1 x A7 x
Reward!
d e f
100 Low-level planning RL agent
APC agent
APC planning 10
Number of planning
80
100
5 Pretraining
80
***
Correct (%)
60
steps
60
0
40 40
20
–5 Scratch
20 0
–10 1 2 3 4 5
0
Session
0 5 10 15 20 0 500 1,000 1,500 2,000
Distance from goal Episodes
Fig. 4 | Hierarchical planning. a, The problem of navigating in a large only three high-level actions (bottom). Small red dot, intermediate location;
environment (left) can be reduced to planning using high-level states (red- and small blue dot, intermediate goal. d, High-level planning by the APC model versus
yellow-outlined ‘rooms’) and high-level abstract actions (panels on the right low-level heuristic planning using primitive actions (see text for details). e, The
show two abstract actions, A1 and A3). Blue, current location; gray, walls; green, APC model can reuse learned high-level actions in new combinations to quickly
current goal location. b, To navigate to the goal, the APC model uses its learned solve new tasks (green circles, times at which the navigation goal changed); a
high-level state network to sample K high-level state–action sequences (K = 2 reinforcement learning (RL) agent needs to relearn a new policy from scratch.
here, shown bifurcating from the initial state). In each sequence, the high-level Blue- or red-shaded regions in d,e are 1 s.d. from the mean. f, Mice pretrained on
state is depicted by a predicted room image (red- or yellow-outlined image) and two subtasks quickly learned to combine them to solve a new composite task75
its location (marked by an ‘X’ in the rectangular global frame below the image). (compare with the APC model in e after a goal change). Blue, performance of mice
High-level actions are depicted as square local frames (next to arrows) with learning the task from scratch (compare with the reinforcement learning agent in
goal locations (purple). c, Given the sampled sequences, the model picks the e after a goal change) (a–e, adapted with permission from ref. 32, MIT Press;
sequence with the highest total reward, executes this sequence’s first (high-level) f, adapted from ref. 75, CC BY 4.0).
action to reach the blue location (top) and repeats to reach the goal location with
location; green square, current goal location). Here, the lower-level each trained using reinforcement learning to reach one of the four
states of the APC model are locations in the grid, and lower-level corners of S1 or S2 (see ref. 32 for details). Defining these policies to
actions are going north, east, south or west, with a large reward at operate within the local reference frame of the higher-level state S1 or
the goal location and smaller negative rewards for each action to S2 (regardless of global location in the building) allows the same policy
encourage shorter paths. to be reused at multiple locations.
Just as an object consists of parts at different locations, the build- The higher-level state network was trained to predict the next
ing environment in Fig. 4a is composed of smaller elements (two 3 × 3 higher-level state. This trained higher-level network was used for plan-
‘room types’, S1 (red) and S2 (yellow)) at different locations in the global ning (using model predictive control66): random state–action trajecto-
reference frame of the building. The higher-level states of the APC ries of length 4 were generated using the higher-level state network by
model are defined by state-embedding vectors S1 and S2, trained to starting from the current higher-level state and picking at random one
generate, via the top–down network Hs (figure in Box 3, panel a), the of the four higher-level actions Ai for each next higher-level state. The
lower-level transition functions fŝ for rooms S1 and S2, respectively. action sequence with the highest total reward was selected, and its first
Similar to how the APC vision model reconstructed an image in action was executed. This process was repeated. Fig. 4b,c illustrates this
the section Active visual perception and part–whole learning by com- high-level planning process using the trained APC model.
posing parts from subparts, the APC model for planning computes Fig. 4d illustrates the efficacy of the APC model’s high-level plan-
higher-level action embedding vectors Ai (option vectors) that gener- ning compared to lower-level planning using primitive actions (see ref.
ate, via the top–down network Ha (figure in Box 3, panel a), lower-level 32 for details): the APC model takes significantly fewer planning steps
policies fâ that produce primitive actions (north, east, south or west) and can reuse its learned higher-level actions in new combinations to
from any location in the local reference frame (S1 or S2) to reach a local quickly solve new tasks (for example, when the goal is changed; Fig. 4e),
goal i within that frame. Fig. 4a (right) illustrates two of the eight Ai, similar to a recent study in mice75 (Fig. 4f).
Box 4
Episodic memories and cortical–hippocampal binding network when the inputs are natural videos comprise oriented spati-
Each level of the APC hierarchy learns generic ‘basis functions’ for rep- otemporal Gabor filters coding for edges and bars moving at different
resenting states and actions based on interactions with the environ- orientations61. The neural activity vector is a specific activation pattern
ment. For example, the basis functions learned by the lowest-level state coding for the current video segment in terms of the learned
spatiotemporal filters. At the highest level N of the APC model, the both a state-transition network for state prediction and an action net-
(N) (N)
specific activation patterns s ̂ and a ̂ together represent (figure in work for action (or goal) prediction, and (2) higher-area neurons rep-
Box 3, panel e, with i + 1 = N) an entire sequence (or timeline76) corre- resenting more abstract states and actions modulate lower-area state
sponding to the current episode of interaction with the environment, and action networks via top–down modulatory control to change the
for example, the sequence of glimpses of an object as in Fig. 3a or the functions they are computing, leading to nested reference frames and
sequence of locations visited during navigation as in Fig. 4c (bottom). hierarchical representations of objects, states and actions. A possible
(N) (N)
By ‘binding’ these highest-level neural activation patterns s ̂ and a ̂ neuroanatomical mapping of the APC model to cortical laminar struc-
(assumed to correspond to entorhinal cortex activity in the APC model) ture was suggested in the section APC module and neuroanatomical
in a hippocampus-like associative memory, one can store the current implementation.
sequence of experienced sensations (vision, touch, sound, smell, The APC model lends support to the hypothesis20–23 that there may
rewards, etc.), locations (or coordinates within a reference frame) and be a unifying computational principle operating across the cortex by
actions as an episodic memory vector m (ref. 61). showing how the same basic APC architecture can perform a diverse
The projection from the hippocampus back to the entorhinal range of computations (see Box 4 for a summary). The APC model
cortex implies that the fused multimodal information in the current shares broad similarities with a number of other models advocating
episodic memory vector m is fed back to enable better prediction (m prediction and hierarchy as core aspects of brain function1,3,22,23,79–83,
plays a role similar to context windows in transformers in AI) and to going back to the seminal early work of MacKay84 and Albus85. The
influence state and action estimation in cortical areas down the hierar- goal of putting action on an equal footing with perception in terms
chy to the lowest levels. Particularly salient episodic memory vectors of Bayesian inference and prediction error minimization is in keeping
may be stored and later compositionally recombined73 or recalled for with the theories of free energy minimization proposed by Friston and
replay in the cortex when given an internal or external cue, for example, others3,8,69. In its current formulation, APC addresses action selection
a location where the episode occurred, a sound or smell associated via reinforcement learning (see the section Active visual perception
with the episode or a partial visual input marking the beginning of the and part–whole learning) and planning via model predictive control
episode (see ref. 61 for an example). (as described in the section Planning and navigation using hierar-
In summary, the APC model suggests that the cortex encodes chical world models). The latter is related to planning as inference
generic semantic knowledge about the world within state and action methods63–65 and active inference schemes that optimize expected
networks that implement nested reference frames. Any particular information gain plus expected value8,69.
instantiation of this knowledge invoked by, for example, an interac- Compositionality and the representation of sensory–motor infor-
tion with a person or an object, is stored temporarily as an episodic mation in cortical columns are also central tenets of the ‘thousand
memory vector m in the hippocampus. This instantiation could be brains’ theory23,59. The close interaction between state-estimation
used for reasoning about the current situation or for planning and, if networks and action-computing networks in the APC model is con-
deemed important, could be consolidated within the cortex by updat- sistent with theories of optimal motor control86, especially theories
ing cortical networks via replay during inactivity or sleep. The idea highlighting the importance of internal models in solving the inverse
of fast binding of specific instances (‘fillers’) with generic semantic problem of computing optimal motor commands to solve a task81,87,88.
‘roles’ is gaining currency in both AI58 and hippocampal modeling (for However, based on recent evidence pointing to outputs from layer 5 in
example, the Tolman–Eichenbaum machine77; see also ref. 73). The essentially all cortical areas to subcortical motor centers19,34,35,37, the APC
benefits of such a representation, including fast transfer of knowl- model proposes that all cortical areas include both state-estimation
edge and zero-shot learning, can be expected to also accrue to the and policy components. M1 is often cited as a uniquely ‘motor’ cortical
memory-augmented APC model. area missing the sensory input layer 4, with damage to M1 in primates
causing permanent loss of distal (although not proximal) movements89.
Learning abstract concepts However, even M1 receives sensory information from other cortical and
I briefly sketch here how the same sensory–motor architecture used subcortical areas90, especially in its superficial layers39,40, and could
for perception and planning above could also be potentially used to therefore, as suggested by the APC model, predict and estimate state
model abstract concepts. Take, for example, modeling the concept of (for example, proprioceptive state) and compute actions based on
a family tree. The state–action representations in the APC model can these state estimates91.
be made categorical (for example, as in ref. 78), allowing states and The APC generative model in the figure in Box 3 (panel a) focuses on
actions to represent symbols. The states can then represent abstract hierarchical structure and does not account for cross-modal (sensory
categories such as father, mother, daughter, uncle, etc., while abstract to sensory) or hierarchically ‘horizontal’ connections in the neocortex
actions (up, down, etc.) can be used to traverse and define a family tree (for example, ref. 92). However, it is possible to extend APC’s generative
sequentially. The notion of ‘fast binding’ of cortical representations in model to allow cross-modal influences and horizontal interactions to
hippocampal memory discussed above could be used to bind specific enable more accurate state prediction and estimation. For example,
persons to their roles (father, mother, etc.). consider a generative model evolved for use by an animal foraging in
Results along these lines were obtained using the Tolman–Eichen- the forest: the hidden state denoting, for example, a tiger, can generate
baum machine model77, in which a recurrent neural network (similar both a visual cue (stripes) and an auditory cue (rustling sound). In the
to the state-transition network in the APC model but for a single level) extended APC model employing such a generative model, the state
was used in conjunction with an associative memory to learn the struc- network in a sensory area (for example, V1) would leverage information
ture of family trees from examples. Extending these ideas to abstract from other sensory modalities (for example, from the auditory cortex)
state–action networks for symbolic reasoning in a hierarchical APC via horizontal cortical connections to derive an accurate estimate of
model may offer new insights into understanding how cortical–hip- the current state of the world. Extending the APC model to account
pocampal networks represent language and solve abstract cognitive for such cross-modal and horizontal cortico–cortical connections is
tasks such as arithmetic. an important direction for future work.
A large number of unknowns remain such as the exact physiologi-
Discussion cal mechanisms underlying the modulatory interactions between
Inspired by recent results highlighting the influence of actions across higher-order and lower-order cortical areas across multiple time
most areas of the cortex, I suggested APC as a sensory–motor theory of scales, the role of alpha, beta, theta and gamma oscillations in such
cortical function. APC proposes that (1) each cortical area implements interactions and the representation of uncertainty in the cortex.
While there is emerging neurophysiological and neuroanatomical 23. Hawkins, J. A Thousand Brains: A New Theory of Intelligence
evidence2,9–11,13–16,18,41,42,51,55,93 that lends some support to the APC model’s (Basic Books, 2021).
predictions (Box 4), there is much that remains to be tested. I hope 24. Douglas, R. J. & Martin, K. A. Neuronal circuits of the neocortex.
that the theoretical framework offered by the APC model is helpful in Annu. Rev. Neurosci. 27, 419–451 (2004).
the design of new experiments aimed at uncovering the cortical and 25. Harris, K. D. & Shepherd, G. M. The neocortical circuit: themes and
subcortical basis of sensory–motor processing and cognition. variations. Nat. Neurosci. 18, 170–181 (2015).
26. Roe, A. W., Pallas, S. L., Kwon, Y. H. & Sur, M. Visual projections
References routed to the auditory pathway in ferrets: receptive fields of visual
1. Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: neurons in primary auditory cortex. J. Neurosci. 12, 3651–3664
a functional interpretation of some extra-classical receptive-field (1992).
effects. Nat. Neurosci. 2, 79–87 (1999). 27. Felleman, D. & Essen, D. V. Distributed hierarchical processing in
2. Keller, G. B. & Mrsic-Flogel, T. D. Predictive processing: a the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).
canonical cortical computation. Neuron 100, 424–435 (2018). 28. Murray, J. D. et al. A hierarchy of intrinsic timescales across
3. Friston, K. & Kiebel, S. Predictive coding under the free-energy primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).
principle. Philos. Trans. R. Soc. London B Biol. Sci. 364, 1211–1221 29. Siegle, J. H. et al. Survey of spiking in the mouse visual system
(2009). reveals functional hierarchy. Nature 592, 86–92 (2021).
4. Jiang, L. P. & Rao, R. P. N. Predictive coding theories of cortical 30. Grafton, S. T. & de C. Hamilton, A. F. Evidence for a distributed
function. Oxford Research Encyclopedia of Neuroscience https:// hierarchy of action representation in the brain. Hum. Mov. Sci. 26,
doi.org/10.1093/acrefore/9780190264086.013.328 (Oxford Univ. 590–616 (2007).
Press, 2022). 31. Gklezakos, D. C. & Rao, R. P. N. Active predictive coding networks:
5. Halpern, B. P. Tasting and smelling as active, exploratory sensory a neural solution to the problem of learning reference frames
processes. Am. J. Otolaryngol. 4, 246–249 (1983). and part–whole hierarchies. Preprint at arxiv.org/abs/2201.08813
6. Lederman, S. J. & Klatzky, R. L. Hand movements: a window into (2022).
haptic object recognition. Cogn. Psychol. 19, 342–368 (1987). 32. Rao, R. P. N., Gklezakos, D. C. & Sathish, V. Active predictive
7. Ahissar, E. & Assa, E. Perception as a closed-loop convergence coding: a unifying neural model for active perception,
process. eLife 5, e12830 (2016). compositional learning, and hierarchical planning. Neural
8. Friston, K. The free-energy principle: a unified brain theory? Nat. Comput. 36, 1–32 (2024).
Rev. Neurosci. 11, 127–138 (2010). 33. Fisher, A. & Rao, R. P. N. Recursive neural programs: a
9. Zatka-Haas, P., Steinmetz, N. A., Carandini, M. & Harris, K. D. differentiable framework for learning compositional part–whole
Sensory coding and the causal impact of mouse cortex in a visual hierarchies and image grammars. PNAS Nexus 2, pgad337 (2023).
decision. eLife 10, e63163 (2021). 34. Kasper, E., Larkman, A., Lübke, J. & Blakemore, C. Pyramidal
10. Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. neurons in layer 5 of the rat visual cortex. I. Correlation among
Distributed coding of choice, action and engagement across the cell morphology, intrinsic electrophysiological properties, and
mouse brain. Nature 576, 266–273 (2019). axon targets. J. Comp. Neurol. 339, 459–474 (1994).
11. Stringer, C. et al. Spontaneous behaviors drive multidimensional, 35. Stebbings, K., Lesicko, A. & Llano, D. The auditory
brainwide activity. Science 364, eaav7893 (2019). corticocollicular system: molecular and circuit-level
12. Talluri, B. C. et al. Activity in primate visual cortex is minimally considerations. Hear. Res. 314, 51–59 (2014).
driven by spontaneous movements. Nat. Neurosci. 26, 1953–1959 36. Xiong, X. et al. Auditory cortex controls sound-driven innate
(2023). defense behaviour through corticofugal projections to inferior
13. Jordan, R. & Keller, G. B. Opposing influence of top–down and colliculus. Nat. Commun. 6, 7224 (2015).
bottom–up input on excitatory layer 2/3 neurons in mouse 37. Frezel, N. et al. In-depth characterization of layer 5 output
primary visual cortex. Neuron 108, 1194–1206 (2020). neurons of the primary somatosensory cortex innervating the
14. Nakamura, K. & Colby, C. L. Updating of the visual representation mouse dorsal spinal cord. Cereb. Cortex Commun. 1, tgaa052
in monkey striate and extrastriate cortex during saccades. Proc. (2020).
Natl Acad. Sci. USA 99, 4026–4031 (2002). 38. Rathelot, J. A. & Strick, P. L. Subdivisions of primary motor cortex
15. Duhamel, J. R., Colby, C. L. & Goldberg, M. E. The updating of the based on cortico–motoneuronal cells. Proc. Natl Acad. Sci. USA
representation of visual space in parietal cortex by intended eye 106, 918–923 (2009).
movements. Science 255, 90–92 (1992). 39. Mao, T. et al. Long-range neuronal circuits underlying the
16. Umeno, M. M. & Goldberg, M. E. Spatial processing in the monkey interaction between sensory and motor cortex. Neuron 72, 111–123
frontal eye field. I. Predictive visual responses. J. Neurophysiol. 78, (2011).
1373–1383 (1997). 40. Hooks, B. M. et al. Organization of cortical and thalamic input
17. Wurtz, R. H., McAlonan, K., Cavanaugh, J. & Berman, R. A. Thalamic to pyramidal neurons in mouse motor cortex. J. Neurosci. 33,
pathways for active vision. Trends Cogn. Sci. 15, 177–184 (2011). 748–760 (2013).
18. Sherman, S. M. & Guillery, R. W. Functional Connections of Cortical 41. Heindorf, M., Arber, S. & Keller, G. B. Mouse motor cortex
Areas: A New View from the Thalamus (MIT, 2013). coordinates the behavioral response to unpredicted sensory
19. Prasad, J., Carroll, B. & Sherman, S. Layer 5 corticofugal projections feedback. Neuron 99, 1040–1054 (2018).
from diverse cortical areas: variations on a pattern of thalamic and 42. Holey, B. E. & Schneider, D. M. Sensation and expectation are
extrathalamic targets. J. Neurosci. 40, 5785–5796 (2020). embedded in mouse motor cortical activity. Preprint at bioRxiv
20. Mountcastle, V. in the Mindful Brain (eds Edelman, G. & https://doi.org/10.1101/2023.09.13.557633 (2023).
Mountcastle, V.) 7–50 (MIT, 1978). 43. Kim, E., Juavinett, A., Kyubwa, E., Jacobs, M. & Callaway, E.
21. Creutzfeldt, O. D. Generality of the functional structure of the Three types of cortical layer 5 neurons that differ in brain-wide
neocortex. Naturwissenschaften 64, 507–517 (1977). connectivity and function. Neuron 88, 1253–1267 (2015).
22. Mumford, D. On the computational architecture of the neocortex. 44. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An
II. The role of cortico–cortical loops. Biol. Cybern. 66, 241–251 Introduction 2nd edn http://incompleteideas.net/book/the-book-
(1992). 2nd.html (MIT Press, 2018).
45. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and 68. Kahneman, D. Thinking, Fast and Slow (Farrar, Straus and Giroux,
acting in partially observable stochastic domains. Artif. Intell. 101, 2011).
99–134 (1998). 69. Friston, K., Parr, T. & de Vries, B. The graphical brain: belief
46. Rao, R. P. N. Decision making under uncertainty: a neural model propagation and active inference. Netw. Neurosci. 1, 381–414
based on partially observable Markov decision processes. Front. (2017).
Comput. Neurosci. 4, 146 (2010). 70. O’Reilly, R. C., Russin, J. L., Zolfaghar, M. & Rohrlich, J. Deep
47. von Helmholtz, H. Handbuch der Physiologischen Optik Vol. 3 predictive learning in neocortex and pulvinar. J. Cogn. Neurosci.
(Voss, 1867). 33, 1158–1196 (2021).
48. Friston, K., Adams, R. A., Perrinet, L. & Breakspear, M. Perceptions 71. Willett, F. R. et al. Hand knob area of premotor cortex represents
as hypotheses: saccades as experiments. Front. Psychol. 3, 151 the whole body in a compositional way. Cell 181, 396–409 (2020).
(2012). 72. Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang,
49. Bastos, A. M. et al. Canonical microcircuits for predictive coding. X.-J. Task representations in neural networks trained to perform
Neuron 76, 695–711 (2012). many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019).
50. Shipp, S. Neural elements for predictive coding. Front. Psychol. 7, 73. Kurth-Nelson, Z. et al. Replay and compositional computation.
1792 (2016). Neuron 111, 454–469 (2023).
51. Miura, S. & Scanziani, M. Distinguishing externally from saccade- 74. Mnih, V., Heess, N., Graves, A. & Kavukcuoglu, K. Recurrent
induced motion in visual cortex. Nature 610, 135–142 (2022). models of visual attention. In Advances in Neural Information
52. Keller, G. B., Bonhoeffer, T. & Hübener, M. Sensorimotor mismatch Processing Systems 27 (eds Ghahramani, Z. et al.) 2204–2212
signals in primary visual cortex of the behaving mouse. Neuron (Curran Associates, 2014).
74, 809–815 (2012). 75. Makino, H. Arithmetic value representation for hierarchical
53. Bastos, A. M., Lundqvist, M., Waite, A. S., Kopell, N. & Miller, E. behavior composition. Nat. Neurosci. 26, 140–149 (2023).
K. Layer and rhythm specificity for predictive routing. Proc. Natl 76. Hogendoorn, H. Perception in real-time: predicting the present,
Acad. Sci. USA 117, 31459–31469 (2020). reconstructing the past. Trends Cogn. Sci. 26, 128–141 (2022).
54. Leinweber, M., Ward, D. R., Sobczak, J. M., Attinger, A. & 77. Whittington, J. C. R. et al. The Tolman–Eichenbaum machine:
Keller, G. B. Sensorimotor circuit in mouse cortex for visual flow unifying space and relational memory through generalization in
predictions. Neuron 95, 1420–1432 (2017). the hippocampal formation. Cell 183, 1249–1263 (2020).
55. Schneider, D. M., Sundararajan, J. & Mooney, R. A cortical filter 78. Hafner, D., Lee, K.-H., Fischer, I. & Abbeel, P. Deep hierarchical
that learns to suppress the acoustic consequences of movement. planning from pixels. In Advances in Neural Information
Nature 561, 391–395 (2018). Processing Systems 35 (eds Koyejo, S. et al.) 26091–26104 (Curran
56. Markov, N. T. & Kennedy, H. The importance of being hierarchical. Associates, 2022).
Curr. Opin. Neurobiol. 23, 187–194 (2013). 79. Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the
57. Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. visual cortex. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 20, 1434–1448
Building machines that learn and think like people. Behav. Brain (2003).
Sci. 40, e253 (2017). 80. George, D. & Hawkins, J. Towards a mathematical theory of
58. Smolensky, P., McCoy, R. T., Fernandez, R., Goldrick, M. & cortical micro-circuits. PLoS Comput. Biol. 5, e1000532 (2009).
Gao, J. Neurocompositional computing: from the central paradox 81. Wolpert, D. M. & Miall, R. C. Forward models for physiological
of cognition to a new generation of AI systems. AI Mag. 43, motor control. Neural Netw. 9, 1265–1279 (1996).
308–322 (2022). 82. Mehta, M. R. Neuronal dynamics of predictive coding.
59. Lewis, M., Purdy, S., Ahmad, S. & Hawkins, J. Locations in the Neuroscientist 7, 490–495 (2001).
neocortex: a theory of sensorimotor object recognition using 83. Heeger, D. J. Theory of cortical function. Proc. Natl Acad. Sci. USA
cortical grid cells. Front. Neural Circuits 13, 22 (2019). 114, 1773–1782 (2017).
60. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive 84. Mackay, D. in Automata Studies (eds Shannon, C. E. & McCarthy, J.)
field properties by learning a sparse code for natural images. 235–251 (Princeton Univ., 1956).
Nature 381, 607–609 (1996). 85. Albus, J. S. Brains, Behavior and Robotics (BYTE, 1981).
61. Jiang, L. P. & Rao, R. P. N. Dynamic predictive coding: a model of 86. Scott, S. Optimal feedback control and the neural basis of
hierarchical sequence learning and prediction in the neocortex. volitional motor control. Nat. Rev. Neurosci. 5, 532–546 (2004).
PLoS Comput. Biol. 20, e1011801 (2024). 87. Jordan, M. I. & Rumelhart, D. E. Forward models: supervised
62. Whittington, J. C. R. & Bogacz, R. An approximation of the error learning with a distal teacher. Cogn. Sci. 16, 307–354 (1992).
backpropagation algorithm in a predictive coding network with local 88. Kawato, M. Internal models for motor control and trajectory
Hebbian synaptic plasticity. Neural Comput. 29, 1229–1262 (2017). planning. Curr. Opin. Neurobiol. 9, 718–727 (1999).
63. Attias, H. Planning by probabilistic inference. In Proceedings of 89. Fetz, E. E. in Textbook of Physiology (eds Patton, H. D. et al.)
the Ninth International Workshop on Artificial Intelligence and 608–631 (Saunders, 1989).
Statistics (AISTATS 2003) (eds Bishop, C. M. & Frey, B. J.) 9–16 90. Jones, E. G., Coulter, J. D. & Hendry, S. H. C. Intracortical
(PMLR, 2003). connectivity of architectonic fields in the somatic sensory, motor
64. Verma, D. & Rao, R. P. N. Planning and acting in uncertain and parietal cortex of monkeys. J. Comp. Neurol. 181, 291–347
environments using probabilistic inference. In 2006 IEEE/RSJ (1978).
International Conference on Intelligent Robots and Systems 91. Adams, R., Shipp, S. & Friston, K. Predictions not commands:
2382–2387 (IEEE, 2006). active inference in the motor system. Brain Struct. Funct. 218,
65. Botvinick, M. & Toussaint, M. Planning as inference. Trends Cogn. 611–643 (2013).
Sci. 16, 485–488 (2012). 92. Falchier, A., Clavagnier, S., Barone, P. & Kennedy, H. Anatomical
66. Richards, A. Robust Constrained Model Predictive Control. PhD evidence of multimodal integration in primate striate cortex.
thesis, MIT (2004). J. Neurosci. 22, 5749–5759 (2002).
67. Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized 93. Audette, N. J., Zhou, W., La Chioma, A. & Schneider, D. M. Precise
behavior and its neural foundations: a reinforcement learning movement-based predictions in the mouse auditory cortex.
perspective. Cognition 113, 262–280 (2009). Curr. Biol. 32, 4925–4940 (2022).
94. Craik, K. J. W. The Nature of Explanation (Macmillan, 1943). 110. Friston, K. J., Rosch, R., Parr, T., Price, C. & Bowman, H. Deep
95. Churchland, M. M. et al. Neural population dynamics during temporal models and active inference. Neurosci. Biobehav. Rev.
reaching. Nature 487, 51–56 (2012). 77, 388–402 (2017).
96. Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V.
A neural network that finds a naturalistic solution for the Acknowledgements
production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015). I thank A. Fisher, D. Gklezakos, P. Jiang, P. Rangarajan and V. Sathish
97. Sauerbrei, B. A. et al. Cortical pattern generation during for many discussions and the collaborative work cited in the text.
dexterous movement is input-driven. Nature 577, 386–391 I also thank K. Friston, C. Eliasmith and members of his laboratory,
(2020). researchers at Numenta, S. Mirbagheri, N. Steinmetz and G. Burachas
98. Chaudhuri, R., Knoblauch, K., Gariel, M.-A., Kennedy, H. & for discussions and feedback. This work was supported by National
Wang, X.-J. A large-scale circuit mechanism for hierarchical Science Foundation EFRI grant 2223495, National Institutes of Health
dynamical processing in the primate cortex. Neuron 88, 419–431 grant 1UF1NS126485-01, the Defense Advanced Research Projects
(2015). Agency under contract HR001120C0021, a UW + Amazon Science
99. Salinas, E. & Sejnowski, T. J. Gain modulation in the central Hub grant, a Weill Neurohub Investigator grant, a Frameworks grant
nervous system: where behavior, neurophysiology, and from the Templeton World Charity Foundation and a Cherng Jia and
computation meet. Neuroscientist 7, 430–440 (2001). Elizabeth Yun Hwang Professorship. The opinions expressed in this
100. Larkum, M. E., Senn, W. & Lüscher, H.-R. Top–down dendritic input publication are those of the author and do not necessarily reflect the
increases the gain of layer 5 pyramidal neurons. Cereb. Cortex 14, views of the funders.
1059–1070 (2004).
101. Ferguson, K. A. & Cardin, J. A. Mechanisms underlying gain Competing interests
modulation in the cortex. Nat. Rev. Neurosci. 21, 80–92 (2020). The author declares no competing interests.
102. Stroud, J. P., Porter, M. A., Hennequin, G. & Vogels, T. P. Motor
primitives in space and time via targeted gain modulation in Additional information
cortical networks. Nat. Neurosci. 21, 1774–1783 (2018). Correspondence should be addressed to Rajesh P. N. Rao.
103. McAdams, C. J. & Maunsell, J. H. R. Effects of attention on
orientation–tuning functions of single neurons in macaque Peer review information Nature Neuroscience thanks Karl Friston,
cortical area V4. J. Neurosci. 19, 431–441 (1999). Aleena Garner, and the other, anonymous, reviewer(s) for their
104. Ha, D., Dai, A. M. & Le, Q. V. Hypernetworks. In 5th International contribution to the peer review of this work.
Conference on Learning Representations (ICLR 2017) openreview.
net/forum?id=rkpACe1lx (OpenReview.net, 2017). Reprints and permissions information is available at
105. Eliasmith, C. et al. A large-scale model of the functioning brain. www.nature.com/reprints.
Science 338, 1202–1205 (2012).
106. Galanti, T. & Wolf, L. On the modularity of hypernetworks. In Publisher’s note Springer Nature remains neutral with regard
Advances in Neural Information Processing Systems 33 (eds to jurisdictional claims in published maps and institutional
Larochelle, H. et al.) 10409–10419 (Curran Associates, 2020). affiliations.
107. Tomov, M. S., Yagati, S., Kumar, A., Yang, W. & Gershman, S. J.
Discovery of hierarchical representations for efficient planning. Springer Nature or its licensor (e.g. a society or other partner) holds
PLoS Comput. Biol. 16, e1007594 (2020). exclusive rights to this article under a publishing agreement with
108. Olson, C. R. Brain representation of object-centered space in the author(s) or other rightsholder(s); author self-archiving of the
monkeys and humans. Annu. Rev. Neurosci. 26, 331–354 (2003). accepted manuscript version of this article is solely governed by the
109. George, D. et al. Clone-structured graph representations enable terms of such publishing agreement and applicable law.
flexible learning and vicarious evaluation of cognitive maps.
Nat. Commun. 12, 2392 (2021). © Springer Nature America, Inc. 2024