0% found this document useful (0 votes)
19 views12 pages

Bio Arm

This document discusses the use of neural reservoir computing for controlling a soft bio-hybrid arm, which integrates multiple muscle-tendon groups around an elastic spine. The approach demonstrates improved performance in dynamic control tasks compared to traditional neural network methods, achieving significant energy efficiency when implemented on neuromorphic hardware. The findings highlight advancements in soft robotic control and the potential for novel bio-hybrid designs that merge engineering and biology.

Uploaded by

2ff5kmtb4k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Bio Arm

This document discusses the use of neural reservoir computing for controlling a soft bio-hybrid arm, which integrates multiple muscle-tendon groups around an elastic spine. The approach demonstrates improved performance in dynamic control tasks compared to traditional neural network methods, achieving significant energy efficiency when implemented on neuromorphic hardware. The findings highlight advancements in soft robotic control and the potential for novel bio-hybrid designs that merge engineering and biology.

Uploaded by

2ff5kmtb4k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Neural reservoir control of a soft bio-hybrid arm

§
Noel Naughton1,5† , Arman Tekinalp2,5 ‡ , Keshav Shivam3,5 , Seung Hung Kim2,5 , Volodymyr
Kindratenko3,5 , and Mattia Gazzola2,3,4,5*
1 Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, 2 Mechanical Science and Engineering, University

of Illinois Urbana-Champaign , 3 National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, 4 Carl R. Woese Institute
for Genomic Biology, University of Illinois Urbana-Champaign, 5 The Grainger College of Engineering, University of Illinois Urbana-Champaign

A long-standing engineering problem, the control of soft robots is difficult because of their highly non-linear, heterogeneous,
anisotropic, and distributed nature. Here, bridging engineering and biology, a neural reservoir is employed for the dynamic control of
a bio-hybrid model arm made of multiple muscle-tendon groups enveloping an elastic spine. We show how the use of reservoirs
facilitates simultaneous control and self-modeling across a set of challenging tasks, outperforming classic neural network approaches.
Further, by implementing a spiking reservoir on neuromorphic hardware, energy efficiency is achieved, with nearly two-orders of
magnitude improvement relative to standard CPUs, with implications for the on-board control of untethered, small-scale soft robots.
arXiv:2503.09477v1 [cs.RO] 12 Mar 2025

Hyper-redundancy, underactuation, distributedness, and con- in principle can be any dynamical system (31), integrates and
tinuum mechanics are defining features of soft robots (artificial projects input data streams into a separable, high-dimensional
or biological (1–8)), intrinsic to their compliant, elastic constitu- latent space that decomposes non-linear correlations. Reservoir
tive materials. These traits are attractive in the pursuit of extreme dynamics are then sampled and recombined via linear maps
reconfigurability, morphological adaptivity, delicacy and dexter- into desired computations. While a variety of algorithmic and
ity, for applications in medicine, defense, or agriculture (9–12). physical implementations have been proposed (29, 31), in gen-
However, the same advantageous traits pose fundamental chal- eral the RC paradigm offers a number of appealing features
lenges in control, due to the associated vast space of degrees of (27, 32): it does not require expensive backpropagation (only
freedom and the highly non-linear dynamics involved. output linear maps are learned), it intrinsically accommodates
Recognizing the importance of the problem, significant effort nonlinear, spatiotemporally correlated dynamics, it is natively
has been devoted to the development of control approaches parallel as multiple maps can be learned separately to achieve
suitable for continuum and elastic structures (13–15). Model- different tasks while running on the same reservoir, and can be
based controllers have proven effective in quasi-static settings, matched with specialized hardware (e.g., neuromorphic systems
but lack accuracy when inertial effects become significant and for energy efficiency (33, 34)) or ‘wetware’ (neural tissue used
typically rely on simplifying assumptions that may overlook as bio-hybrid reservoir (35)).
environmental interactions, anisotropy, or material nonlinearities In this work, we consider a neural reservoir (modeled as a re-
(14). Data-driven approaches bypass such modeling difficulties currently connected or spiking network) able to sense the shape
by directly learning associated dynamics via artificial neural (proprioception) of a simulated bio-hybrid arm made of muscles
networks (16, 17). These techniques are generally resource- and tendons enveloping an elastic spine. Neural dynamics are
intensive and may employ neural architectures poorly suited to coupled to muscle actuations through Reinforcement Learning
capture the long-range spatial and temporal dependencies that (RL), to learn to control the system’s musculature in an unsu-
define soft systems (9, 18), hindering robustness. pervised manner. We demonstrate that our approach produces
An alternative, bioinspired route to soft robotic control hinges control polices that outperform traditional deep-learning method-
on the concept of ‘mechanical intelligence’ (19–21), whereby ologies, with advantages seen to widen as the arm’s compliance
non-linear structural and environmental effects are purposefully increases. RC’s high-dimensional latent space is also leveraged,
exploited to passively achieve adaptivity, thus simplifying and through parallel output maps, for concurrent self-modeling to
robustifying control. Complementarily, rhythm-generation tech- improve control robustness during conditions of disturbance
niques inspired by the nervous system (e.g., central pattern gen- and sensing failure. Further, motivated by the their potential
erators) have also been shown effective, particularly for locomo- relevance in bio-hybrid contexts, spiking neural reservoirs are
tion (22, 23), leading to bio-hybrid attempts where neural-tissue considered, and subsequently mapped onto neuromorphic hard-
is directly integrated with soft scaffolds, muscles, and electron- ware, attaining a seventy-five fold reduction in energy usage.
ics, to drive simple actuation dynamics (7, 8, 24, 25). Neuromorphic RC is finally employed to drive the arm through
Here, with the goal of intimately meshing compliant mechan- a set of unstructured obstacles, learning to exploit solid objects
ics and neural dynamics for enhanced adaptivity and control, we to reshape and facilitate the reaching of a target — a hallmark of
numerically explore the integration of a soft bio-hybrid arm with mechanical intelligence that is automatically recovered here.
Reservoir Computing (RC). Partially inspired by the mammalian Overall, this work not only advances soft robotic control, but
neocortex (26), RC was developed (27) for processing real-time, also furthers neuro-mechano integration, a characteristic trait
analog, spatiotemporally correlated inputs, by means of a dynam- of organic systems. This, in turn, may lead to novel bio-hybrid
ically rich substrate (28–30), the reservoir. The reservoir, which designs bridging engineering and biology (7, 8, 25, 35–37) as
well as new insights into biological processing.
† Current address: Department of Mechanical Engineering, Virginia Tech,

Blacksburg, VA. ‡ Current address: Department of Mechanical Engineering,


Modeling of a soft bio-hybrid arm. While no soft, synthetic
University of Maryland, College Park, MD. § Current address: Google, Mountain actuator has yet reached biologically comparable levels of per-
View, CA, → email: [email protected] formance, improved artificial (38, 39) or tissue-bioengineered
a κi(t)
3D random Soft Arm Muscular Architecture
trajectory
+ Neural Reservoir d(t)
Average arm
curvature of State
four locations Action
κi(t) Neural Reservoir
Wi Reward: -||d(t)||2
Overlapping xt(t) xt(t) a1(t)
muscles groups Wo Orthogonally-oriented

...
Activate
Wr a16(t) muscles muscle pairings
for 3D control
Target distance from Fixed
base of arm Linear map learned via RL

b 0.0 Neural reservoir c 0.0


Trained policy RC (4096)

Average Reward
-0.1
2.2x better comparison for RC (2048)
-0.1 -0.2 4096 neurons
Faster 250 episodes RC (1024)
Average Reward

2048 neurons
-0.3 learning 1024 neurons RC (512) Improved
-0.2 LSTM network as reservoir 512 neurons control with
-0.4 size increases 256 neurons RC (256) increasing
5.9x better
128 neurons
-0.3 -0.5
64 neurons
Failure to RC (128) reservoir size
control RC (64)
0 50 100
-0.4 Feedforward network Episodes (thousands) LSTM
Feedforward
-0.5
0 20 40 60 80 100 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00
Episodes (thousands) Average Episode Score (-||d(t)||2)

d e

Side View Top View


Figure 1. Neural reservoir control of a soft bio-hybrid arm. (a) Schematic depicting the coupling of a neural reservoir with a soft bio-hybrid arm made of sixteen
muscle-tendon units enveloping an elastic spine to achieve autonomous control. (b) Learning performance for the control of a soft bio-hybrid arm (backbone stiffness:
125 kPa) tracking a 3D moving target: neural reservoir and traditional feedforward and LSTM network architectures. The solid lines depict the average learning
performance (average episode return), each obtained by considering five neural architectures instances initialized using different random number generator seeds. The
shaded regions depict the spread in learning performance of the five architectures with different initial seeds. (c) Violin plots of average performance of trained
policies evaluated over 250 episodes, for different reservoir sizes and reference architectures (FF and LSTM). Inset shows learning performance as reservoir size
increases, illustrating how both more rapid and better overall learning performance is achieved as the reservoir size increases. For FF and LSTM, different numbers
of neurons and layers / stacks were considered (FF: [64↑64], [128↑128], [64↑64↑64], [64↑64↑64↑64]; LSTM: [64↑1 stack], [128↑1 stack], [256↑1 stack],
[64↑2 stacks], [128↑2 stacks], [256↑2 stacks]), with no significant differences observed. Here we report data for the best performing networks (SI for full details).
Overlaid snapshots of (d) side view and (e) top view of the bio-hybrid arm successfully tracking a 3D target (see also SI Video 1) using a trained neural reservoir
control policy. Intensity of muscle color (pale to dark red) denotes a muscle’s activation level.

(8) prototypes are continually being developed. Cognizant of tions (i.e., joints (40)) or continuum bending and twisting (e.g.,
this rapid development, we consider the automated, dynamic octopus arms (41, 44)), while graded co-contractions can im-
control of a computational neuromuscular arm, characterized prove structural stability (45, 46) by actively increasing body
by multiple, antagonistic and overlapping muscle-tendon pairs stiffness. Similarly, overlapping muscle groups (of which the
arranged around and along an elastic supporting spine (Fig. 1a). many staggered spinal muscles of snakes provide an extreme
Such a design captures key organizational motifs commonly example (42)) enable precise, distributed, and efficient actuation
encountered in biology and, increasingly, in robotics (21, 23). (47). Our prototypical model arm then allows us to explore
Antagonistic muscles are, for example, ubiquitous in both limbed compliant actuation and control in a biologically and engineer-
(40) and limbless animals (41, 42), and popular in bio-hybrid ing relevant setting, while offering a generalizable blueprint for
robots (2, 4, 8, 43). Their asymmetric contractions enable rota- experimental implementations. For example, the elastic spine

2/12
may be realized at various scales via molding or 3D printing, proprioception is provided here through the arm’s curvature
and decorated with arrays of compliant actuators, from advanced !¯ i (t), estimated at four equally spaced locations (Methods). En-
supercoiled artificial muscles (48) to cultured muscles (8, 49). vironmental information is limited to the distance vector xt (t)
To computationally instantiate our arm, we employ a model- between the (fixed) base of the arm and the moving target, repre-
ing approach based on assemblies of Cosserat rods (Methods, senting visual or acoustic tracking. Notably, this sensory arrange-
(50, 51)). These are slender, one-dimensional elements able to ment is laboratory-frame invariant. Indeed, local curvature is
undergo all modes of deformation —bend, stretch, twist, and sufficient to recover the arm shape, while the target is sensed rel-
shear— to dynamically reconfigure in three-dimensional space. ative to the arm base. Thus, overall, the arm state is constituted
Cosserat rods are a convenient representation since they naturally by !¯ i (t) and xt (t), with all higher-order information (velocities
map to elastic beams (spine), tendons, and muscles, can actively or accelerations) implicitly captured through the memory of the
contract and stretch along their length according to prescribed reservoir (29), described next.
force-length relations, and can be connected together (via appro- Neural reservoir control. Controlling this structure is chal-
priate boundary conditions) into arbitrary architectures. They lenging because of the arm’s continuum elastic nature, char-
are thus well-suited to capture the heterogeneous, anisotropic, acterized by highly non-linear and long-range stress propaga-
and distributed nature of our bio-hybrid arm. Assemblies of rods tion effects. Indeed, distributed and localized loads (muscu-
and their governing equations are numerically discretized and lature/environment) are communicated throughout the entire
solved via the open-source software Elastica (52), whose quan- system, potentially leading to mechanical instabilities and global
titative utility has been demonstrated in a range of biophysical morphological reconfigurations. Compounding this complexity
settings, including animal locomotion (47, 53, 54), invertebrate are the non-linear contractile properties of the muscle-tendon
manipulation (44), plant dynamics (55), fibrous metamaterials units, the anisotropic and heterogeneous quality of the arm, as
(56, 57), and soft robotic design and control (8, 47, 49, 58–63). well as its inertial dynamics.
We begin with the soft elastic spine/backbone, represented To establish control (Fig. 1a), we seek to learn, in an unsu-
as a single passive rod, onto which 16 muscle-tendon units are pervised, model-free fashion, muscle activations (action: a =
patterned (Fig. 1a, Fig. S1). Three-dimensional arm deforma- {a j (t)}, j ↔ {1, ...16}) that minimize at all times the distance
tions are achieved by organizing muscle-tendon units into four d(t) between the arm tip and the target (reward: r = ↗||d(t)||2 ),
layers of orthogonal agonist-antagonistic pairs, with adjoining based on available sensory information (state: s = {xt (t), !¯ i (t)},
layers overlapping by 50% of their length and offset 45↓ to avoid i ↔ {1, ...4}). The mapping (control policy) between state and
intersection. This architecture enables bending in all directions, action space may be represented by any neural network topol-
via the coordinated activation of orthogonal muscles combined ogy (71, 72), and it is here that we insert our neural reservoir
with the omni-directional bending/twisting capability of the elas- (Fig. 1a). This is initially realized as a recurrent artificial neural
tic backbone. Further, muscle overlap allows for continuously network (Methods) paired with a single, external linear recombi-
graded contractions along the arm despite the limited number of nation layer (map) whose function is to transform the reservoir’s
individual muscles (47). Each muscle-tendon unit is modeled internal dynamics (sampled at each neuron) into muscle con-
as an individual rod with an actively-contractible muscle belly tractions a = {a j (t)}. To learn this linear map, we employ the
and tapered tendinous ends that insert into the backbone. The reinforcement learning (on-policy) proximal policy optimization
entire unit is glued lengthwise to the spine, to conform to it while (PPO) strategy (73). We emphasize here that since RC requires
bending, mimicking the presence of surrounding fascia tissue or learning the output map weights only, all connections from the
encapsulating materials. input state to the reservoir and within the reservoir itself, remain
The active force-length and force-velocity relationships of fixed, removing the need for backpropagation.
the muscle belly and its passive hyperelastic material proper- To contextualize the reservoir performance, we addition-
ties as well as the tendon’s material properties are based on ally consider feedforward (FF) and long short-term memory
reported biomechanical data (Methods). Noteworthy is the non- (LSTM) networks, reference architectures commonly employed
linear passive stress-strain behavior of muscle and tendon tissues in learning-based control (74, 75). Training is again performed
(64, 65), which are characterized by a ‘J-curve’ response that using PPO, where this time the network weights are all learned
allows for compliant deformations at small strains, before stiff- (as opposed to RC) through backpropagation. To facilitate a
ening at larger strains for structural stability (66). This is a key meaningful comparison, all network topologies are evaluated
mechanical feature of biological musculoskeletal tissues, and against the same control problem, and setup/trained according
significant engineering effort has been recently devoted to re- to the same protocol (Methods).
capitulate similar properties (67–69). Finally, each muscle is As illustrated in Fig. 1b-e and SI Video 1, upon training, the
independently controllable via a continuous tetanic activation neural reservoir successfully learns to track the target, substan-
(ai ↔ [0, 1], i ↔ {1, ... 16}). Full details are provided in Methods. tially outperforming the FF and LSTM networks. Indeed, the FF
We challenge the arm to learn to coordinate its muscle activa- network is found to quickly reach a (poor) performance limit be-
tions (action space at (t)) to dynamically track a target moving yond which it is unable to improve (Fig. 1b). The LSTM network
along a smoothly-varying, random 3D trajectory (Fig. 1a). First, instead converges to a better performance, although the ceiling
we outfit the arm with a compact set of sensory capabilities. remains significantly below the reservoir’s (which is 2.2x better).
Analogous to strain-sensing muscle spindles in vertebrates (70), The neural reservoir’s performance is also found to correlate

3/12
0.0 Compact performance envelope 0.0 0.0
a Degraded performance envelope
−0.1 −0.1 −0.1

Average Reward

Average Reward
FAILURE

FAILURE
Average Reward

−0.2 Backbone Stiffness


1.0 MPa −0.2 −0.2
500 kPa
−0.3 250 kPa −0.3 lower −0.3
decreased
125 kPa backbone =
performance
−0.4 62.5 kPa −0.4 stiffness −0.4
Neural Reservoir LSTM Network Feedforward Network
−0.5 −0.5 −0.5
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Episodes (thousands) Episodes (thousands) Episodes (thousands)

b d
100
Avg. Kinetic Energy (J)

all methods
sucessful excessive bending
Top View

failure mode
10-1

Kinetic Energy
10-2
102 103
backbone stiffness (kPa)
c 1 MPa 250 kPa 62.5 kPa
Avg. Bending Energy (J)

Neural
100 LSTM
Reservoir
Side View

FF network
10-1 LSTM network
Neural reservoir Feed
Bending Energy forward
102 103
backbone stiffness (kPa)

Figure 2. Control of increasingly compliant bio-hybrid arms. (a) Learning performance of a neural reservoir (4096 neurons) compared to LSTM and FF networks
for a soft bio-hybrid arm with decreasing backbone elastic modulii. The neural reservoir exhibits a compact performance envelope while the LSTM and FF networks
exhibit much wider performance envelopes with control performance drastically decreasing as the backbone softens. Solid and shaded regions denote the performance
average and spread relative to five randomly initialized network instances (as in Fig. 1b). Violin plots of the average (b) kinetic and (c) bending energy of the arm’s
spine, when RC, LSTM, and FF network architectures are used for controlling systems of decreasing backbone stiffness. Values reported are total kinetic/bending
energy integrated over the episode and normalized by episode length. The RC network exhibits lower energies in all cases. Bending energy is observed to decrease
linearly in the case of RC, in keep with the linear softening of the spine and in contrast with FF and LSTM, which exhibits sublinear scalings. (d) Snapshots of trained
policy performance for different backbone stiffness levels illustrating control failure modes. While all three network architectures successfully control a soft arm
with a backbone stiffness of 1 MPa, for a stiffness of 250 kPa the feedforward network fails to coordinate muscle contractions, demonstrating an excessive bending
failure mode (with an associated increased bending energy compared to RC, see panel (b)). While The LSTM network exhibits better performance, it also produces
excessive bending and fails as the backbone stiffness continues to decrease (62.5 kPa). Video comparison of tracking performances for all backbone stiffness cases is
available in SI Video 2. As in Fig. 1, for FF and LSTM, different numbers of neurons and layers / stacks were considered, with no significant differences observed
across arm’s stiffnesses. Here we report data for the best performing networks (SI for full details).

with reservoir size, whereby learning speed and tracking accu- the moving target (Fig. 2a, SI Video 2). However, as soon as
racy increase as the number of neurons grows (Fig. 1c). This is the spine softens, differences start to manifest. The learned FF
in keeping with the known ability of larger reservoirs to capture controllers are indeed observed to rapidly degrade, practically
complex dynamics (29, 76), and supports the hypothesis that the failing to coordinate the arm already at 500 kPa (SI Video 2).
control of elastic systems may benefit from a specific focus on The LSTM network instead initially maintains control, although
highly non-linear, spatiotemporally correlated dynamics. below 250 kPa its performance too rapidly degrades (SI Video 2).
To further test this hypothesis, we investigate the effect of In contrast, the neural reservoir exhibits a compact performance
the arm’s elastic spine stiffness by progressively decreasing its envelope and maintains sustained control down to a 62.5 kPa
elastic modulus (while holding mass and geometry constant). stiffness (comparable to mammalian (78) or octopus (79) muscle
This softening enables larger modes of deformations, further tissue (78), for reference).
exacerbating the system’s non-linear response, rendering control An energetic analysis reveals that the neural reservoir con-
harder and allowing us to tease out differences between RC, FF sistently minimizes (relative to FF and LSTM, Fig. 2b,c) both
and LSTM approaches. We begin with a relatively stiff backbone the kinetic and bending energy of the spine. For each network
(1 MPa, equivalent to rubber (77) and ↘10x stiffer than Fig. 1b) architecture, kinetic energy (Fig. 2b) is found to remain approxi-
for which all three network topologies successfully learn to track mately consistent across Youngs’ moduli, albeit settling at dif-

4/12
a b 10 0 d I II III IV
Tip

Relative Error
-1
State 10

performance
Steady-state
Action 10-2
κi(t) ~10%
an(t) 10-3 Reservoir startup
10-4 Target position transient effects
xt(t) Base
Wo
RL ab(t) 0 4 8 12 16 20
c 10 0
0 1 2 3
Episode time (sec) 0.0 0.1
Reinforcement Relative error

Relative Error
10-1 estimated along arm length
Supervised learning pose
S (unsupervised) 10-2
learning Wo ~10% current e3
10-3 pose
Feedback x*t(t+1); x*i(t); v*t(t); ... 10-4
e1
self- Tip position
x*t(t+1) estimate modeling e2
0 1 2 3
into state at t+1 State Estimation Forcast interval (sec)

e 0.0 f Target location Inferred location I t=1.75 II t=7.50


xt1
Estimated target trajectory

0.5
0.0
Average Reward

−0.1 baseline −0.5


performance 0.5 xt2 BLIND
(no inference) 0.0
−0.2
−0.5 SEE
1.0 xt3
−0.3 inference 0.5
0.0
0 1 2 3 0 4 8 12 16 20
Target inference period (sec) Time (sec) III t=12.75 IV t=17.75
Figure 3. Parallel maps for self-modeling and robust control. (a) Schematic of neural reservoir control equipped with additional, parallel maps to infer/predict
state information. All reported results are for an arm with a backbone stiffness of 250 kPa controlled by an already trained (via RL) neural reservoir with 4096
neurons, as described in Fig. 1. (b) Accuracy of parallel map estimates of future target positions, for increasing time-windows into the future. (c) Accuracy of parallel
map estimates of future arm’s tip position for increasing time-windows into the future. For both (b) and (c), the relative error is defined as ||ŷ ↗ y||/ω where || · || is the
L2-vector norm, ŷ is the predicted position, y is the true position, and ω is the length of the soft arm. (d) Performance of reservoir self-modeling for estimation of
current arm pose (top row). Heat map of accuracy of pose estimation along the length of the arm. Color denotes relative error of estimation of the position of a point
s ↔ [0, ω] along the soft arm of length ω. Relative error is defined as ||ŷ(s) ↗ y(s)||/s where ŷ(s) is the predicted location of point s and y(s) is the true location of that
point. Accuracy is initially lower due to transient startup effects in the reservoir before reaching a consistent level of high accuracy as confirmed (mid/bottom row)
by visualization of the estimated (gold) and true (full color) arm pose at selected time instances. (e) Tracking performance of the neural reservoir when the target
position becomes unavailable for increasing lengths of time. The target oscillates between being measured (‘seen’) and inferred (‘blind’) for equal time intervals that
range in length from 0.25 seconds to 3 seconds. Violin plots show performance over 50 trials for increasing periods of time during which the arm is blinded. The
blue shaded region shows the baseline performance of the reservoir when the target location is always known. (f) Comparison of inferred target’s 3D position for a
three-second inference period. After approximately 1.5 seconds, the estimate of the target trajectory sometimes drifts from the true trajectory, leading to the drop in
tracking performance seen in panel (e).

ferent levels, with LSTM and FF entailing ↘3x and ↘10x larger Therefore, RC-based controllers are shown to achieve higher
energies than RC, respectively. Phenomenologically, higher tracking accuracies and, concurrently, avoid unnecessary bend-
kinetic energies manifest as faster swings or pronounced struc- ing, twisting, or corrective accelerations, reflecting their ability
tural vibrations (SI Video 2), and thus the RC-based controller to capture (and predict) the arm’s dynamics, as quantified next.
(which produces reduced kinetic energies) exhibits more pre- Self-modeling for robust control. Reservoir dynamics can
cise and smoother actuation, without extensively relying on also be used for self-modeling (80, 81), that is, for the explicit
corrective muscle contractions. The neural reservoir also demon- estimation or prediction of the system’s configuration, which in
strates clear superiority in minimizing the spine’s bending energy turn can be used to robustify the controller. Self-modeling is
(Fig. 2c). It is worth noting that bending stiffness scales linearly achieved here through three dedicated maps (Fig. 3a), to extract
with the elastic modulus. We would thus expect that a controller the arm’s full pose, future target location, and future tip location.
able to effectively coordinate the arm would produce bending Maps are obtained in a supervised fashion via Ridge regression
deformations (and therefore energies) congruent with this linear based on data from the 250kPa case of Fig. 2a (Methods). The
scaling as the spine softens. This trend is indeed met by RC, maps are then directly appended to the reservoir, exploiting
while LSTM and FF networks are found to exhibit a sub-linear RC’s intrinsic parallelism whereby several computations can be
decrease. This indicates that LSTM and FF networks resort to carried out concurrently using the same underlying dynamics.
deformations that are excessive in relation to the spine’s stiffness, We show how the reservoir, using target location and curva-
eventually leading to failure (SI Video 2). Recorded muscle acti- ture proprioception history, accurately synthesizes future target
vation patterns (SI Fig. S2) suggest that RC’s greater control may locations (Fig. 3b), current arm pose (Fig. 3d, SI Video 3), and
be enabled by the use of antagonistic muscle co-contractions, future tip positions (by anticipating forthcoming muscle activa-
to locally stiffen and stabilize the arm as it softens. LSTM and tions selected by the control policy based on the predicted target
FF controllers instead do not discover this strategy, and forgo trajectory, Fig. 3c). We note that in all cases target trajectories
co-contractions in favor of one-sided activations. are randomly generated so that the reservoir must constantly

5/12
a Spike
b 0.0
c 0.0
spiking
Encoding Spike reservoir LSTM -0.1

Average Reward
Average Reward
Decoding network
-0.2
κi(t) Spiking Neural Action non-spiking 1.0 MPa
Reservoir -0.3 500 kPa
xt(t) Wi −0.1 reservoir
an(t) 0.0
250 kPa
-0.4
Wo 125 kPa
State Wr ab(t) -0.2 -0.5 62.5 kPa
Training
Performance
Encode continuous state Decode via −0.2
-0.4 0 50 100
50 100
Episodes (thousands)
into spike train mean firing rate Episodes (thousands)

d e f
10-1 on ) 0.0 Unstructured
ilic ing
Energy per Inference (J)

obstacles
Average Reward

s
l ik
na sp
10-2 i tio on- ~ 75x -0.2
d n 1.0 MPa
tra ( 500 kPa
10 -3
g) -0.4 250 kPa
2 ikin
0 01 i (s
p 125 kPa
Target
10-4 ~1 ~1 l Loih -0.6
62.5 kPa
e
Int
100 1000 0 10 20 30
Reservoir Size Episodes (thousands)
Front View Side View
Figure 4. Spiking neural reservoir control on neuromorphic hardware (a) Schematic showing how the state is encoded into a spike train (SI) that is sent to a
spiking neural reservoir running on an Intel Loihi chip. Spike train outputs are then decoded (SI) into continuous actions (contractions of the arm muscles). (b)
Comparison between the performance over 250 episodes of a trained spiking neural reservoir on Loihi (backbone stiffness: 125 kPa; reservoir size: 2048 neurons,
which is the largest implementable on Loihi), a neural reservoir running on traditional silicon using non-spiking artificial neurons, and an LSTM network (same as
Fig. 1 and Fig. 2). Inset shows the training performance for the spiking reservoir, the non-spiking reservoir, and the LSTM. (c) The spiking reservoir exhibits a
compact performance envelope as the backbone stiffness decreases. All cases are trained using five random initialization seeds as in Fig. 1b. (d) Energy use of a
non-spiking reservoir running on traditional silicon (Intel Xeon W-2665) and the spiking reservoir running on the Intel Loihi chip. The non-spiking reservoir on
traditional silicon exhibits quadratic energy scaling as the reservoir size increases compared to the linear energy scaling of the spiking reservoir running on the Intel
Loihi chip. (e) Initial 30k episodes of learning performance of neural reservoirs (n=1024) controlling arms of decreasing backbone stiffness tasked with reaching
through a cluttered nest of obstacles to a fixed target (SI for expanded results). Video of the performance of final trained policies for all stiffness levels is available in
SI Video 6. (f) Timelapse front/side views of a spiking reservoir guiding a soft arm through unstructured obstacles to reach a target (backbone stiffness: 125 kPa).

produce new predictions. These capabilities are then directly Spiking reservoirs on neuromorphic hardware. Here, we
used to improve control in a scenario where the arm intermit- illustrate how our approach is also well-positioned to enhance
tently loses sight of the target, mimicking a faulty sensor or computational energy efficiency, a concern in many robotic ap-
environmental disturbances. For the arm to keep ‘seeing’, the plications (82, 83). To this end, we consider a spiking neural
target’s next location is predicted and fed back into the reservoir reservoir on neuromorphic hardware, demonstrating almost two
in lieu of the missing data. This in turn enables the forecasting of orders of magnitude improvements relative to standard CPUs,
muscle actuations and future tip positions, allowing the blinded while retaining control performance.
arm to continue tracking (Fig. 3e,f). Using this strategy, infer- The use of spiking networks, inspired by cortical dynamics
ence periods of less than one second (interval during which the (29, 84), well-aligns with both RC (27) and specialized neuro-
arm is ‘blind’, approximately equivalent to the target traveling morphic chips (33, 34) that physically implement energy effi-
a distance of 50% the arm’s length) result in no performance cient artificial spiking neurons (85, 86). It is thus only natural
degradation relative to perfect knowledge of the target location. to explore the use of spiking networks within our framework,
If the disturbance lasts beyond one second, then tracking quality also envisioning potential bio-hybrid applications where spiking
begins to decrease, although modestly. Indeed, average perfor- neural tissue may replace synthetic reservoirs.
mance remain comparable to an LSTM network with complete To match algorithmic and hardware infrastructures, we con-
knowledge of the target location (SI). However, the spread of the sider a reservoir of leaky integrate and fire (LIF) spiking neurons
arm performance (violin plot bars in Fig. 3e) increases for large on Intel’s neuromorphic Loihi chip (33) (Fig. 4a, Methods). This
disturbance periods, implying that the arm does occasionally spiking reservoir is challenged to control our soft arm and is com-
lose track of the target while blinded. pared with the LSTM and the non-spiking reservoir approaches
Thus, a mixed strategy where unsupervised learning is of Fig. 1, which were run on traditional CPUs (Intel Xeon).
used for control and supervised learning is employed for self- As can be seen in Fig. 4b, both reservoir-based approaches
modeling, with the second supporting the first, is found to be outperform the LSTM network in the target-tracking test. While
effective in robustifying performance (SI Video 4). It is worth the non-spiking reservoir exhibits slightly better overall perfor-
noting that the inclusion of self-modeling entails minimal com- mance, inspection of the trained policies (SI Videos 1 & 5) shows
putational overhead since all linear maps use the same reservoir. that both approaches are in fact successful in controlling the soft

6/12
arm, which is instead found to frequently collapse (as captured tion. Further, spiking neural reservoirs on neuromorphic hard-
by the larger score spread) in the LSTM case. Further, consistent ware are shown to deliver over 75-fold energy savings relative to
with the non-spiking reservoir, the spiking reservoir exhibits a non-spiking reservoirs, while conserving control performance.
compact performance envelope as the arm softens (Fig. 4c), and Spiking reservoirs are finally employed to drive the arm through
larger reservoirs improve performance (SI Fig. S4). a set of unstructured obstacles, taking advantage of solid bound-
However, the spiking reservoir on the neuromorphic Loihi aries and compliance, a hallmark of mechanical intelligence.
chip provides a clear advantage when energy use is considered The capacity of neural reservoirs to synthesize compliant dy-
(Fig. 4d). Indeed, power consumption is observed to reduce namics presents much potential for soft robotic control, combin-
by up to 75x. Furthermore, and critically, energy use on Loihi ing learning, versatility, adaptivity, and robustness to mechanical
exhibits linear scaling with reservoir size compared to quadratic changes or environmental disturbances. Higher-order situational
scaling for reservoirs deployed on traditional silicon hardware. information (velocities or accelerations) can be extracted with
This implies that larger (thus more capable) spiking reservoirs minimal additional computational costs (parallel maps running
may be used in energy-constrained applications. on the same reservoir), reducing onboard sensing and energy
Environmental interactions. A key advantage of soft robots requirements. This aspect is compounded by the use of spiking
is their ability to comply with the environment without damaging neural reservoirs on neuromorphic hardware, to deliver signifi-
themselves or their surroundings. We explore these boundary cant energy savings, a necessity in many robotic applications.
interactions, and the possibility of taking advantage of them, by These results not only advance soft robotic control, but may
considering the reaching of a target through a set of unstructured also further the use of biological constructs in bio-hybrid robotics
obstacles (Fig. 4e,f). We underscore the steep control challenge and engineering, with both spike-activated cultured muscle tissue
associated with this test, exacerbated by a complex environment, (6–8, 49) and living reservoirs of cultured neural tissue (35)
geometric constraints, and contact dynamics. amenable to integration within this computing paradigm.
In Fig. 4e,f, we show results obtained using a spiking neural
References
reservoir for arms characterized by spines of decreasing stiffness.
As can be seen, all arms successfully learn to reach through 1. L. Ricotti et al., Science robotics 2, eaaq0495 (2017).
the obstacles with softer ones exhibiting faster initial learning, 2. Y. Morimoto, H. Onoe, S. Takeuchi, Science robotics 3, eaat4440 (2018).
ostensibly due to their ability to better conform to obstacles 3. T. Li, S. Takeuchi, Biophysics Reviews 6 (2025).
and push through gaps without precise actuation. However, 4. X. Ren, Y. Morimoto, S. Takeuchi, Science Robotics 10, eadr5512 (2025).
the more precise control associated with stiff arms is found to 5. S. Park et al., Science 353, 158–162 (2016).
eventually achieve better performance (Fig. 4e). In all cases 6. G. J. Pagan-Diaz et al., Advanced Functional Materials 28, 1801145
though, throughout training, arms are observed to extensively (2018).
exploit (instead of avoiding) solid boundaries, leaning against 7. O. Aydin et al., Proceedings of the National Academy of Sciences 116,
them to passively reconfigure and favorably redirect the tip (SI 19841–19847 (2019).
Video 6). This facilitates learning and control, as supported by 8. Y. Kim et al., Science Robotics 8, eadd1053 (2023).
the fact that the target can be reached even by the softest arm (see 9. O. Yasa et al., Annual Review of Control, Robotics, and Autonomous
Video), despite being the most difficult to coordinate (Fig. 2). Systems 6, 1–29 (2023).
Thus, our RC approach is found to naturally recover a hallmark 10. P. Polygerinos et al., Advanced engineering materials 19, 1700016
of mechanical intelligence, whereby passive mechanics and the (2017).
environment are leveraged to simplify the task at hand. 11. M. Cianchetti, C. Laschi, A. Menciassi, P. Dario, Nature Reviews Materi-
Discussion. In this work, we consider the challenging prob- als 3, 143–153 (2018).
lem of learning to control a heterogeneous, soft bio-hybrid arm. 12. G. Chowdhary, M. Gazzola, G. Krishnan, C. Soman, S. Lovell, Sustain-
The arm’s musculoskeletal architecture is designed to encompass ability 11, 6751 (2019).
a range of biomechanically relevant elements and is modeled 13. M. Russo et al., Advanced Intelligent Systems 5, 2200367 (2023).
via assemblies of Cosserat rods. The physical infrastructure is 14. C. Della Santina, C. Duriez, D. Rus, IEEE Control Systems Magazine 43,
matched with a neural infrastructure (the reservoir), to encode 30–65 (2023).
the arm’s sensing and spatiotemporal dynamics. This representa- 15. T. George Thuruthel, Y. Ansari, E. Falotico, C. Laschi, Soft robotics 5,
149–163 (2018).
tion is coupled with reinforcement learning to achieve muscle
coordination and allow the arm to track and reach targets. 16. Z. Chen et al., IEEE Transactions on Automation Science and Engineer-
ing (2024).
Our approach is demonstrated to be effective, outperform-
17. E. Falotico et al., Advanced Intelligent Systems, 2400344 (2024).
ing classic neural network methods. In particular, the synergy
18. G. Mengaldo et al., Nature Reviews Physics 4, 595–610 (2022).
between neural and physical dynamics is observed to produce
control policies that minimize (relative to FF and LSTM) the 19. R. Pfeifer, How the body shapes the way we think: A New View of intelli-
gence, 2006.
arm’s bending and kinetic energies, across a range of material
20. N. T. Ulrich, “Grasping with mechanical intelligence”, tech. rep.
stiffnesses. In contrast, FF and LSTM approaches are found
unable to cope with material property changes. Parallel self- 21. T. Wang et al., Science Robotics 8, eadi2243 (2023).
modeling on the same reservoir is also demonstrated for robust 22. A. J. Ijspeert, Neural networks 21, 642–653 (2008).
control, in the case of unavailable or corrupted sensory informa- 23. P. Ramdya, A. J. Ijspeert, Science Robotics 8, eadg0279 (2023).

7/12
24. O. Aydin et al., APL bioengineering 4, 016107 (2020). 60. H.-S. Chang et al., Proceedings of the Royal Society A 479, 20220593
25. H. Tetsuka, S. Gobbi, T. Hatanaka, L. Pirrami, S. Shin, Science Robotics (2023).
9, eado0051 (2024). 61. T. Wang et al., presented at the 2022 IEEE 61st Conference on Decision
26. L. F. Seoane, Philosophical Transactions of the Royal Society B 374, and Control (CDC), pp. 1059–1066.
20180377 (2019). 62. C.-H. Shih et al., Advanced Intelligent Systems n/a, 2300088.
27. W. Maass, T. Natschläger, H. Markram, Neural computation 14, 2531– 63. N. Charles, R. Chelakkot, M. Gazzola, B. Young, L. Mahadevan, arXiv
2560 (2002). preprint arXiv:2303.15482 (2023).
28. L. Gonon, J.-P. Ortega, IEEE transactions on neural networks and learn- 64. B. Calvo et al., Journal of biomechanics 43, 318–325 (2010).
ing systems 31, 100–112 (2019).
65. C. N. Maganaris, J. P. Paul, In vivo human tendon mechanical properties,
29. M. Luko!evičius, H. Jaeger, Computer science review 3, 127–149 (2009). 1999.
30. W. Maass, P. Joshi, E. Sontag, Advances in neural information processing 66. R. F. Ker, International Journal of Fatigue 29, 1001–1009 (2007).
systems 18 (2005).
67. M. Vatankhah-Varnosfaderani et al., Nature 549, 497–501 (2017).
31. G. Tanaka et al., Neural Networks 115, 100–123 (2019).
68. Z. Tu et al., Nature communications 12, 2916 (2021).
32. H. Jaeger, Bonn, Germany: German National Research Center for Infor-
69. W. L. Lim, L. L. Liau, M. H. Ng, S. R. Chowdhury, J. X. Law, Tissue
mation Technology GMD Technical Report 148, 13 (2001).
engineering and regenerative medicine 16, 549–571 (2019).
33. M. Davies et al., IEEE Micro 38, 82–99 (2018).
70. P. Matthews, Physiological Reviews 44, 219–288 (1964).
34. P. A. Merolla et al., Science 345, 668–673 (2014).
71. M. T. Hagan, H. B. Demuth, presented at the Proceedings of the 1999
35. X. Zhang et al., Advanced Science 11, 2306826 (2024). American control conference (cat. No. 99CH36251), vol. 3, pp. 1642–
36. H. Cai et al., Nature Electronics 6, 1032–1039 (2023). 1656.

37. B. J. Kagan et al., Neuron 110, 3952–3969 (2022). 72. A. Perrusquia, W. Yu, Neurocomputing 438, 145–154 (2021).

38. M. Duduta, E. Hajiesmaili, H. Zhao, R. J. Wood, D. R. Clarke, Proceed- 73. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov,
ings of the National Academy of Sciences 116, 2476–2481 (2019). arXiv:1707.06347 (2017).

39. J. Wang, D. Gao, P. S. Lee, Advanced Materials 33, 2003088 (2021). 74. Y. Yu, X. Si, C. Hu, J. Zhang, Neural computation 31, 1235–1270 (2019).

40. J. Gray, Journal of experimental biology 20, 88–116 (1944). 75. J. Zou, Y. Han, S.-S. So, Artificial neural networks: methods and applica-
tions, 14–22 (2009).
41. W. M. Kier, K. K. Smith, Zoological journal of the Linnean Society 83,
307–324 (1985). 76. J. Pathak, B. Hunt, M. Girvan, Z. Lu, E. Ott, Physical review letters 120,
024102 (2018).
42. B. C. Jayne, Journal of Morphology 197, 159–181 (1988).
77. K. Nagdi, Rubber as an engineering material: guideline for users (Hanser
43. K. Y. Lee et al., Science 375, 639–647 (2022). Verlag, 1993).
44. A. Tekinalp et al., Proceedings of the National Academy of Sciences 121, 78. D. Fish, J. Orenstein, S. Bloom, Circulation research 54, 267–276 (1984).
e2318769121 (2024).
79. A. Di Clemente, F. Maiole, I. Bornia, L. Zullo, Journal of Experimental
45. P. L. Gribble, L. I. Mullin, N. Cothros, A. Mattar, Journal of neurophysi- Biology 224, jeb242644 (2021).
ology 89, 2396–2405 (2003).
80. B. Chen, R. Kwiatkowski, C. Vondrick, H. Lipson, Science Robotics 7,
46. A. D. Koelewijn, A. J. Van Den Bogert, PeerJ 10, e13085 (2022). eabn1944 (2022).
47. X. Zhang, F. K. Chan, T. Parthasarathy, M. Gazzola, Nature communica- 81. Y. Hu, J. Lin, H. Lipson, Nature Machine Intelligence, 1–11 (2025).
tions 10, 1–12 (2019).
82. M. Zhu, O. G. Schmidt, MRS Bulletin 49, 115–124 (2024).
48. G. M. Spinks, N. D. Martino, S. Naficy, D. J. Shepherd, J. Foroughi,
Science Robotics 6, eabf4788 (2021). 83. Y. Sandamirskaya, M. Kaboli, J. Conradt, T. Celikel, Science Robotics 7,
eabl8419 (2022).
49. J. Wang et al., Advanced Intelligent Systems 3, 2000237 (2021).
84. P. F. Dominey, Biological cybernetics 73, 265–274 (1995).
50. S. S. Antman, Nonlinear Problems of Elasticity, 513–584 (2005).
85. B. Rajendran, A. Sebastian, M. Schmuker, N. Srinivasa, E. Eleftheriou,
51. M. Gazzola, L. Dudte, A. McCormick, L. Mahadevan, Royal Society IEEE Signal Processing Magazine 36, 97–110 (2019).
open science 5, 171628 (2018).
86. A. Javanshir, T. T. Nguyen, M. P. Mahmud, A. Z. Kouzani, Neural Com-
52. A. Tekinalp et al., PyElastica, Feb. 2023, (https://doi.org/10. putation 34, 1289–1328 (2022).
5281/zenodo.7658872).
87. H. Herr, R. G. Dennis, Journal of neuroengineering and rehabilitation 1,
53. X. Zhang, N. Naughton, T. Parthasarathy, M. Gazzola, Nature communi- 1–9 (2004).
cations 12, 6076 (2021).
88. J. Van Leeuwen, Journal of theoretical biology 149, 229–256 (1991).
54. A. Tekinalp, Y. Bhosale, S. Cui, F. K. Chan, M. Gazzola, arXiv preprint
arXiv:2401.09506 (2024). 89. J. Van Leeuwen, W. M. Kier, Philosophical Transactions of the Royal
Society of London. Series B: Biological Sciences 352, 551–571 (1997).
55. A. Porat, A. Tekinalp, Y. Bhosale, M. Gazzola, Y. Meroz, Proceedings of
the National Academy of Sciences 121, e2312761121 (2024). 90. R. L. Lieber, C. G. Brown, C. L. Trestik, Journal of biomechanics 25,
421–428 (1992).
56. Y. Bhosale et al., Physical review letters 128, 198003 (2022).
91. L. F. Abbott, Brain research bulletin 50, 303–304 (1999).
57. N. Weiner, Y. Bhosale, M. Gazzola, H. King, Journal of Applied Physics
127 (2020). 92. N. Hansen, S. D. Müller, P. Koumoutsakos, Evolutionary computation
11, 1–18 (2003).
58. N. Naughton et al., IEEE Robotics and Automation Letters 6, 3389–3396
(2021). 93. T. Bekolay et al., Frontiers in Neuroinformatics 7, 1–13, ISSN: 1662-5196
(2014).
59. H.-S. Chang et al., presented at the 2020 59th IEEE Conference on
Decision and Control (CDC), pp. 3913–3920. 94. A. Raffin et al., Journal of Machine Learning Research 22, 1–8, (http:
//jmlr.org/papers/v22/20-1364.html) (2021).

8/12
Acknowledgements where Eq. 1 (lab frame) and Eq. 2 (local frame) represent the
Funding: This study was jointly funded by ONR MURI N00014-19-1-2373 change of linear and angular momentum at every cross-section,
(M.G.), ONR N00014-22-1-2569 (M.G.), NSF EFRI C3 SoRo #1830881 (M.G.),
NSF CAREER #1846752 (M.G.), NSF Expeditions in Computing ‘Mind in Vitro’
respectively, n(s,t) ↔ R3 and & (s,t) ↔ R3 are internal forces and
#2123781 (M.G.), and NSF EFRI BEGIN OI #2422340 (N.N.). Computational couples, respectively, developed due to elastic deformations and
support was provided by the Bridges2 supercomputer at the Pittsburgh Supercom- muscle contractions while f̄(s,t) ↔ R3 and c̄(s,t) ↔ R3 capture
puting Center through allocation TG-MCB190004 from the Extreme Science external forces and couples applied to the arm, respectively.
and Engineering Discovery Environment (XSEDE; NSF grant ACI-1548562).
Author contributions statement: N.N., A.T., K.S., and M.G. conceptualized The above continuous representation is discretized into
and designed the research. N.N., A.T., K.S., and S.H.K. performed the research. (nelem + 1) nodes of position x̄i that are connected by nelem cylin-
All analyzed the data. All wrote the paper. Competing interests: The authors drical elements. Linear displacements are determined by the
have no competing interests to declare. internal and external forces acting at the nodes, while rotations
are accounted for via couples applied to the cylindrical elements.
Energy losses due to internal friction and viscoelastic effects are
Methods captured through use of Rayleigh potentials. The dynamic behav-
ior of a rod is then computed by integrating the discretized set of
Modeling a neuromuscular arm
equations, along with appropriate boundary conditions, in time
We model the muscular arm as an assembly of Cosserat rods,
via a second-order position Verlet scheme (51). In this study,
which are slender, one-dimensional elastic structures that can
we used PyElastica (52), which is an open-source, Python-based
undergo all modes of deformation: bending, twisting, stretching,
implementation of this numerical scheme.
and shearing. Representing the backbone and muscle groups
as Cosserat rods entails a number of advantages. Assembled Assembling a muscular arm.
together they naturally capture the heterogeneity of muscular We assemble our muscular arm by arranging active and pas-
architectures while the Cosserat formulation can be extended sive rods into a representative muscular architecture. The arm
to account for non-linear material properties, connectivity, ac- consists of an elastic backbone (modeled as a single rod) onto
tive stresses, internal pressure, and environmental loads. The which sixteen muscle-tendon units (each modeled as a single
numerical implementation of Cosserat rods is computationally ef- rod) are attached around the outside for a total of seventeen rods.
ficient as they accurately capture large 3D deformations through This mechanical connectivity enables the arm to translate the
a one-dimensional representation, alleviating time-consuming one-dimensional internal contraction forces generated by indi-
remeshing difficulties and compute costs of 3D elasticity. vidual muscles into the three-dimensional dynamic motions of
Dynamics of a single Cosserat rod. the arm as a whole. A description of the arm’s geometry and
An individual Cosserat rod is described by its center line posi- material parameters is provided in the SI.
tion x̄(s,t) ↔ R3 and an oriented reference frame of (row-wise) Modeling the elastic backbone. The backbone is modeled as
! "↗1 a constant cross-section, elastic rod. For a linear elastic mate-
orthonormal directors Q(s,t) ↔ R3↑3 = d̄1 , d̄2 , d̄3 along its
rial such as the backbone, the internal forces n = [n1 , n2 , n3 ]
length s ↔ [0, L(t)], where L(t) is the current length, for all time
are proportional to the shear strain of the rod n = S## where
t ↔ R ≃ 0. Any vector defined in the global lab frame (v̄) can
S = diag (∋c GA, ∋c GA, EA) is the rod’s shear/stretch stiffness
be transformed into the local frame (v) via v = Qv̄ and from
matrix, E is the rod’s Young’s modulus, G is the rod’s shear
the local to the lab frame via v̄ = QT v. For an unshearable
modulus, and ∋c is the Timoshenko shear correction factor (51).
and inextensible rod, d̄3 is parallel to the rod’s local tangent
We additionally model internal elastic torques & using the linear
(∀s x̄ = x̄s ), and d̄1 (normal) and d̄2 (binormal) span the rod’s
relationship & = B! ! where B ↔ R3↑3 = diag (EI1 , EI2 , GI3 ) is
cross-section. However, under shear or extension, the rod’s
the rod’s bend/twist stiffness matrix, and I1 , I2 , and I3 are the
tangent direction x̄s and d̄3 are no longer the same, with # the dif-
$ rod’s second moments of inertia about d̄1 , d̄2 , and d̄3 , respec-
ference represented by the shear strain vector # = Q x̄s ↗ d̄3 .
tively. Finally, zero-displacement and zero-rotation boundary
The gradient of the directors (d̄ j ) with respect to the rod’s length
conditions are defined on the node and element of the backbone
is defined by the curvature vector !¯ (s,t) ↔ R3 through the rela-
nearest to the base, respectively, to anchor the arm.
! = Q!¯ ), the components
tion ∀s d̄ j = !¯ ↑ d̄ j . In the local frame (!
Modeling the muscle-tendon unit. We model muscle-tendon
of the curvature vector ! = [!1 , !2 , !3 ] relate to bending (!1
units based on frog semitendionous muscle, in line with previous
and !2 ) and twisting (!3 ) of the rod. Similarly, the gradient
bio-hybrid demonstrations (87). Biological muscles actively
of the directors with respect to time is defined by the angular
generate internal forces that cause them to axially contract while
velocity vector ∃¯ (s,t) ↔ R3 through the relation ∀t d̄ j = ∃¯ ↑ d̄ j .
both muscle and tendon tissues exhibit hyperelastic passive be-
The linear velocity of the center-line is v̄(s,t) ↔ R3 = ∀t x̄ while
havior when stretched. Both effects render a linear stress-strain
the second area moment of inertia I(s,t) ↔ R3↑3 , cross-sectional
treatment of the Cosserat rod’s axial stretch inaccurate. Instead,
area A(s,t) ↔ R, and density %(s) ↔ R are defined based on the
we model the axial component (n3 ) of the internal force vector n
rod’s material properties. The dynamics of a Cosserat rod are
by considering the active (a (s,t) ↔ R and passive ( p (s,t) ↔ R
then described as (51)
axial stresses developed along the muscle n3 = A((am + ( pm ) and
# $
∀t2 (%Ax̄) = ∀s QT n + f̄ (1) the passive response of the tendon n3 = A(( pt ) while modeling
its shear components (n1 , n2 ) using the above presented linear
∃ ) = ∀s & + ! ↑ & + (Qx̄s ↑ n) + (%I∃
∀t (%I∃ ∃ ) ↑ ∃ + Qc̄ (2) elastic formulation. Internal elastic torques are represented us-

9/12
ing a similar approach as the elastic backbone, though we note normal and binormal directions
that the non-linear effects of the muscle’s axial stretch extends % di+1
to encompass angular momentum via the third term in Eq. 2. L di ! (s,t)ds
K i (t) = ⇐i ↔ [0..3] (4)
Material properties (provided in the SI) of the muscle-tendon 2) di+1 ↗ di
unit are based on the sliding-filament model of muscle (88, 89)
and experimental measurements (90). where ! (s,t) ↔ R2 is the local normal and binormal curvature
Patterning muscle-tendon units onto backbone. We pattern of the arm, d = {kL/4|k ↔ [0..4]}, and L is the length of the arm.
the sixteen muscle-tendon units onto the elastic backbone. Units This results in a state sn = [x̄t (t), K (t)] ↔ R3+8 . The reward is
are arranged in four layers. Each layer consists of four units, the square of the Euclidean distance between the tip of the arm
organized in two agonist-antagonist pairs oriented orthogonal x̄(L,t) and the 3D moving target x̄t (t) integrated over the control
to each other. Neighboring pairs overlap lengthwise by 50% time step
& t+!t
while being rotated 45° relative to each other to avoid intersect- rn = ||x̄(L,t ⇒ ) ↗ x̄t (t ⇒ )||2 dt ⇒ (5)
ing. Muscle-tendon units are arranged on the elastic backbone’s t
surface such that the muscle-tendon unit’s center-line lies one The integration of the reward over the time step ensures the tip
(backbone) radius away from the backbone’s center-line. The of the arm continuously follows the target trajectory over the
arm in its rest configuration presents no muscle activation or entire time step. Each episode is terminated after 100 seconds.
residual stresses.
Unstructured nest environment
To join the muscle-tendon units to the backbone, we define
In Fig. 4, the arm is challenged to reach through an unstruc-
zero-displacement boundary conditions at the ends of the muscle-
tured nest of rigid cylindrical obstacles towards a stationary
tendon unit relative to point on the outer surface of the backbone
target. Here, the target and nest are stationary and fixed in the
to model to first order the enthesis point where tendon and bone
same location for all episodes. Between the target and the arm’s
join together. The location of these connections a radial distance
initial configuration, an unstructured nest of cylindrical obsta-
from the backbone’s center-line then generates a bending torque
cles is placed, which the arm must navigate through to reach
on the backbone then the muscle-tendon unit linear shortens
the target on the other side. The location of the obstacles is
when activated. Beyond these primary connections, we also
the same for all episodes and each episode is terminated after 5
implement distributed displacement-force boundary conditions
seconds. External contact forces between the arm and the nest
to lightly adhere the muscle-tendon unit to the backbone during
are implemented (described in SI) as well as contact between the
bending motions (44, 47), analogous to the effect of connective
arm and the floor, allowing the arm to interact with its external
fascia tissue that helps maintain muscle organization.
environment. The state, action, and reward of this environment
are the same as with the 3D tracking environment, but here suc-
3D tracking environment cessful control of the neuromuscular arm requires addressing
The muscular arm defined above is challenged to track a a more complex physical environment by learning to interact
target moving in 3D space along to a smoothly varying, random with and account for the presence of solid obstacles that restrict
trajectory. The target trajectory x̄t (t) = [x̄1 , x̄2 , x̄3 ] is defined at reaching.
the beginning of each episode with each spatial component of
the trajectory generated according to Neural reservoir formulation
We consider two formulations of a neural reservoir: artifi-
x̄ j (t) = b j sin( f j,1 2)t) sin( f j,2 2)t) sin( f j,3 2)t) ⇐ j ↔ [1..3] cial neuron reservoirs and spiking neuron reservoirs. Artificial
(3) neuron reservoirs consist of a discrete-time, recurrent network
where f j,i ↔ R is a randomly selected constant from the interval of artificial neurons typically used in machine learning applica-
[0.5, 1.0] drawn at the beginning of each episode and b j is a tions while spiking neuron reservoirs utilize a continuous-time,
velocity scaling constant, whose sign is also randomly selected spike-based approach.
each episode. This structure of Eq. 3 results in a smoothly Neural reservoir with artificial neurons
varying trajectory of the target while the random selection of f j,i Neural reservoirs using artificial neurons are implemented
each episode renders it not possible for the controller to simply using an Echo State Network (ESN) approach (32). The reservoir
memorize a target trajectory. The state and muscle activations consists of a large pool of n sparsely and recurrently connected
of the arm are updated at a 4 Hz control frequency (!t = 250 artificial neurons. Recurrent connections within the reservoir
ms), leading to a mapping between the physical time t of the are represented using the reservoir connection matrix Wr ↔ Rn↑n
simulation environment and the control time step number n while connections into the reservoir from the input state (of size
of n = [4t]. At each timestep, muscle activations are directly s) are represented using the input connection matrix Wi ↔ Rn↑s .
mapped from the action space an = [ai ] ↔ R16 , ai ↔ [↗∀, ∀] to an The state of the reservoir unr at time step n is then
individual muscle activation level ∋i (t) = (tanh(ai (t)) + 1)/2 ↔
[0, 1] which is applied for 0.25 seconds (until the next control unr = f (Wi sn +Wr un↗1
r ) (6)
time step). The arm’s state consists of the current 3D target
location x̄t (t) and the current averaged normalized curvature of where f (·) is a non-linear activation function. Here, a hyper-
the arm K i (t) over i = 4 equidistant intervals in the rod’s local bolic tangent function (TanH) is used. Notably, the connection

10/12
weight matrices are fixed at initialization. No adjustment to their count is then treated as the output of the reservoir for that control
weights is made during the learning process. Thus, their proper time step. This approach then enables the same RL algorithms
initialization is critical to the information processing capability as used to train the artificial neural reservoir to be employed to
of the reservoir. Following the ESN approach, both the reservoir learn the output weights Wo to control the arm.
connection matrix Wr and the input connection matrix Wi are ran- This neural reservoir setup was deployed on Intel’s Kapoho
domly initialized from a normal distribution with zero mean, a Bay device (33), which contains 2 Loihi chips. The reservoir
standard deviation of 0.50, and a sparse connection density % of was defined using the Nengo Python package (93) and its Nen-
0.10. The reservoir connection matrix is then normalized based goLoihi extension. The input spike conversion layer and output
on its spectral radius ∗ (the magnitude of its largest eigenvector) spike counting layer are implemented off-chip, with the com-
to ensure the echo state property, which is that information from munication between the CPU host and the Loihi chip occurring
initial conditions will asymptotically disappear over time. To only via spike communication.
achieve the echo state property it is generally sufficient (though
not necessary) to enforce ∗ < 1 (32). Energy efficiency
The mapping from the reservoir state unr to the output (in this The energy usage of neural reservoirs consisting of either
case action, with size j) is determined according to the linear artificial neurons or spiking neurons was analyzed by measuring
output mapping Wo ↔ R j↑n according to the average energy consumption for each inference, or control
time step. For a neural reservoir with artificial neurons, this
an = Wo unr (7) refers to a single evaluation of the system defined in Eq. 6 while
for spiking neurons it refers to the presentation of the input state
The linear map Wo can be learned using a variety of different for a time !. The energy consumption of the artificial neural
algorithms. For the control policy, the unsupervised reinforce- reservoir was measured using the pyRAPL Python package on an
ment learning Proximal Policy Optimization (PPO) algorithm Intel Xeon W-2265 CPU to identify the running power with idle
was used while for the parallel output maps of Fig. 3, supervised power subtracted. Energy usage on Loihi was measured using
learning algorithms such as ridge regression were used. utilities provided by Intel’s NxSDK API for Loihi to similarly
measure running power.
Neural reservoir with spiking neurons
To implement a neural reservoir using spiking neurons, we
Reinforcement learning integration
use leaky-integrate and fire (LIF) neurons (91) whose dynamics
To learn the mapping from the neuromuscular arm’s state to
are governed by the differential equation
the action required to control it, reinforcement learning (RL)
dv is used. We employ the state-of-the art, model-free, on-policy
&m = (vr ↗ v(t)) + RI(t) (8) Proximal Policy Optimization (PPO) algorithm to optimize Wo
dt
(73) based on the open-source Stable Baselines 3 implementa-
where &m is the membrane constant of the RC circuit simulating tion (94). When used with feed-forward and LSTM network
voltage leakage, R is the resistance, I(t) is the input current, v(t) architectures, the environment state sn was used as the input
is the output voltage, and vr is the reset membrane potential. state into the artificial neural network in the usual manner. When
When v(t) exceeds some threshold voltage level (v(t) > vt ), a integrated with the neural reservoir, the current reservoir state
spike in voltage occurs at time t followed by the neuron returning unr is used as the input state to the PPO algorithm, with the PPO
to vr which it is held at for some refractory time &r . algorithm then tasked with learning the single layer, linear map-
We begin by converting the discreet time, continuous state ping Wo from the reservoir state to the optimal muscle action.
input of the soft muscular arm to a continuous time, spike-based When a neural reservoir consisting of artificial neurons is used,
input compatible with a spiking neural reservoir. Each state vari- the reservoir state is directly used as the input into the PPO
able is presented as current into two LIF input neurons for a time algorithm while for the spiking neural reservoir, output spikes
of ! (which we decouple from the control time !t). These two are accumulated and counted over each control time step to de-
neurons have opposite tuning curves such that one will positively termine an analogous reservoir state that can be then directly
rate encode increasing values while the other will rate encode used by the PPO algorithm.
decreasing values. The resulting spike trains are multiplied by
input weights Wi and fed into the spiking reservoir. The input Training and evaluation
weights Wi are randomly drawn from a uniform distribution be- All network architectures were trained using the same ap-
tween -1 and 1. The spiking neural reservoir consists of a set of proach. For the 3D tracking case (Figs. 1, 2, 4b,c), the policy
N LIF neurons recurrently connected by a sparse set of weights was trained using 120 parallel environments. The policy was
Wr . The recurrent connection weights have a density of 0.1 and updated every 4800 time steps (1200 seconds at 4 Hz), resulting
are drawn from the normal distribution N (0.0, 0.52 ). LIF neu- in each environment collecting ten seconds of data between up-
ron properties such as refractory period and membrane voltage dates. Each environment was reset every 100 seconds with a new
decay as well as reservoir connection properties were optimized trajectory generated and the arm initiated in a straight configu-
using a CMA evolutionary search algorithm (92) as detailed in ration. Policies were trained for either two or four million time
the SI. For each control time step, the number of spikes over the steps and five separate policies with random initial seeds were
time interval ! is counted for each reservoir neuron. This spike trained for each case. Trained policies were evaluated on new,

11/12
unseen trajectories for up to 100 seconds (400 time steps). Hy-
perparameter tuning for all algorithms was performed as detailed
in the SI. All training and hyperparameter tuning was performed
on the Pittsburgh Supercomputing Center’s Bridges 2 cluster.
For the spiking neural reservoir, only one Kapaho Bay de-
vice was available, so NengoLoihi’s emulation environment,
which emulates Loihi hardware on CPUs, was used to train
using the same 120 parallel environment approach for the 3D
tracking case (Fig. 4b,c). For the unstructured obstacle case
(Fig. 4e,f), 120 parallel environments were also used (using the
NengoLoihi emulator), resulting in each environment having
a five-second episode, however, environments were then reset
after each episode. All training using the Loihi emulation en-
vironment was performed on the Pittsburgh Supercomputing
Center’s Bridges 2 cluster. Trained policies were then evaluated
on Loihi hardware to determine performance and energy usage.
All evaluation and visualization results reported in Fig. 4 and in
the SI videos were run on Loihi hardware.

Kinetic and bending energy measurements


To report the kinetic and bending energy measurements in
Figure 2b,c, the average kinetic and bending energy of the arm’s
backbone over an episode of length T was measured as
%T
0 ei (t) dt
ēi = %T (9)
0 dt
where ei ↔ {ek , eb }. Here ek is the kinetic energy of the backbone
and eb is the elastic bending energy of the rod. The kinetic energy
of the rod at each time is
& L
1
ek (t) = %A(vv · v ) ds (10)
2 0

where % is the local density, A is the local cross-sectional area,


and v is the local velocity. The bending energy of the rod at each
time is &
1 L
eb (t) = ! · & ds (11)
2 0
where ! is the local curvature of the rod, & = B! ! is the local in-
ternal elastic torque vector, and B is the rod’s bend/twist stiffness
matrix. Arm dynamics from trained policies were recorded for
100 unique episodes (100 seconds each) and the average energy
of each episode was calculated to determine the distribution of
values reported in Fig. 2b.

Training of parallel maps


Figure 3 shows the use of parallel maps to increase the infor-
mation extracted from the neural reservoir. To generate these
parallel maps, an RL policy trained to track a moving target was
recorded for 100 seconds over 400 different target trajectories.
The state, action, reservoir state, and environment were recorded.
These data were then used as the training data (using an 80/20
training/testing data split) for parallel maps using a supervised
learning method via ridge regression. For target correction (Fig.
3d), the parallel map was evaluated every time step to predict
the 3D position the target would be one time step in the future
and used in place of the target’s true position in the state vector
as required.

12/12

You might also like