0% found this document useful (0 votes)
8 views9 pages

IJCNN10 A

This paper presents a spiking neural network architecture that learns to control a 4 degree-of-freedom robotic arm using Spike Timing-Dependent Plasticity (STDP) after an initial motor babbling phase. The network autonomously maps input stimuli related to the arm's joint configuration and intended movement direction to motor commands, employing biologically realistic neuron models. The approach demonstrates effective learning and control of the robotic arm's movements in three-dimensional space, validated through testing on a kinematic model of the iCub humanoid robot.

Uploaded by

hoangkahn123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

IJCNN10 A

This paper presents a spiking neural network architecture that learns to control a 4 degree-of-freedom robotic arm using Spike Timing-Dependent Plasticity (STDP) after an initial motor babbling phase. The network autonomously maps input stimuli related to the arm's joint configuration and intended movement direction to motor commands, employing biologically realistic neuron models. The approach demonstrates effective learning and control of the robotic arm's movements in three-dimensional space, validated through testing on a kinematic model of the iCub humanoid robot.

Uploaded by

hoangkahn123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224181224

Training a Spiking Neural Network to Control a 4-DoF Robotic Arm based on


Spike Timing-Dependent Plasticity

Conference Paper · August 2010


DOI: 10.1109/IJCNN.2010.5596525 · Source: IEEE Xplore

CITATIONS READS

104 687

2 authors, including:

Murray Shanahan
Imperial College London
174 PUBLICATIONS 7,780 CITATIONS

SEE PROFILE

All content following this page was uploaded by Murray Shanahan on 19 May 2014.

The user has requested enhancement of the downloaded file.


WCCI 2010 IEEE World Congress on Computational Intelligence
July, 18-23, 2010 - CCIB, Barcelona, Spain IJCNN

Training a Spiking Neural Network to Control a 4-DoF Robotic


Arm based on Spike Timing-Dependent Plasticity
Alexandros Bouganis and Murray Shanahan

Abstract— In this paper, we present a spiking neural network


architecture that autonomously learns to control a 4 degree-of-
freedom robotic arm after an initial period of motor babbling.
Its aim is to provide the joint commands that will move the
end-effector in a desired spatial direction, given the joint con-
figuration of the arm. The spiking neurons have been simulated
according to Izhikevich’s model, which exhibits biologically
realistic behaviour and yet is computationally efficient. The
architecture is a feed-forward network where the input layers
encode the intended movement direction of the end-effector in
spatial coordinates, as well as the information that is given
by proprioception about the current joint angles of the arm.
The motor commands are determined by decoding the firing
patterns in the output layers. Both excitatory and inhibitory
synapses connect the input and output layers, and their initial
weights are set to random values. The network learns to map
input stimuli to motor commands during a phase of repetitive
action-perception cycles, in which Spike Timing-Dependent
Plasticity (STDP) strengthens synapses between neurons that
are correlated and weakens synapses between uncorrelated
ones. The trained spiking neural network has been successfully
tested on a kinematic model of the arm of an iCub humanoid
robot.

Fig. 1. The iCub.


I. I NTRODUCTION

I N this work, we present a neural network architecture that


autonomously learns to control a four degree-of-freedom
robotic arm in the three dimensional space. The problem of
controlling a robotic arm has attracted the attention of many in the spatial domain are perceived and associated. Outstar-
researchers in the past, with a common assumption being learning [3] is used in the DIRECT model for training the
that the kinematic model of the arm is a priori known. This network and modifying the synaptic weights. More recently,
assumption enabled researchers to either introduce analytical Asuni et al. [4] were inspired by this approach and proposed
methods, which offer exact solutions for simple kinematic a neural network, also based on outstar learning, that aims
chains, or propose solutions based on numerical methods. to control a robotic head for gazing points in the 3D space.
Recently, there has been increasing interest in developing
The present work is also influenced by the DIRECT model.
methods that do not assume any a priori knowledge for
An important feature of the proposed network is that it con-
the arm’s kinematic model, but its kinematic properties are
sists of individual spiking neurons that exhibit realistic be-
derived through a learning procedure.
haviour and uses a biologically plausible learning mechanism
Bullock et al. [1] presented the DIRECT (Direction-to-
for modifying the synaptic weights, namely Spike Timing-
Rotation Effector Control Transform) model which is a
Dependent Plasticity (STDP). Spiking neural networks are
self-organizing neural network that learns, during a motor
considered to be biologically realistic and many researchers
babbling period, the mapping between joint commands and
have lately used them for coordinate transformations ( [5],
the resulting spatial displacements of the end-effector. (Motor
[6]), object segmentation [7], visual pattern recognition [8],
babbling can be observed in babies, where a repetitive action-
etc.
perception cycle generates associative information between
the various representations.) Action is generated through The present paper is organized as follows. Section II
the Endogenous Random Generator (ERG) [2], which sends presents background information on controlling a robotic
random motor commands, and the results of these actions arm. Section III introduces the spiking neural network,
the model of the spiking neuron, and the STDP learning
Alexandros Bouganis and Murray Shanahan are with the Department of
Computing, Imperial College London, UK (email: {alexandros.bouganis, mechanism. Experiments are presented in section IV, while
m.shanahan}@imperial.ac.uk). the conclusions are given in section V.

978-1-4244-8126-2/10/$26.00 c 2010 IEEE 4104


II. R EACHING W ITH A ROBOTIC A RM and indicates the effect that a small change in its angle has
Many tasks in robotics require the robot’s end-effector to on the spatial position of the end-effector. The change in
move between two points in space. The issue that arises, the spatial position of the end-effector can be computed by
however, is that while the task is naturally represented in the sum of the individual changes in its position by each one
cartesian coordinates (world coordinates), the robot is only degree of freedom. Given that our aim is to compute the joint
able to control its arm through the motors and perform commands ϑ̇ that will result in moving the end-effector in
the task in the joint space. The computation of the joint the desired spatial direction ė, we can use eq.(1) and solve
coordinates that result in a desired spatial position of the with respect to ϑ̇:
end effector is called inverse kinematics, and the problem is
ill-posed when considering arms with redundant degrees of ϑ̇ = J (ϑ) · ė (3)
 −1 T
freedom since there could be multiple solutions. A simple J (ϑ) = JT J J (4)
way to move the end-effector to its target position can
be achieved by considering each joint independently, and where J and JT denote respectively the pseudo-inverse and
incrementing its angle accordingly towards its final value at the transpose of the Jacobian at the joint configuration ϑ.
the target configuration of the arm. However, this approach Thus, the approach is to (i) decompose the path that the
is not competent for most applications as it cannot force the end-effector should follow into small steps, (ii) compute at
end-effector to follow a desired path between the initial and each step the spatial direction for the next movement, (iii)
the target position. A synergy of angle changes at joints is compute the Jacobian at the current joint configuration (since
required. This issue can be resolved though by taking many its value is only valid locally in the joint space), (iv) compute
intermediate points along the desired path of the end-effector, the pseudo-inverse of the Jacobian, and finally, (v) compute
and using these points as “mid-term targets”. The sequence the joint commands ϑ̇ according to eq.(3). The linearity of
of the joint angles that fix the end-effector at the intermediate eq.(1) ensures that the linear combination of known solutions
points can then be followed, forcing the end-effector to trace will also give a valid solution. Also, the fact that the solution
the desired path. As discussed in [1], this approach has found emerges by computing the small increments in the
two main shortcomings. Since there could be multiple joint joint angles ensures the continuity in the joint space along
configurations that can result to the same spatial position of the path.
the end-effector, it is possible to have discontinuity in the In this paper, we present a spiking neural network that ex-
joint space, and encounter, for example, two non-adjacent hibits the same functionality as the approach described above,
angles between consecutive steps along the path for the same and consists of spiking neurons that have been modeled by
joint. Moreover, due to the non-linearity of the mapping Izhikevich’s equations [10]. This neuron model is chosen
function between joint angles and spatial position of the end- as it facilitates efficient simulation of real neurons with
effector, the linear combination of solutions is probably not biologically realistic behaviours. The network autonomously
a solution itself. learns to control a robotic arm with four degrees-of-freedom
To overcome these issues, we can follow an alternative in 3D space (we are interested only in the position, and
approach, and instead of computing the joint angles that not orientation, of the end-effector) during an initial period
result in the desired spatial position of the end effector, we of motor babbling, using STDP ( [11], [17]) to strengthen
can use small steps towards the target position and compute or weaken the synaptic weights between sensory and motor
at each step the joint velocities that move the end-effector in neurons which are originally randomly set. During the period
the desired spatial direction. More specifically, let us assume of the action-perception cycle, proprioception stimulates sen-
that the manipulator has M degrees of freedom and the joint sory neurons and encodes into the network the current joint
angles are denoted by ϑ = [ϑ1 , ϑ2 , . . . , ϑM ]. Let us also configuration. The Endogenous Random Generator (ERG)
assume that the spatial position of the end-effector is rep- randomly stimulates motor neurons and the resulting joint
resented by the N -dimensional vector e = [e1 , e2 , . . . , eN ]. commands move the end-effector in a certain spatial direc-
If we assume that the current joint configuration is given tion which is observed and encoded into the network. The
by ϑ, and the joint velocities ϑ̇ result in the end-effector’s temporal correlation between neuronal firings is extracted
spatial velocity ė, then: by the Spike Timing-Dependent Plasticity mechanism which
ė = J(ϑ) · ϑ̇ (1) modifies the synaptic weights so that if the robotic arm is
⎡ ∂e1 ∂e1 ∂e1 ⎤ at the pose just learned and the end-effector should move in
∂ϑ1 ∂ϑ2 ... ∂ϑM the same spatial direction, then the motor neurons that were
⎢ ∂e2 ∂e2
... ∂e2 ⎥
⎢ ∂ϑ1 ∂ϑ2 ∂ϑM ⎥ originally activated by ERG, should now be stimulated by
J(ϑ) = ⎢ .. .. .. .. ⎥ (2)
⎣ . . . . ⎦ the current they receive from the input neurons. Currently,
∂eN ∂eN
... ∂eN the spatial position of the end-effector in the training stage
∂ϑ1 ∂ϑ2 ∂ϑM
is computed by solving forward kinematic equations. Future
where J denotes the Jacobian and its value depends on the work will substitute this by including a visual pathway in the
joint configuration of the arm. As can be seen, each column network. Further details about the network and the learning
of the Jacobian corresponds to a single joint of the robot mechanism are given in the following section.

4105
III. T HE S PIKING N EURAL N ETWORK ϑ1 ϑ2 ϑ3 ϑ4 e1 e2 e3

The proposed neural network consists of spiking neurons


which are organized into seven input layers and four output
layers, as shown in Figure 2.1 We will denote for the rest
of the paper the i-th input layer and the j-th output layer
by Linput and Loutput
plastic, all-to-all
i j , respectively. We use a population of
1200 neurons for each input layer, and a population of 800
neurons for each output layer. Four of the input layers Linput
i=1:4 ϑ1 ϑ2 ϑ3 ϑ4
encode the information that is given by proprioception, and
the firing pattern at each one of them indicates the angle at
the respective joint. The four joints of interest are located at Fig. 2. Architecture of the feed-forward network. The network includes
the shoulder (roll, pitch and yaw) and the elbow of the arm, seven input layers, where four of them encode information given by
proprioception and the remaining three encode the spatial direction of the
with their ranges being [−75o , −15o ], [15o , 75o ], [−10o , 50o ] end-effector. The input layers are connected all-to-all with plastic synapses
and [15o , 75o ]. The network encodes these angles after dis- to four output layers which represent the motor commands to the joints. A
cretizing them into bins with 5o resolution. The remaining bell-shaped distribution models the mean firing rate in the encoding scheme.
three input layers Linput
i=5:7 represent the spatial direction that
the end-effector should move at the next time step, with each
layer encoding the projection of the 3D directional vector layers in the training phase, the reverse process of decoding
to one of the world axes. The directions encoded are also firing patterns from the output layers to motor commands
discretized using a 45o resolution, resulting in 26 possible is important in the “performance period”. A voting scheme
movements of the end-effector from its original position. The is adopted [12], according to which the “central neuron” is
input layers are connected all-to-all with the output layers, given by:
n
with the firing pattern of each output layer representing the F (x)x
motor command that is provided to a single joint. Each x=1
neuron in the input layers is connected with an excitatory and r= n (6)
inhibitory synapse to each output neuron. This is equivalent F (x)
to representing a single input neuron by a pair of two highly x=1
correlated neurons, one excitatory and one inhibitory. All while the normalized motor command is given by f −1 (r).
synapses are plastic, which means that their original random We should now discuss two issues that arise due to the
weights are modified during the motor babbling period under representation scheme adopted and the nature of the task.
STDP. As has been described, we have seven independent layers
A common assumption made is that neuronal activity of neurons which represent the joint configuration of the
patterns represent a single value per variable at any given arm and the intended spatial direction of movement of the
time [12]. Biological evidence also supports that the activity end-effector, as well as four layers of motor neurons that
level of a population of neurons is characterized by tuning control the four joints. In the task of controlling the robotic
curves, typically bell-shaped, which describe the mean firing arm, it is highly likely to encounter two joint configurations
rates of neurons based on the value of the represented vari- ϑ1 and ϑ2 that differ only in the angle of a single joint,
able. The peak of that curve indicates the “central neuron” and the movement of the end-effector in the same spatial
which exhibits the highest sensitivity for a given value of the direction at the two configurations requires two different sets
variable [13]. In this work, a Gaussian distribution is used to of motor commands. In terms of the spiking neural network
model the tuning curves in the neuronal layers. Let us assume this entails that the stimulation in the input layers of two
that a population of n neurons represents a variable θ, whose neuronal populations, which largely overlap and differ only
domain is [0, 1] after normalization. If f : [0, 1] → [1, n] by a small subset of firing neurons, should be able to result in
indicates the neuron that exhibits the highest activity for a the activation of two different sets of motor neurons. This is
specific value of the variable, then the distribution of the autonomously accomplished by the proposed network, which
firing rate is given by: includes all-to-all connections between the input and the
(x−f (θ0 ))2 output neurons, and modifies the initial synaptic weights
F (x) = Fmax · e− 2σ 2 (5) through the STDP learning mechanism. STDP manages to
strengthen (weaken) the inhibitory (respectively, excitatory)
where σ denotes the standard deviation, Fmax is the maxi-
synapses between uncorrelated sensory and motor neurons,
mum firing rate, and F (x) expresses the firing rate of neuron
and have the opposite effect on synapses between correlated
x when the normalized value θ0 is encoded. While the
neurons. A balance is also important between the maximum
process of encoding the value of a variable into spike trains
allowable inhibitory and excitatory synaptic weights with
is important for the input layers, as well as for the output
respect to the minimum number of firing neurons that can
1 In this paper, we use the term “layer” when referring to a group of differ between two potential input patterns. To see the second
input/output neurons. issue that arises, let us assume that during the motor babbling

4106
Training Period Loutput
i=1:4 (ϑ̇1 ) is erroneously activated because the sets of
Current firing neurons Linput input
i=1:4 (ϑ1 ) and Li=5:7 (ė2 ) are individually
shown to be good predictors for Loutput i=1:4 (ϑ̇1 ), according to
Cartesian
Coordinates

Motor ϑt et the first and third pattern under learning. To address this
ERG
Commands
Forward
issue, we modify the population vector scheme and use many
ϑt+1 et+1
+ Kinematic - “bins” of neurons to represent a single value of a variable,
ϑt Equations
Proprioception et which means many possible “central neurons”. In this way,
ϑt
even when firing patterns have the above characteristic, the
erroneous firing in the output layers can be avoided when at
least one of the four central neurons representing Linputi=1:4 (ϑ1 )
is different in the first and second pattern, or Linputi=5:7 (ė2 ) in
SNN
the second and third pattern.
The aforementioned issues would not have been encoun-
Input Output tered if were following an alternative approach to represent-
Layers Layers
ing the input patterns. As has been discussed, we use N
STDP
independent neuronal layers to represent N variables, with
the population of neurons in each layer encoding the value
of a single variable. An alternative representation scheme
that would not cause the issues discussed above would be
to use a single N -dimensional array of neurons, where each
Performance Period instance of input pattern (i.e., N -tuple) would be represented
SNN
by stimulating a unique set of neurons. This representation
desired et however has the important drawback of poor scalability, since
spatial
direction Input Output
Motor
Commands
ϑt the population of neurons required increases exponentially
Proprioception
Layers Layers
with the number of variables represented. In particular, even
ϑt
if we had just 10 neurons representing a single variable, then
this scheme would necessitate the use of 100000 neurons for
5 variables, and ten times this number if we were adding just
a single variable. It is thus evident that such a representation
can only be considered when the number of variables is
Fig. 3. A diagram of the system during the training and the performance small, and is not suitable for our task.
period.
A. Neuron Model
Many models have been proposed in the literature in
period, the network has to learn that when the arm lies on an attempt to simulate the behaviour of real neurons. An
the joint configuration ϑ1 , the motor command vectors ϑ̇1 influential model was proposed by Hodgkin and Huxley [9]
and ϑ̇2 move the end-effector in the spatial directions ė1 and who translated their experimental observations on the giant
ė2 , respectively. This means that the synaptic weights should axon of the squid into a set of nonlinear ordinary dif-
adapt so that, the simultaneous stimulation of Linput i=1:4 (ϑ1 ) ferential equations. While their model is considered to be
and Linput
i=5:7 1(ė ) will result in the activation of Loutput
i=1:4 (ϑ̇1 ).
2
biophysically accurate, their simulation is computationally
Similarly, the simultaneous stimulation of Linput i=1:4 (ϑ1 ) and expensive. An alternative model is based on integrate-and-
Linput output
i=5:7 (ė2 ) should result in the activation of Li=1:4 (ϑ̇2 ). fire neurons which carry much less computational burden.
While this can be learned through STDP, a conflict is The shortcoming of this model however is its inability to
encountered if the network is subsequently called to learn reproduce the rich dynamics exhibited by cortical neurons.
that, when the arm rests in the new joint configuration ϑ2 , In this work, we simulate the individual neurons according
the motor command vector ϑ̇1 moves the end-effector in to Izhikevich’s “simple model” [10]. This model preserves
the spatial direction ė2 . That is, the simultaneous stimulation the biologically realistic behaviour exhibited by the Hodgkin-
of Linput input
i=1:4 (ϑ2 ) and Li=5:7 (ė2 ) should activate the neurons Huxley model, and at the same time is computationally effi-
output
in Li=1:4 (ϑ̇1 ). If the synaptic weights in the network are cient as the integrate-and-fire model. The low computational
modified to incorporate the last pattern, it is clear that the cost is especially important when it comes to simulate large
stimulation of Linput input
i=1:4 (ϑ1 ) and Li=5:7 (ė2 ) would erroneously networks. The efficiency of the model relies on the fact that
result in the activation of both neuron sets Loutput i=1:4 (ϑ̇1 ) and it uses only two equations and has only one non-linear term.
Loutput
i=1:4 (ϑ̇ 2 ), while it should only activate L output
i=1:4 (ϑ̇2 ), as
In particular, the equations describing the model are given
given by the second training pattern. The output firing pattern by:

2 L (θ )
v̇ = 0.04v 2 + 5v + 140 − u + I (7)
i 0 denotes the set of neurons in the i-th layer which represent the
value θ0 . u̇ = a(bv − u) (8)

4107
0.05
with after-spike resetting:
0.04
v←c
if v ≥ 30 then (9)
u←u+d 0.03

where v denotes the neuron’s membrane potential, I is the 0.02

input current, and u is the variable that determines the 0.01

ΔS
recovery period of the neuron after spiking. The membrane 0
recovery variable u provides negative feedback to the mem-
brane potential v, and emulates the activation of K + ionic −0.01

currents and the inactivation of N a+ ionic currents. There are −0.02

four parameters that alter the behaviour of a neuron, namely −0.03


a, b, c and d, that can be set to accurately simulate a large
variety of types of neurons. In particular, a determines the −0.04
−100 −80 −60 −40 −20 0 20 40 60 80 100
Δt = tpost − tpre (msec)
time scale of the recovery period, b represents the coupling
between u and v, while c and d denote the after-spike reset Fig. 4. Symmetrical STDP. The dotted line illustrates the symmetrical
values for v and u, respectively. Characteristic values of STDP rule for τa = 10 msec and τb = 12 msec, the dashed line for
these parameters are a = 0.02, b = 0.2, c = −65mV and τa = 15 msec and τb = 12 msec, and the solid line for τa = 10 msec
and τb = 10 msec. As can be observed, the solid line and the dotted line
d = 2. According to this model, a spike is produced when have the same positive time window (i.e., the same τa ), while the dashed
the membrane potential of a neuron reaches the threshold of line has the smallest depression area.
30 mV.
The simulation is run in discrete time, setting the time step
to 1 msec. At each simulation step t, the incoming current where ΔSi,j denotes the weight update between the presy-
I(t) for a neuron i is updated based on the activity of a set naptic neuron j and postsynaptic neuron i, Asym is a coeffi-
of presynaptic neurons Q which are connected to neuron i cient that controls the magnitude of the synaptic change, τa
with a conductance delay δ and fired at the time step t − δ. determines the time window in which the incremental change
In particular, of weights is positive, the ratio ττab controls the balance
I(t) = Ib + Si,j F (10) between potentiation and depression, and Δt = tpost − tpre .
j∈Q In the present work, we set Asym = 0.05, τa = 10msec, and
where Ib is the base current, Si,j is the synaptic weight τb = 10msec. The time window between presynaptic and
from neuron j to neuron i, and F is a scaling factor. In postsynaptic neurons that are included in STDP is set to 50
our simulation, F = 0.2 and δ = 1 msec. msec.

B. Learning Mechanism: Spike Timing-Dependent Plasticity IV. E XPERIMENTS


Spike Timing-Dependent Plasticity is regarded as a bi- The proposed spiking neural network has been tested on
ologically plausible learning mechanism that modifies the controlling the kinematic model of the arm of a humanoid
synaptic strength between real neurons. Its exact form varies robot, called iCub. This robot has been developed as part
between different types of synapses and many models have of the RobotCub project3 and has been designed to resemble
been proposed ( [11], [14]–[16]). Common differences be- the size of a 3.5 year old child. It has 53 degrees of freedom,
tween the various versions of STDP is the amount of change weights around 22 kg, and is approximately 1 m tall. Each
in weights, the dependence or not of the weight update on the arm has seven DoFs, with three of them located on the
current synaptic weight, and the time windows that are exam- shoulder, one DoF located on the elbow and three DoFs on
ined before and after the spike of the postsynaptic neuron. In the wrist. In the present work, we control the four upper
this work, we use the symmetric version of STDP [17], which DoFs of the iCub’s arm, but the extension of the proposed
has been found in [6] to be robust to the temporal structure spiking neural network to control all of the seven DoFs is
of the input patterns. In this version of STDP, the decision straightforward.
of whether a synapse should be potentiated or depressed During the training stage, a set of configurations of the arm
does not depend on the temporal order of the events (arrival are selected as “home” positions. The Endogenous Random
of the presynaptic spike at postsynaptic neuron before/after Generator sends random motor commands, in the range of
the firing of the postsynaptic neuron), but instead on their [−5o , 5o ] at each joint, and their effect on the spatial position
absolute time difference | tpost − tpre |. That is, synapses of the end-effector is computed based on forward kinematic
are potentiated when they deliver spikes slightly before or equations, according to the Denavit-Hartenberg parameters of
after the firing of the postsynaptic neuron, while they get the iCub’s arm [18]. Each iteration includes four movements
depressed when the time lag is greater (see Figure 4). This of the arm, and for each movement, the neurosimulation
rule is described by: encodes the joint positions, the spatial direction of movement
2  and the motor commands with a firing stimulus of 20 msec
Δt −
|Δt|
ΔSi,j = Asym 1 − e τb (11)
τa 3 http:://www.robotcub.org

4108
iteration 0: Random Synaptic Weights iteration 40: iteration 100:
)
n
io
t
c
e
ir
d
l
a
i
t
s a
n p
o s
r t
u e
e g
n r
t a
u t
p ,
n
in o
i
t
p
e
c
o
i
r
p
o
r
p
(

motor neurons motor neurons motor neurons


joint commands
iteration 200: iteration 400: iteration 700: (1) (2) (3) (4)
-1 3 5 -4 -4 -3 5 2
)
n
joint 1: -50
o
i
t
c
-30
e
ir
joint 2: 23
d
l 53
0
a
i
t
s
n
o
a
p
s
joint 3: 30
r t
u
e
e
g
joint 4: 42
n
t
u
r
a
t 22
direction x: 00
p ,
n
in o
i
t
p

direction y: 0.7
e
c
io
r -0.7
p
o
r
0.7
direction z: -0.7
p
(

motor neurons motor neurons motor neurons

Fig. 5. The connectivity matrix between a set of input and motor neurons (see text). The matrix is shown for the iterations 0, 40, 100, 200, 400 and 700.
Darker color indicates lower synaptic weights. As can be seen, the synaptic weights start from random values before training, and their values change to
accommodate the correlation between input and output patterns. In the specific example, two different central neurons have been assigned to represent the
value 0 in the fifth input layer that represents the spatial direction x, as has been explained in the text.

in the respective neuronal layers. An interval of 50 msec is ϑ̇1 move the arm from the configuration ϑ1 in the direction
set between the firing patterns that represent two consecutive ė1 , and the motor commands ϑ̇2 move the arm from the
movements. In the simulation, a background noise of 3 Hz pose ϑ2 in the direction ė2 . Recall that we have 7 input
is used in both input and output layers in order to weaken layers and 4 output layers, and these layers are separated
the synaptic weights between uncorrelated neurons. in the connectivity matrix with thick yellow lines. As can
be seen in the upper-left corner of Figure 5, the synaptic
The connectivity matrix for a subset of the synaptic
weights are initially set to random values. As the training
weights between input and motor neurons during the training
takes place, we can observe that progressively the synaptic
stage is shown in Figure 5, where darker colors represent
weights between the correlated neurons (ϑ1 , ė1 ) and ϑ̇1
weaker synapses. The set of input neurons included in the
increase, while the synaptic weights between the uncorrelated
connectivity matrix shown represent the arm configurations
(ϑ1 , ė1 ) and ϑ̇2 decrease. At the 700th iteration, we can see
ϑ1 = [−50o , 23o , 0o , 42o ] and ϑ2 = [−30o , 53o , 30o , 22o ],
that the correlation between (ϑ2 , ė2 ) and ϑ̇2 has also been
as well as the spatial directions ė1 = [0, 0.7, 0.7] and ė2 =
established.
[0, −0.7, −0.7]. Similarly, the set of motor neurons included
represent the motor commands ϑ̇1 = [−1o , 5o , −4o , 5o ] and After training, the network is able to generate those motor
ϑ̇2 = [3o , −4o , −3o , 2o ]. We have selected this set of input commands which, for a given arm configuration, move the
and motor neurons for illustration, as the motor commands end-effector in the desired spatial direction. In particular,

4109
Output Layer: 1
800
600
Neuron

400
200

0 100 200 300 400 500 600 700 800 900 1000
Time (ms)
Output Layer: 2
800
600
Neuron

400
200

0 100 200 300 400 500 600 700 800 900 1000
Time (ms)
Output Layer: 3
800
600
Neuron

400
200

0 100 200 300 400 500 600 700 800 900 1000
Time (ms)
Output Layer: 4
800
600
Neuron

400
200

0 100 200 300 400 500 600 700 800 900 1000
Time (ms)

Fig. 6. Raster plot of the output layers. We consider three arm configurations, and for each configuration, we set five desired spatial directions. The
time windows [0,350], [350, 670], and [670, 1000] refer to the three arm poses. Note that the first desired direction in each pose is the same, but
different motor commands are required. The arm configurations considered are: [−50o , 23o , 0o , 42o ], [−30o , 53o , 30o , 22o ], [−20o , 65o , 40o , 60o ], while
the spatial directions given are: [0 0.7 0.7; 0 1 0; 0 0 1; 0 -0.7 -0.7; 0 -1 0] (for the first arm pose), [0 0.7 0.7; 0.7 0 -0.7; 0.7 0.7 0; -0.7 -0.7 0; 0 -0.7
-0.7] (for the second arm pose), and [0 0.7 0.7; 0 1 0; 0.5 0.5 -0.7; 0 -0.7 -0.7; 0 -1 0] (for the third arm pose).

we consider three arm poses, which have also been used as network provides when aiming to move the end-effector in
“home” positions during the training stage, and set, for each a certain spatial direction (i.e., [0 0.7 0.7]) for consecutive
pose, five desired spatial directions. The resulting raster plot steps.
of the output layers is shown in Figure 6, which is decoded
into motor commands in order to move the end-effector in V. C ONCLUSION
the desired spatial directions. It is worth noting that the same In this paper, we have presented a spiking neural network
spatial direction in the three arm poses requires different that autonomously learns to control a four degree-of-freedom
motor commands, and thus necessitates the activation of robotic arm in three dimensional space. The neural network
different output neurons. This is achieved in the proposed consists of approximately 12000 Izhikevich’s neurons, and
network, even though there is a partial overlapping in the has a feed-forward architecture. The input layers of the
input stimuli due to the fact that the input layers Linput i=5:7
network encode the joint positions of the arm and the desired
encode the same spatial direction. The desired directions spatial direction of the end-effector, and the output layers
and the actual directions of movement produced by the represent the corresponding motor commands. We have cho-
decoded motor commands can be seen in Figure 7, where the sen this architecture due to the fact that the set of motor
mean difference is 7o . The error can be decreased by using commands that drive the end-effector in a certain spatial
smaller discretization bins for the spatial direction during the direction is only valid in a local region of the joint space, and
training stage (currently, the bin size is 45o ). Finally, Figure 8 thus, the input firing pattern should encode the desired spatial
illustrates the joint angles of the arm that the spiking neural direction in conjunction with the current joint configuration
of the arm. The training takes place during a motor babbling

4110
ACKNOWLEDGMENT
This work was supported by an EPSRC Grant under
300 Project Code EP/F033516/1.
200 R EFERENCES
100 [1] D. Bullock, S. Grossberg and P.H. Guenther, “A Self-Organizing Neural
Model of Motor Equivalent Reaching and Tool Use by a Multijoint
0 Arm”, Journal of Cognitive Neuroscience, vol. 5, no. 4, pp. 408–435,
z

1993
−100 [2] P. Gaudiano and S. Grossberg, “Vector associative maps: Unsupervised
real-time error-based learning and control of movement trajectories”,
−200 Neural Networks, vol. 4, no. 2, pp. 147–183, 1991
[3] S. Grossberg, “Some nonlinear networks capable of learning a spatial
−300
pattern of arbitrary complexity”, Proceedings of the National Academy
−500 of Sciences of the United States of America, vol. 59, no. 2, pp. 368–372,
500
−400 400 1968
−300
−200
300 [4] G. Asuni, G. Teti, C. Laschi, E. Guglielmelli and P. Dario, “A Robotic
200
−100 100 Head Neuro-controller Based on Biologically-Inspired Neural Models”,
0 0 Proceedings of the IEEE International Conference on Robotics and
y
x Automation, pp. 2362–2367, 2005
[5] Q. Wu, T.M. McGinnity, L. Maguire, A. Belatreche and B. Glackin, “2D
co-ordinate transformation based on a spike timing-dependent plasticity
Fig. 7. The desired directions and the actual directions of movement
learning mechanism”, Neural Networks, vol. 21, no. 9, pp. 1318–1327,
produced by the decoded motor commands.
2008
60
[6] A.P. Davison and Y. Frégnac, “Learning Cross-Modal Spatial Trans-
formations through Spike Timing-Dependent Plasticity”, Journal of
50 Neuroscience, vol. 26, no. 21, pp. 5604-5615, 2006
40 [7] R. Borisyuk, Y. Kazanovich, D. Chika, V. Tikhanoff and A. Cangelosi,
“A neural model of selective attention and object segmentation in the
30
visual scene: An approach based on partial synchronization and star-
20 like architecture of connections”, Neural Networks, vol. 22, no. 5-6, pp.
707-719, 2009
joint angles (degrees)

10
[8] S.G. Wysoski, L. Benuskova and N. Kasabov, “Fast and adaptive
0 network of spiking neurons for multi-view visual pattern recognition”,
−10
Neurocomputing, vol. 71, pp. 2563–2575, 2008
[9] A.L. Hodgkin and A.F. Huxley, “A quantitative description of membrane
−20 current and its application to conduction and excitation in nerve”,
−30 Journal of Physiology, vol. 117, no. 4, pp. 500-544, 1952
[10] E.M. Izhikevich, “Simple model of spiking neurons”, IEEE Transac-
−40
tions on Neural Networks, vol. 14, pp. 1569-1572, 2003
−50 [11] S. Song and L.F. Abbott, “Cortical development and remapping
through spike timing-dependent plasticity”, Neuron, vol. 32, pp. 339-
−60
1 2 3 4 350, 2001
Motion Steps
[12] A. Pouget and P.E. Latham, “Population codes”, The Handbook of
Brain Theory and Neural Networks, 2nd edition. M. Arbib, MIT Press,
Fig. 8. The arm initially rests on the joint configuration 2003
[−50o , 23o , 0o , 42o ]. The figure depicts the joint angles as they have been [13] A.P. Georgopoulos, A.B. Schwartz and R.E. Kettner, “Neuronal popu-
updated by the neural network when moving the end-effector towards the lation coding of movement direction”, Science, vol. 233, no. 4771, pp.
spatial direction [0 0.7 0.7]. Symbols ‘*’, ‘o’, ‘’ and ‘’ correspond to 1416–1419, 1986
the first, second, third and fourth joint respectively. [14] S. Song, K.D. Miller and L.F. Abbott, “Competitive Hebbian learning
through spike-timing-dependent synaptic plasticity”, Nature Neuro-
science, vol. 3, pp. 919–926, 2000
period, and Spike Timing-Dependent Plasticity has been [15] G. Bi and M. Poo, “Synaptic Modifications in Cultured Hippocampal
used as the learning mechanism. We have shown that this Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsy-
naptic Cell Type”, Journal of Neuroscience, vol. 18, no. 24, pp. 10464–
mechanism is able to temporally associate the input and 10472, 1998
output patterns, modify the synaptic weights accordingly, [16] D.E. Feldman, “Timing-based LTP and LTD at vertical inputs to layer
and train the network to perform the mapping from spatial II/III pyramidal cells in rat barrel cortex”, Neuron, vol. 27, pp. 45–56,
2000
commands to joint commands. An important feature of [17] M.A. Woodin, K. Ganguly and M. Poo, “Coincident Pre- and Postsy-
the proposed network is its scalability with respect to the naptic Activity Modifies GABAergic Synapses by Postsynaptic Changes
number of degrees of freedom, as the population of neurons in Cl− Transporter Activity”, Neuron, vol. 39, pp. 807-820, 2003
[18] B. Siciliano and O. Khatib, “Springer Handbook of Robotics”,
required increases linearly with the number of joints. The Springer-Verlag, 2008
current implementation of the network is computationally [19] A. Fidjeland, E.B. Roesch, M.P. Shanahan and W. Luk, “NeMo:
costly (several seconds of processing time are required per A platform for neural modelling of spiking neurons using GPUs”,
20th IEEE International Conference on Application-specific Systems,
arm movement). In future work, we aim to implement the Architectures and Processors, 2009
proposed network using our GPU architecture [19] in order
to achieve real-time performance.

4111
View publication stats

You might also like