0% found this document useful (0 votes)
48 views

Robot Learning Driven by Emotions: Adaptive

.

Uploaded by

Fabricio Costa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Robot Learning Driven by Emotions: Adaptive

.

Uploaded by

Fabricio Costa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Adaptive Behavior

Robot Learning Driven by Emotions

Sandra Clara Gadanho, John Hallam


University of Edinburgh, Department of Artificial Intelligence

The adaptive value of emotions in nature indicates that they might also be useful in artificial creatures.
Experiments were carried out to investigate this hypothesis in a simulated learning robot. For this
purpose, a non-symbolic emotion model was developed that takes the form of a recurrent artificial
neural network where emotions both depend on and influence the perception of the state of the world.
This emotion model was integrated in a reinforcement-learning architecture with three different roles:
influencing perception, providing reinforcement value, and determining when to reevaluate deci-
sions. Experiments to test and compare this emotion-dependent architecture with a more conven-
tional architecture were done in the context of a solitary learning robot performing a survival task. This
research led to the conclusion that artificial emotions are a useful construct to have in the domain of
behavior-based autonomous agents with multiple goals and faced with an unstructured environment,
because they provide a unifying way to tackle different issues of control, analogous to natural
systems emotions.

Keywords robotics emotions reinforcement learning

1 Introduction several functional roles of emotions under an animat


philosophy (Wilson 1991), that is, by building a com-
Recent research provides considerable support for the plete agent where emotions form an integral part of the
claim that emotions are essential to human reasoning. whole. Furthermore, it was considered important not
This suggests that something analogous to emotions only to develop a fully functional agent that success-
might play an important role in artificial intelligence fully performs the task that it is devised for, but also to
creatures. This is the topic of the present research, demonstrate that the introduced mechanisms are
which employed artificial emotions in the control of a advantageous when compared with more traditional
solitary agent that adapts to its environment using mechanisms. This implies an engineering approach to
reinforcement learning (RL) techniques. modeling emotions (Wehrle, 1998): The primary
A view shared by many researchers in the objective pursued was to enhance the performance of
emergent field of emotional agents is that it is the the robot and not to improve knowledge about the
functional aspect of emotions in cognition that should nature of emotions themselves. However, the effective
be taken into account when modeling emotional use of emotions might, we hoped, contribute some
agents (Caamero, 1998; Frijda and Swagerman, clues to their understanding.
1987) and not the replication of the experience of The experiments reported here compared the
human emotions as reported by the individuals sub- performance of emotional and nonemotional robots in a
jective cognitive observations. This view was adopted survival task that consists of maintaining adequate
for the current work, which consisted of the test of energy levels in a simulated environment with obstacles
Correspondence to: S.C. Gadanho, Instituto de Sistemas e Robotica, Copyright 2002 International Society for Adaptive Behavior
Instituto Superior Tecnico, Torne Norte 6.10, Av. Rovisco Pais 1, (2002), Vol 9(1): 4264.
1049001 Lisboa, Portugal. E-mail: [email protected] [10597123 (200101) 9:1;4264; 020582]

42

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 43

and energy sources. To master this task, the agents used because those are usually the ones associated with the
reinforcement learning techniques (e.g., Kaelbling, current emotional state. In nature, emotionally charged
Littman, & Moore, 1996; Sutton and Barto, 1998, for objects are also made more salient in perception, but
surveys on RL) to learn the coordination of simple this was not modeled in the experiments.
hardwired behaviors, which is quite a difficult problem
and extensively hardwired in other robot applications. 1.2 Reinforcement
To test the utility of emotions in the adaptation of
the robot to its environment, an emotion model was Emotions are usually associated with either pleasant or
designed and integrated into a reinforcement learning unpleasant feelings that can act as reinforcement. This
framework for robot control. This is a simple model allows emotions to motivate the agent to approach or
based on a recurrent network, where perception and avoid certain emotional scenarios. It is often assumed
emotions influence each other. Through this mutual that human decision making consists of the maximiza-
influence some persistence of emotional state is tion of positive emotions and minimization of negative
achieved, while maintaining reactiveness to new emotions (e.g., Tomkins, 1984). Mowrer (1960)
perceptual states. The model endows the agent with proposed that during learning, stimuli are primarily
emotional states that are coherent with its contextual associated with emotions that then drive the behavior
interaction. In the design of the model, it was consi- associations. Proving a source of context evaluation,
dered more important to have simplified emotions that or reinforcement, is the most usual role attributed to
could be afforded by the robotenvironment inter- emotions in the functionality of an artificial learning
action than to equip the robot with humanlike emotions agent (e.g., Albus, 1990; Bozinovski, 1982; Kitano,
(Caamero, 1998). Nevertheless, to make the text more 1995; Wright, 1996). This role of providing reinforce-
concise and easier to follow, language is used that ment was also tested in the experiments reported here.
might implicitly attribute human emotions to the robot.
Apart from the influence on perception imposed 1.3 Control Triggering
by the model used, emotions were used in the rein-
forcement learning framework to fulfill the following One of the most difficult problems faced when
roles: reinforcement specification and detection of sig- employing reinforcement learning techniques in robot-
nificant events. So, three possible forms of emotional ics applications is to determine when a discrete state
influence were examined: perception, reinforcement, transition occurs. This is not an issue in traditional
and control triggering. reinforcement learning problems, because these usually
consist of grid worlds where state transition consists of
a single discrete action that moves the agent to one of
the cells in the neighborhood of the cell where the
1.1 Perception
agent is located. In a continuous world the determina-
Many emotions theorists agree that emotions are most tion of this transition is not clear and it is usually
helpful for focusing attention on the relevant features implementation dependent. This transition can be trig-
of the problem at hand and, in particular, for deter- gered by some internal or external event and must be
mining the salience of the perceptual information identified by the designer, because it determines when
(Cytowic, 1993). Emotions not only help discriminate the controller needs to reevaluate its previous decision
the perceptual information congruent with the emo- and make a new one. Experiments have been done to
tional state (Niedenthal, Setterlund, & Jones, 1994) test whether emotions can successfully fulfill the role
but also direct attention toward the perceptual infor- of determining state transitions by triggering the adap-
mation that might be useful to cope with the emotional tive controller whenever there is a significant change
state (Derryberry & Tucker, 1994). in the emotional state.
In the experiments, the perception of state within Emotions are frequently pointed to as a source of
the learning algorithm is subject to the influence of interruption of behavior (Simon, 1967; Sloman &
emotions. The perceived state is changed to be more Croucher, 1981) in the domain of more traditional
compatible with the current emotions, thus making the symbolic artificial intelligence architectures. In
relevant features of the environment more salient general, it is considered that behavior should be

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


44 Adaptive Behavior 9(1)

interrupted and eventually replaced whenever a strong Doubts have even been posed on whether machines
emotion is felt. In the current work, it is considered can exhibit intelligent behavior without emotions
that if the emotional intensity falls, then behavior (Charland, 1995; Minsky, 1986).
should also be changed, because the crisis that gave Emotions influence cognition in terms of different
rise to the emotion has probably been solved. So state elementary mechanisms like perception (Cytowic, 1993;
transition is triggered not only by sudden rises of Derryberry & Tucker, 1994; Niedenthal, Setterlund, &
emotional intensity but also by abrupt drops. Implicit Jones, 1994), memory (Blaney, 1986; Bower, 1981;
in this approach is the fact that the emotion model LeDoux, 1998), attention (De Sousa, 1987; Frijda, &
being used is continuous and so does not provide a Swagerman, 1987; LeDoux, 1998; Ortony, Clore, &
clear-cut onset or termination of emotions, requiring; Collins, 1988; Simon, 1967; Sloman & Croucher, 1981)
that abrupt changes be detected instead. and reasoning (Bechara, Damasio, Tranel, & Damsio,
In essence, the reported research work shows how 1997; Damsio, 1994; LeDoux, 1998).
emotions can influence control in multiple ways. Some of the properties of emotions that have an
Although the emotions used are much simplified adaptive value that might be transposed to artificial
and quite different from the human counterparts, they creatures are
try to capture several functional aspects of emotions
and were an integrant part of a complete agent. The control of attention. Emotions influence percep-
next section presents reasons why emotions can be tion and orient reasoning by focusing the agents
advantageous in artificial creatures and describes the attention on the most relevant features to solve its
model of artificial emotions developed. Section 3 immediate problem (e.g., Beck, 1983; Morignot &
gives a detailed description of the experiments made to Hayes-Roth, 1995). In particular, they have been
test the different emotion roles presented above. attributed the role of interrupting the agent from
Finally, the conclusions reached are presented in what it is doing when new problems arise that
Section 4. need to be attended to (Beaudoin & Sloman,
1993; Sloman, Beaudoin, & Wright, 1994);

2 Emotions in Robots source of reinforcement for adaptive agents (e.g.,


Albus, 1990; McCauley & Franklin, 1998;
2.1 Emotions Interaction with Cognition Wright, 1996) or the role of monitoring the
agents performance so that the agents plans or
In their quest for true intelligence, people usually actions can be changed if necessary (e.g.,
adopt a Cartesian approach that regards emotions as a Michaud, Lachiver, & Dinh, 1996; Shibata,
hindrance carried over from their early evolutionary Ohkawa, & Tanie, 1996);
development, at odds with their aspiration to high
rationality. Psychologists, too, have tended to concur memory filters that allow better recall of events
with this popular view of emotions as useless or even that are congruent with current emotional state
disruptive to rationality (Toda, 1993). However, the (mood congruence, e.g., Araujo, 1994) or events
view that emotions are an integral part of rational that were learned while the agent was in a con-
behavior is receiving increasing support from brain gruent emotional state (mood dependency, e.g.,
research studies (Cytowic, 1993; Damsio, 1994; Bower & Cohen, 1982). This preferential recall of
LeDoux, 1998). some events over others can affect the decision-
Studies show that human decisions are not always making process by making the agent more or less
logical (Grossberg & Gutowski, 1987). Pure logic is optimistic depending on whether it is happy or
not enough and shows serious faults when used to sad, respectively (Bower & Cohen, 1982; Seif El-
model human intelligence in artificial intelligence Nasr, Ioerger, & Yen, 1998);
systems (Dreyfus, 1992). Furthermore, emotions have
been suggested in the field of artificial intelligence as assistance in reasoning. The agents affective
the ultimate source of intelligence that might provide system quickly obtains perceptual cues that can
robots with the autonomy they need (Toda, 1994). be used to direct the access of the cognitive

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 45

information relevant for the cognitive systems Dalgleish, 1997) that stress the role of conscious
deliberation (Damsio, 1994; Ventura, Custodio, & reasoning in the generation and definition of emotions,
Pinto-Ferreira, 1998); in spite of emotions also being aroused by crude sub-
conscious experiences involving simple information
behavior tendencies or even stereotyped responses, processing without the need for high-level reasoning
which are usually associated with particular processes (Izard, 1993; Zajonc, 1984). Moreover,
emotional scenarios. These built-in responses allow these theories usually rely strongly on verbal reports
for appropriate behavior to be automatically trig- that can be very deceptive for several reasons
gered in emergency situations, avoiding spending (Gadanho, 1999), the most pertinent being that the
unavailable time on elaborate reasoning; emotion-generation mechanisms are not necessarily
knowable to the conscious mind (LeDoux, 1998).
physiological arousal of the body. A strong emo- Following the psychologists mainstream, most
tion is usually associated with a general release of artificial intelligence models of emotions are based on
energy in anticipation of demanding action an analytic and symbolic approach (Bates, Loyall, &
response. The translation of this feature to an arti- Reilly, 1992b; Dyer, 1987; Frijda & Swagerman,
ficial system can consist in the modulation of sys- 1987; Pfeifer, 1982; Pfeifer & Nicholas, 1985;
tem parameters (Bates, Loyall, & Reilly, 1992a; Sloman, Beaudoin, & Wright, 1994) that tries to endow
Caamero, 1997), such as the level of behavioral the model with the full complexity of human emotions
activity or speed, which are directly relevant to the as perceived from an observers point of view.
overall performance of the system; In opposition to the traditional approach, a syn-
thetic bottom-up approach based on the animat
support for social interaction (e.g., Aub, 1998; approach (Wilson, 1991) was preferred for the current
Bates, 1994; Breazeal, 1998; Klein, 1996; Picard, work. For this reason, it was considered more appro-
1997). The expression of emotions allows the priate to investigate the generation of emotions by
individuals to transmit to others messages that are simple associations of stimuli instead. This led to the
often crucial to their survival and therefore have development of a nonsymbolic emotion model.
great adaptive value (Darwin, 1872/1965). Recently, models have been suggested that also follow
a bottom-up approach (Caamero, 1997; Foliot &
2.2 Emotion Model Michel, 1998; Velsquez, 1998; Wright, 1996). The
problem with reproducing most of these models is that
Like many other psychological terms (e.g., intelligence, they usually provide so little architectural specifica-
consciousness), emotion is difficult to describe and tion that they allow almost total freedom of imple-
existing emotion models employ mostly working defini- mentation. Furthermore, the evaluation of their
tions that tend to conflict with each other. On the one practical implementations is often difficult, because in
hand, although there are behavioral and physiological general they are presented as an end result, that is, the
aspects of emotions that can be monitored, an essential adaptiveness value of the presence of emotions is not
part of emotions consists of a private internal experi- evaluated but only presented as fact. In these condi-
ence not subject to direct observation by anyone other tions, unless an objective and accurate description of
than the individual experiencing them, making proper the end product is given, only its direct observation
scientific analysis difficult. On the other hand, emotions can make any kind of evaluation possible.
are intrinsically related to other psychological processes The most significant emotion features that the
(e.g., cognition) and the artificial separation from them designed model tries to capture are:
created by the traditional scientific approach, together
with the artificiality of the experimental setups, often emotions have valence, that is, they provide a
hides away the true nature of real emotional experiences positive or negative value.
(Kaiser & Wehrle, 1996).
A large subset of theories of emotions are based on emotions have some persistence in time, that is,
elaborate cognitive appraisal theories (e.g., Lazarus, sudden unrealistic swings between different emo-
1982, Ortony, Clore, & Collins, 1988; Power & tions should not be allowed, particularly when the

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


46 Adaptive Behavior 9(1)

Figure 1 The emotion model. In this model, emotions do not


depend directly on the agents immediate perception of the
world, that is, its sensations. They depend on the feelings that
are a combination of the sensations and the hormones pro-
duced recently by active emotions. This adds to the emotion
state some memory of the recent past.

emotions in question differ greatly. The occur- believed to be the most universally expressed
rence of a certain emotion depends not only on emotions along with disgust (Ekman, 1992) and are
direct sensory input, but also on the agents recent usually included in the definitions of basic or primary
emotional history. emotions (see, for example, Goleman, 1995; Power &
Dalgleish, 1997; Shaver, Schwartz, Kirson, &
Emotions color perception in that what is per- OConnor, 1987). Furthermore, they are adequate and
ceived is biased by the current emotional state. useful for the robotenvironment interaction afforded
by the experiments. Others might prove too sophisti-
The model that was developed (Figure 1) is based cated or out of place. For instance, there seems to be
on four emotions (): happiness, sadness, fear, and no situation where it is appropriate for the robot to feel
anger. These emotions were selected because they are disgust. However, if, for instance, toxic food were

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 47

added to the environment, disgust would become 


useful to keep the robot away from it. b if x < b
Th[b ,b +](x) = b+ if x > b+ (3)
E = {Happiness, Sadness, Fear, Anger} (1) x otherwise

The model determines the intensity of each emo- Equation 4 shows how the intensity value of
tion based on the robots current internal feelings (F). emotion e at step n (Ien ) is calculated from the intensity
In the current work, the robot feelings are defined as of the feelings (If n) at that step. This calculation
the internal perception of its raw sensations, or its sub- involves an emotion bias (Be) and coupling coefficients
jective sensations. The intensity of each emotion is (Cef) between the emotion e and the feelings f F.
calculated through linear weighted dependencies from   
feelings. The set of feeling used in the experiments I , Ie = Th[0 ,1] Be +
e E, n N (Ce f I fn )
n
reported here is given in Equation 2. fF

(4)
F = {Hunger, Pain, Restlessness, Temperature,
Eating, Smell, Warmth, Proximity} (2) The calculation of the feelings intensity has to
take into account both the influences provided by the
hormone system (Hfn), which are dependent on a coef-
Furthermore, the emotional state also influences ficient parameter (Ch), and the value of the respective
the robots feelings. The feelings that give rise to an sensation (Sfn). The sensations values are directly
emotion are also the ones aroused by the emotion. This derived from the sensory data. The hormone values are
way, each emotion tries to influence the feelings in responsible for the memory of the emotion system, and
such a way that the resulting feelings match the state depend both on their previous values and the
that gives rise to that particular emotion. An emotion emotion influences (A fn). Emotions only influence the
only influences the feelings if its intensity value is sig- hormone values if their intensity is above the activation
nificantly large, that is, its value is above an activation threshold (ITha). To calculate the value of the hormones
threshold. (H fn), two different system parameters are used, the
The emotions influence the feelings through a attack gain ( up) and the decay gain ( dn). The first one
hormone system, by producing appropriate hormones. is used when the emotions and their influences are
The hormone system in the model is a very simplified increasing and the other when the emotion intensities
one. It consists of having one hormone associated with are fading away. In general, the attack gain is much
each feeling. A feeling intensity is not a value directly higher than the decay gain. This way the decay of emo-
obtained from the value of the sensation that gives rise tions is slow while the emergence of new emotions is
to it, but from the sum of the sensation and hormone much faster. The values of the parameters used in the
value. The hormone quantities produced by each emotions model are given in Table 1, except for the
emotion are directly related to its intensity and its emotions dependencies (Cef ) and bias (Be), which will
dependencies on the associated feelings. The stronger be given later in the context of the agents task.
the dependency on a certain feeling, the greater quantity
of the associated hormone is produced by an emotion. f F, n N
I ,
On the one hand, the hormone mechanism intro- If n = Th[0,1](ChHf n + Sfn) (5)
duces a competition between the emotions to gain 
control over the feelings, which is ultimately what 0 if n = 1
Hf n =
selects which emotion will be dominant. On the other n Hf + (1 n)Af
n n1
if n > 1 (6)
hand, the robot feelings are dependent not only on its

sensations but also on its emotional state, that is, the Af n = Cef Ien
intensity of its emotions. e: Ie > Ith
n a
A formal description of the models functions is  (7)
given by Equations 38. The function Th[b , b+] (x) was up if | Af n| > |Hf n|
n =
needed to confine values within an interval [b , b +]. dn otherwise (8)

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


48 Adaptive Behavior 9(1)

The emotion response

1 Sensation value (Sf)


Feeling intensity (If)
0.8
Emotion intensity (Ie)
0.6 Hormone value (Hf)
Activation threshold (Ith )
a
0.4

0.2

0
0 200 600 800 1000 1200 1400 1600 1800 2000
Number of steps
Figure 2 Emotional response to a sensation. The emotion e depends indirectly on this sensation through the
respective feeling f. The emotion intensity reaches a maximum value as soon as the sensation value is set to one
and it decreases drastically once the sensation value is set back to zero, but the emotion is still active long after the
sensation has been set to zero.

Table 1 Parameter values used in the experiments that fear has a strong dependency on pain,1 then the
fear intensity will rise. If this intensity is high enough
Parameter Definition Value then fear will produce hormones. In particular, the
Itha Emotion activation threshold 0.2 hormone associated with pain will quickly build up
Iths Emotion selection threshold 0.2 during the collision. This will make the fear emotion
Ch Hormone coefficient 0.9 grow stronger and possibly overtake other existing
up Hormone attack gain 0.98 emotions. When the robot finally manages to cease the
dn Hormone decay gain 0.996
collision, it will still have pain not because the pain
sensation is still there, but because the hormone asso-
ciated with pain has a high value. So the fear emotion
Figure 2 shows the response of an emotion e to a will persist while the hormone gradually decreases in
sensation on which it has a dependency (Cef) of value. This means that while the robot is gaining dis-
0.8 weight. This dependency is actually indirect, through tance from the obstacle, the fear will still be there.
the respective feeling f . When the sensation value (Sf ) Nevertheless, it will usually fade away as soon as a
is 1.0, the emotion intensity (Ie) is 0.8, which is the short distance is gained and the risk of further colli-
highest value possible in this example. The influence sions has diminished.
of the hormone (Hf) is only noticeable after the sensa- The hormones values can increase quite rapidly,
tion returns to value zero. Before that, the feeling allowing for the quick buildup of a new emotional
intensity (If) is saturated by the stimulus itself. When state, and decrease slowly, allowing for the persistence
the stimulus disappears, the emotion intensity has a of an emotional state even when the cause that gave
sudden drop in value because it becomes dependent rise to it is goneanother of the characteristic features
solely on the total value of hormone (Hf) that accumu- of emotions. It should be noted that the time scales
lated while the sensation was on. The values of involved in the persistence of an emotion after the
hormone and emotion gradually decay to zero without stimulus is gone, particularly when in the presence of
the presence of the sensation. a new stimulus that favors another emotion, are quite
As a concrete example of the dynamics of the small. This allows for what is perceived as quick
model in terms of robotenvironment interactions, changes of emotions, in opposition to the much slower
consider the situation of the robot colliding with an process of changes in mood. One can only talk of
obstacle. The collision itself produces a pain sensation moods when talking of the residual hormone values
that will be captured by the pain feeling. Assuming that might exist in the system and are not strong

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 49

Figure 3 The simulated robot and its environment. The simulation window shows the robots environment
on the left and its internal state on the right.

enough to stimulate the existence of a dominant The model of emotions behaves appropriately
emotion. That would be consistent with the theory that when tested on the robot, in the sense that the robot
moods are differentiated from emotions in terms of consistently displays plausible contextual emotional
level of arousal (Panksepp, 1995). These residual states during the process of interacting with the
hormone values can act as moods in the sense that they environment. Furthermore, because its emotions are
might favor the appearance of certain emotions. grounded in subjective feelings, and not direct
Emotions were divided into two categories: posi- sensory input or sensations, it manages to avoid
tive and negative. The ones that are considered pleas- sudden changes of emotional state, from one extreme
ant are positive (only happiness, in the set of emotions emotion to a completely different one. The more
used); the others are considered negative. This way a different the two emotions are, the more difficult it is
value judgment can easily be obtained from the to change from one to the other. Emotion persistance
emotion model by considering the intensity of the could have been modeled at the level of the emotions
current dominant emotion and whether it is positive or themselves, but modeling it at the level of feelings
negative. The dominant emotion is the one with the allows for a richer interaction between emotions in
highest intensity, unless no emotion intensity exceeds a their competion for dominance.
selection threshold (Iths). In this case, there will not be
a dominant emotion and the value judgment is neutral.
In summary, the model of emotions described not 3 Experiments
only provides an emotional state, based on simple
sensations, that is coherent with the current situation, This section presents a detailed account of the experi-
but also influences the perception. Side issues associ- ments made. First, a description of the robot, its task,
ated with emotions as moods and temperaments were its feelings, and its emotions is given. Second, the
not directly built into the architecture and are only robot learning controller used is presented and its
exhibited as a by-product. Different temperaments, for interactions with the emotion system are fully
instance, can be achieved by having different emotion explained. Finally, the experimental procedure is
dependencies on feelings or changing other para- described and the results obtained are reported and
meters of the system. discussed.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


50 Adaptive Behavior 9(1)

3.1 The Robot and Its Task The sensations available to the robot were:
All the experiments were carried out in a realistic simu- hunger The robots energy deficit;
lator developed by Michel (1996) of a Khepera robot
a small robot with a left and a right wheel motor, and pain High if the robot is bumping into obstacles;
eight infrared sensors that allow it to detect object prox-
imity and ambient light. Six of the sensors are located at restlessness Increases if the robot does not move
the front of the robot and two at the rear. The robot envi- and decreases otherwise; it is reset whenever a
ronment (Figure 3) consisted of a closed environment behavior is selected;
with some walls and three lights surrounded by bricks.
Simulated feeding needs were added to the robot. temperature Rises with high motor usage and
The robot is always losing energy: The more it uses its returns to zero with low motor usage;
motors the more energy is used up. It can recover its eating High when the robot is acquiring energy,
energy from lights in its environment. The main reason that is, when the hunger sensation is decreasing;
for having lights as food sources is to allow the robot
to distinguish its food sources with its poor perception smell Active when there is energy available
capabilities. note that the robot does not have real smell sensors:
To gain energy from a food source, the robot has Odor perception is simulated for the experiments;
to bump into it. This will make energy available for a
short period of time. At the same time an odor will be warmth Directly dependent on the intensity of
released that can be sensed by the robot. It is important light perceived by the robots light sensors;
that the agent is able to discriminate this state through
its sensors, because the agent can only get energy dur- proximity Reflects the proximity of the nearest
ing this period. This energy is obtained by receiving obstacle perceived by the distance sensors.
high values of light in its rear light sensors, which
means that the robot must quickly turn its back to the 3.2 The Emotion System
food source as soon as it senses that energy is avail-
able. To receive further energy the robot has to restart To have the robots emotional state compatible with
the whole process by hitting the light again so that a its task, the emotions dependencies on feelings are
new time window of released energy is started. such that
A food source can only release energy a few times
before it is exhausted. In time, the food source will the robot is happy if there is nothing wrong with
recover its ability to provide energy again, but the present situation. It will be particularly happy
meanwhile the robot is forced to look for other sources if it has been using its motors a lot or is in the
of energy to survive. The robot cannot be successful process of getting new energy at the moment.
by relying on a single food source for energy, that is, Even just the smell of food can make it happy;
the time it takes for new energy to be available in a
if the robot has very low energy and it is not
single food source is longer than the time it takes for
acquiring energy, then its state will be sad. It will
the robot to use it. When a food source has no energy,
be more sad if it cannot sense any light;
the light associated with it is turned off and becomes a
simple obstacle for the robot. if the robot bumps into obstacles then the pain will
The task can be translated into multiple goals: make it fearful. It will be less fearful if it is
moving around the environment to find different food hungry or restless;
sources and, if a food source is found, extracting
energy from it. The extraction of energy was com- if the robot stays in the same place too long it will
plicated to make the learning task harder by requiring start to get restless. This will make it angry. The
the agent to learn sequences of behaviors. Furthermore, anger will persist for as long as the robot does not
the robot should not keep still in the same place for move away or change its current action. A hungry
long durations of time or collide with obstacles. robot will tend to be more angry.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 51

Table 2 The emotions dependencies on feelings

Hunger Pain Restlessness Temperature Eating Smelling Warmth Bias

Happiness 0.2 0.3 0.2 0.2 0.4 0.3 0.0 0.1


Sadness 0.7 0.0 0.1 0.2 0.4 0.0 0.2 0.1
Fear 0.2 0.7 0.2 0.1 0.2 0.2 0.0 0.0
Anger 0.2 0.1 0.7 0.2 0.2 0.0 0.0 0.0

Table 2 presents the actual values for each of the successfully (Lin, 1993; Mahadevan & Connell,
emotion biases and dependencies on feelings. The 1992), behavior coordination is much more difficult
appropriate values were carefully chosen by hand, to and is usually hardwired to some extent (Lin, 1993;
reflect the qualitative dependencies described above, Mahadevan & Connell, 1992; Mataric, 1994).
using a simple process of trial and error that needed In the current work, the primitive behaviors were
few iterations. No emotion dependencies were created hand designed and only the more difficult task of
for the feeling of proximity; this feeling is used only to behavior coordination was learned in the hope that
determine state within the adaptive controller. emotions might be helpful in solving some of problems
found at this level. The developed controller tries to
maximize the evaluation received by selecting between
3.3 The Adaptive Controller one of three possible hand-designed behaviors:
The adaptive controller selected was a well-known
avoid obstacles Turn away from the nearest
reinforcement-learning algorithm that has given
obstacle and move away from it. If the sensors can-
good results in the field of robotics: Q-learning
not detect any obstacle nearby, then remain still;
(Watkins, 1989).
One of the major problems responsible for the
seek light Go in the direction of the nearest light.
slowness of reinforcement learning is the slow itera-
If no light can be seen, remain still;
tive process of spreading the rewards and punishments
through the input space, which can be greatly mini-
wall following If there is no wall in sight, move
mized if the algorithm has added mechanisms to
forward at full speed. Once a wall is found, follow
spread the reinforcements to similar input states. The
it. This behavior by itself is not very reliable in
system implemented profits from generalization over
that the robot can crash, that is, become immobi-
the input space by using neural networks to learn the
lized against a wall. The avoid-obstacles behavior
utility values of each action (Lin, 1993). Apart from
can easily help in these situations.
accelerating the learning process, it also minimizes the
memory space needed to store the policy, which is
The controller (Figure 4) implements a Q-learning
often stored in the form of a look-up table with one
algorithm, using neural networks, very similar to the
value for each action and sensor state combination.
one reported by Lin (1992). It will be defined next in
For more complex tasks skill decomposition
terms of two separate modules: the associative memory
is usually advisable as it can significantly reduce
module and the behavior selection module.
the learning time, or even make the task feasible.
Researchers report that a monolithic approach can fail
to solve the long-term temporal credit assignment 3.3.1 Associative Memory Module This plastic
(Mahadevan & Connell, 1992). One of the reasons module uses three neural networks to associate the
pointed out is the loss in accuracy of the propagation robot feelings with the current expected utility of each
of credit assignment with long action sequences (Lin, of the three robot behaviors. These are three-layer
1992). Skill decomposition usually consists of learn- feed-forward neural networks with 9 input units, one
ing some predefined behaviors in a first phase and then for each feeling and a bias; 10 hidden units; and 1 out-
finding the high-level coordination of these behaviors. put unit that represents the expected outcome of the
Although the behaviors themselves are often learned associated behavior.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


52 Adaptive Behavior 9(1)

Qn(sn , k1)
Feelings Qn(sn1, a) T n (sn1, a)
Values (If)
sn1 sn k 1 Qn(sn , k2)
Reinforcement (R) a k2
Associative Memory

...
Module

Expected Evaluation k i Qn(sn , ki)


of each Behavior (Q) Rn
Figure 5 Learning iteration of the reinforcement-
Behaviour Selection learning algorithm.
Module

Selected Behavior (b)


behavior b selected is updated for the previous state
Figure 4 The adaptive controller. The associative (sn1). An iteration of the back-propagation algorithm
memory estimates an evaluation for each behavior that is is made using as target value Tn (sn1, b). The calcula-
used in the stochastic behavior selection. tion of this target value depends on the current
estimate Qn(sn , k) for each behavior k provided by the
outputs of the networks when using the current state
The activation function used in both the hidden (sn) as input.
and output units was the hyperbolic tangent:
2 x
tanh(x) = 1 e2 x , = 0.25. The weights between the
1+ e
hidden layer and the output layer are initialized with Tn (sn 1, b) = Rn + max{Qn (sn , k) | k behaviors}
random values, and the weights between the input (10)
layer and the hidden layer are set to zero. This way all
the networks provide an initial neutral evaluation. The
learning algorithm used to train the networks was
back-propagation (see, for example, Hertz, Krogh, & 3.3.2 Behavior Selection Module The utility values
Palmer, 1991) with learning rate set to 0.3. provided by the associative memory are used for the
The input of the neural networks consists of the stochastic selection of the next behavior. The higher
perceived world state and the single output of each the value provided by the associated net, the higher the
neural network models the following function for one probability of a behavior to be selected. The function
of the behaviors b: used to calculate the probability Pn (sn , b) of each
behavior b being selected is based on the Boltzmann
Gibbs distribution:
Q(sn , b) = Rn + 1 + V(sn + 1) (9)
Qn (sn,b)
e T
Pn (sn , b) =  Qn (sn,k)
= 0.1 (11)
This function represents the expected discounted e T
cumulative reinforcement that an agent will receive kbehaviors

after executing behavior b in response to the world


state sn. The immediate reinforcement received in the
next state (sn + 1) is Rn + 1. The utility of the state sn + 1, or
3.4 The Emotional Controller
V(sn + 1), is its expected discounted cumulative rein-
forcement if the optimal policy is followed by the The emotional controller (Figure 6) was achieved by
agent, that is, if it always selects the action with the the integration of the emotion system with the learning
highest Q function value. The value is the discount controller. It implements emotional perception, rein-
factor, which is set to 0.9. forcement, and control triggering. These emotional
In each learning step of the algorithm (see mechanisms and the alternative mechanisms used for
Figure 5), the neural network associated with last experimental comparison are presented next.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 53

Feelings

Emotion Dominant emotion Trigger


Event
System Detector

Reinforcement Reinforcement Adaptive


Function Controller

Sensations Behaviour
Perception Behaviour
System System

Figure 6 The emotional controller. The emotion system (Figure 1) uses the
sensations returned by the perception system to calculate the current feelings and
the dominant emotion. These values are used by the adaptive controller (Figure 4)
to select the behavior executed by the behavior system: The feelings define the
current state; and the dominant emotion determines when a state transition occurs
and the reinforcement received.

Feelings
Event Trigger
Emotion Dominant emotion Detector
System

Reinforcement Reinforcement Adaptive


Function Controller

Sensations Behaviour
Perception Behaviour
System System

Figure 7 Emotions influence on perception. In the emotion controller, the feelings


define the agents perception of the world state within the adaptive controller. An
alternative non-emotional perception is to use sensations instead of feelings.

3.4.1 Perception In the adaptive controller selected, In the emotion model developed, feelings are influ-
the agents perception is considered to be the neural enced by emotions through the hormone system that
network inputs of the associative memory. These raises the value of those feelings that are related to the
define the input state that is perceived by the agent and current emotional state. So the represented robot state
is associated with rewards and punishments during is emotion dependent: The state that the robot learns to
the learning process. Selecting a correct input space is associate with rewards is being biased by emotions. To
an important step in the design of a learning agent test whether perception being influenced by the
because it implicitly informs the agent which elements hormone system or not has an impact on the perfor-
of its environment might be important for achieving its mance of the robot (Figure 7) the experimental results
task. The input state selected was the robots feelings. of using feelings as network inputs were compared

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


54 Adaptive Behavior 9(1)

Feelings

Emotion Dominant emotion Trigger


Event
System Detector

Reinforcement Reinforcement Adaptive


Function Controller

Sensations Behaviour
Perception Behaviour
System System

Figure 8 Emotions determining reinforcement value. The reinforcement used by


the emotion controller is derived from the dominant emotion. The agent is rewarded
if the emotion is positive and punished if the emotion is negative. The value of the
reward or punishment is the emotion intensity. Alternatively, a more conventional
nonemotional reinforcement can be used.

with those obtained by replacing them by sensations The experiments made with the behavior-based
that are emotion independent. controller to test whether the emotion-dependent
reinforcement was adequate (Figure 8) compared
3.4.2 Reinforcement The reinforcement function the results obtained using emotion-dependent
must specify the correct behavior of the agent by giving reinforcement with a more traditional reinforcement
it rewards when it is performing well and punishments based on direct sensations. The sensation-dependent
when its behavior is inadequate. For this reason it reinforcement was based on the emotion-dependent
implicitly specifies the agents goals or motivations or one but without any of the temporal or lateral side
its task. In the emotional controller developed, emotions effects introduced by the hormone system. The sensa-
were considered the fundamental source of motivation tions were used instead of the feelings to calculate
and were therefore used as reinforcement. The learning the value of each emotion and the highest of these
task consisted of the maximization of positive emotions values was selected to be the reinforcement value. As
and minimization of negative emotions. before, the values associated with negative emotions
The emotional controller uses the emotion- were negated. Figure 9 illustrates this modification
dependent reinforcement described in Equation 13, and Equation 14 shows the resulting reinforcement
that is, Rn = Ren. The reinforcement magnitude was function.
set to be the intensity of the current dominant emotion
or zero if there was no dominant emotion. If the   
dominant emotion was negative then its positive inten- Rsn = Be + (Ce f Sf n ) sign(e) where
fF
sity value would be negated.
   
1 if e is positive e = arg max Be + (Ce f Sf n ) (14)
e E, sign(e) = e E
1 if e is negative fF

(12)
 3.4.3 Control Triggering One problem of behavior
0 if e E, Ien < Iths coordination that is quite difficult and task dependent
Ren = Ien sign(e) where e = arg max (Ien) is deciding when to change behavior. This is not a
e E
otherwise problem in traditional reinforcement learning where
(13) agents live in grid worlds and state transition is

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 55

Figure 9 Truncated emotion model used to obtain the sensation-dependent


reinforcement.

perfectly determined. In robotics, however, agent states essential for good results. This rule ensured that the
change asynchronously in response to internal and same behavior was kept until the goal of the behavior
external events, and actions take variable amounts of had been achieved or a previously inapplicable behav-
time to execute (Mataric, 1994). As a solution to this ior became applicable.
problem, some researchers extend the duration of the In practice, the duration of behaviors must be long
current action according to some domain-specific enough to allow them to manifest themselves, and short
conditions of goal achievement or applicability of the enough so that they do not become inappropriate (due
action. Others will interrupt the action when there is a to changing circumstances) long before being
change in the input state (Asada, 1996; Rodriguez & interrupted. The ideal would be to know when a signi-
Muller, 1995). Rodriguez and Muller (1995) argue that ficant change has occurred in the environment that
new decisions should only be taken when there is a makes a reevaluation necessary. Using emotions to trig-
change in the input state, on the basis that otherwise the ger state transition seems reasonable, because emotions
choice is uniquely determined by the current state of can provide a global summarized vision of the environ-
knowledge. However, this may not be a very straight- ment. Any important change in the environment is liable
forward solution when the robot is equipped with to be captured by changes in the emotional state.
multiple continuous sensors that are vulnerable to noise. The selection of this asynchronous triggering
In the specific case of the adaptive controller mechanism has the added advantage of helping the
selected, reported experiments involving learning of learning at the level of the neural networks. When
behavior coordination (Lin, 1993) required the intro- learning on-line by backpropagation, the neural net-
duction of several simplifications. To start with, each works have a tendency to be overwhelmed by the large
behavior was associated with predefined conditions of quantity of consecutive similar training data and forget
activation. For example, the behavior of door passing the rare relevant experiences. Detecting and using only
could only be selected if a door was nearby. Further- a few relevant examples for training can help with this
more, the introduction of a persistence rule proved problem.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


56 Adaptive Behavior 9(1)

Feelings

Emotion Dominant emotion Trigger


Event
System Detector

Reinforcement Reinforcement Adaptive


Function Controller

Sensations Behaviour
Perception Behaviour
System System

Figure 10 Emotions triggering control. The emotional controller triggers the


adaptive controller whenever there are significant changes in the dominant emotion.
An alternative nonemotional triggering mechanism that is traditionally used in
reinforcement learning is to trigger the adaptive controller at regular intervals.

Table 3 Different mechanisms used by emotional and nonemotional controllers

Controller Reinforcement Perception Control triggering

Emotional Emotion-dependent Feelings Even-triggered


Nonemotional perception Emotion-dependent Sensations Event-triggered
Nonemotional reinforcement Sensation-dependent Feelings Event-triggred
Nonemotional triggering Emotion-dependent Feelings Interval-triggered
Nonemotional Sensation-dependent Sensations Interval-triggered

To test whether emotions can successfully be used since a state transition was last made, that is,
to trigger state transitions, that is, determine when if the difference between the new value and
behavior selection should be re-evaluated (Figure 10), the mean of the previous values exceeds both
two triggering mechanisms were designed: a small tolerance threshold and times the
standard deviation of those previous values,
Event-triggered This consisted in having state where is a constant set to 2 in the experi-
transitions triggered by the detection of significant ments (details in Gadanho, 1999).
changes in the emotional state. From the robots
point of the view, an event occurs whenever there A maximum limit of 10,000 steps is reached.
is a significant change in emotional state, as
this should reflect a relevant event in the robot Interval-triggered This mechanism was a simple
environment interaction. An event is detected alternative to emotion-dependent event detection
whenever used for comparison. This consists in triggering
the adaptive controller at regular intervals. The
there is a change of dominant emotion, success of the learning agent depends on selecting
including changes between emotional states an adequate interval duration. Experiments
and neutral emotional states (i.e., states with (Gadanho, 1999) showed that it is important to
no dominant emotion); synchronize the duration of behavior execution
with the dynamics of the robotenvironment
the current dominant emotion value is sta- interaction and thus allow compatible time scales
tistically different from the values recorded between them. The interval of 35 steps was

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 57

Table 4 Summary of the comparison between emotional and nonemotional controllers

Controller Emotion Events (%) Energy Collisions (%) Distance

Emotional 0.24 0.01 0.5 0.0 0.63 0.01 0.6 0.3 1.0 0.2
Non-emotional perception 0.22 0.03 0.5 0.0 0.61 0.03 1.2 0.7 1.0 0.2
Non-emotional reinforcement 0.22 0.02 0.5 0.0 0.70 0.02 1.6 1.1 1.4 0.1
Non-emotional triggering 0.21 0.02 2.9 0.0 0.62 0.04 1.7 0.1 0.9 0.0
Non-emotional 0.21 0.02 2.9 0.0 0.64 0.04 1.4 0.1 0.9 0.0

considered the best suited for the task at hand, collisions number of collisions;
given the outcome of pilot experimentation.
events number of times the adaptive controller
was triggered.
3.5 The Experimental Procedure
Five different experiments were carried out to assess The results presented are an average of the different
the competence of the emotional system as an adaptive statistics over the several trials with errors representing
controller: one experiment for the emotional the 95% confidence interval. The last two statistics were
controller; one experiment for each of the emotional presented as a percentage over the total number of steps
mechanisms, which consisted in replacing the mecha- in the evaluation period to have the experimental results
nism with its nonemotional counterpart; and one for a independent of the duration of this period.
non-emotional controller where all the emotional
mechanisms were replaced by the nonemotional 3.6 Results
counterparts. Table 3 presents the differences between
the experiments. A summary of the experimental results obtained for all
Each experiment consisted in having 30 different controllers tested is presented in Table 4. The means of
robot trials of 3 million learning steps. In each trial, a the values and their 95% confidence intervals obtained
new fully recharged robot with all state values reset in the last half of the trials are presented. Other results
was placed at a randomly selected starting position. showed clearly that all controllers learning algorithms
For evaluation purposes, the trial period was divided had converged before the table values were taken.
into 60 smaller periods of 50,000 steps. For each of Experiments showed a small nonsignificant
these periods the following statistics were taken: advantage in performance of the emotional perception,
both in terms of emotion and collisions, which means
emotion mean of the emotion-dependent rein- that the robot does well with a biased view of reality
forcement value, which is a measure of how posi- but does not demonstrate that emotions can be useful
tive the robots emotional state is and therefore a in this domain.
good measure of its overall performance; There was also no significant difference between
the performance obtained with the two different rein-
energy mean energy level of the robot; forcements. Again the mean values of emotion and
collisions are slightly improved, but results do not
distance mean value of the Euclidean distance d, clearly show that emotion reinforcement is better. The
taken at 100-step intervals, between the opposing results do show that the distance and energy values are
points of the rectangular extent that contains all the higher for the sensation-dependent reinforcement.
points the robot visited during the last interval. Nevertheless, this is not considered an improvement in
performance over the emotional reinforcement since
1 the robot is only required to maintain these values at a
d= (x xmin)2 + (ymax ymin)2
100 max high level and not to maximize them.
Experiments showed that the event-triggered
It is a measure of how much distance was covered mechanism required many fewer control iterations to
by the robot. carry out the task successfully. Furthermore, the

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


58 Adaptive Behavior 9(1)

0.4 3

Number of events (%)


0.2
2

Emotion
0
1
-0.2

-0.4 0
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
Number of steps x 10
6 Number of steps x 10
6
1 10

Collisions (%)
Energy

0.5 5

0 0
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
Number of steps x 10
6 Number of steps x 10
6
2

1.5
Distance

1
Emotional
Non-emotional
0.5

0
0 0.5 1 1.5 2 2.5
Number of steps x 10
6

Figure 11 Comparing the emotional with the non-emotional controller. The non-
emotional controller does not use emotions in the agent control, but emotions are
calculated while the robot is running as a measure of its global performance. Other
measures of comparison included the number of times the controller was triggered,
its average energy value, the number of collisions with obstacles, and the average
distance it covered. The error bars represent the 95% confidence interval.

controller with nonemotional triggering performed might be a beneficial factor. Although the emotional
worse in terms of obstacle avoidance. controller suffers from intrinsic delays with respect to
The results in Figure 11 compare the learning the robotenvironment interaction due to the emo-
performance of the emotional controller and that of the tions persistence, its perception, reinforcement, and
nonemotional controller. Both controllers are success- triggering mechanism are in perfect synchronism.
ful in learning the task, managing to keep a high level
of energy while minimizing the number of bumps. The 3.7 Discussion
most important differences between the emotional and
Experiments showed that emotions can be used as an
nonemotional controllers are the number of events
attention mechanism at different levels of a reinforce-
and the number of collisions. Results indicate that the
ment learning task:
main factor responsible for the advantage of the
emotional controller over the nonemotional controller making more evident the particular aspects of the
is the control triggering mechanism. environment that are relevant for the current
The emotional controller also has a slight advan- emotional state;
tage in terms of obstacle avoidance when compared
with all the other controllers, suggesting that the tem- providing a straightforward reinforcement function
poral synchrony between the different mechanisms that attributes value to the different environmental

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 59

situations and therefore specifies their relative emotion-dependent reinforcement. If the robot has to
importance; select and evaluate a primitive action at each time step
the emotions persistence in time becomes a severe
determining the occurrence of the significant hindrance for their successful use as reinforcement.
changes in the environment that ought to trigger Moreover, the difference between behavior-based and
state transition, by looking at sudden changes in action-based control is not only restricted to temporal
the emotional system state. duration. The behaviors themselves are by nature
distinct from simple actions. They are not defined by
These were three different mechanisms that worked constant motor values, but rather by a simple reactive
well experimentally. Each one of them had different goal that determines the motor values at each step as
levels of performance when compared with alternative a function of the agents current perception. This allows
methods. them enough versatility to run for longer durations of
No significant differences were found in using time, which is their main advantage for use with emo-
emotion-dependent perception, that is, making the tions. The persistence of emotions over time in natural
emotionally relevant aspects of the environment more systems also suggests that they should be related to a
salient. This result might be task dependent but is higher level of decision making that does not rely on
certainly controller dependent because the adaptive simple primitive actions but on complex action patterns
controller used can easily ignore the differences in more suitably expressed at a behavioral level.
magnitude of the input values introduced by emotions The emotion-dependent reinforcement has the char-
by compensating for them with changes in the neural acteristic of only depending on one emotion at a time, if
networks weights. A proper assessment of this emo- any. The reinforcement information that might be pro-
tion role would benefit from the employment of a con- vided by emotions other than the dominant emotion is
troller equipped with proper mechanisms of attention ignored. For example, if the robot is sad and bumps into
for input processing, that is, a controller where differ- an obstacle then fear will overcome sadness and only
ent weights could be given to the analysis of the fear will be taken into consideration for reinforcement.
different inputs. This means that reinforcement information will mostly
The emotion-dependent reinforcement has been ignore the hunger feeling and will be dominated by the
tested previously in a similar controller that chooses pain feeling. To test whether this is an advantage for the
between primitive actions instead of behaviors adaptive controller or not, experiments were made that
(Gadanho & Hallam, 1998). The empirical results used the reinforcement values provided by all the
obtained for this action-based controller showed that emotions in the calculation of the reinforcement func-
emotion-dependent reinforcement was inadequate and tion at each iteration. The poor performance of this
performed much worse than the more traditional reinforcement function (Gadanho, 1999) supports the
sensation-dependent reinforcement. The reason found view that the nonlinearities of the emotion-dependent
for this result was that the sensation-dependent rein- reinforcement have an active role in its success.
forcement has the advantage of providing a reinforce- The emotion-dependent event detector allowed
ment value that reflects the situation as it stands at the drastic cuts in the frequency of triggering of the adap-
moment and not the mixed evaluation of present and tive controller while maintaining overall performance.
recent past situations that the emotion-dependent This can be particularly advantageous in the case of
reinforcement provides. However, experiments made very time consuming learning controllers, where each
with the behavior-based controller showed no differ- triggering of the controller can result in a significant
ence between the performances obtained with the two loss of precious real time. An analysis of the distribu-
different reinforcements. tion of the emotional states of the emotional controller
This led to the conclusion that the previous results in general and in the particular situations where events
(Gadanho & Hallam, 1998) obtained were controller were detected (Gadanho, 1999) showed that the robot
dependent and that emotions are more appropriate for control is triggered more often in adverse situations.
behavior-level control than action-based control. This has adaptive value, because it focuses the agents
Behavior-based control provides more appropriate attention on the need to change behavior when the
time scales than action-based control for the use of current behavior becomes inappropriate. Furthermore,

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


60 Adaptive Behavior 9(1)

behaviors that have some immediate goal, such as Furthermore, the use of emotions provided a new
avoid obstacles and seek light, tend to have shorter perspective on these different task-dependent compo-
durations than the wall-following behavior, because nents of a reinforcement-learning framework. This
they are terminated by the event-triggering mechanism resulted in the introduction of innovative mechanisms
the moment their goal is reached. that were tested in the robot experiments. The most
Furthermore, a more comprehensive comparison important innovations are in terms of the reinforce-
between the two (Gadanho & Hallam, 2001) reveals ment function and the specification of state transition:
that an event-triggered controller is also a better
learner than the interval-triggered controller: It has the a multidimensional reinforcement function that
dual advantage of being a more time-efficient learner takes into consideration the different problems
and being able to master more difficult tasks. faced by the robot with variable degrees of atten-
Adding the right amount of behavioral persistence tion dependent on the robots current priorities;
is a difficult problem that has been found before in the
design of artificial creatures (Blumberg, 1996; Lin, a simplified definition of state transition based on
1993; Mahadevan & Connell, 1992; Mataric, 1994). In detection of significant events captured by varia-
fact, when using a learning architecture very similar to tion in the reinforcement function value.
the one used by the nonemotional controller presented
here, Lin (1992) had to recur to severe simplifications of The emotional system selects between different
the behavior-coordination learning task. These simplifi- reinforcement functions according to the context of
cations consisted in having behaviors associated with the world, that is, it might choose to ignore other prob-
predefined conditions of activation and only interrupt- lems that exist when faced with a more important one.
ing a behavior once it had reached its goal or an inap- For example, the reinforcement function might not
plicable behavior had become applicable. The emotion punish the robot for its collision with an obstacle and
controller proposed here offers a less task dependent instead reinforce it for successfully extracting energy
solution to solve the behavior-persistence problem. from a light. The attribution by the reinforcement
function of variable degrees of attention to each of the
different problems might be taken as a source of con-
fusion for the learning process. However, experiments
4 Conclusion
showed that instead of confusing the learning process,
this was actually advantageous and that the learning
The emotional controller proved to be more successful
algorithm was exploiting the nonlinearities of the rein-
than an equivalent nonemotional controller when
forcement function.
the emotional mechanisms were replaced by their
The presented event-detection mechanism also
nonemotional counterparts. The triggering mechanism
profits from the novel structure of the reinforcement
was strongly responsible for the difference in perfor-
function. Apart from providing an absolute reinforce-
mance between the two, but the joint use of all three
ment value that varies with the robots situation, the
mechanisms also seems to provide an advantage.
developed reinforcement function based on emotion
The use of emotions as an abstraction has the
also differentiates and prioritizes the different problems
advantage of allowing different components of rein-
faced by the robot. This added information allows the
forcement learning to be brought together under the
detection of events when there is a difference in type of
same construct. This was found helpful for two reasons.
dominant problem and not just in problem degree.
The synchrony and coherence between the differ- The work reported here was a first study in the
ent components achieved by this unified solution interaction of emotions and control in a robot agent that
represented a slight enhancement of the agents could benefit from several extentions. The emotion
performance. model could be refined to have more complex emotions
and the capability to learn new emotion associations so
The design of the different components was that the learning agent could be able to master more dif-
simplified to the design of a single construct, the ficult tasks. Furthermore, the interaction between emo-
robots emotions. tions and control could be extended to capture other

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 61

functionalities found in natural emotions like influences Bates, J., Loyall, A. B., & Reilly, W. S. (1992b). Integrating
on memory, reasoning, behavioral tendencies, and reactivity, goals, and emotion in a broad agent (Tech. Rep.
modulation of behavioral responses. The proper evalua- No. CMU-CS-92-142). Pittsburgh, PA: Carnegie Mellon
tion of these influences and a better understanding of University, School of Computer Science.
Beaudoin, L., & Sloman, A. (1993). A study of motive process-
the influences already studied would probably benefit
ing and attention. In A. Sloman, D. Hogg, G. Humphreys,
from a more elaborate control architecture.
A. Ramsay, & D. Partridge (Eds.), Proceedings of AISB93
(pp. 229238). Oxford: IOS Press.
Bechara, A., Damasio, H., Tranel, D., & Damsio, A. R. (1997).
Note Deciding advantageously before knowing the advanta-
geous stragegy. Science, 275, 12931295.
1 Although this dependency is used in the experiments, aver- Beck, R. C. (1983). Motivation: Theories and principles (2nd
sive stimulation such as pain is more usually connected to ed.). Engelwood Cliffs, NJ: Prentice-Hall.
anger in humans (e.g., Izard, 1993). Blaney, P. (1986). Affect and memory. Psychological Bulletin,
99, 229246.
Blumberg, B. (1996). Old Tricks, new dogs: Ethology and inter-
Acknowledgments active creatures. Unpublished doctoral dissertation
Massachusetts Institute of Technology.
The first author was supported by a scholarship from the Bower, G. H. (1981). Mood and memory. American Psycholo-
Portuguese program PRAXIS XXI for doing the work reported. gist, 36, 129148.
We thank Gillian Hayes, Rosalind Picard, and the reviewers for Bower, G. H., & Cohen, P. R. (1982). Emotional influences in
helpful comments on earlier versions of this material. Facilities memory and thinking: Data and theory. In M. S. Clark &
for this work were provided by the University of Edinburgh. S. T. Fisk (Eds.), Affect and cognitionThe Seventeenth
Annual Carnegie Symposium on Cognition (pp. 291331).
Hillsdale, NJ: Erlbaum.
Bozinovski, S. (1982). A self-learning system using secondary
References reinforcement. In R. Trappl (Ed.), Cybernetics and systems
(pp. 397402). North-Holland: Elsevier.
Albus, J. S. (1990). The role of world modeling and value judg- Breazeal, C. (1998). Early experiments using motivations to
ment in perception. In A. Meystel, J. Herath, & S. Gray regulate human-robot interaction. In D. Caamero (Ed.),
(Eds.), Proceedings of the 5th IEEE International AAAI Fall Symposium on Emotional and Intelligent: The
Symposium on Intelligent Control. Los Alamitos, CA: tangled knot of cognition (Tech. Rep. No. FS-98-03,
IEEE Computer Society Press. pp. 3136). Menlo Park, CA: AAAI Press.
Araujo, A. F. R. (1994). Memory, emotions & neural networks. Caamero, D. (1997). Modeling motivations and emotions as a
Unpublished doctoral dissertation, Sussex University. basis for intelligent behavior. In D. Caamero (Ed.),
Asada, M. (1996). An agent and an environment: A view on Proceedings of the First International Symposium on
body scheme. In J. Tani & M. Asada (Eds.), Proceedings Autonomous Agents, AA97. Menlo Park, CA: ACM
of the 1996 IROS Workshop on Towards Real Autonomy Press.
(pp. 1924). Osaka: Senri Life Science Center. Caamero, D. (1998). Issues in the design of emotional agents.
Aub, M. (1998). A commitment theory of emotions. In In D. Caamero (Ed.), AAAI Fall Symposium on
D. Caamero (Ed.), AAAI Fall Symposium on Emotional Emotional and Intelligent: The tangled knot of cognition
and Intelligent: The tangled knot of cognition (Tech. Rep. (Tech. Rep. No. FS-98-03, pp. 4954). Menlo Park,
No. FS-98-03, pp. 1318). Menlo Park, CA: AAAI Press. CA: AAAI Press.
Bates, J. (1994). The role of emotions in believable agents Charland, L. C. (1995). Emotion as a natural kind: Towards a
(Tech. Rep. No. CMU-CS-94-136). Pittsburgh, PA: computational foundation for emotion theory. Philo-
Carnegie Mellon University, School of Computer Science. sophical Psychology, 8, 5984.
Bates, J., Loyall, A. B., & Reilly, W. S. (1992a). An architecture Cytowic, R. E. (1993). The man who tasted shapes. London:
for action, emotion, and social behavior. In Artificial Abacus.
social systems: Fourth European Workshop on Modeling Damsio, A. R. (1994). Descartes errorEmotion, reason and
Autonomous Agents in a Multi-Agent World. Berlin: human brain. London: Picador.
Springer. (Also available as Tech. Rep. No. CMU-CS-92- Darwin, C. (1965). The expression of the emotions in man and
144. Pittsburgh, PA: Carnegie Mellon University, School animals. Chicago: University of Chicago Press. (Original
of Computer Science). work published in 1872).

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


62 Adaptive Behavior 9(1)

Derryberry, D., & Tucker, D. M. (1994). Motivating the focus Kitano, H. (1995). A model for hormonal modulation of
of attention. In P. M. Niedenthal, & S. Kitayama (Eds.), learning. In C. Mellish (Ed.), IJCAI-95Proceedings of
The hearts eye (pp. 167196). New York: Academic Press. the Fourteenth International Joint Conference on Artificial
De Sousa, R. (1987). The rationality of emotion. Cambridge, Intelligence (Vol. 1, pp. 532538). San Francisco: Morgan
MA: MIT Press. Kaufmann.
Dreyfus, H. L. (1992). What computers still cant do: A critique Klein, J. T. (1996). Computer response to frustration. Masters
of artificial reason. Cambridge, MA: MIT Press. thesis, Massachusetts Institute of Technology, School of
Dyer, M. G. (1987). Emotions and their computations: Three Architecture and Planning.
computer models. Cognition and Emotion, 1, 323347. Lazarus, R. S. (1982). Thoughts on the relations between
Ekman, P. (1992). An argument for basic emotions. Cognition emotion and cognition. American Psychologist, 37,
and Emotion, 6, 169200. 10191024.
Foliot, G., & Michel, O. (1998). Learning object significance LeDoux, J. E. (1998). The emotional brain. London: Phoenix.
with an emotion based process. In Workshop W5: Lin, L.-J. (1992). Self-improving reactive agents based on rein-
Grounding Emotions in Adaptive Systems (pp. 2530). forcement learning planning and teaching. Machine
Workshop conducted at SAB98: Fifth International Learning, 8, 293321.
Conference on Simulation of Adaptive Behavior. Lin, L.-J. (1993). Reinforcement learning for robots using
Frijda, N. H., & Swagerman, J. (1987). Can computers feel? neural networks. Unpublished doctoral dissertation,
Theory and design of an emotional system. Cognition and Carnegie Mellon University. (Tech. Rep. No. CMU-CS-
Emotion, 1(3), 235257. 93103).
Gadanho, S. C. (1999). Reinforcement learning in autonomous Mahadevan, S., & Connell, J. (1992). Automatic programming
robots: An empirical investigation of the role of emotions. of behavior-based robots using reinforcement learning.
Unpublished doctoral dissertation, University of Edinburgh. Artificial Intelligence, 55, 311365.
Gadanho, S. C., & Hallam, J. (1998). Emotion-driven learn- Mataric, M. J. (1994). Reward functions for accelerated learn-
ing for animat control. In R. Pfeifer, B. Blumberg, ing. In W. W. Cohen, & H. Hirsh (Eds.), Machine learning:
J. A. Meyer, & S. W. Wilson (Eds.), From animals to Proceedings of the Eleventh International Conference
animals 5Proceedings of the Fifth International Con- (pp. 181189). San Francisco, CA: Morgan Kaufmann.
ference on Simulation of Adaptive Behavior (pp. 354359). McCauley, L., & Franklin, S. (1998). An architecture for emo-
Cambridge, MA: MIT Press. tion. In D. Caamero (Ed.), AAAI Fall Symposium on
Gadanho, S. C., & Hallam, J. (2001). Emotion-triggered learn- Emotional and Intelligent: The tangled knot of cognition
ing in autonomous robot control. In D. Caamero, (Tech. Rep. No. FS-98-03, pp. 122127). Menlo Park, CA:
C. Numaoka, & P. Petta (Eds.), Grounding emotions in AAAI Press.
adaptive systemsA special issue of Cybernetics and Michaud, F., Lachiver, G., & Dinh, C. T. L. (1996). A new
Systems, 32(5), 531559. control architecture combining reactivity, planning,
Goleman, D. (1995). Emotional intelligence. London: deliberation and motivation for a situated autonomous
Bloomsbury Publishing. agent. In P. Maes, M. J. Mataric, J.-A. Meyer, J. Pollack, &
Grossberg, S., & Gutowski, W. (1987). Neural dynamics of S. W. Wilson (Eds.), From animals to animals
decision making under risk: Affective balance and cognitive- 4Proceedings of the Fourth International Conference on
emotional interactions. Psychological Review, 94, 300318. Simulation of Adaptive Behavior (pp. 245254).
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Cambridge, MA: MIT Press.
theory of neutral computation. Reading, MA: Addison- Michel, O. (1996). Khepera Simulator package version 2.0:
Wesley. Freeware mobile robot simulator written at the University
Izard, C. E. (1993). Four systems for emotion activation: of Nice SophiaAntipolis. Downloadable from the World
Cognitive and noncognitive processes. Psychological Wide Web at http://wwwi3s.unice.fr/ ~om/khep-sim.html.
Review, 100, 6890. Minsky, M. (1986). The society of mind. New York: Simon and
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Schuster.
Reinforcement learning: A survey. Journal of Artificial Morignot, P., & Hayes-Roth, B. (1995). Why does an
Intelligence Research, 4, 237285. agent act? In M. Cox & M. Freed (Eds.), AAAI Spring
Kaiser, S., & Wehrle, T. (1996). Situated emotional problem Symposium on Representing Mental States and Mechanisms
solving in interactive computergames. In N. H. Frijda (pp. 97101). Menlo Park, CA: AAAI Press. (Also available
(Ed.), Proceedings of the 8th Conference of the as report KSL 9576 of Stanford University).
International Society for Research on Emotions, ISRE96 Mowrer, O. H. (1960). Learning theory and behavior.
(pp. 276280). Storrs, CT: ISRE Publications. New York: Wiley.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


Gadanho & Hallam Robot Learning Driven by Emotions 63

Niedenthal, P. M., Setterlund, M. B., & Jones, D. E. (1994). Sloman, A., & Croucher, M. (1981). Why robots will have
Emotional organization of perceptual memory. In emotions. In IJCAI81Proceedings of the Seventh
P. M. Niedenthal & S. Kitayama (Eds.), The hearts eye International Joint Conference on Artificial Intelligence
(pp. 87113). New York: Academic Press. (pp. 23692371).
Ortony, A., Clore, G. L., & Collins, A. (1988). The cognitive Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning.
structure of emotions. Cambridge: Cambridge University Cambridge, MA: MIT Press.
Press. Toda, M. (1993). The urge theory of emotion and cogni-
Panksepp, J. (1995). The emotional brain and biological tion: Chapter 1 Emotions and urges (SCCS Tech. Rep.
psychiatry. Advances in Biological Psychiatry, 1, 263288. No. 93-1-01). Chuyko University.
Pfeifer, R. (1982). Cognition and emotion: An informa- Toda, M. (1994). Emotion, society and the versatile architecture
tion process approach (CIP Working Paper 436). (SCCS Tech. Rep. No. 94-1-02). Chuyko University.
Pittsburgh, PA: Carnegie-Mellon University, Department Tomkins, S. S. (1984). Affect theory. In K. R. Scherer, &
of Psychology. P. Ekman (Eds.), Approaches to emotion. London:
Pfeifer, R., & Nicholas, D. W. (1985). Toward computational Erlbaum.
models of emotion. In L. Steels & J. A. Campbell Velsquez, J. D. (1998). A computational framework for
(Eds.), Progress in artificial intelligence (pp. 184192). emotion-based control. In Workshop W5: Grounding
Chichester, UK: Ellis Horwood. Emotions in Adaptive Systems (pp. 6267). Workshop
Picard, R. (1997). Affective computing. Cambridge, MA: MIT conducted at SAB98: Fifth International Conference on
Press. Simulation of Adaptive Behavior.
Power, M., & Dalgleish, T. (1997). Cognition and emotion. Ventura, R., Custdio, L., & Pinto-Ferreira, C. (1998).
Philadelphia: Psychology Press. Emotionsthe missing link? In D. Caamero (Ed.), AAAI
Rodriguez, M., & Muller, J.-P. (1995). Towards autonomous Fall Symposium on Emotional and Intelligent: The tangled
cognitive animats. In F. Morn, A. Moreno, J. Merelo, & knot of cognition (Tech. Rep. No. FS-98-03, pp 170175).
P. Chacn, (Eds.), Advances in artificial life Menlo Park, CA: AAAI Press.
Proceedings of the Third European Conference on Watkins, C. (1989). Learning from delayed rewards. Unpublished
Artificial Life. Lecture Notes in Artificial Intelligence doctoral dissertation, Kings College, Cambridge.
Vol. 929. Berlin: Springer-verlag. Wehrle, T. (1998). Motivations behind modeling emotions
Seif EI-Nasr, M., Ioerger, T. R., & Yen, J. (1998). Learning and agents: Whose emotions does your robot have? In
emotional intelligence in agents. In D. Caamero (Ed.), Workshop W5: Grounding Emotions in Adaptive Systems
AAAI Fall Symposium on Emotional and Intelligent: The (pp. 7176). Workshop conducted at SAB98: Fifth
tangled knot of cognition (Tech. Rep. No. FS-98-03, International Conference on Simulation of Adaptive
pp. 150155). Menlo Park, CA: AAAI Press. Behavior.
Shaver, P., Schwartz, J., Kirson, D., & OConnor, C. (1987). Wilson, S. W. (1991). The animat path to AI. In J.-A. Meyer &
Emotion knowledge: Further exploration of a prototype S. W. Wilson (Eds.), From animals to animats
approach. Journal of Personality and Social Psychology, Proceedings of the First Conference on Simulation of
52, 10611086. Adaptive Behavior (pp. 1521). Cambridge, MA: MIT
Shibata, T., Ohkawa, K., & Tanie, K. (1996). Spontaneous Press.
behavior of robots for cooperationEmotionally intelli- Wright, I. (1996). Reinforcement learning and animat emo-
gent robot system. In Proceedings of IEEE International tions. In P. Maes, M. J. Mataric, J.-A. Meyer, J. Pollack, &
Conference on Robotics and Automation (pp. 24262431). S. W. Wilson (Eds.), From animals to animats 4
Tokyo, Japan. Proceedings of the Fourth International Conference on
Simon, H. A. (1967). Motivational and emotional controls of Simulation of Adaptive Behavior (pp. 273281).
cognition. Psychological Review, 74, 2939. Cambridge, MA: MIT Press.
Sloman, A., Beaudoin, L., & Wright, I. (1994). Computational Zajonc, R. B. (1984). On primacy of affect. In K. R. Scherer &
modeling of motive-management processes. In N. Frijda P. Ekman, (Eds.), Approaches to emotion. London:
(Ed.), Proceedings of the Conference of the International Erlbaum.
Society for Research in Emotions (pp. 344348).
Cambridge, UK: ISRE Publications.

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015


64 Adaptive Behavior 9(1)

About the Authors

Sandra Clara Gadanho received a B.Sc. in informatics engineering in 1994 from the
Universidade Nova de Lisboa, Portugal, and a Ph.D. in artificial intelligence in 1999 from
the University of Edinburgh. She is currently a post-doctoral fellow at the Institute of
Systems and Robotics in Lisbon. The focus of her research is the use of emotions as a
way to increase the adaptiveness and autonomy of artificial agents in their interaction
with unstructured and dynamic environments. Other research interests include
reinforcement learning, neural networks, and biologically inspired robotics.

John Hallam graduated with first class honors in mathematics from the University of
Oxford in 1979, completed a Ph.D. in the department of artificial intelligence at the
University of Edinburgh in 1984, and joined the teaching faculty in that department in
1985. He is the co-director of the Mobile Robotics Research Group, having been active
in mobile robotics research for almost 20 years. The current focus of his catholic
research interest in robotics is in biological modeling using robotic techniques and evo-
lutionary robotics. He is the current president of the International Society for Adaptive
Behavior, a member of the IEE, of AISB, and of the London Mathematical Society.
Address: Department of Artificial Intelligence, University of Edinburgh, 5, Forrest Hill,
Edinburgh EH1 2QL, Scotland. E-mail: [email protected]

Downloaded from adb.sagepub.com at CMU Libraries - library.cmich.edu on December 19, 2015

You might also like