Asada 94 B

The document presents a method for a mobile robot to learn purposive behavior for shooting a ball into a goal using vision-based reinforcement learning. The robot utilizes Q-learning to acquire this behavior without prior knowledge of the environment's 3-D parameters, relying solely on visual information from a camera. The study includes both computer simulations and real robot experiments to demonstrate the effectiveness of the proposed approach.

Uploaded by

Luiz Celiberto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views7 pages

Asada 94 B

Uploaded by

Luiz Celiberto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Vision-Based Behavior Acquisition For A Shooting Robot By Using

A Reinforcement Learning

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda

Dept. of Mech. Eng. for Computer-Controlled Machinery
Osaka University, 2-1, Yamadaoka, Suita, Osaka 565, Japan
[email protected]

Abstract tonomous agent.

Since Brooks [1, 2] proposed the behavior-based ap-
We propose a method which acquires a purposive proach, his group invented several kinds of behavior-
behavior for a mobile robot to shoot a ball into the based robots, such as [3] and [4]. Although these
goal by using a vision-based reinforcement learning. A robots can take reflexive actions against the environ-
mobile robot (an agent) does not need to know any ment, we still lack a capability of generating purposive
parameters of the 3-D environment or its kinematic- behaviors each of which consists of a sequence of ac-
s/dynamics. Information about the changes of the en- tions to achieve the goal. We are studying the feasibil-
vironment is only the image captured from a single TV ity of providing this capability with our robot by using
camera mounted on the robot. An action-value func- vision. Arkin [5] proposed a hybrid approach with re-
tion in terms of state is to be learned. Image positions active and deliberative methods for navigation task.
of a ball and a goal are used as a state variable which He encoded a priori world knowledge into the motor
shows the effect of an action previously taken. After schemas in order to generate the purposive behaviors.
the learning process, the robot tries to carry a ball n- Here, we intend to solve this problem with less world
ear the goal and to shoot it. Both computer simulation knowledge and expect our robot to learn a behavior
and real robot experiments are shown, and discussion though interactions with dynamic environment.
on the role of vision in the context of the vision-based
reinforcement learning is given.

1 Introduction
Due to its globally perceptive capability, vision State, s Action,a
seems indispensable for autonomous agent(s) to ac- Environment
quire reactive and purposive behaviors that can be
obtained through interactions with its environment.
The existing deliberative and incremental approach- Reward,r
es in computer vision, however, do not seem to have
made the great advances in this context because these
methods often need huge amount of computation time
which is fatal in real time execution of robot tasks,
and they offer general descriptions of the scene which
might need more time to be transformed into the spec- Agent
ified descriptions needed to accomplish tasks at hand.
Rather, these general descriptions are hard to be prop-
erly evaluated unless the task or purpose of an agen- Figure 1: The basic model of robot-environment
t is specified. From this viewpoint, purposive, task- interaction
oriented, or so-called behavior-based approach seems Reinforcement learning has recently been receiving
promising to evaluate the role of vision (when, where, increased attention as a method for robot learning
and what kind of information is necessary and how with little or no a priori knowledge and higher ca-
accurate they should be) and finally to realize au- pability of reactive and adaptive behaviors [6]. Fig.1
shows the basic model of robot-environment interac- kinematics and dynamics of the robot itself. All the
tion, where a robot and environment are modeled by information the robot can capture is the image posi-
two synchronized finite state automatons interacting tions of the ball and the goal from which we can infer
in a discrete time cyclical processes. The robot senses changes in the world caused by the robot’s actions.
the current state of the environment and selects an ac- The remainder of this article is structured as fol-
tion. Based on the state and action, the environment lows: In the next section, we give a brief overview of
makes a transition to a new state and generates a re- Q-learning. We then explain the task and the learning
ward that is passed back to the robot. Through these scheme in our method. Next, we show the experiments
interactions, the robot learns a purposive behavior to with computer simulations and a real robot system.
achieve a given goal. Finally, we give concluding discussions.
Although the role of reinforcement learning is very
important to realize autonomous systems, the promi- 2 Q-learning
nence of that role is largely determined by the extent Before getting into the details of our system, we
to which it can be scaled to larger and complex robot briefly review the basics of Q-learning. For more
learning tasks. Many theoretical works have argued on through treatment, see [12]. We follow the explana-
the convergence time of the learning, and how to speed tion of Q-learning by Kaelbling [13].
up it by using some heuristics and to extend these We assume that the robot can discriminate the set
techniques from a single goal task to multiple ones [7]. S of distinct world states, and can take the set A
However, almost of them have only shown computer of actions on the world. The world is modeled as a
simulations, and only a few real robot applications are Markov process, making stochastic transitions based
reported which are simple and less dynamic [8, 9]. Es- on its current state and the action taken by the robot.
pecially, the use of vision in the reinforcement learning Let T (s, a, s0 ) be the probability that the world will
is very rare. transit to the next state s0 from the current state-
action pair (s, a). For each state-action pair (s, a),
To the best of our knowledge, only Whitehead and
the reward r(s, a) is defined.
Ballard [10] argued this problem. Their task is a sim-
The general reinforcement learning problem is typ-
ple manipulation of blocks on the conveyer belt. Al-
ically stated as finding a policy that maximizes dis-
though each block is colored to be easily discriminated,
counted sum of the reward received over time. A pol-
they still have a large size of state space. To cope with
icy f is mapping from S to A. This sum is called the
this problem, they assumed that observer could con-
return and is defined as:
trol its gaze to attended object so as to reduce the size
X∞
of the state space. However, this causes so-called “per-
γ n rt+n , (1)
ceptual aliasing” problem. That is, both the observer
n=0
motion and actual changes happened in the environ-
ment cause the changes inside the image captured by where rt is the reward received at step t given that
the observer. Therefore, it seems difficult to discrim- the agent started in state s and executed policy f . γ
inate the both from only the image. Then, they pro- is the discounting factor, it controls to what degree
posed a method to cope with this problem by adopting rewards in the distant future affect the total value of
the internal states and separating action command- a policy and is just slightly less than 1.
s into “Action frame” and “Attention frame” com- Given definitions of the transition probabilities and
mands. Thus, they encoded encoded a priori world the reward distribution, we can solve the optimal poli-
knowledge into state and action spaces. cy, using methods from dynamic programming [14]. A
more interesting case occurs when we wish to simulta-
In order to make the role of the reinforcement learn-
neously learn the dynamics of the world and construct
ing evident in realizing autonomous agents, we need
the policy. Watkin’s Q-learning algorithm gives us an
more applications in more dynamic and complex en-
elegant method for doing this.
vironment. In this paper, we propose a method which
Let Q∗ (s, a) be the expected return or action-value
acquires a purposive behavior for a mobile robot to
function for taking action a in a situation s and con-
shoot a ball into the goal by using a vision-based rein-
tinuing thereafter with the optimal policy. It can be
forcement learning. We apply Q-learning method [11],
recursively defined as:
one of the widely used reinforcement learning schemes X
to our problem. The robot is expected to learn a Q∗ (s, a) = r(s, a) + γ T (s, a, s0 ) max Q∗ (s0 , a0 ).
0
a ∈A
shooting behavior without world knowledge such as s0 ∈S
3-D locations and sizes of the goal and the ball or the (2)
Because we do not know T and r initially, we construct
incremental estimates of the Q values on line. Starting
with Q(s, a) at any value, usually 0, every time an
action is taken update the Q value as follows:

Q(s, a) ⇐ (1 − α)Q(s, a) + α(r(s, a) + γ max

0
Q(s0 , a0 )).
a ∈A
(3)
where r is the actual reward value received for taking
action a in a situation s, s0 is the next state, and α is
a leaning rate (between 0 and 1). The following is a
simple version of the 1-step Q-learning algorithm we
used here.
Initialization: Q ← a set of initial values for the
action-value function (e.g., all zeros). Figure 3: A picture of the radio-controlled vehi-
Repeat forever: cle.
1. s ← the current state
2. Select an action a that is usually consistent with the robot, we might be able to develop several meth-
the policy f but occasionally an alternate. ods to control it to shoot a ball into a goal. This is
3. Execute action a, and let s0 and r be the next not our intention. We intend to start with the visual
state and the reward received, respectively. information only, that is, the image positions of the
4. Update Q(s, a): ball and the goal. That is all the robot captures from
the environment. In order for the robot to take an
Q(s, a) ← (1 − α)Q(s, a) + α(r + γ max
0
Q(s0 , a0 )). action against the environment, it has several motion
a ∈A
5. Update the policy f : (4) commands such as forward and turn left (See Fig.2).
Note that the robot does not even know any physical
f (s) ← a such that Q(s, a) = max Q(s, b) meanings for these motion commands. The effects of
b∈A
an action against the environment can be informed to
(5)
the robot only through the visual information. To en-
3 Task and Assumptions able to do that, the robot has to track the ball and/or
the goal inside image continuously. Fig.3 shows a
mobile robot, a ball, and a goal we used in the real
experiments.

4 Construction of State and Action

Sets
In order to apply Q-learning scheme to the task, we
define a number of sets and parameters. Many exist-
ing applications of the reinforcement learning schemes
have constructed the state and action spaces in such
Figure 2: The task is to shoot a ball into the
a way that each action causes a state transition (e.g.
goal.
one action is forward, backward, left, or right, and
The task for a mobile robot is to shoot a ball in- states are encoded by the locations of the agent) in
to the goal as shown in Fig.2. The problem we are order to make the quantization problem (the struc-
attacking here is to develop a method which automat- tural credit assignment problem) easy. This makes a
ically acquires strategies how to do this. As a first gap between the computer simulations and real robot
step, we simplify the environment in such a way that systems. Each space should reflect the correspond-
the environment consists of only a ball the robot can ing physical space in which a state (an action) can be
kick and a goal fixed on the ground. perceived (taken). However, such construction of state
If we know the exact three-dimensional parameters and action spaces sometimes causes a “state-action de-
of the environment and kinematics and dynamics of viation” problem. In the followings, we describe how
to construct the state and action spaces, and then how state transitions is vary large, and therefore the learn-
to cope with the state-action deviation problem. ing does not converged correctly. Then, we reconstruct
the action space as follows. Each action defined in the
4.1 Construction of Each Space above is called an action primitive. The robot con-
(a) a state set S tinues to take one action primitive until the current
state changes. This sequence of the action primitive
Only the information the robot can obtain about the
is called an action. The number of action primitives
environment is the image supposed to be capturing
needed for state changes has no meanings. Once the
the ball and/or the goal. The ball image is quan-
state has changed, we apply the update equation (3)
tized into 9 sub-states, combinations of three posi-
of the action value function.
tions (left, center, and right) and three sizes (large
(near), medium, and small (far)). The goal image has (c) a reward and a discounting factor γ
27 sub-states, combinations of three parameters each We assign a reward value 1 when the ball was entered
of which is quantized to three levels. Each sub-state into the goal or 0 otherwise. This makes the learning
corresponds to one posture of the robot toward the very time-consuming. Although adopting a reward
goal, that is, position and orientation of the robot in function in terms of distance to the goal state makes
the field. In addition to these 243 (27 × 9) states, we the learning time much shorter, it seems difficult to
add other states such as these cases in which only the avoid the local maxima of the action-value function
ball or the goal is captured in the image. Q.
After some simulations, we realized that as long A discounting factor γ is used to control to what
as the robot captures the ball and the goal positions degree rewards in the distant future affect the total
in the image, it succeeds in shooting a ball. Howev- value of a policy. In our case, we set the value at
er, once it lost the ball, it randomly moves because slightly less than 1 (γ = 0.8).
it does not know to which direction it should move
to find the ball. This happens because the ball-lost s- 5 Experiments
tate is just one, therefore it cannot determine in which
The experiment consists of two parts: first, learning
direction the ball is lost. Then, we separate the ball-
the optimal policy f through the computer simulation,
lost state into two states; the ball-lost-into-right and
then apply the learned policy to a real situation. The
the ball-lost-into-left states. Also, we set up goal-lost-
merit of the computer simulation is not only to check
into-right and goal-lost-into-left states. This made the
the validity of the algorithm but also to save the run-
robot behavior much better. As a result, we totally
ning cost of the real robot during the learning process.
have 319 states in the set S.
Still real experiments are necessary because the com-
puter simulation cannot completely simulate the real
(b) an action set A
world [15].
The robot can select an action to be taken against the
environment. In real system, the robot moves around 5.1 Simulation
the field by a PWS (Power Wheeled Steering) sys- We performed the computer simulation with the
tem with two independent motors. Since we can send following specifications (the unit is an arbitrary-scaled
the motor control command to each of two motors in- length). The field is a square of which side length is
dependently, we quantized the action set in terms of 200. The goal post is located at the center of the
two motor commands ωl and ωr , each of which has 3 side line of the square (see Fig.2) and its height and
sub-actions (forward, stop, and back motions, respec- width are 10 and 50, respectively. The robot is 16
tively). Totally, we have 9 actions in the action set wide and 20 long and kicks a ball of diameter 6. The
A. camera is horizontally mounted on the robot (no tilt),
Due to the peculiarity of the visual information, and its visual angle is 36 degrees. The mass of the
that is, a small change near the observer might cause ball is negligible compared to that of the robot. Other
a large change in image and vice versa, causes a state- parameters such as a bounding factor between the ball
action deviation problem because each action pro- and the robot, viscous friction between the ball and
duces almost the same amount of motion in the en- the field and so on are properly chosen to simulate the
vironment. In our case, the resolution of robot action real world.
is much higher than that of state space. Therefore, First, we place the ball and the robot at arbitrary
the robot might frequently transits to the same state. positions. In almost all cases, the robot crossed over
This is highly undesirable because the variance of the the field line without shooting a ball into the goal.
This means that the learning has not converged af-
ter many trials (three days running on SGI Elan with
R4000). This situation resembles a case in which the
small child tries to shoot a ball into the goal, but he
(or she) cannot imagine which direction and how far
the goal is because a reward is received only after the
ball has entered into the goal. Further, he (or she)
does not know how to choose an action from several
action commands. This is the famous delayed rein- (a) shooting (b) shooting
forcement problem due to no explicit teacher signal (c) finding
(γ = 0.999) (γ = 0.6)
that indicates the correct output at each time step.
Then, we construct the learning schedule such that
the robot can learn in easy situations at early stages
Figure 5: Some kinds of behaviors during learn-
and learn in more difficult situations at later stages.
ing process.
We began the learning of the shooting behavior by
setting the ball and the robot near the goal. Once
the robot succeeds in a shooting, the robot begins to γ = 0.999, and the robot shifted its body to the better
learn (the sum of Q increases), but after that the robot position for getting a shoot. While, in (b), γ = 0.6,
wonders again in the field. After many iterations of and the robot tried to get a shoot immediately. (a)
these successes and failures, the robot learned to shoot shows a series of behaviors: first the robot lost the
a ball into the goal when the ball is near the goal. ball, then tried to find it by rotating itself, and finally
After that, we place the ball and the robot slightly it dribbled and got a shoot.
further from the goal, and repeat the robot learning
again. 5.2 Real System

12000
gamma = 0.600
gamma = 0.800 Sun WS Sun WS
10000 gamma = 0.999 Ether Net
number of goals

MC68040
8000
MaxVideo 200
DigiColor
6000 UHF Receiver

4000 MC68040
Parallel I/O
2000 A/D
D/A
VME BOX
0 Transmitter
0 500 1000 1500 2000 2500 3000 3500
time step *1000

Receiver
+ + + + ++
Radio Controller
Figure 4: Number of goals in terms of γ
Fig.4 show the accumulated number of shooting
goals in terms of the temporal discount factor γ, where
Figure 6: A configuration of the real system.
the number of goals with larger γ (0.999) is lower than
that with smaller ones (0.6 and 0.8). The reason is as Fig.6 shows a configuration of the real mobile robot
follows. When the temporal discount factor γ is very system. The image taken by a TV camera mounted
close to 1 (almost no discount), the reward received on the robot is transmitted to a UHF receiver and
after the goal is almost the same whichever path is processed by Datacube MaxVideo 200, a real-time
selected. While, if γ is small, the robot try to take pipeline video image processor. In order to simplify
a shorter path which means more rewards. Howev- and speed up the image processing time, we painted
er, for a too small γ, the robot loses the way to the the ball in red and the goal in blue. We constructed
goal. Fig.5 shows some kinds of behaviors during the the radio control system of the vehicle, following the
learning process. (a) and (b) show the difference be- remote-brain project by Profs. Inaba and Inoue at U-
tween the shooting behaviors with different γs. In (a), niversity of Tokyo [16]. The image processing and the
Table 1: State-Action data
time state state action errer
step step ball goal L R
1 1 (C,F) (C,F,Fo) F F
2 2 (R*,F) (C,F,Fo) F F 1
3 3 (D*,D*) (C,F,Ro*) B B 3
4 4 (C,F) (C,F,Lo*) B S 1
5 5 (C,F) (C,F,Fo) F F
6 (C,F) (C,F,Fo) F F
7 (C,F) (C,F,Fo) F F
8 (C,F) (C,F,Fo) F F
9 6 (C,F) (C,F,Ro*) B S 1
(a) input image (b) detected image 10 7 (C,F) (C,F,Fo) F F
11 8 (C,F) (R,M,Fo) F F
12 9 (R,F) (R,M,Fo) F F
13 10 (R,M*) (R,F*,Lo*) F B 3
14 11 (L*,F) (R,M,Ro*) F S 2
Figure 7: Detection of the ball and the goal. 15 12 (L*,F) (R,M,Fo) F S 1
16 13 (R,M) (R,M,Fo) S B
17 14 (C,M) (C,M,Fo) F F
18 15 (L,M) (L,M,Fo) S F
vehicle control system are operated by VxWorks OS 19 16 (L,N) (L,M,Fo) B S
on MC68040 CPU which are connected with host Sun 20 (L,N) (L,M,Fo) B S
21 17 (L,M*) (L,M,Fo) S F 1
workstations via Ether net. We have shown a picture 22 18 (L,N) (L,M,Fo) B S
of the real robot with a TV camera (Sony handy-cam 23 (L,N) (L,M,Fo) B S
24 19 (C,N) (C,M,Fo) F B
TR-3) and video transmitter in Fig.3. 25 20 (C,M) (C,M,Fo) F F
26 (C,M) (C,M,Fo) F F
27 21 (C,M) (C,N,Fo) F S
28 22 (C,M) (C,M*,Lo*) F S 2
29 23 (C,M) (C,M*,Ro*) S B 2
30 24 (C,F) (D,D,D) F S

due to noise of image processing, the robot succeeded

in shooting a ball into the goal as long as the errors
do not occur continuously because the robot has the
optimal action value function against the all states.

6 Discussion and Future Works

Figure 8: The robot succeeded in shooting a ball As a first step towards an autonomous agent capa-
into the goal. ble to generate a purposive behavior, we have studied
the feasibility of realizing a shooting behavior with
Fig.7 shows a result of the image processing where vision. Although we need very longer learning time,
the ball and the goal are detected and their positions the robot has learned to generate a shooting behav-
are calculated in real time (1/30 of a second). Fig.8 ior consisting of a series of actions including finding,
shows a sequence of the shooting images in which the dribbling, and shooting a ball.
robot succeeded in shooting a ball into the goal. In the followings, we argue the role of vision in the
Table 1 shows the result of state discrimination context of the vision-based reinforcement learning.
for the scene shown in Fig.8, where the time step
• During the learning process, the visual informa-
(1/30 of a second), state step the robot discriminat-
tion has an important role of state discrimination
ed, ball state (Left, Right, Center, Disappeared, and
and eventually action evaluation. Almost of the
Near, Medium, or Far), goal state (in addition to the
previous works in the reinforcement learning ap-
same states as a ball, Front Oriented, Left Oriented,
plications assume the perfect sensors that are of-
or Right Oriented), control commands to right and
ten too idealized to apply the real situations. The
left motors (Forward, Stop, or Backward), and the
vision is the most suitable sensor because of its
number of failures of state discrimination. The mis-
global scope to the environment. Only the prob-
understood states are with *s. Although error ratio
lem is how to extract the necessary information
of the state discrimination was very high (about 15%)
to accomplish the task.
• The action value function obtained after the reward to speed up the learning rate and the scaling
inforcement learning includes the necessary infor- problem to apply the learned policy to similar tasks
mation for the robot to take the optimal action but different environments, we realized that the vision-
against each individual state of the environment, based reinforcement learning method seems promising
and therefore it could be called the environment in realizing autonomous agent in real world. We are
map. planning to extend the method to multiple players co-
• Two environments which are different in their ap- ordination and competition.
pearances could be found similar to each other if
we can find the similarity in the action value func-
tions for these environments. A desk or a chair References
in the office scene and a large rock in the out- [1] R. A. Brooks. “A robust layered control system for
door scene need not be discriminated for a obsta- a mobile robot”. IEEE J. Robotics and Automation,
cle finder and avoider. RA-2:14–23, 1986.
• How to construct a state space is one of the is- [2] R. A. Brooks. “Elephants don’t play chess”. In
sues in the reinforcement learning scheme. This P. Maes, editor, Designing Autonomous Agents, pages
is called structural credit assignment problem. 3–15. MIT/Elsevier, 1991.
Many existing works in the reinforcement learning [3] M. J. Mataric. “Integration of representation into
construct the state space in such a way that their goal-driven behavior based robots”. IEEE J. Robotics
and Automation, RA-8:–, 1992.
simulations work well. This sometimes causes un-
[4] P. Maes. “The dynamics of action selection”. In Proc.
natural segmentation of the sensory information of IJCAI-89, pages 991–997, 1989.
and mapping to the state space. At least, this [5] R. C. Arkin. “Integrating behavioral, perceptual, and
can be avoided if the robot uses the visual in- world knowledge in reactive navigation”. In P. Maes,
formation because the spatial resolution and the editor, Designing Autonomous Agents, pages 105–122.
MIT/Elsevier, 1991.
dynamic range of the observed intensities are lim-
ited to some extents and therefore the robot can- [6] J. H. Connel and S. Mahadevan, editors. Robot Learn-
ing. Kluwer Academic Publishers, 1993.
not discriminate the state beyond these physical
constraints. [7] R. S. Sutton. “Special issue on reinforcement learn-
• In the vision based reinforcement learning ing”. In R. S. Sutton(Guest), editor, Machine Learn-
ing, volume 8, pages –. Kluwer Academic Publishers,
scheme, there are two issues related to each oth- 1992.
er: coarse segmentation of the state space and [8] P. Maes and R. A. Brooks. “Learning to coordinate
real-time processing (state mapping and control). behaviors”. In Proc. of AAAI-90, pages 796–802,
In our work, the red ball and the blue goal are 1990.
easily extracted and their sizes and positions are [9] J. H. Connel and S. Mahadevan. “Rapid task learning
for real robot”. In J. H. Connel and S. Mahadevan,
very coarsely mapped to the state space, that editors, Robot Learning, chapter 5. Kluwer Academic
is, “right,” “center,” or “left” and so on in or- Publishers, 1993.
der to reduce the size of the state space. This [10] S. D. Whitehead and D. H. Ballard. “Active percep-
contributes to absorb small amount of errors in tion and reinforcement learning”. In Proc. of Work-
shop on Machine Learning-1990, pages 179–188, 1990.
measuring the positions and the sizes of the ball
[11] C. J. C. H. Watkins and P. Dayan. “Technical note:
or the gall in the image captured by the robot. Q-learning”. Machine Learning, 8:279–292, 1992.
In order to cover this coarseness of the state s- [12] C. J. C. H. Watkins. Learning from delayed rewards”.
pace, control of the robot action must be done PhD thesis, King’s College, University of Cambridge,
in real-time (1/30 of a second). Since it has the May 1989.
action value function obtained after the learning, [13] L. P. Kaelbling. “Learning to achieve goals”. In Proc.
the robot can take the optimal action against any of IJCAI-93, pages 1094–1098, 1993.
situation of the environment. Therefore, even if [14] R. Bellman. Dynamic Programming. Princeton Uni-
mis-mapping of the state due to the image noise versity Press, Princeton, NJ, 1957.
or the failure of action execution due to the slip [15] R. A. Brooks and M. J. Mataric. “Real robot, re-
al learning problems”. In J. H. Connel and S. Ma-
between the floor and crawler of the robot hap- hadevan, editors, Robot Learning, chapter 8. Kluwer
pens, the robot succeed in shooting a ball into the Academic Publishers, 1993.
goal as long as the frequencies of these mistakes [16] M. Inaba. “Remote-brained robotics: Interfacing ai
is low (See Table 1). with real world behaviors”. In Preprints of ISRR’93,
Pitsuburg, 1993.
Although we have other problems such as the tem-
poral credit assignment problem when to give a re-

Towards Vision-Based Deep Reinforcement Learning For Robotic Motion Control
No ratings yet
Towards Vision-Based Deep Reinforcement Learning For Robotic Motion Control
8 pages
Bootstrapping Reinforcement Learning With Imitation For Vision-Based Agile Flight
No ratings yet
Bootstrapping Reinforcement Learning With Imitation For Vision-Based Agile Flight
15 pages
Survey of Model-Based Reinforcement Learning: Applications On Robotics
No ratings yet
Survey of Model-Based Reinforcement Learning: Applications On Robotics
21 pages
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
No ratings yet
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
15 pages
Self-Improving Robots
No ratings yet
Self-Improving Robots
13 pages
QT-Opt - Scalable Deep Reinforcement Learning For Vision-Based Robotic Manipulationa
No ratings yet
QT-Opt - Scalable Deep Reinforcement Learning For Vision-Based Robotic Manipulationa
23 pages
Learning Robot Control - 2012
No ratings yet
Learning Robot Control - 2012
12 pages
Hierarchical Learning of Reactive Behaviors in An Autonomous Mobile Robot
No ratings yet
Hierarchical Learning of Reactive Behaviors in An Autonomous Mobile Robot
5 pages
Vision-Based Robotic Grasping
No ratings yet
Vision-Based Robotic Grasping
23 pages
Robot Skill Learning via Reinforcement
No ratings yet
Robot Skill Learning via Reinforcement
9 pages
Continual Curiosity Driven Skill Acquisition From High Di - 2017 - Artificial in
No ratings yet
Continual Curiosity Driven Skill Acquisition From High Di - 2017 - Artificial in
23 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
44 Efficient Adaptation For End T
No ratings yet
44 Efficient Adaptation For End T
12 pages
2011-Leon Teaching A Robotb
No ratings yet
2011-Leon Teaching A Robotb
8 pages
Paper 1
No ratings yet
Paper 1
7 pages
Adaptive Robotics Papers
No ratings yet
Adaptive Robotics Papers
56 pages
Reinforcement Learning and Transfer Learning: Simulation-Robot System For Object-Handling
No ratings yet
Reinforcement Learning and Transfer Learning: Simulation-Robot System For Object-Handling
3 pages
Agent Environment Interface
No ratings yet
Agent Environment Interface
19 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
UNIT V 5.3 ML Reinforcement Learning
No ratings yet
UNIT V 5.3 ML Reinforcement Learning
41 pages
Reinforcement Learning in Robotics
No ratings yet
Reinforcement Learning in Robotics
38 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
ML Module 5 2
No ratings yet
ML Module 5 2
32 pages
Robotics Learning Innovations
No ratings yet
Robotics Learning Innovations
8 pages
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
No ratings yet
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
61 pages
Reinforcement Learning in Robotics A Survey
No ratings yet
Reinforcement Learning in Robotics A Survey
37 pages
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
No ratings yet
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning
7 pages
Auv RL
No ratings yet
Auv RL
11 pages
ARTICLEONnlp
No ratings yet
ARTICLEONnlp
18 pages
Unit 02 REL
No ratings yet
Unit 02 REL
127 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
No ratings yet
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
24 pages
2024 Special Lectures in Information Science 4 Wo Ans
No ratings yet
2024 Special Lectures in Information Science 4 Wo Ans
47 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
Autonomous Improvement of Instruction Following Skills Via Foundation Models
No ratings yet
Autonomous Improvement of Instruction Following Skills Via Foundation Models
23 pages
Unit-V: Nb-Mjcet
No ratings yet
Unit-V: Nb-Mjcet
36 pages
Reinforcement and Imitation Learning For Diverse Visuomotor Skills
No ratings yet
Reinforcement and Imitation Learning For Diverse Visuomotor Skills
12 pages
Module 5-1
No ratings yet
Module 5-1
12 pages
A Survey On Deep Learning and Deep Reinforcement Learning in Robotics With A Tutorial On Deep Reinforcement Learning
No ratings yet
A Survey On Deep Learning and Deep Reinforcement Learning in Robotics With A Tutorial On Deep Reinforcement Learning
33 pages
Data Driven Control IEEE Paper
No ratings yet
Data Driven Control IEEE Paper
4 pages
Language-Guided Robot Skill Learning
No ratings yet
Language-Guided Robot Skill Learning
27 pages
10.1109 Lars-Sbr.2015.41 Apbp
No ratings yet
10.1109 Lars-Sbr.2015.41 Apbp
6 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
122 Gero p1
No ratings yet
122 Gero p1
8 pages
Learning Task
No ratings yet
Learning Task
14 pages
XPG-RL RL With Explainable Priority Guidance For Efficiency-Boosted Mechanical Search
No ratings yet
XPG-RL RL With Explainable Priority Guidance For Efficiency-Boosted Mechanical Search
13 pages
Active Policy Learning For Robot Planning and Exploration Under Uncertainty
No ratings yet
Active Policy Learning For Robot Planning and Exploration Under Uncertainty
8 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
3 pages
Paper 5 Story Summary
No ratings yet
Paper 5 Story Summary
2 pages
Path Planning of Mobile Robot Based On Deep Reinforcement Learning
No ratings yet
Path Planning of Mobile Robot Based On Deep Reinforcement Learning
5 pages
Towards Reinforcement Learning Controllers For Soft Robots Using Learned Environments
No ratings yet
Towards Reinforcement Learning Controllers For Soft Robots Using Learned Environments
6 pages
Identifying Important Sensory Feedback For Learning Locomotion Skills
No ratings yet
Identifying Important Sensory Feedback For Learning Locomotion Skills
22 pages
BC-Z - Zero-Shot Task Generalization With Robotic Imitation Learning
No ratings yet
BC-Z - Zero-Shot Task Generalization With Robotic Imitation Learning
23 pages
IX Science 2: Detailed Science Lesson Plan
100% (2)
IX Science 2: Detailed Science Lesson Plan
4 pages
Ice Cream AAK
100% (1)
Ice Cream AAK
12 pages
Paradise Weekly Model Test Cee Mds Based Model Test: (Saturday, Kartik 19, 2079)
No ratings yet
Paradise Weekly Model Test Cee Mds Based Model Test: (Saturday, Kartik 19, 2079)
35 pages
Principles of Scalable Performance: CSE539: Advanced Computer Architecture
No ratings yet
Principles of Scalable Performance: CSE539: Advanced Computer Architecture
7 pages
Purplsoc 2017 Pursuit of Pattern Languages For Societal Change Richard Sickinger Download
No ratings yet
Purplsoc 2017 Pursuit of Pattern Languages For Societal Change Richard Sickinger Download
28 pages
Start
No ratings yet
Start
575 pages
Haier AS Series - Remote Controller AS142XCBAA
No ratings yet
Haier AS Series - Remote Controller AS142XCBAA
18 pages
Foundation Design - Hybrid Riser GN E-Jan17
No ratings yet
Foundation Design - Hybrid Riser GN E-Jan17
92 pages
Compliance Sheet-Foundation Waterproofing MFGC
No ratings yet
Compliance Sheet-Foundation Waterproofing MFGC
2 pages
Technical Specs for FN080-SDQ.6N.V7
No ratings yet
Technical Specs for FN080-SDQ.6N.V7
6 pages
7-3.7 Energy
No ratings yet
7-3.7 Energy
13 pages
Multi-Layered Two Wheeler Parking System
No ratings yet
Multi-Layered Two Wheeler Parking System
16 pages
Of Scientific Research Students and Lecture: Preparing A Paper Using Microsoft Word For Publication in Journal
No ratings yet
Of Scientific Research Students and Lecture: Preparing A Paper Using Microsoft Word For Publication in Journal
5 pages
Analysis of Mechanical Properties of Aluminium Metal
No ratings yet
Analysis of Mechanical Properties of Aluminium Metal
25 pages
Distribution Management Systems - Robert Uluski
No ratings yet
Distribution Management Systems - Robert Uluski
178 pages
Manual Motoniveladora Champion
67% (12)
Manual Motoniveladora Champion
51 pages
Install Manual For KEIHIN PE28
0% (1)
Install Manual For KEIHIN PE28
4 pages
Linear Programming: (Graphical Method)
No ratings yet
Linear Programming: (Graphical Method)
10 pages
Tropospheric Wave Propagation
No ratings yet
Tropospheric Wave Propagation
17 pages
How To Make Blind Contour Drawings (And Why You Should)
No ratings yet
How To Make Blind Contour Drawings (And Why You Should)
6 pages
Test Booklet: Csat Paper-2
No ratings yet
Test Booklet: Csat Paper-2
36 pages
Annual Report SMGR
No ratings yet
Annual Report SMGR
425 pages
Magna-505-Display Infor
No ratings yet
Magna-505-Display Infor
2 pages
Basic Electrical Engineering
No ratings yet
Basic Electrical Engineering
32 pages
IADC DDR Codes 2 13 2019
No ratings yet
IADC DDR Codes 2 13 2019
4 pages
Nur 145 (Exam Gina) : Concealed Hemorrhage
100% (1)
Nur 145 (Exam Gina) : Concealed Hemorrhage
26 pages
Vmethod Forensic Toxicology SCRN
No ratings yet
Vmethod Forensic Toxicology SCRN
4 pages
Brochure Kulswamini Bheri Bhavani-3
No ratings yet
Brochure Kulswamini Bheri Bhavani-3
9 pages
Industrial Robotics Assignment 1
No ratings yet
Industrial Robotics Assignment 1
5 pages
KAR1801-CEC-PR-040-PM-0001 - Permit To Work Procedure
No ratings yet
KAR1801-CEC-PR-040-PM-0001 - Permit To Work Procedure
15 pages

Asada 94 B

Uploaded by

Asada 94 B

Uploaded by

Vision-Based Behavior Acquisition For A Shooting Robot By Using

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda

Abstract tonomous agent.

Q(s, a) ⇐ (1 − α)Q(s, a) + α(r(s, a) + γ max

4 Construction of State and Action

due to noise of image processing, the robot succeeded

6 Discussion and Future Works

You might also like