0% found this document useful (0 votes)

7 views24 pages

1705-Article Text-1701-1-10-20080129

Uploaded by

karimhafez37

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views24 pages

1705-Article Text-1701-1-10-20080129

Uploaded by

karimhafez37

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

AI Magazine Volume 24 Number 2 (2003) (© AAAI)

Articles

Learning-Assisted
Automated Planning
Looking Back, Taking Stock,
Going Forward

Terry Zimmerman and Subbarao Kambhampati

■ This article reports on an extensive survey and spanning some 30 years attests that it is an in-
analysis of research work related to machine learn- teresting, broad, and fertile field in which
ing as it applies to automated planning over the learning techniques can be applied to advan-
past 30 years. Major research contributions are tage. We focus here on this learning-in-plan-
broadly characterized by learning method and
ning research and utilize both tables and graph-
then descriptive subcategories. Survey results re-
veal learning techniques that have extensively
ic maps of existing studies to spotlight the
been applied and a number that have received combinations of planning-learning methods
scant attention. We extend the survey analysis to that have received the most attention as well as
suggest promising avenues for future research in those that have scarcely been explored. We do
learning based on both previous experience and not attempt to provide, in this limited space, a
current needs in the planning community. tutorial of the broad range of planning and
learning methodologies, assuming instead that
the interested reader has at least passing famil-
iarity with these fields.

I
n this article, we consider the symbiosis of A cursory review of the state of the art in
two of the most broadly recognized hall- learning in planning during the early to mid-
marks of intelligence: (1) planning—solving 1990s reveals that the primary impetus for
problems in which one uses beliefs about ac- learning was to make up for often debilitating
tions and their consequences to construct a se- weaknesses in the planners themselves. The
quence of actions that achieve one’s goals— general-purpose planning systems of even a
and (2) learning—using past experience and pre- decade ago struggled to solve simple problems
cepts to improve one’s ability to act in the fu- in the classical benchmark domains; blocks
ture. Within the AI research community, ma- world problems of 10 blocks lay beyond their
chine learning is viewed as a potentially capabilities as did most logistics problems
powerful means of endowing an agent with (Kodtratoff and Michalski 1990; Minton 1993).
greater autonomy and flexibility, often com- The planners of the period used only weak
pensating for the designer’s incomplete knowl- guidance in traversing their search spaces, so it
edge of the world that the agent will face and is not surprising that augmenting the systems
incurring low overhead in terms of human to learn some such guidance was often a win-
oversight and control. If we view a computer ning strategy. Relative to the largely naïve base
program with learning capabilities as an agent, planner, the learning-enhanced systems dem-
then we can say that learning takes place as a onstrated improvements in both the size of
result of the interaction of the agent and the problems that could be addressed and the
world and observation by the agent of its own speed with which they could be solved (Kamb-
decision-making processes. Planning is one hampati, Katukam, and Qu 1996; Leckie and
such decision-making process that such an Zukerman 1998; Minton et. al. 1989; Veloso
agent might undertake, and a corpus of work and Carbonell 1993).

Copyright © 2003, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2003 / $2.00 SUMMER 2003 73
Articles

With the advent of several new genres of Where Learning

planning systems in the past five to six years,
the entire base-performance level against Might Assist Planning
which any learning-augmented system must In a number of ways, automated planning pre-
compare has shifted dramatically. It is argu- sents a fertile field for the application of ma-
ably a more difficult proposition to accelerate chine learning. The simple (STRIPS) planning
a planner in this generation by outfitting it problem itself has been shown to be PSPACE
with some form of online learning because complete (Bylander 1992); thus, for planning
the overhead cost incurred by the learning systems to handle problems large enough to be
system can overwhelm the gains in search ef- of interest, they must greatly reduce the size of
ficiency. This, in part, might explain why the the search space they traverse. Indeed, the
planning community appears to have paid great preponderance of planning research,
less attention to learning in recent years. from alternate formulations of the planning
From the machine learning–community per- problem to the design of effective search
spective, Langley (1997, p. 18) remarked on heuristics, can be seen as addressing this prob-
the swell of research in learning for problem lem of pruning the search space. It is therefore
solving and planning that took place in the not surprising that the earliest and most wide-
1980s as well as to note the subsequent tail- spread application of learning to automated
off: “One source is the absence of robust algo- planning has focused on the aspect of expedit-
rithms for learning in natural language, plan- ing solution search.
ning, scheduling, and configuration, but As automated planning advanced beyond
these will come only if basic researchers re- solving trivial problems, the issue of plan qual-
gain their interest in these problems.” ity received increased attention. Although
Of course, interest in learning within the there are often many valid plans for a given
planning community should not be limited to problem, generating one judged acceptable by
anticipated speedup benefits. As automated the user or optimizing over several quality met-
planning has advanced its reach to the point rics can increase the complexity of the plan-
where it can cross the threshold from toy prob- ning task immensely. A learning-augmented
lems to some interesting real-world applica- planning system that can perceive a user’s pref-
tions, a variety of issues come into focus. They erences and bias its subsequent search accord-
range from dealing with incomplete and un- ingly offers a means of reducing this complex-
certain environments to developing an effec- ity. Learning seems to have an obvious role in
tive interface with human users. mixed-initiative planning, where it might be
Our purpose in this article is to develop, us- imperative to perceive and accommodate the
ing an extensive survey of published work, a expertise, preferences, and idiosyncrasies of
broad perspective of the diverse research that humans. Finally, expanding our view to a real-
has been conducted to date in learning in world situation in which a planning system
planning and to conjecture about profitable might operate, we are likely to confront uncer-
directions for future work in this area. The re- tainty as a fact of life, and complete and robust
mainder of the article is organized into three domain theories are rare. As we show, the study
parts: (1) what learning is likely to be of assis- of machine learning methods in planning ap-
tance in automated planning, (2) what roles proaches that address uncertainty is in its in-
has learning actually played in the relevant fancy.
planning research conducted to date, and (3) Machine learning offers the promise of ad-
where might the research community gainful- dressing such issues by endowing the planning
ly direct its attentions in the near future. In system with the ability to profit from observa-
the section entitled Where Learning Might As- tion of its problem space and its decision-mak-
sist Planning, we describe a set of five dimen- ing experience, whether or not its currently
sions for classifying learning-in-planning sys- preferred decision leads to success. However, to
tems with respect to properties of both the actually realize this promise within a given ap-
underlying planning engine and the learning plication challenges the planning system de-
component. By mapping the breadth of the signer on many fronts. Success is generally
surveyed work along these dimensions, we re- heavily dependent on complex relationships
veal some underlying research trends, pat- and interconnections between planning and
terns, and possible oversights. This mapping learning. In figure 1, we suggest five dimen-
motivates our speculation in the final section sions that capture perhaps the most important
on some promising directions for such re- of these system design issues: (1) type of plan-
search in the near future, given our current ning problem, (2) approach to planning, (3)
generation of planning systems. goal for the learning component, (4) planning-

74 AI MAGAZINE
Articles

Planning Aspects Learning Aspects

Problem Type Planning Planning- Learning Type of Learning

Approach Learning Phase
Goal Analytic
Classical Planning State Space Search
Explanation-Based
Static world Learning
Deterministic Conjunctive SS
[PRODIGY, HSPR, Speed Static Analysis
Fully observable
FF…1] Up Before Planning Starts and Abstractions
Instantaneous
Actions Planning Analogic
Disjunctive SS
Propositional
[GRAPHPLAN, Derivational Analogy
STAN, IPP…] / Case Based

Beyond Classical Inductive

Modes Plan Space Search
Decision Tree
[SNLP, TWEAK,
UCPOP…] Improve
Plan During Planning Process Inductive Logic
Quality Programming

Neural Network
Compilation
Approaches Bayesian Learning

Full-Scope Planning Reinforcement

CSP Learn Learning
Dynamic world [CP-CSP …]
or Improve
Stochastic During Plan Execution
Domain Multistrategy
Partially observable
SAT Theory
Asynchronous Goals
[SATPLAN, Analytic and Induction
Metric/Continuous BLACKBOX ...]

EBL and Inductive

Integer Logic Programming
Programming
EBL and Reinforce-
ment Learning

Figure 1. Five Dimensions Characterizing Automated Planning Systems Augmented with a Learning Component.
CSP = constraint-satisfaction programming. EBL = explanation-based learning. SAT = satisfiability.

execution phase in which learning is conduct- corpus of work to date and the difficulty of vi-
ed, and (5) type of learning method. sualizing and presenting patterns and relation-
We hope to show that this set of dimensions ships in high-dimensional data, we settled on
is useful in both gaining useful perspective on the five dimensions of figure 1 as the most re-
the work that has been done in learning-aug- vealing. Before reporting on the literature sur-
mented planning and speculating about prof- vey, we briefly discuss each of these dimen-
itable directions for future research. Admitted- sions.
ly, these are not independent or orthogonal
dimensions; they also do not make up an ex- Planning Problem Type
haustive list of relevant factors in the design of The nature of the environment in which the
an effective learning component for a given planner must conduct its reasoning defines
planner. Among other candidate dimensions where a given problem lies in the continuum
that could have been included are type of plan of classes from classical to full-scope planning.
(for example, conditional, conformant, serial, Here, classical planning refers to a world model
or parallel actions), type of knowledge learned in which fluents are propositional, and they
(domain or search control), learning impetus don’t change unless the planning agent acts to
(data driven or knowledge driven), and type of change them, all relevant attributes can be ob-
organization (hierarchical or flat). Given the served at any time, the impact of executing an

SUMMER 2003 75
Articles

action on the environment is known and de- state-space search (Kambhampati 2000). BLACK-
terministic, and the effects of taking an action BOX (Kautz and Selman 1999) uses GRAPHPLAN’s
occur instantly. If we relax all these constraints disjunctive representation of states and itera-
such that fluents can take on a continuous tively converts the search into a SAT problem.
range of values (for example, metric), a fluent
might change its value spontaneously or for Goal of Planner’s
reasons other than agent actions—for example, Learning Component
the world has hidden variables, the exact im- There is a wide variety of targets that the learn-
pact of acting cannot be predicted, and actions ing component of a planning system might
have durations—then we are in the class of aim toward, such as learning search control
full-scope planning problems. In between these rules, learning to avoid dead-end or unpromis-
extremes lies a wide variety of interesting and ing states, or improving an incomplete domain
practical planning problem types, such as clas- theory. As indicated in figure 1, they can be
sical planning with a partially observable categorized broadly into one of three groups:
world (for example, playing poker) and classi- (1) learning to speed up planning, (2) learning
cal planning where actions realistically require to elicit or improve the planning domain the-
significant periods of time to execute (for ex- ory, or (3) learning to improve the quality of
ample, logistics domains). The difficulty with the plans produced (where quality can have a
even the classical planning problem is that it wide range of definitions).
largely occupied the full attention of the re-
Learning and Improving Domain Theo-
search community until the past few years. The
ry Automated planning implies the presence
current extension into various neoclassical,
of a domain theory—the descriptions of the ac-
temporal, and metric planning modes has been
tions available to the planner. When an exact
spurred in part by impressive advances in auto-
model of how an agent’s actions affect its
mated planning technology over the past six
world is unavailable (a nonclassical planning
years or so.
problem), there are obvious advantages to a
Planning Approach planner that can evolve its domain theory by
learning. Few interesting environments are
Planning as a subfield of AI has roots in Newell
simple and certain enough to admit a complete
and Simon’s 1960-era problem-solving system,
model of their physics, so it’s likely that even
GPS , and theorem proving. At a high level,
“the best laid plans” based on a static domain
planning can be viewed as either a problem
theory will occasionally (that is, too often) go
solver or theorem prover. Planning methods
astray. Each such instance, appropriately fed
can further be seen as either search processes or
back to the planner, provides a learning oppor-
model checking. Among planners most com-
tunity for evolving the domain theory toward
monly characterized by search mode, there are
a version more consistent with the actual envi-
two broad categories: (1) search in state space
ronment in which its plans must succeed.
and (2) search in a space of plans. It is possible
to further partition current state-space plan- Even in classical planning, the designer of a
ners into those that maintain a conjunctive problem domain generally has many valid al-
state representation and those that search in a ternative ways of specifying the actions, and it
disjunctive representation of possible states. is well known that the exact form of the action
Planners most generally characterized as descriptions can have a large impact on the ef-
model checkers (although they also conduct ficiency of a given planner on a given problem.
search) involve recompiling the planning prob- Even if the human designer can identify some
lem into a representation that can be tackled by of the complex manner in which the actions in
a particular problem solution engine. These sys- a domain description will interact, he/she will
tems can be partitioned into three categories: likely be faced with trade-offs between efficien-
(1) satisfiability (SAT), constraint-satisfaction cy and factors such as compactness, compre-
problems (CSPs), and integer linear program- hensibility, and expressiveness.
ming (IP). Figure 1 lists these three different Planning Speedup In all but the most
methods along with representative planning trivial of problems, a planner will have to con-
systems for each. These categories are not en- duct considerable search to construct a solu-
tirely disjoint for purposes of classifying plan- tion, in the course of which it will be forced to
ners because some systems use a hybrid ap- backtrack numerous times. The primary goals
proach or can be viewed as examples of more of speedup learning are to avoid unpromising
than one method. GRAPHPLAN (Blum and Furst portions of the search space and bias the
1997), for example, can be seen as either a dy- search in directions most likely to lead to
namic CSP or as a conductor for disjunctive high-quality plans.

76 AI MAGAZINE
Articles

Improving Plan Quality This category drawn based on the following formulations of
ranges from learning to bias the planner to- the learning problem:
ward plans with a specified attribute or metric Inductive learning: The learner is confront-
value to learning a user’s preferences in plans ed with a hypothesis space H and a set of train-
and variations of mixed-initiative planning. ing examples D. The desired output is a hy-
pothesis h from H that is consistent with these
Planning Phase in Which training examples.
Learning Is Conducted Analytic learning: The learner is confront-
At least three opportunities for learning pre- ed with the same hypothesis space and train-
sent themselves over the course of a planning ing examples as for inductive learning. Howev-
and execution cycle: (1) before planning starts, er, the learner has an additional input: a
(2) during the process of finding a valid plan, domain theory B composed of background
and (3) during the execution of a plan. knowledge that can be used to help explain ob-
served training examples. The desired output is
Learning before Planning Starts Before
a hypothesis h from H that is consistent with
the solution search even begins, the specifica-
both the training examples D and the domain
tion of the planning problem itself presents
theory B.
learning opportunities. This phase is closely
Understanding the advantages and disad-
connected to the aspect of learning and im-
vantages of applying a given machine learning
proving the domain theory but encompasses
technique to a given planning system can help
only preprocessing of a given domain theory. It
to make sense of any research bias that be-
is done offline and produces a modified do-
comes apparent in the survey tables. The pri-
main that is useful for all future domain prob-
mary types of analytic learning systems devel-
lems.
oped to date, along with their relative
Learning during the Process of Finding strengths and weaknesses and an indication of
a Valid Plan Planners capable of learning their inductive biases, are listed in table 1. The
in this mode have been augmented with some major types of pure inductive learning systems
means of observing their own decision-making are similarly described in table 2. Admittedly,
process. They then take advantage of their ex- the various subcategories within these tables
perience during planning to expedite the fur- are not disjoint, and they don’t nicely partition
ther planning or improve the quality of plans the entire class (inductive or analytic).
generated. The learning process itself can ei- The research literature itself conflicts at
ther be online or offline. times about what constitutes learning in a giv-
Learning during the Execution of a Plan en implementation, so tables 1 and 2 reflect
A planner has yet another opportunity to im- the decisions made in this regard for this study.
prove its performance when it is an embedded The classification scheme we propose for
component of a system that can execute a plan learning-augmented planning systems is per-
and provide sensory feedback. A system that haps most inadequate when it comes to rein-
seeks to improve an incomplete domain theory forcement learning. We discuss this special case,
would conduct learning in this phase, as might in which planning and learning are inextricably
a planner seeking to improve plan quality intertwined, in the sidebar “Reinforcement
based on actual execution experience. The Learning: The Special Case.”
learning process itself can either be online or Analogical learning is only represented in
offline. table 1 by a specialized and constrained form
known as derivational analogy and the closely
Type of Learning related case-based reasoning formulism. More
The machine learning techniques themselves flexible and powerful forms of analogy can be
can be classified in a variety of ways, irrespec- envisioned (compare Hofstadter and Marshall
tive of the learning goal or the planning phase [1996, 1993]), but the lack of active research in
they might be used in. Two of the broadest tra- this area within the machine learning commu-
ditional class distinctions that can be drawn nity effectively eliminates more general analo-
are between so-called inductive (or empirical) gy as a useful category in our learning-in-plan-
methods and deductive (or analytic) methods. ning survey.
In figure 1, we broadly partition the machine The three columns for each technique given
learning–techniques dimension into these two in tables 1 and 2 give a sense of the degree to
categories along with a multistrategy ap- which the method can be effective when ap-
proach. We then consider additional properties plied to a given learning problem, in our case,
that can be used to characterize a given meth- automated planning. Two columns summarize
od. The inductive-deductive classification is the relative strengths and weaknesses of each

SUMMER 2003 77
Articles

Analytic Technique Models Strengths Weaknesses

Nogood Learning Inconsistent states Simple, fast learning Low strength learning—each
(Memoization, and sets of fluents Generally low computational nogood typically prunes small
Caching) overhead sections of search space
Practical, widely used Difficult to generalize across
problems
Memory requirements can be
high

Search control rules Uses a domain theory—the available Requires a domain theory—
Explanation-Based Domain refinement background knowledge incorrect domain theory can
Learning (EBL) Can learn from a single training lead to incorrect deductions
example
If-then rules are generally intuitive Rule utility problem
(readable)
Widely used

Static Analysis and Existing problem / Performed “offline”, benefits Benefits vary greatly
Abstractions Learning domain invariants or generally available for all subsequent depending on domain
structure problems in domain. and problem

Derivational Analogy / Similarity between Holds potential for shortcutting Large space required as case
Case-Based Reasoning current state and much planning effort where similar library builds
(CBR) previously cataloged problem states arise frequently. Case-matching overhead
states Extendable to full analogy? Revising old plan can be
costly

Table 1. Characterization of the Most Common Analytic Learning Techniques.

technique. The column headed Models refers ly justified hypotheses. The logical justifica-
to the type of function or structure that the tions fall short when the prior knowledge is
method was designed to represent or process. A flawed, and the statistical justifications are sus-
method chosen to learn a particular function is pect when data are scarce, or assumptions
not well suited if it is either incapable of ex- about distributions are questionable.
pressing the function or is inherently much We next consider the learning-in-planning
more expressive than required. This choice of work that has been done in light of the charac-
representation involves a crucial trade-off. A terization structure given in figure 1.
very expressive representation that allows the
target function to be represented as close as What Role Has Learning
possible will also require more training data to
choose among the alternative hypotheses it Played in Planning?
can represent. We report here the results of an extensive sur-
The heart of the learning problem is how to vey of AI research literature focused on applica-
successfully generalize from examples. Analyti- tions of machine learning techniques to plan-
c learning leans on the learner’s background ning. Research in the area of machine learning
knowledge to analyze a given training instance goes back at least as far back as 1959, with
to discern the relevant features. In many do- Arthur Samuel’s (1959) checkers-playing pro-
mains, such as the stock market, complete and gram that improved its performance through
correct background knowledge is not available. learning. It is noteworthy that perhaps the first
In these cases, inductive techniques that can work in what was to become the AI field of
discern regularities over many examples in the planning (STRIPS [Fikes and Nilsson 1971]) was
absence of a domain model can prove useful. quickly followed by a learning-augmented ver-
One possible motivation for adopting a multi- sion that could improve its performance by an-
strategy approach is that analytic learning alyzing its search experience (Fikes, Hart, and
methods generate logically justified hypothe- Nilsson 1972). Space considerations preclude
ses, but inductive methods generate statistical- an all-inclusive survey for this 30-year span,

78 AI MAGAZINE
Articles

Inductive Technique Models Strengths Weaknesses

Discrete-valued Robust to noisy data, Approximating real-valued

Decision Tree Learning functions, classification missing values or vector-valued, functions
problems Learns disjunctive clauses (essentially propositional)
If-then rules are easily Incapable of learning
understandable relational predicates
Practical, widely used

Artificial Discrete-, real-, and Robust to noisy and complex Long training times are
Neural Networks vector-valued functions data, errors in data common; learned target
function is largely
inscrutable

Inductive Logic First-order logic, theories Robust to noisy data, Large training sample size
Programming as logic programs missing values. might be needed to acquire
More expressive than effective set of predicates
propositional-based learners Rule utility problem
Able to generate new
predicates.
If-then rules (Horn clauses)
are easily understandable

Bayesian Learning Probabilistic inference Readily combine prior Require large initial
Hypotheses that make knowledge with observed probability sets
probabilistic predictions data High computational cost to
Modifies hypothesis obtain Bayes's optimal
probability incrementally hypothesis
based on each training
example.

Reinforcement Learning Control policy to Domain theory not required Depends on a real-valued
maximize rewards. Handling actions with non- reward signal for each
Fits the MDP setting deterministic outcomes transition
Optimal policy from Difficulty handling large
nonoptimal training sets, state spaces. Convergence
facilitates life-long learning can be slow, space
requirements can be huge

Table 2. Characterization of the Most Common Inductive Learning Techniques.

but we wanted to list either seminal studies in visual mapping of the studies’ demographics
each category or a typical representative study along the five dimensions.
if the category has many. We discuss each of these representations in
It is difficult to present the survey results in the following subsections.
2-dimensional (2D) format such that the five
dimensions represented in figure 1 are usefully Survey Tables according to Learning
reflected. We used three different formats, em- Type and Planning Type
phasizing different combinations and order- Table 3A deals with studies focused primarily
ings of the figure 1 dimensions: on analytic (deductive) learning in its various
First is a set of three tables organized around forms, and table 3B is concerned with induc-
just two dimensions: (1) type of learning and tive learning. Table 3C addresses studies and
(2) type of planning. multistrategy systems that aim at some combi-
Second is a set of tables reflecting all five di- nation of analytic and inductive techniques.
mensions for each relevant study in the survey. All studies and publications appearing in these
Third is a graphic representation providing a tables are listed in full in the reference section.

SUMMER 2003 79
Articles

Analytic General Planning Applications

Learning Applications
State Space Plan Space Compilation
(Conjunctive / Disjunctive) (CSP / SAT / IP)

Static / Domain Learning Abstractions Smith and Peot

Analysis and Sacerdoti (1974) ABSTRIPS (1993) [SNLP]
Abstractions Knoblock (1990) ALPINE Gerevini and
Schubert (1996)
Static Analysis, Domain Invars [UCPOP]
Dawson and Siklossy (1977)
REFLECT
Etzioni (1993) STATIC
[PRODIGY]
Perez and Etzioni (1992)
DYNAMIC (with EBL)[PRODIGY]
Nebel, Koehler, and Dimo-
poulos (1997) RIFO
Gerevini and Schubert (1998)
DISCOPLAN
Fox and Long (1998, 1999)
STAN/ TIM [GRAPHPlan]
(Rintanen 2000)

Explanation-Based General Problem Fikes and Nilsson (1972) Chien (1989) Wolfman and Weld
Learning (EBL) Solving (Chunking) STRIPS Kambhampati, (1999) LPSAT
Laird et al. (1987) Minton et al. (1989) PRODIGY Katukam, and Qu [RELSAT]
SOAR (1996) UCPOP-EBL
Gratch and DeJong (1992) Nogood Learning
Horn Clause Rules COMPOSER [PRODIGY] Kautz and Selman
Kedar-Cabelli (1987) Bhatnagar and Mostow (1994) (1999) BLACKBOX
Prolog-EBG FAILSAFE (using RELSAT)
Borrajo and Veloso (1997) Do and Kambham-
Symbolic Integration HAMLET (See also multistrategy) pati (2001) GP-CSP
Mitchell et al. (1986) [GRAPHPlan]
LEX-2 (See also multi- Kambhampati (2000)
strategy) GRAPHPlan-EBL

Permissive Real-World Plans

Bennett and DeJong (1996)
GRASPER

Analogical Jones and Langley

(1995) EUREKA Learning Various
Abstraction-Level Cases
Microdomain
Bergmann and Wilke (1996)
Analogy Maker PARIS
Hofstadter and
Marshall (1993, 1996) User-Assisted Planning
COPYCAT Avesani, Perini, and Ricci
(2000) CHARADE
Conceptual Design
Sycara et al. (1992) CBR Derivational
Case-Based CBR Derivational
CADET Ihrig and Kamb-
Reasoning Veloso and Carbonell (1993)
PRODIGY / ANALOGY hampati (1996)
Legal Reasoning
[UCPOP]
by Analogy CBR Transformational
Ashley and McLaren Hammond (1989) CHEF With EBL
(1995) TRUTH- Ihrig and Kamb-
TELLER PRIAR (Kambhampati and
Hendler 1992) hampati (1997)
Ashley and Aleven [UCPOP]
(1997) CATO SPA (Hanks and Weld 1995)
Leake, Kinley, and Wilson
Kakuta et al. (1997)
(1996) (see also multistrategy)

Table 3A. Analytic Learning Applications and Studies.

Studies in heavily shaded blocks concern planners applied to problems beyond classical planning. Implemented system and pro-
gram names appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

80 AI MAGAZINE
Articles

Planning Applications
Inductive Learning General Applications
State Space Plan Space Compilation
(Conjunctive / Disjunctive) (CSP / SAT / IP)

Propositional Concept Learning Learning Operators for Real-

Decision Trees Hunt, Marin, and Stone (1966) World Robotics, Clustering
CLS Schmill, Oates, and Cohen
(2000) [TBA for inducing
General DT Learning decision tree]
Quinlan (1986) ID3
Quinlan (1993) C4.5
Khardon (1999) L2ACT
Cohen and Singer (1999)
SLIPPER

Real-Valued Hinton (1989)

Neural Network
Symbolic Rules from NN
Craven and Shavlik (1993)

Reflex/Reactive
Pomerleau (1993) ALVINN

First-Order Logic Hornlike Clauses Leckie and Zukerman (1998) Estlin and Mooney Huang, Selman, and
Inductive Logic Quinlan (1990) FOIL GRASSHOPPER [PRODIGY] (1996) (See also Kautz (2000) (See also
multistrategy) multistrategy)
Programming (ILP) Muggleton and Feng (1990) Zelle and Mooney (1993) (See
GOLEM also multistrategy)
Lavrac, Dzeroski, and Grobel- Reddy and Tadepalli (1999)
nik (1991) LINUS ExEL

Bayesian Learning Train Bayesian Belief Networks,

Unobserved Variables
Dempster, Laird, and Rubin
(1977) EM

Text Classification
Lang (1995) NEWSWEEDER

Predict Run Time of Problem

Solvers for Decision-Theoretic
Control
Horvitz et al. (2001)

Other Inductive Action Strategies and Rivest’s

Learning Decision List Learning
Khardon (1999)
Martin and Geffner (2000)

Plan Rewriting
Ambite, Knoblock, and Minton
(2000) PBR

Reinforcement Sutton (1988) TD / TDLAMBDA (Dietterich and Flann 1995)

Learning (RL) Watkins (1989) Q Learning (See also multistrategy)

Real-Time Dynamic Incremental Dynamic

Programming Programming
Barto, Bradtke, and Singh Sutton (1991) DYNA
(1995) Planning with learned operators:
Dearden, Friedman, and Garcia-Martinez and Borrajo
Russel (1998) Bayesian Q (2000) LOPE
Learning

Table 3B. Inductive Learning Applications and Studies.

CSP = constraint-satisfaction programming. DT = decision tree. IP = integer linear programming. NN = neural network. SAT = satis-
fiability. Studies in heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system and
program names appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

SUMMER 2003 81
Articles

Planning Applications
Multistrategy General Applications
Learning State Space Plan Space Compilation
Conjunctive/Disjunctive [ CSP / SAT/ IP ]

Analytic and Symbolic Integration Learn / Refine Operators

Inductive Mitchell, Keller, and Kedar-Cabelli Carbonell and Gil (1990),
(1986) LEX-2 Gil (1994) EXPO
[PRODIGY]
Learn CSP Variable Ordering
Wang (1996a, (1996b)
Zweban et al. (1992) GERRY
OBSERVER [PRODIGY]
Incorporate Symbolic Knowledge McCluskey, Richardson,
in Neural Networks and Simpson (2002)
Shavlik and Towell (1989) KBANN OPMAKER
Fu (1989)
EBL and Induction:
Learn Horn Clause Sets Calistri-Yeh, Segre, Sturgill
Focused by Domain Theory (1996) ALPS
Pazzani, Brunk, and Silverstein
CBR and Induction
(1991) FOCL
Leake, Kinley, and Wilson
Refining Domain Theories (1996) DIAL
Using Empirical Data Borrajo and Veloso (1997)
Ourston and Mooney (1994) HAMLET [PRODIGY]
EITHER Zimmerman and
Kambhampati (1999, 2002)
Neural Networks and Fuzzy EGBG, PEGG [GRAPHPLAN]
Logic to Implement Analogy
Hollatz (1999) Deduction, Induction,
and Genetic
Genetic, Lazy RL, Aler, Borrajo, and Isasi
k-Nearest Neighbor (1998) HAMLET-EvoCK
Sheppard and Salzberg (1995) [PRODIGY]
Aler and Borrajo (2002)
HAMLET-EvoCK
[PRODIGY]

Explanation-Based Domain Theory Cast in Neural

Learning and Network Form
Neural Networks Mitchell and Thrun (1995) EBNN

Explanation-Based Search Control for Logic Programs Zelle and Mooney (1993) Estlin and Mooney EBL, ILP, and Some
Learning and Cohen (1990) AxA-EBL DOLPHIN [PRODIGY/FOIL] (1996) SCOPE Static Analysis
Inductive Logic Zelle and Mooney (1993) [FOIL] Huang, Selman, and
Programming DOLPHIN [FOIL/PRODIGY] Kautz (2000)
[BLACKBOX-FOIL]

Explanation-Based
Dietterich and Flann (1997)
Learning and
EBRL Policies
Reinforcement
Learning

Table 3C. Multistrategy Learning Applications and Studies.

CSP = constraint-satisfaction programming. DT = decision tree. EBL = explanation-based learning. IP = integer linear programming. NN =
neural network. RL = reinforcement learning. SAT: satisfiability. Studies in the heavily shaded blocks feature planners applied to problems
beyond classical planning. Implemented system and program names appear in all caps, and underlying planners and learning subsystems
appear in small caps but enclosed in brackets.

The table rows feature the major learning studies and implementations of the learning
types outlined in tables 1 and 2, occasionally technique in the first column. These General
further subdivided as indicated in the leftmost Applications were deemed particularly rele-
column. The second column contains a listing vant to planning, and of course, the list is
of some of the more important nonplanning highly abridged. Comparing the General Ap-

82 AI MAGAZINE
Articles

plications column with the Planning columns main theory. Also obvious is the extent to
for each table provides a sense of which ma- which research has focused on learning prior
chine learning methods have been applied to or during planning, with scant attention
within the planning community. The three paid to learning during plan execution.
columns making up the Planning Applications
partition subdivide the applications into state Graphic Analysis of Survey
space; plan space; and CSP, SAT, and IP plan- There are obvious limitations to what can read-
ning. Studies dealing with planning problems ily be gleaned from any tabular presentation of
beyond classical planning (as defined in Plan- a data set across more than two or three dimen-
ning Problem Type earlier) appear in shaded sions. To more easily visualize patterns and re-
blocks in these tables. lationships in learning-in-planning work, we
Table 3C, covering multistrategy learning, have devised a graphic method of depicting
reflects the fact that the particular combina- the corpus of work in this survey with respect
tion of techniques used in some studies could to the five dimensions given in figure 1. Figure
not always be easily subcategorized relative to 2 illustrates this method of depiction by map-
the analytic and inductive approaches of tables ping two studies from the survey onto a ver-
3A and 3B. This is often the case, for example, sion of figure 1.
with an inductive learning implementation In this manner, every study or project cov-
that exploits the design of a particular plan- ered in the survey has been mapped onto at
ning system. Examples include HAMLET (Borrajo least one 5-node, directed subgraph of figure 3
and Veloso 1997), which exploits the search (classical planning systems) or figure 4 (sys-
tree produced by the PRODIGY 4.0 planning systems designed to handle problems beyond the
tem to lazily learn search control heuristics, classical paradigm). The edges express which
and EGBG and PEGG (Zimmerman and Kamb- combinations of the figure 1 dimensional at-
hampati 2002, 1999), which exploit GRAPH- tributes were actually realized in a system cov-
PLAN’s use of the planning graph structure to ered by the survey.
learn to shortcut the iterative search episodes. Besides providing a visual characterization
Studies such as these appear in table 3c under of the corpus of research in learning in plan-
the broader category, analytic and inductive. ning, this graphic presentation mode permits
In addition to classifying the studies sur- quick identification of all planner-learning sys-
veyed along the learning-type and planning- tem configurations that embody any of the as-
type dimensions, these tables illustrate several pects of the five dimensions (nodes). For exam-
foci of this corpus of work. For example, the ple, because the survey tables don’t show all
preponderance of research in analytic learning possible values in each dimension’s range, as-
as it applies to planning rather than inductive pects of learning in planning that have re-
learning styles is apparent, as is the heavy ceived scant attention are not obvious until
weighting in the area of state-space planning. one glances at the graphs, which entails simply
We return to such issues when discussing im- observing the edges incident on any given
plications for future research in the final sec- node. Admittedly, a disadvantage of this pre-
tion. sentation mode is that the specific planning
system associated with a given subgraph can-
Survey Tables Based on not be extracted from the figure alone. Howev-
All Five Dimensions er, the tables can assist in this regard.
The same studies appearing in tables 3A, 3B, Learning within the Classical Planning
and 3C are tabulated in tables 4A and 4B ac- Framework Figure 3 indicates with dashed
cording to all five dimensions in figure 1. We lines and fading those aspects (nodes) of the
have used a block structure within the tables to five dimensions of learning in planning that
emphasize shared attribute values wherever are not relevant to classical planning. Specifi-
possible, given the left-to-right ordering of the cally, Learning or Improving the Domain The-
dimensions. Here, the two dimensions not rep- ory is inconsistent with the classical planning
resented in the previous set of tables, “Plan- assumption of a complete and correct domain
ning-Learning Goal” and “Learning Phase,” are theory. Similarly, the strength of reinforcement
ordered first, so this block structure reveals the learning lies in its ability to handle stochastic
most about the distribution of work across at- environments in which the domain theory is
tributes in these dimensions. It’s apparent that either unknown or incomplete. (Dynamic pro-
the major focus of learning-in-planning work gramming, a close cousin to reinforcement
has been on speedup, with much less attention learning methods, requires a complete and per-
given to the aspects of learning to improve fect domain theory, but because of efficiency
plan quality or building and improving the do- considerations, it has remained primarily of

SUMMER 2003 83
Articles

Dimensions
Planning/ Learning Phase Planning Planning Systems / Studies
Type of Learning
Learning Goal Approach
. Analytic Plan space Smith and Peot (1993) [SNLP]
. . Static analysis Gerevini and Schubert (1996) [UCPOP]
. Before planning . Etzioni (1993) STATIC [PRODIGY]
. starts
. Dawson and Siklossy (1977) REFLECT
. .
Nebel, Koehler, and Dimopoulos (1997) RIFO
. .
State space Fox and Long (1998, 1999), Rintanen (2000)
. . STAN / TIM [GrRAPHPLAN]
. .
Static analysis: . Sacerdoti (1974) ABSTRIPS
. Learn abstractions . Knoblock (1990) ALPINE [PRODIGY]
.
Before and during Static analysis and . Perez and Etzioni (1992) DYNAMIC
.
planning EBL [PRODIGY]
.
Analytic . Fikes and Nilsson (1972) STRIPS
. . . Minton (1989) PRODIGY/EBL
Speedup
. . State space Gratch and DeJong (1992) COMPOSER
.
. . . [PRODIGY]
.
. EBL . Bhatnagar (1994) FAILSAFE
.
. . Kambhampati (2000) GRAPHPLAN-EBL
.
. . Plan space Chein (1989)
.
. . Kambhampati, Katukam, and Qu (1996)
.
. . UCPOP-EBL
.
During planning .
. (Compilation) Nogood Learning
. Kautz and Selman (1999) BLACKBOX
. SAT
.
.
. LP & SAT Wolfman and Weld (1999) LPSAT [RELSAT]
.
.
.
. CSP Nogood Learning
.
. Do and Kambhampati (2001) GP-CSP
.
. [GRAPHPLAN]
.
. Analytic Learning Various Abstraction-Level Cases
.
. Analogical . Bergmann and Wilke (1996) PARIS
.
. .
. User Assist Planning
. Case-Based . Avesani, Perini, and Ricci (2000) CHARADE
.
. Reasoning State space
Transformational Analogy / Adaptation
. Hammond (1989) CHEF
. Kambhampati and Hendler (1992) PRIAR
. Hanks and Weld (1995) SPA
. Leake, Kinley, and Wilson (1996) DIAL
.
Derivational Analogy / Adaptation
Veloso and Carbonell (1993) PRODIGY /
ANALOGY

Plan space Ihrig and Kambhampati (1996) [UCPOP]

With EBL
Ihrig and Kambhampati (1997) [UCPOP]

Table 4A. Survey Studies Mapped across All Five Dimensions, Part 1.
CSP = constraint-satisfaction programming. EBL = explanation-based learning. LP = linear programming. SAT = satisfiability. Stud-
ies in heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system and program
names appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

84 AI MAGAZINE
Articles

Dimensions
Planning Systems / Studies
Planning / Planning
Learning Phase Type of Learning
Learning Goal Approach
Inductive: . Leckie and Zuckerman (1998)
. . Inductive logic State space GRASSHOPPER [PRODIGY]

. . programming (ILP) Reddy and Tadepalli (1999) ExEL

. . . Action Strategies and Rivest’s
. . Other induction . Decision List Learning of Policies
Khardon (1999)
. . . Martin and Geffner (2000)
. .
. . Calistri-Yeh, Segre, and Sturgill (1996)
Multistrategy . ALPS
. During planning
. State space CBR and Induction
Speedup . Analytic and . Leake, Kinley, and Wilson (1996) DIAL
inductive .
. .
Zimmerman and Kambhampati (1999)
. . EGBG [Graphplan]
.
. Before and EBL, ILP, (Compilation) Huang, Selman, and Kautz (2000)
. during planning and static analysis SAT [BLACKBOX/FOIL]

.
. Zelle and Mooney (1993) DOLPHIN
.
. State space [PRODIGY/FOIL]
.
. EBL and ILP
Plan space Estlin and Mooney (1996) SCOPE [FOIL]
.

. . Borrajo and Veloso (1997) HAMLET

. EBL and . [PRODIGY]

Speedup During planning inductive . Deductive-Inductive and Genetic

and . State space HAMLET-EvoCK (PRODIGY) (Aler and
improve plan . . Borrajo 1998, 2002)
quality . Zimmerman and Kambhampati (2002)
PEGG [GRAPHPLAN]
.
Inductive . Plan Rewriting
(Analysis of plan . Ambite, Knoblock, and Minton (2000) PbR
differences)

. Learning Operators for Real World Robotics,

Before planning (Propositional) State space Clustering
starts Schmill, Oates, and Cohen (2000) TBA for
decision trees
inducing decision tree
Learn or improve
domain theory . Permissive Real-World Plans
During plan Analytic: . Bennett and DeJong (1996) GRASPER
execution EBL

. Learning / Refining Operators

Multistrategy: . Wang (1996a, 1996b) OBSERVER
[PRODIGY]
Analytic and .
inductive McClusky, Richardson, and Simpson
(2002) OPMAKER
Carbonell and Gil (1990); Gil (1994) EXPO
[PRODIGY]

Learn or improve . EBL and RL State space EBRL Dietterich and Flann (1997)
domain theory .
and During planning . Incremental Dynamic Programming
Inductive:
improve plan . Sutton (1991) DYNA
. Reinforcement
quality .
. learning Planning with Learned Operators
Garcia-Martinez and Borrajo (2000) LOPE

Table 4B. Survey Studies Mapped across All Five Dimensions, Part 2.
EBL = explanation-based learning. ILP = inductive logic programming. RL = reinforcement learning. SAT = satisfiability. Studies
in heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system and program
names appear in small caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

SUMMER 2003 85
Articles

Planning Aspects Learning Aspects

Type of Learning
Problem Planning Planning- Learning Phase
Type Approach Learning Analytic
Goal Explanation-Based
State Space Search Learning
Classical Planning [Conjunctive /
Static Analysis
Static world Disjunctive] Before Planning
and Abstractions
Deterministic Starts
Speed
Fully observable Up Analogic
Instantaneous Planning Derivational Analogy
Actions Plan Space Search / Case-Based
Propositional

Inductive
Beyond Classical Decision Tree
Modes
Compilation During Planning Inductive Logic
Approaches Process Programming

Improve Neural Network

CSP Plan
Quality Bayesian Learning

SAT Other Induction

Full-Scope Planning
Dynamic world Reinforcement
Stochastic Learning
Learn
Partially observable LP or Improve
Durative actions Domain During Plan
Asynchronous Goals Theory Execution Multistrategy
Metric/Continuous
Analytic and Induction

EBL and Inductive

Logic Programming

EBL and Reinforce-

ment Learning

Figure 2. Example Graphic Mapping of Two Learning-in-Planning Systems.

EBL = explanation-based learning. ILP = inductive logic programming. The layout of the five dimensions in figure 1 and their range of values
can be used to map the research work covered in the survey tables. By way of example, the PRODIGY-EBL system (Minton et al. 1989) is rep-
resented by the top connected series of gray lines; it’s a classical planning system that conducts state-space search, and it aims to speed up
planning using explanation-based learning (EBL) during the planning process. Tracing the subgraph from the left, the edge picks up the
line thickness at the State-Space node and the gray shade of the Speed-Up Planning node. The SCOPE system (Estlin and Mooney 1996) is
then represented as the branched series of thicker (2 pt) lines. SCOPE is a classical planning system that conducts plan-space search, and the
goal of its learning subsystem is to both speed up planning and improve plan quality. Thus, the Plan-Space node branches to both Planning-
Learning Goal nodes. All SCOPE’s learning occurs during the planning process, using both EBL and inductive logic programming (ILP). As
such, the edges converge at the during planning process node, but both edges persist to connect with the EBL and ILP node.

theoretical interest with respect to classical ning. Not surprisingly, learning in the third
planning.) phase, during plan execution, is not a focus for
Broadly, the figure indicates that some form classical planning scenarios because this mode
of learning has been implemented with all has clear affinity with improving a faulty do-
planning approaches. If we consider the Learn- main theory—a nonclassical problem.
ing Phase dimension of figure 3, it is obvious It is apparent, based on the figure 3 graph in
that the vast majority of the work to date has combination with the survey tables, that ex-
focused on learning conducted during the planation-based learning (EBL) has been exten-
planning process. Work in automatic extrac- sively studied and applied to every planning
tion of domain-specific knowledge through approach and both relevant planning-learning
analysis of the domain theory (Fox and Long goals. This is perhaps not surprising given that
1999, 1998; Gerevini and Schubert 1998) con- planning presumes the sort of domain theory
stitutes the learning conducted before plan- that EBL can readily exploit. Perhaps more no-

86 AI MAGAZINE
Articles

Planning Aspects Learning Aspects

Type of Learning
Problem Planning Planning- Learning Phase
Type Approach Learning Analytic
Goal Static Analysis
State Space Search and Abstractions
Classical Planning [Conjunctive /
Explanation-Based
Static world Disjunctive] Before Planning
Learning
Deterministic Starts
Speed
Fully observable Up Analogic
Instantaneous Planning Case-Based Reasoning
Actions Plan Space Search (Derivational/Trans-
Propositional formational Analogy)

Inductive
Beyond Classical Decision Tree
Modes
Compilation During Planning Inductive Logic
Approaches Process Programming

Improve Neural Network

CSP Plan
Quality Bayesian Learning

SAT Other Induction

EBL and Inductive

Logic Programming

EBL and Reinforce-

ment Learning

Figure 3. Mapping of the Survey Planning-Learning Systems for Classical

Planning Problems on the Figure 1 Characterization Structure.

table is the scant attention paid to inductive of figure 4 for each combination. Learning in a
learning techniques for classical planners. Al- dynamic, stochastic world is the natural do-
though ILP has extensively been applied as a main of reinforcement learning systems, and
learning tool for planners, other inductive as discussed earlier, this popular machine
techniques such as decision tree learning, learning field does not so readily fit our five-di-
neural networks, and Bayesian learning, have mensional learning-in-planning perspective.
seen few planning applications. Figure 4 therefore represents reinforcement
Learning within a Nonclassical Planning learning in a different manner than the other
Framework Figure 4 covers planning systems approaches; a single shade, brick crosshatch set
designed to learn in the wide range of problem of edges is used to span the five dimensions.
classes beyond the classical formulation The great majority of reinforcement learning
(shown in shaded blocks in tables 3A, 3B, and systems to date adopt a state-space perspective,
3C and 4A and 4B). There are, as yet, far fewer so there is an edge skirting this node. With re-
such learning-augmented systems, although spect to the planning-learning goal dimension,
this area of planning community interest is reinforcement learning can be viewed as both
growing. Those “beyond classical planning” “improving plan quality” (the process moves
systems that exist extend the classical planning toward the optimal policy) and “learning the
problem in a variety of different ways, but be- domain theory” (it begins without a model of
cause of space considerations, we have not re- transition probability between states). This
flected these variations with separate versions view is reflected in figure 4 as the vertical rein-

SUMMER 2003 87
Articles

Planning Aspects Learning Aspects

Problem Type Planning Planning- Learning Type of Learning

Approach Learning Phase
Goal
Analytic
State Space Search
Classical Planning [Conjunctive / Static Analysis
Static world Disjunctive] and Abstractions
Deterministic Speed Before Planning
Fully observable up Starts Explanation-Based
… planning Learning

Plan Space Search Case Based Reasoning

Beyond Classical
Planning Inductive
Dynamic world
Stochastic Improve Decision Tree
Compilation
… Plan
Approaches Inductive Logic
Quality
Programming

EBL and Inductive

Logic Programming

EBL and Reinforce-

ment Learning

Figure 4. Mapping of the Survey Planning-Learning Systems for

Beyond-Classical Planning Problems on the Figure 1 Characterization Structure.

forcement learning edge spanning these nodes Where to for Learning in

under the planning-learning goal dimension.
Finally, because reinforcement learning is both
Automated Planning?
rooted in interacting with its environment and We organize this discussion of promising direc-
takes place during the process of building the tions for future work in this field along two
plan, there is a vertical edge spanning these broad partitions: (1) apparent gaps in the cor-
nodes under the learning-phase dimension. pus of learning-in-planning research as sug-
Beyond reinforcement learning systems, fig- gested by the survey tables and figures of this
ure 4 suggests at least three aspects to the learn- report and (2) recent advances in planning that
suggest a role for learning notably beyond the
ing-in-planning work done to date for nonclas-
modes investigated by existing studies.
sical planning problems, all fielded systems
plan using state-space search, most systems Research Gaps Suggested
conduct learning during the plan execution by the Survey
phase, and EBL is again the learning method of
There are significant biases apparent in the fo-
choice. It is also notable that the only decision cus and distribution of the survey studies rela-
tree learning conducted in any planner is based tive to the five dimensions we have defined. To
in a nonclassical planning system. an extent, these biases are to be expected be-
With this overview of where we have been cause some configurations of planning-learn-
with learning in planning, we next turn our ating methods are intrinsically infeasible or
tention to open issues and research directions poorly matched (for example, learning domain
that beckon. theory in a classical planning context or com-

88 AI MAGAZINE
Articles

bining reinforcement learning with SAT, which faster processing of grounded versions involv-
does not capture the concept of a state). In as- ing only propositions. The cost of rule check-
sessing the survey tables here, however, we ing and matching in more recent systems that
seek learning-in-planning configurations that use grounded operators is much lower than for
are feasible, have been largely ignored, and ap- planning systems that handle uninstantiated
pear to hold promise. variables.
Nonanalytic Learning Techniques The Not conceding these hurdles to be insur-
survey tables suggest a considerable bias to- mountable, we suggest the following research
ward analytic learning in planning, which de- approaches:
serves to be questioned. Why is analytic learn- One trade-off associated with a move to
ing so favored? In a sense, a planner using EBL planning with grounded operators is the loss of
is learning guaranteed knowledge, control infor- generality in the basic precepts that are most
mation that is provably correct. However, it is readily learned. For example, GRAPHPLAN can
well known within the machine learning com- learn a great number of “no goods” during
munity that approximately correct knowledge search on a given problem, but in their basic
can be at least as useful, particularly if we’re form, they are only relevant to the given prob-
careful not to sacrifice completeness. Given the lem. GRAPHPLAN retains no interproblem mem-
presence of a high-level domain theory, it is ory. It is worth considering what might consti-
reasonable to exploit it to learn. However, large tute effective interproblem learning for such a
constraints are placed on just what can be system.
learned if the planner doesn’t also take advan- The rule utility issue faced by analytic learn-
tage of the full planning search experience. ing systems (and possibly all systems that learn
The tables and figures of this study indicate the search control rules) can be viewed as the prob-
extent to which ILP has been used in this spirit lem of incurring the cost of a large set of sound,
together with EBL. This is a logical marriage of exact, and probably overspecific rules. Learn-
two mature methodologies; ILP in particular ing systems that can reasonably relax the
has powerful engines for inducing logical ex- soundness criterion for learned rules can move
pressions, such as FOIL (Quinlan 1990), that can broadly toward a problem goal using generally
readily be employed. It is curious to note, how- correct search control. Some of the multistrat-
ever, that decision tree learning has been used egy studies reflected in table 3C are relevant to
in only one study in this entire survey, yet this this view to the extent that they attempt to
inductive technique is at least as mature and leverage the strengths of both analytic and in-
features its own very effective engines such as ductive learning techniques to acquire more
ID 3 and C 4.5 (Quinlan 1993, 1986). In the useful rules. Initial work with an approach that
1980s, decision tree algorithms were generally does not directly depend on a large set of train-
not considered expressive enough to capture ing examples was reported in Kambhampati
complex target concepts (such as under what (1999). Here, a system is described that seeks to
conditions to apply an operator). However, giv- learn approximately correct rules by relaxing
en subsequent evolutions in both decision tree the constraint of the UCPOP-EBL system that re-
methods and the current opportunities for quires regressed failure explanations from all
learning to assist the latest generation of plan- branches of a search subtree before a search
ners, the potential of decision tree learning in control rule is constructed.
planning merits reconsideration. Perhaps the most ambitious approach to
Learning across Problems A learning as- learning across problems would be to extend
pect that has largely fallen out of favor in re- some of the work being done in analogical rea-
cent years is the compilation and retention of soning elsewhere in AI to the planning field.
search guidance that can be used across differ- The goal is to exploit any similarity between
ent problems and perhaps even different do- problems to speed up solution finding. Current
mains. One of the earliest implementations of case-based reasoning implementations in plan-
this took the form of learning search control ning are capable of recognizing a narrow range
rules (for example, using EBL). There might be of similarities between an archived partial plan
two culprits that led to disenchantment with and the current state the planner is working
learning this interproblem search control: from. Such systems cannot apply knowledge
First is the utility problem that can surface learned in one logistics domain, for example,
when too many, or relatively ineffective rules to another system—even though a human
are learned. would find it natural to use what he/she has
Second is the propositionalization of the plan- learned in solving an AIPS planning competi-
ning problem, wherein lifted representations tion driver log problem to a depot problem. We
of the domain theory were forsaken for the note that transproblem learning has been ap-

SUMMER 2003 89
Articles

proached from a somewhat different direction

in Fox and Long (1999) using a process of iden-
tifying abstract types during domain prepro-
Reinforcement cessing.
Extending Learning to Nonclassical

Learning Planning Problems The preponderance of

planning research has been based in classical
planning, as is borne out by the survey tables

The Special Case and figures. Historically, this weighting arose

because of the need to study a less daunting
problem than full-scope planning, and much
of the progress realized in classical planning
has indeed provided the foundation for ad-
In the context of the figure 1 dimensions for a learning-in- vances now being made in nonclassical formu-
planning system, reinforcement learning must be seen as a lations. It is a reasonable expectation that the
special case. Unlike the other learning types, this widely body of work in learning methods adapted to
studied machine learning field is not readily characterized as classical planning will similarly be modified
a learning technique for augmenting a planning system. Es- and extended to nonclassical planning sys-
sentially, it’s a toss up whether to view reinforcement learn- tems. With the notable exception of reinforce-
ing as a learning system that contains a planning subsystem ment learning, the surface has scarcely been
with a learning component. Reinforcement learning is de- scratched in this regard.
fined more clearly by characterizing a learning problem in- If, as we suggest in the introduction, the re-
stead of a learning technique. cent striking advances in speed for state-of-the-
A general reinforcement learning problem can be seen as art planning systems lies behind the relative
composed of just three elements: (1) goals an agent must paucity of current research in speedup learn-
achieve, (2) an observable environment, and (3) actions an ing, the focus might soon shift back in this di-
agent can take to affect the environment (Sutton and Barto rection. These systems, impressive though they
1998). Through trial-and-error online visitation of states in are, demonstrated their speedup abilities in
its environment, such a reinforcement learning system seeks classical planning domains. As the research at-
to find an optimal policy for achieving the problem goals. tention shifts to problems beyond the classical
When reinforcement learning is applied to a planning prob- paradigm, the greatly increased difficulty of
lem, a fourth element, the presence of a domain theory, the problems themselves seems likely to renew
comes into play. The explicit model of the valid operators is planning community interest in speed-up
used to direct the exploration of the state space, and this learning approaches.
space is used (together with the reward associated with each
state), in turn, to refine the domain theory. Because, in prin- New Avenues for Learning in
ciple, the “exact domain theory” is never acquired, reinforce- Planning Motivated by Recent
ment learning has been termed a “lifelong learning process.” Developments in Planning
This aspect stands in sharp contrast to the assumption in Recent advances in planning research suggest
classical planning that the planner is provided a complete several aspects of the new generation of plan-
and perfect domain theory. ners for which machine learning methods
Because of the tightly integrated nature of the planning might provide important enhancements. We
and learning aspects of reinforcement learning, the five-di- discuss here three such avenues for learning in
mensional view of figure 1 is not as useful for characterizing planning: (1) offline learning of domain
implemented reinforcement learning-planning systems as it knowledge, (2) learning to improve heuristics,
is for other learning-augmented planners. Nonetheless, and (3) learning to improve plan quality.
when we analyze the survey results in the next section, we
will map planning-oriented reinforcement learning work on- Offline Learning of Domain Knowledge
to this dimensional structure for purposes of comparison We have previously noted the high overhead
with the other nine learning techniques that have been (or cost of conducting learning online during the
could be) used to augment planning systems. course of solving a single problem, relative to
often-short solution times for the current gen-
eration of fast and efficient planners. This
handicap might help explain more recent in-
terest in offline learning, such as domain
analysis, which can be reused to advantage
over a series of problems within a given do-
main. The survey results and figure 3 also sug-
gest an area of investigation that has so far
been neglected in studies focused on nonclas-

90 AI MAGAZINE
Articles

sical planning—the learning of domain invari- planner at key decision points in its search. As
ants before planning starts. This static analysis such, considerable research effort is focusing
has been shown to be an effective speedup ap- on finding more effective domain-indepen-
proach for many classical planning domains, dent heuristics and tuning heuristics to partic-
and there is no reason to believe it cannot sim- ular problems and domains. The role that
ilarly boost nonclassical planning. learning might play in acquiring or refining
On another front, there has been much en- such heuristics has largely been unexplored. In
thusiasm in parts of the planning community particular, learning such heuristics inductively
for applying domain-specific knowledge to during the planning process would seem to
speed up a given planner (for example, TL PLAN hold promise. Generally, the heuristic values
[Bacchus and Kabanza 2000] and BLACKBOX are calculated by a linear combination of
[Kautz and Selman 1998]). This advantage has weighted terms where the designer chooses
also been realized in hierarchical task network both the terms and their weights in hopes of
(HTN) planning systems by supplying domain- obtaining an equation that will be robust
specific task-reduction schemas to the planner across a variety of problems and domains. The
(SHOP [Nau et al. 1999]). Such leveraging of search trace (states visited) resulting from a
user-supplied domain knowledge has been problem-solving episode could provide the
shown to greatly decrease planning time for a negative and positive examples needed to train
variety of domains and problems. One draw- a neural network or learn a decision tree. Pos-
back of this approach is the burden it places on sible target functions for inductively learning
the user to correctly hand code the domain or improving heuristics include term weights
knowledge ahead of time and in a form usable that are most likely to lead to higher-quality
by the particular planner. Offline learning solutions for a given domain, term weights
techniques might be exploited here. If the user that will be most robust across many domains,
provides very high-level domain knowledge in attributes that are most useful for classifying
a format readily understandable by humans, states, exceptions to an existing heuristic such
the system could learn in supervised fashion to as used in LRTA* (Korf 1990), and a metalevel
operationalize this background knowledge to function that selects or modifies a search
the particular formal representation usable by heuristic based on the problem or domain.
a given target planning system. If the user is Multistrategy learning might also play a role
not to be burdened with learning the planner’s in that the user might provide background
low-level language for knowledge representa- knowledge in the form of the base heuristic.
tion, this approach might entail solving sam- The ever-growing cadre of planning ap-
ple problems iteratively with combinations of proaches and learning tools, each with their
these domain rules to determine both correct- own strengths and weaknesses, suggests an-
ness and efficacy. other inviting direction for speedup learning.
An interesting related issue is the question of Learning a rule set or heuristic that will direct
which types of knowledge are easiest and hard- the application of the most effective approach
est to learn, which has a direct impact on the (or multiple approaches) for a given problem
types of knowledge that might actually be could lead to a metaplanning system with ca-
worth learning. The closely related machine pabilities well beyond any individual planner.
learning aspect of sample complexity addresses Interesting steps in this direction have been
the number and type of examples that are taken by Horvitz et al. (2001) using the con-
needed to induce a given concept or target struction and use of Bayesian models to predict
function. To date, the relative difficulty of the run time of various problem solvers.
learning tasks has received little attention with Learning to Improve Plan Quality The
respect to the domain-specific knowledge used survey tables and figures suggest that the issue
by some planners. What are the differences in of improving plan quality using learning has
terms of the sample complexity of learning dif- received much less attention in the planning
ferent types of domain-specific control knowl- community than speedup learning. However,
edge? For example, it would be worth catego- because planning systems are ported into real-
rizing the TL PLAN control rules versus the world applications, this concern is likely to be
SHOP/HTN–style schemas in terms of their sam- a primary one. Many planning systems that
ple complexity. successfully advance into the marketplace will
Learning to Improve Heuristics The need to interact frequently with human users
credit for both the revival of plan-space plan- in ways that have received scant attention in
ning and the impressive performance of most the lab. Such users are likely to have individual
state-space planners in recent years goes largely biases with respect to plan quality that they
to the development of heuristics that guide the can be hard pressed to quantify. These plan-

SUMMER 2003 91
Articles

might be that the best way to tailor an interac-

tive planner will be in the manner of the pro-
gramming-by-demonstration systems that
have recently received attention in the ma-
chine learning community (Lau, Domingos,
Most
and Weld 2000). Such a system implemented
IDEAL
Current on top of a planner might entail having the
Degree of Automation

Planning user create plans for several problems that the

Systems learning system would then parse to learn plan
aspects peculiar to the particular user.

Summary and Conclusion

User‘s
Priority
We have presented the results of an extensive
survey of research conducted and published
since the first application of learning to auto-
mated planning was implemented some 30
years ago. In addition to compiling categorized
Coverage of Real-World Aspects tables of the corpus of work, we have presented
a five-dimensional characterization of learning
in planning and mapped the studies onto it.
This process has clarified the foci of the work in
Figure 5. The Coverage versus Automation Trade-Off in Planning Systems.
this area and suggested a number of avenues
along which the community might reasonably
proceed in the future. It is apparent that auto-
ning systems could be charted in a 2D space
mated planning and machine learning are
with the following two axes: (1) degree of cov-
well-matched methodologies in a variety of
erage of the issues confronted in a real-world
configurations, and we suggest there are a
problem, that is, the capability of the system to
number of these approaches that merit more
deal with all aspects of a problem without ab-
research attention than they have received to
stracting them away and (2) degree of automa-
date. We have expanded on several of these
tion, that is, the extent to which the system au-
possibilities and offered our conjectures about
tomatically reasons about the various problem
where the more interesting work might lie.
aspects and makes decisions without guidance
by the human user. References
Figure 5 shows such a chart for current-day
Aler, R., and Borrajo, D. 2002. On Control Knowl-
planning systems. The ideal planner plotted on edge Acquisition by Exploiting Human-Computer
this chart would obviously lie in the top-right Interaction. Paper presented at the Sixth Internation-
corner. It is interesting to note that most al Conference on Artificial Intelligence Planning and
users—aware that they can’t have it all—prefer Scheduling. 23–27 April, Toulouse, France.
a system that can, in some sense, handle most Aler, R.; Borrajo, D.; and Isasi, P. 1998. Genetic Pro-
aspects of the real-world problem at the ex- gramming and Deductive-Inductive Learning: A
pense of full automation. However, most cur- Multistrategy Approach. Paper presented at the Fif-
rent-day planning systems abstract away large teenth International Conference on Machine Learn-
portions of the real-world problem in favor of ing, ICML ’98, 24–27 July, Madison, Wisconsin.
fully automating what the planner can actually Ambite, J. L.; Knoblock, C. A.; and Minton, S. 2000.
accomplish. In large practical planning envi- Learning Plan Rewriting Rules. Paper presented at
ronments, fully automated planning is neither the Fifth International Conference on Artificial Intel-
ligence Planning and Scheduling, 14–17 April,
feasible nor desirable because users want to ob-
Breckenridge, Colorado.
serve and control plan generation.
Ashley, K., and Aleven, V. 1997. Reasoning Symboli-
Some planning systems such as HICAP
cally about Partially Matched Cases. In Proceedings
(Munoz-Avila et al. 1999) and ALPS (Calistri- of the Fifteenth International Joint Conference on
Yeh, Segre, and Sturgill 1996) have made in- Artificial Intelligence, IJCAI-97, 335–341. Menlo
roads toward building an effective interface Park, Calif.: International Joint Conferences on Arti-
with their human users. No significant role for ficial Intelligence.
learning has been established yet for such sys- Avesani, P.; Perini, A.; and Ricci, F. 2000. Interactive
tems, but possibilities include learning user Case-Based Planning for Forest Fire Management.
preferences with respect to plan actions, inter- Applied Intelligence 13(1): 41–57.
mediate states, and pathways. Given the hu- Barto, A.; Bradtke, S.; and Singh, S. 1995. Learning to
man inclination to “have it their way,” it Act Using Real-Time Dynamic Programming. Artifi-

92 AI MAGAZINE
Articles

cial Intelligence 72(1): 81–138. ings of the Fifth International Joint Conference on
Ashley, K. D., and McLaren, B. 1995. Reasoning with Artificial Intelligence, 465–471. Menlo Park, Calif.:
Reasons in Case-Based Comparisons. In Proceedings International Joint Conferences on Artificial Intelli-
of the First International Conference on Cased-Based gence.
Reasoning (ICCBR-95), 133–144. Berlin: Springer. Dearden, R.; Friedman, N.; and Russell, S. 1998.
Bacchus, F., and Kabanza, F. 2000. Using Temporal Bayesian Q-Learning. In Proceedings of the Fifteenth
Logics to Express Search Control Knowledge for Plan- National Conference on Artificial Intelligence (AAAI-
ning. Artificial Intelligence 116(1–2): 123–191. 98), 761–768. Menlo Park, Calif.: American Asso-
ciation for Artificial Intelligence.
Bennett, S. W., and DeJong, G. F. 1996. Real-World
Robotics: Learning to Plan for Robust Execution. Ma- Dempster, A. P.; Laird, N. M.; and Rubin, D. B. 1977.
chine Learning 23(2–3): 121–161. Maximum Likelihood from Incomplete Data via the
EM Algorithm. Journal of the Royal Statistical Society
Bergmann, R., and Wilke, W. 1996. On the Role of
B39(1): 1–38.
Abstractions in Case-Based Reasoning. In Proceedings
of EWCBR-96, the European Conference on Case-Based Dietterich, T. G., and Flann, N. S. 1997. Explanation-
Reasoning, 28–43. New York: Springer. Based Learning and Reinforcement Learning: A Uni-
fied View. Machine Learning 28:169–210.
Bhatnagar, N., and Mostow, J. 1994. Online Learning
from Search Failures. Machine Learning 15(1): 69–117. Do, B., and Kambhampati, S. 2003. Planning as Con-
straint Satisfaction: Solving the Planning Graph by
Blum, A., and Furst, M. L. 1997. Fast Planning
Compiling It into a CSP. Artificial Intelligence 132:
through Planning Graph Analysis. Artificial Intelli-
151–182.
gence 90(1–2): 281–300.
Estlin, T. A., and Mooney, R. J. 1996. Multi-Strategy
Borrajo D., and Veloso, M. 1997. Lazy Incremental
Learning of Search Control for Partial-Order Plan-
Learning of Control Knowledge for Efficiently Ob-
ning. In Proceedings of the Thirteenth National Con-
taining Quality Plans. Artificial Intelligence Review
ference on Artificial Intelligence, 843–848. Menlo
11(1–5): 371–405.
Park, Calif.: American Association for Artificial Intel-
Bylander, T. 1992. Complexity Results for Serial De-
ligence.
composability. In Proceedings of the Tenth National
Etzioni, O. 1993. Acquiring Search-Control Knowl-
Conference on Artificial Intelligence (AAAI-92),
edge via Static Analysis. Artificial Intelligence 62(2):
729–734. Menlo Park, Calif.: American Association
265–301.
for Artificial Intelligence.
Fikes, R. E., and Nilsson, N .J. 1971. STRIPS: A New Ap-
Calistri-Yeh, R.; Segre, A.; and Sturgill, D. 1996. The
proach to the Application of Theorem Proving to
Peaks and Valleys of ALPS: An Adaptive Learning and
Problem Solving. Artificial Intelligence 2(3–4):
Planning System for Transportation Scheduling. Pa-
per presented at the Third International Conference 189–208.
on Artificial Intelligence Planning Systems (AIPS-96), Fikes, R. E.; Hart, P.; and Nilsson, N. J. 1972. Learning
29–31 May, Edinburgh, United Kingdom. and Executing Generalized Robot Plans. Artificial In-
Carbonell, Y. G., and Gil, Y. 1990. Learning by Exper- telligence 3:251–288.
imentation: The Operator Refinement Method. In Fox, M., and Long, D. 1999. The Detection and Ex-
Machine Learning: An Artificial Intelligence Approach, ploitation of Symmetry in Planning Problems. Paper
Volume 3, eds. Y. Kodtratoff and R. S. Michalski, presented at the Sixteenth International Joint Con-
191–213. San Francisco, Calif.: Morgan Kaufmann. ference on Artificial Intelligence, 31 July–6 August,
Stockholm, Sweden.
Chien, S. A. 1989. Using and Refining Simplifi-
cations: Explanation-Based Learning of Plans in In- Fox, M., and Long, D. 1998. The Automatic Inference
tractable Domains. In Proceedings of the Eleventh of State Invariants in TIM. Journal of Artificial Intelli-
International Joint Conference on Artificial Intelli- gence Research 9: 317–371.
gence, 590–595. Menlo Park, Calif.: International Fu, L.-M. 1989. Integration of Neural Heuristics into
Joint Conferences on Artificial Intelligence. Knowledge-Based Inference. Connection Science 1(3):
Cohen, W. W. 1990. Learning Approximate Control 325–340.
Rules of High Utility. Paper presented at the Seventh García-Martínez, R., and Borrajo, D. 2000. An Inte-
International Conference on Machine Learning, grated Approach of Learning, Planning, and Execu-
21–23 June, Austin, Texas. tion. Journal of Intelligent and Robotic Systems 29(1):
Cohen, W. W., and Singer, Y. 1999. A Simple, Fast, 47-78.
and Effective Rule Learner. In Proceedings of the Six- Gerevini, A., and Schubert, L. 1998. Inferring State
teenth National Conference on Artificial Intelligence Constraints for Domain-Independent Planning. In
(AAAI-99), 335–342. Menlo Park, Calif.: American Proceedings of the Fifteenth National Conference on
Association for Artificial Intelligence. Artificial Intelligence, 905–912. Menlo Park, Calif.:
Craven, M., and Shavlik, J. 1993. Learning Symbolic American Association for Artificial Intelligence.
Rules Using Artificial Neural Networks. Paper pre- Gerevini, A., and Schubert, L. 1996. Accelerating Par-
sented at the Tenth International Conference on Ma- tial-Order Planners: Some Techniques for Effective
chine Learning, 24–27 July, London, United King- Search Control and Pruning. Journal of Artificial Intel-
dom. ligence Research 5:95–137.
Dawson, C., and Siklossy, L. 1977. The Role of Pre- Gil, Y. 1994. Learning by Experimentation: Incre-
processing in Problem-Solving Systems. In Proceed- mental Refinement of Incomplete Planning Do-

SUMMER 2003 93
Articles

mains. Paper presented at the Eleventh International Kambhampati, S. 2000. Planning Graph as (Dynam-
Conference on Machine Learning, 10–13 July, New ic) CSP: Exploiting EBL, DDB, and Other CSP Tech-
Brunswick, New Jersey. niques in GRAPHPLAN. Journal of Artificial Intelligence
Gratch, J., and Dejong, G. 1992. COMPOSER: A Proba- Research 12:1–34.
bilistic Solution to the Utility Problem in Speed-Up Kambhampati, S. 1998. On the Relations between In-
Learning, In Proceedings of the Tenth National Con- telligent Backtracking and Failure-Driven Explana-
ference on Artificial Intelligence (AAAI-92), 235–240. tion-Based Learning in Planning. Constraint Satisfac-
Menlo Park, Calif.: American Association for Artifi- tion and Artificial Intelligence 105(1–2): 161–208.
cial Intelligence. Kambhampati, S., and Hendler, J. 1992. A Validation
Hammond, K. 1989. Case-Based Planning: Viewing Structure–Based Theory of Plan Modification and
Planning as a Memory Task. San Diego, Calif.: Acade- Reuse. Artificial Intelligence 55(23): 193–258.
mic Press. Kambhampati, S., and Katukam, Y. Q. 1996. Failure-
Hanks, S., and Weld, D. 1995. A Domain-Indepen- Driven Dynamic Search Control for Partial Order
dent Algorithm for Plan Adaptation. Journal of Artifi- Planners: An Explanation-Based Approach. Artificial
cial Intelligence Research 2:319–360. Intelligence 88(1–2): 253–315.
Hinton, G. E. 1989. Connectionist Learning Proce- Kautz, H., and Selman, B. 1999. BLACKBOX: Unifying
dures. Artificial Intelligence 40(1–3): 185–234. SAT-Based and Graph-Based Planning. In Proceedings
Hofstadter, D. R., and Marshall, J. B. D. 1993. A Self- of the Sixteenth International Joint Conference on
Watching Cognitive Architecture of High-Level Per- Artificial Intelligence (IJCAI-99), 318–325. Menlo
ception and Analogy-Making. Technical Report, Park, Calif.: International Joint Conferences on Arti-
TR100, Center for Research on Concepts and Cogni- ficial Intelligence.
tion, Indiana University. Kautz, H., and Selman, B. 1998. The Role of Domain-
Hofstadter, D. R., and Marshall, J. B. D. 1996. Beyond Specific Knowledge in the Planning as Satisfiability
Copycat: Incorporating Self-Watching into a Com- Framework. Paper presented at the Fifth Internation-
puter Model of High-Level Perception and Analogy al Conference on Planning and Scheduling (AIPS-
Making. Paper presented at the 1996 Midwest Artifi- 98), 7–10 June, Pittsburgh, Pennsylvania.
cial Intelligence and Cognitive Science Conference, Kedar-Cabelli, S., and McCarty, T. 1987. Explanation-
26–28 April, Bloomington, Indiana. Based Generalization as Resolution Theorem Prov-
Hollatz, J. 1999. Analogy Making in Legal Reasoning ing. In Proceedings of the Fourth International Workshop
with Neural Networks and Fuzzy Logic. Artificial In- on Machine Learning, 383–389. San Francisco, Calif.:
telligence and Law 7(2–3): 289–301. Morgan Kaufmann.
Horvitz, E.; Ruan, Y.; Gomes, C.; Kautz, H.; Selman, Khardon, R. 1999. Learning Action Strategies for
B.; and Chickering, D. M. 2001. A Bayesian Approach Planning Domains. Artificial Intelligence 113(1–2):
to Tackling Hard Computational Problems. Paper 125–148.
presented at the Seventeenth Conference on Uncer- Knoblock, C. 1990. Learning Abstraction Hierarchies
tainty in Artificial Intelligence, 2–5 August, Seattle, for Problem Solving. In Proceedings of the Eighth
Washington. National Conference on Artificial Intelligence,
Huang, Y.; Kautz, H.; and Selman, B. 2000. Learning 923–928. Menlo Park, Calif.: American Association
Declarative Control Rules for Constraint-Based Plan- for Artificial Intelligence.
ning. Paper presented at the Seventeenth Interna- Kodtratoff, Y., and Michalski, R. S., eds. 1990. Ma-
tional Conference on Machine Learning, 29 June–2 chine Learning: An Artificial Intelligence Approach, Vol-
July, Stanford, California. ume 3. San Francisco, Calif.: Morgan Kaufmann.
Hunt, E. B.; Marin, J.; and Stone, P. J. 1966. Experi- Korf, R. 1990. Real-Time Heuristic Search. Artificial
ments in Induction. San Diego, Calif.: Academic Press. Intelligence 42(2–3): 189–211.
Ihrig, L., and Kambhampati, S. 1997. Storing and In- Laird, J.; Newell, A.; and Rosenbloom, P. 1987. SOAR:
dexing Plan Derivations through Explanation-Based An Architecture for General Intelligence. Artificial In-
Analysis of Retrieval Failures. Journal of Artificial In- telligence 33(1): 1–64.
telligence 7:161–198. Lang, K. 1995. NEWSWEEDER: Learning to Filter Net-
Ihrig, L., and Kambhampati, S. 1996. Design and Im- news. In Proceedings of the Twelfth International Con-
plementation of a Replay Framework Based on a Par- ference on Machine Learning, 331–339. San Francisco,
tial-Order Planner. In Proceedings of the Thirteenth Calif.: Morgan Kaufmann.
National Conference on Artificial Intelligence (AAAI- Langley, P. 1997. Challenges for the Application of
96). Menlo Park, Calif.: American Association for Ar- Machine Learning. In Proceedings of the ICML ‘97
tificial Intelligence. Workshop on Machine Learning Application in the Real
Jones, R., and Langley, P. 1995. Retrieval and Learn- World: Methodological Aspects and Implications, 15–18.
ing in Analogical Problem Solving. In Proceedings of San Francisco, Calif.: Morgan Kaufmann.
the Seventeenth Conference of the Cognitive Science Lau, T.; Domingos, P.; and Weld, D. 2000. Version
Society, 466–471. Pittsburgh, Pa.: Lawrence Erlbaum. Space Algebra and Its Application to Programming
Kakuta, T.; Haraguchi, M.; Midori-ku, N.; and Okubo, by Demonstration. Paper presented at the Seven-
Y. 1997. A Goal-Dependent Abstraction for Legal teenth International Conference on Machine Learn-
Reasoning by Analogy. Artificial Intelligence and Law ing, 29 June–2 July, Stanford, California.
5(1–2): 97–118. Lavrac, N.; Dzeroski, S.; and Grobelnik, M. 1991.

94 AI MAGAZINE
Articles

Learning Nonrecursive Definitions of Relations with Ninth International Conference on Machine Learn-
LINUS. In Proceedings of the Fifth European Working Ses- ing, 27–30 June, Bled, Slovenia.
sion on Learning, 265–281. Berlin: Springer. Pomerleau, D. A. 1993. Knowledge-Based Training of
Leake, D.; Kinley, A.; and Wilson, D. 1996. Acquiring Artificial Neural Networks for Autonomous Robot
Case Adaptation Knowledge: A Hybrid Approach. In Driving. In Robot Learning, eds. J. Connell and S. Ma-
Proceedings of the Thirteenth National Conference hadevan, 19–43. Boston: Kluwer Academic.
on Artificial Intelligence, 684–689. Menlo Park, Quinlan, J. R. 1993. C4.5: Programs for Machine Learn-
Calif.: American Association for Artificial Intelli- ing. San Francisco, Calif.:. Morgan Kaufmann.
gence.
Quinlan, J. R. 1990. Learning Logical Definitions
Leckie, C., and Zuckerman, I. 1998. Inductive Learn- from Relations. Machine Learning 5:239–266.
ing of Search Control Rules for Planning. Artificial In-
Quinlan, J. R. 1986. Induction of Decision Trees. Ma-
telligence 101(1–2): 63–98.
chine Learning 1(1): 81–106.
Martin, M., and Geffner, H. 2000. Learning General-
Reddy, C., and Tadepalli, P. 1999. Learning Horn De-
ized Policies in Planning Using Concept Languages.
finitions: Theory and an Application to Planning.
In Proceedings of the Seventh International Conference
New Generation Computing 17(1): 77–98.
on Knowledge Representation and Reasoning (KR 2000),
667–677. San Francisco, Calif.: Morgan Kaufmann. Rintanen, J. 2000. An Iterative Algorithm for Synthe-
sizing Invariants. In Proceedings of the Seventeenth
Minton, S., ed. 1993. Machine Learning Methods for
National Conference on Artificial Intelligence and
Planning. San Francisco, Calif.: Morgan Kaufmann.
the Twelfth Innovative Applications of AI Confer-
Minton, S.; Carbonell, J.; Knoblock, C.; Kuokka, D. ence, 806-811. Menlo Park, Calif.: American Associ-
R.; Etzioni, O.; and Gil, Y. 1989. Explanation-Based ation for Artificial Intelligence.
Learning: A Problem-Solving Perspective. Artificial
Sacerdoti, E. 1974. Planning in a Hierarchy of Ab-
Intelligence 40:63–118.
straction Spaces. Artificial Intelligence 5(2): 115–135.
Mitchell, T. M., and Thrun, S. B. 1995. Learning An-
Samuel, A. L. 1959. Some Studies in Machine Learn-
alytically and Inductively. In Mind Matters: A Tribute
ing Using the Game of Checkers. IBM Journal of Re-
to Allen Newell (Carnegie Symposia on Cognition), eds.
search and Development 3(3): 210–229.
J. D. Steier, T. Mitchell, and A. Newell. New York:
Lawrence Erlbaum. Schmill, M.; Oates, T.; and Cohen, P. 2000. Learning
Planning Operators in Real-World, Partially Observ-
Mitchell, T.; Keller, R.; and Kedar-Cabelli, S. 1986. Ex-
able Environments. Paper presented at the Fifth Con-
planation-Based Generalization: A Unifying View.
ference on Artificial Intelligence Planning Systems
Machine Learning 1(1): 47–80.
(AIPS-2000), 14–17 April, Breckenridge, Colorado.
Muggleton, S., and Feng, C. 1990. Efficient Induction
Shavlik, J. W., and Towell, G. G. 1989. An Approach
of Logic Programs. Paper presented at the First Con-
to Combining Explanation-Based and Neural Learn-
ference on Algorithmic Learning Theory, 8–10 Octo-
ing Algorithms. Connection Science 1(3): 231–253.
ber, Ohmsma, Tokyo, Japan.
Sheppard, J., and Salzberg, S. 1995. Combining Ge-
Munoz-Avila, H.; Aha, D. W.; Breslow, L.; and Nau, D.
netic Algorithms with Memory-Based Reasoning. Pa-
1999. HICAP: An Interactive Case-Based Planning Ar-
per presented at the Sixth International Conference
chitecture and Its Application to Noncombatant
on Genetic Algorithms, 15–19 July, Pittsburgh, Penn-
Evacuation Operations. In Proceedings of the Ninth
sylvania.
Conference on Innovative Applications of Artificial
Intelligence, 879–885. Menlo Park, Calif.: American Smith, D., and Peot, M. 1993. Postponing Threats in
Association for Artificial Intelligence. Partial-Order Planning. In Proceedings of the
Eleventh National Conference on Artificial Intelli-
Nau, D.; Cao, Y.; Lotem, A.; and Munoz-Avila, H.
gence (AAAI-93), 500–506. Menlo Park, Calif.: Amer-
1999. SHOP: Simple Hierarchical Order Planner. In
ican Association for Artificial Intelligence.
Proceedings of the Sixteenth International Joint
Conference on Artificial Intelligence (IJCAI-99), Sutton, R. 1991. Planning by Incremental Dynamic
968–975. Menlo Park, Calif.: International Joint Programming. In Proceedings of the Eighth Internation-
Conference on Artificial Intelligence. al Conference on Machine Learning, 353–357. San Fran-
cisco, Calif.: Morgan Kaufmann.
Nebel, B.; Dimopoulos, Y.; and Koehler, J. 1997. Ig-
noring Irrelevant Facts and Operators in Plan Gener- Sutton, R. 1988. Learning to Predict by the Methods
ation. Paper presented at the Fourth European Con- of Temporal Differences. Machine Learning 3(1): 9–44.
ference on Planning (ECP-97), 24–26 September, Sutton, R., and Barto, G. 1998. Reinforcement Learn-
Toulouse, France. ing—-An Introduction. Cambridge, Mass.: MIT Press.
Ourston, D., and Mooney, R. 1994. Theory Refine- Sycara, K.; Guttal, R.; Koning, J.; Narasimhan, S.; and
ment Combining Analytical and Empirical Methods. Navinchandra, D. 1992. CADET: A Case-Based Synthe-
Artificial Intelligence 66(2): 273–309. sis Tool for Engineering Design. International Journal
Pazzani, M. J.; Brunk, C. A.; and Silverstein, G. 1991. of Expert Systems 4(2).
A Knowledge-Intensive Approach to Learning Rela- Veloso, M., and Carbonell, J. 1993. Derivational
tional Concepts. Paper presented at the Eighth Inter- Analogy in PRODIGY: Automating Case Acquisition,
national Workshop on Machine Learning, 27–29 Storage, and Utilization. Machine Learning 10(3):
June, Evanston, Illinois. 249–278.
Perez, M., and Etzioni, O. 1992. DYNAMIC: A New Role Wang, X. 1996a. A Multistrategy Learning System for
for Training Problems in EBL. Paper presented at the Planning Operator Acquisition. Paper presented at

SUMMER 2003 95
Articles

the Third International Workshop on Multistrategy

Learning, 23–25 May, Harpers Ferry, West Virginia.
Wang, X. 1996b. Planning While Learning Opera-
tors. Paper presented at the Third International Con-
ference on Artificial Intelligence Planning Systems
(AIPS’96), 29–31 May, Edinburgh, Scotland.
Watkins, C. 1989. Learning from Delayed Rewards.
Ph.D. dissertation, Psychology Department, Univer-
sity of Cambridge, Cambridge, United Kingdom.
Wolfman, S., and Weld, D. 1999. The LPSAT Engine
and Its Application to Resource Planning. In Proceed-
ings of the Sixteenth International Joint Conference
on Artificial Intelligence, 310–317. Menlo Park,
Calif.: International Joint Conferences on Artificial
Intelligence.
Zelle, J., and Mooney, R. 1993. Combining FOIL and
EBG to Speed Up Logic Programs. In Proceedings of
the Thirteenth International Joint Conference on Ar-
tificial Intelligence, 1106–1111. Menlo Park, Calif.:
International Joint Conferences on Artificial Intelli-
gence.
Zimmerman, T., and Kambhampati, S. 2002. Gener-
ating Parallel Plans Satisfying Multiple Criteria in
Anytime Fashion. Paper presented at the Sixth Inter-
national Conference on Artificial Intelligence Plan-
ning and Scheduling Workshop on Planning and
Scheduling with Multiple Criteria, 23–27 April,
Toulouse, France.
Zimmerman, T., and Kambhampati, S. 1999. Exploit-
ing Symmetry in the Planning Graph via Explana-
tion-Guided Search. Paper presented at the Sixteenth
National Conference on Artificial Intelligence, 18–22
July, Orlando, Florida.

A classic Zweben, M.; Davis, E.; Daun, B.; Drascher, E.; Deale,
M.; and Eskey, M. 1992. Learning to Improve Con-

… straint-Based Scheduling. Artificial Intelligence

58(1–3): 271–296.

Terry Zimmerman has an M.S. in nuclear science

and engineering and is in the throes of completing

still available his PhD. in computer science and engineering at Ari-

zona State University. His dissertation study in the
area of automated planning combines aspects of

from AAAI
heuristic search control, learning, and optimization
over multiple-quality criteria. He previously con-
ducted probabilistic risk assessment and reliability
analysis for energy facilities and developed software
for statistical analysis of experimental nuclear fuel
(members receive a 20% discount!) assemblies. His e-mail address is [email protected].

Subbarao Kambhampati is a professor of computer

science and engineering at Arizona State University,
650-328-3123 where he directs the Yochan research group. His cur-
rent research interests are in automated planning,
scheduling, learning, and information integration.
445 Burgess Drive He received his formative education in Peddapuram,
B.Tech from the Indian University of Technology,
Menlo Park, CA 94025 Madras (Chennai), and his M.S. and Ph.D. from the
University of Maryland at College Park. His e-mail
www.aaai.org/Press/ address is [email protected].

96 AI MAGAZINE

Learning-Based Planning: Sergio Jiménez Celorrio
No ratings yet
Learning-Based Planning: Sergio Jiménez Celorrio
5 pages
A Concise Introduction To Models and Methods For Automated Planning
No ratings yet
A Concise Introduction To Models and Methods For Automated Planning
143 pages
Notes 5
No ratings yet
Notes 5
2 pages
Mod 6 Ai
No ratings yet
Mod 6 Ai
82 pages
Integrating Planning and Learning: The PRODIGY Architecture
No ratings yet
Integrating Planning and Learning: The PRODIGY Architecture
39 pages
Pai 5
No ratings yet
Pai 5
22 pages
Ai Planning IV Unit
No ratings yet
Ai Planning IV Unit
30 pages
Hector Ge Ner, Blai Bonet - A Concise Introduction To Models and Methods For Automated Planning-Morgan & Claypool (2013)
No ratings yet
Hector Ge Ner, Blai Bonet - A Concise Introduction To Models and Methods For Automated Planning-Morgan & Claypool (2013)
133 pages
Oup Publishing Intelligent Techniques For Planning
No ratings yet
Oup Publishing Intelligent Techniques For Planning
382 pages
1990 AI Magazine - Planning Systems
No ratings yet
1990 AI Magazine - Planning Systems
17 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
Machine Learning: Dr. R. B. Chadge YCCE, Nagpur
No ratings yet
Machine Learning: Dr. R. B. Chadge YCCE, Nagpur
24 pages
Module-5 Ai
No ratings yet
Module-5 Ai
20 pages
AI Notes Unit-5
No ratings yet
AI Notes Unit-5
37 pages
AI Planning: Concepts and Algorithms
No ratings yet
AI Planning: Concepts and Algorithms
16 pages
Artificial Intelligence 5
No ratings yet
Artificial Intelligence 5
11 pages
AI Planning: Systems and Techniques: James Hendler, Austin Tate, and Mark Drummond
No ratings yet
AI Planning: Systems and Techniques: James Hendler, Austin Tate, and Mark Drummond
17 pages
Ai Unit 05
No ratings yet
Ai Unit 05
50 pages
AI Planning: Types, Benefits, and Challenges
No ratings yet
AI Planning: Types, Benefits, and Challenges
11 pages
An Introduction To The Planning Domain Definition Language 1nbsped 9781627058759 9781627057370 9781681735122 1681735121 - Compress
No ratings yet
An Introduction To The Planning Domain Definition Language 1nbsped 9781627058759 9781627057370 9781681735122 1681735121 - Compress
189 pages
Module - 4
No ratings yet
Module - 4
38 pages
Planning in AI
No ratings yet
Planning in AI
58 pages
Ai (Un 06)
No ratings yet
Ai (Un 06)
35 pages
Artificial Intelligence Unit Vi: Planning
No ratings yet
Artificial Intelligence Unit Vi: Planning
70 pages
Ai Unit 6 Techknow
No ratings yet
Ai Unit 6 Techknow
31 pages
Unit V Ai
No ratings yet
Unit V Ai
16 pages
AI Planning Techniques Explained
No ratings yet
AI Planning Techniques Explained
19 pages
Planning
No ratings yet
Planning
352 pages
6
No ratings yet
6
24 pages
Understanding The Planning of LLM Agents: A Survey
No ratings yet
Understanding The Planning of LLM Agents: A Survey
9 pages
Unit-5 - Part 2
No ratings yet
Unit-5 - Part 2
11 pages
Unit 4 Ai
No ratings yet
Unit 4 Ai
29 pages
Behavior Planning of Intelligent Agent W 2017 Biologically Inspired Cognitiv
No ratings yet
Behavior Planning of Intelligent Agent W 2017 Biologically Inspired Cognitiv
11 pages
Unit 4 Part 2 - Classical Planning
No ratings yet
Unit 4 Part 2 - Classical Planning
7 pages
AI Lab Manual
No ratings yet
AI Lab Manual
8 pages
ARTIFICIAL INTELLIGENCE Building Intelligent Systems by PARAG KULKARNI, PRACHI JOSHI - Textbook
100% (1)
ARTIFICIAL INTELLIGENCE Building Intelligent Systems by PARAG KULKARNI, PRACHI JOSHI - Textbook
531 pages
Text Book 2 Notes Eh
No ratings yet
Text Book 2 Notes Eh
531 pages
AI E-Book
No ratings yet
AI E-Book
531 pages
Automated Planning Theory Practice
100% (1)
Automated Planning Theory Practice
638 pages
Analysis of Planning Problems in AI and Their Impact On Real-World Applications
No ratings yet
Analysis of Planning Problems in AI and Their Impact On Real-World Applications
22 pages
Sanchez 2022 The Prospects of Artificial Intelli
No ratings yet
Sanchez 2022 The Prospects of Artificial Intelli
17 pages
AIML Syllabus
No ratings yet
AIML Syllabus
3 pages
Documento Senza Titolo
No ratings yet
Documento Senza Titolo
3 pages
Automated Planning and Scheduling
No ratings yet
Automated Planning and Scheduling
1 page
Patrik PDDL CH2 FHS PDF
No ratings yet
Patrik PDDL CH2 FHS PDF
51 pages
21 Reinforcement Learning A Comprehensive Overview
No ratings yet
21 Reinforcement Learning A Comprehensive Overview
7 pages
Automated Planning and Scheduling
No ratings yet
Automated Planning and Scheduling
1 page
Unit 6
No ratings yet
Unit 6
48 pages
User-Tailored Plan Generation: Abstract. The Output of Advice-Giving Systems Can Be Regarded As Plans To Be Executed by
No ratings yet
User-Tailored Plan Generation: Abstract. The Output of Advice-Giving Systems Can Be Regarded As Plans To Be Executed by
10 pages
Sdarticle
No ratings yet
Sdarticle
10 pages
Pddlfuse: A Tool For Generating Diverse Planning Domains: Vedant Khandelwal, Amit Sheth, Forest Agostinelli
No ratings yet
Pddlfuse: A Tool For Generating Diverse Planning Domains: Vedant Khandelwal, Amit Sheth, Forest Agostinelli
30 pages
Model Lite
No ratings yet
Model Lite
4 pages
AI Unit 4 NEW
No ratings yet
AI Unit 4 NEW
60 pages
Ai 2
No ratings yet
Ai 2
344 pages
AI Chapter4
No ratings yet
AI Chapter4
29 pages
Global Contents
No ratings yet
Global Contents
7 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
BS-1868 2010
100% (1)
BS-1868 2010
28 pages
Agilent Technologies E7475A GSM Drive-Test System: Product Overview
No ratings yet
Agilent Technologies E7475A GSM Drive-Test System: Product Overview
16 pages
Nuclear Physics Foundations
No ratings yet
Nuclear Physics Foundations
21 pages
RHT PCM Mains 2023
No ratings yet
RHT PCM Mains 2023
78 pages
3 - Ball Mill Grinding
92% (12)
3 - Ball Mill Grinding
78 pages
Computational Physics Lab: Writing Up: Laboratory Class Attendance
No ratings yet
Computational Physics Lab: Writing Up: Laboratory Class Attendance
3 pages
Hospital
No ratings yet
Hospital
32 pages
Advances in DEFORM-3D Extrusion Simulation
No ratings yet
Advances in DEFORM-3D Extrusion Simulation
5 pages
Real-Life Applications of Linear Algebra
No ratings yet
Real-Life Applications of Linear Algebra
3 pages
Advanced Experimental and Numerical Techniques For Cavitation-Erosion (Chahine 2014)
No ratings yet
Advanced Experimental and Numerical Techniques For Cavitation-Erosion (Chahine 2014)
407 pages
Business Process Modeling Training
100% (4)
Business Process Modeling Training
37 pages
Special Purpose Diodes Overview
No ratings yet
Special Purpose Diodes Overview
131 pages
Video
No ratings yet
Video
5 pages
Smart Grid Innovations for Utilities
No ratings yet
Smart Grid Innovations for Utilities
13 pages
Concrete Durability Enhancer
No ratings yet
Concrete Durability Enhancer
2 pages
Lecture 1 and 3 - Brief Introduction and Casting Fundamental
No ratings yet
Lecture 1 and 3 - Brief Introduction and Casting Fundamental
69 pages
SM SSM DB Uk 003
No ratings yet
SM SSM DB Uk 003
4 pages
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
No ratings yet
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
93 pages
CNC Machine Control Guide
No ratings yet
CNC Machine Control Guide
2 pages
RCM Program of Study 24-25
No ratings yet
RCM Program of Study 24-25
37 pages
The X3: Dealer Specification Guide From August 2019 Production
No ratings yet
The X3: Dealer Specification Guide From August 2019 Production
12 pages
Complete The Square Worksheet
No ratings yet
Complete The Square Worksheet
4 pages
Q9T4-FP91G-engineering Specification PDF
No ratings yet
Q9T4-FP91G-engineering Specification PDF
14 pages
BPS-300W To 30KVA Solar Power System - BESTSUN Solar 2017
No ratings yet
BPS-300W To 30KVA Solar Power System - BESTSUN Solar 2017
5 pages
2013 Sagar J Abichandani
No ratings yet
2013 Sagar J Abichandani
8 pages
Grade 8 Cbse Math 2nd Term Sample Paper 1
100% (1)
Grade 8 Cbse Math 2nd Term Sample Paper 1
2 pages
Foxboro Evo™ Process Automation System: Product Specifications
No ratings yet
Foxboro Evo™ Process Automation System: Product Specifications
20 pages
Cu CR 1 ZR
No ratings yet
Cu CR 1 ZR
38 pages
Automatic Light Reflector
67% (3)
Automatic Light Reflector
6 pages
Worksheet 8 Answers
No ratings yet
Worksheet 8 Answers
1 page