Artificial Intelligence Applications in Chemistry
Artificial Intelligence Applications in Chemistry
Applications in Chemistry
Artificial Intelligence
Applications in Chemistry
Thomas H . Pierce, EDITOR
Rohm and Haas Company
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.fw001
2. Artificial intelligence—Congresses.
I. Pierce, Thomas H., 1952- . II. Hohne, Bruce
Α., 1954- .III. American Chemical Society.
Division of Computers in Chemistry. IV. American
Chemical Society. Meeting (190th: 1985: Chicago, 111.)
V. Series.
QD39.3.E46A78 1986 542'.8 86-3315
ISBN 0-8412-0966-9
Copyright © 1986
American Chemical Society
All Rights Reserved. The appearance of the code at the bottom of the first page of each
chapter in this volume indicates the copyright owner's consent that reprographic copies of the
chapter may be made for personal or internal use or for the personal or internal use of specific
clients. This consent is given on the condition, however, that the copier pay the stated per
copy fee through the Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970,
for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This
consent does not extend to copying or transmission by any means—graphic or electronic—for
any other purpose, such as for general distribution, for advertising or promotional purposes,
for creating a new collective work, for resale, or for information storage and retrieval systems.
The copying fee for each chapter is indicated in the code at the bottom of the first page of the
chapter.
The citation of trade names and/or names of manufacturers in this publication is not to be
construed as an endorsement or as approval by ACS of the commercial products or services
referenced herein; nor should the mere reference herein to any drawing, specification, chemical
process, or other data be regarded as a license or as a conveyance of any right or permission,
to the holder, reader, or any other person or corporation, to manufacture, reproduce, use, or
sell any patented invention or copyrighted work that may in any way be related thereto.
Registered names, trademarks, etc., used in this publication, even without specific indication
thereof, are not to be considered unprotected by law.
PRINTED IN THE UNITED STATES OF AMERICA
Advisory Board
Harvey W. Blanch Donald E. Moreland
University of California—Berkeley USDA, Agricultural Research Service
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.fw001
We decided that now would be a good time for an AI book for several
reasons: (1) enough applications can now be presented to expose newcomers
to many of the possibilities that AI has to offer, (2) showing what everyone
else is doing with AI should generate new interest in the field, and (3) we felt
an overview was needed to collect the different areas of AI applications to
help people who are starting to apply AI techniques to their disciplines. The
final and possibly most important reason is our personal interest in the field.
Chemistry is an ideal field for applications in AI. Chemists have been
using computers for years in their day-to-day work and are quite willing to
accept the aid of a computer. In addition, the D E N D R A L project,
throughout its long history, has graduated many chemists already trained in
AI. It is not surprising that chemistry is one of the leading areas for AI
applications. Scientists have been developing the theories of chemistry for
centuries, but the standard approach taken by a chemist to solve a problem
is heuristic; past experience and rules of thumb are used. AI offers a method
to combine theory with these rules. These systems will not replace chemists,
as is commonly thought; but rather, these programs will assist chemists in
performing their daily work.
Computer applications developed from theoretical chemistry tend to be
algorithmic and numerical by nature. AI applications tend to be heuristic
and symbolic by nature. Multilevel expert systems combine these techniques
to use the heuristic power of expert systems to direct numerical calculations.
They can also use the results of numerical calculations in their symbolic
processing. The problems faced by chemists today are so complex that most
require the added power of the multilevel approach to solve them.
Defining exactly which applications constitute AI is difficult in any
field. The problem in chemistry is even worse because chemical applications
that use AI methods often use numerical calculations. Some applications
that are strictly numerical accomplish tasks similar to AI programs. The key
feature used to limit the scope of this book was symbolic processing. The
work presented includes expert systems, natural language applications, and
manipulation of chemical structures.
ix
THOMAS H. PIERCE
Rohm and Haas Company
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.pr001
BRUCE A. HOHNE
Rohm and Haas Company
Spring House, PA 19477
XI
Dennis H . Smith
Everything I will describe could be built from the ground up using assembly language,
BASIC or any other computer language. In the future, some expert systems will certainly
be built using languages such as Fortran, C or P A S C A L as opposed to LISP and P R O L O G
which are currently in vogue. So there is no mystery here. What is different, but is still
not mysterious, is the approach taken by A I techniques toward solving symbolic, as opposed
to numeric, problems. I discuss this difference in more detail, below. Most readers of this
collection of papers will be scientists and engineers, engaged in research, business or both.
They expect new technologies to have some substantial practical value to them in their
work, or they will not buy and use them. So I will stress the practicality of the technology.
Where is the technology currently? Several descriptions of the marketplace have
appeared over the last year. Annual growth rates for companies involved in marketing
products based on A I exceed 300%, far outstripping other new computer-based applications,
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
I am going to begin my discussion of the technology of expert systems with two provocative
statements. The first is:
Knowledge engineering is the technology base of the "Second Computer Age"
It is possible to use knowledge, for example, objects, facts, data, rules, to manipulate
knowledge, and to cast it in a form in which it can be used easily in computer programs,
thereby creating systems that solve important problems.
The second statement is:
What's on the horizon is not just the Second Computer Age, it's the
important one!
We are facing a second computer revolution while still in the midst of the first one!
And it's probably the important revolution.
Characteristics and Values of Expert Systems. What leads me to make such bold and risky
statements? The answer can be summarized as follows. First, knowledge is power. You
can't solve problems using any technology unless you have some detailed knowledge about
the problem and how to solve it. This fact seems so obvious that it is unnecessary to state
it. Many systems will fail, however, because the builders will attempt to build such systems
to s o l v e i l l - d e f i n e d problems.
Second, processing of this knowledge will become a major, perhaps dominant part of
the computer industry. Why? Simply because most of the world's problem solving
activities involve symbolic reasoning, not calculation and data processing. We have
constructed enormously powerful computers for performing calculations, our number
crunchers. We devote huge machines with dozens of disk drives to database management
systems. Our need for such methods of computing will not disappear in the future.
However, when we have to fix our car, or determine why a processing plant has shut down,
or plan an organic synthesis, we don't normally solve sets of differential equations or pose
queries to a large database. We might use such numerical solutions or the results of such
queries to help solve the problem, but we are mainly reasoning, not calculating.
How do we construct programs that aid us in reasoning as opposed to calculating? A I
is the underlying science. It has several sub-disciplines, including, for example, robotics,
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
machine vision, natural language understanding and expert systems, each of which will
make a contribution to the second computer age. M y focus is on expert systems.
Knowledge engineering is the technology behind construction of expert systems, or
knowledge systems, or expert support systems. Such systems are designed to advise, inform
and solve problems. They can perform at the level of experts, and in some cases exceed
expert performance. They do so not because they are "smarter" but because they represent
the collective expertise of the builders of the systems. They are more systematic and
thorough. And they can be replicated and used throughout a laboratory, company or
industry at low cost.
There are three major components to an expert system:
driven operations, a "mouse" as a pointing device, familiar icons to represent objects such
as schematics, valves, tanks, and so forth.
The Knowledge Base. The knowledge base holds symbolic knowledge. To be sure, the
knowledge base can also contain tables of numbers, ranges of numerical values, and some
numerical procedures where appropriate. But the major content consists of facts and
heuristics.
The facts in a knowledge base include descriptions of objects, their attributes and
corresponding data values, in the area to which the expert system is to be applied. In a
process control application, for example, the factual knowledge might include a description
of a physical plant or a portion thereof, characteristics of individual components, values
from sensor data, composition of feedstocks and so forth.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
The heuristics, or rules, consist of the judgemental knowledge used to reason about
the facts in order to solve a particular problem. Such knowledge is often based on
experience, is used effectively by experts in solving problems and is often privately held.
Knowledge engineering has been characterized as the process by which this knowledge is
"mined and refined" by builders of expert systems. Again, using the motif of process
control, such knowledge might include rules on how to decide when to schedule a plant or
subsystem for routine maintenance, rules on how to adjust feedstocks based on current
pricing, or rules on how to diagnose process failures and provide advice on corrective
action.
Expert systems create value for groups of people, ranging from laboratory units to
entire companies, in several ways, by:
• capturing, refining, packaging, distributing expertise; an "an expert at your
fingertips";
• solving problems that require the knowledge and expertise of several fields
(fusion);
companies are turning to expert systems in order to capture the problem-solving expertise
of their most valuable people. This preserves the knowledge and makes it available in
easily accessible ways to those who must assume the responsibilities of the departing
experts.
Considering commercial applications of the technology, expert systems can create
value through giving a company a competitive edge. This consideration means that the
first companies to exploit this technology to build useful products will obviously be some
steps ahead of those that do not.
Some Areas of Application. I next summarize some areas of application where expert
systems exist or are being developed, usually by several laboratories. Some of these areas
are covered in detail in other presentations as part of this symposium. I want to emphasize
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
that this is a partial list primarily of scientific and engineering applications. A similar list
could easily be generated for operations research, economics, law, and so forth. Some of the
areas are outside strict definitions of the fields of chemistry and chemical engineering, but I
have included them to illustrate the breadth of potential applications in related disciplines.
• Mineral exploration
• Intelligent CAD
There are many diagnosis and/or advisory systems under development, applied to
geology, nuclear reactors, software debugging and use, manufacturing and related financial
services.
There are several applications to scientific and engineering instrumentation which
especially relevant to chemistry and chemical engineering. These include building into
instruments expertise in instrument control and data interpretation, to attempt to minimize
the amount of staff time required to perform routine analyses and to optimize the
performance of a system. There are several efforts underway in process control, focused
currently in the electrical power and chemical industries.
Before looking at some applications in more detail, let me briefly describe why the
number and scope of applications is increasing so dramatically.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
The Technology is Maturing Rapidly. The work that computers are being required to do is
increasingly knowledge intensive. For example, instrument manufacturers are producing
more powerful computer systems that are integral to their product lines. These systems are
expected to perform more complex tasks all the time, i.e., to be in some sense "smarter".
Two developments are proceeding in parallel with this requirement for "smarter" systems.
The software technology for building expert systems is maturing rapidly. A t the same
time, workstations that support A I system development are making a strong entry into the
computer market. For the first time, the hardware and software technology are at a point
where development of systems can take place rapidly.
Beginning in 1970, programming languages such as LISP became available. Such
languages made representation and manipulation of symbolic knowledge much simpler than
use of conventional languages. Around 1975, programming environments became
available. In the case of LISP, its interactive environment, INTERLISP, made system
construction, organization and debugging much more efficient. In 1980, research work led
to systems built on top of LISP that removed many of the requirements for programming,
allowing system developers to focus on problem solving rather than writing code. Some of
these research systems have now evolved to become commercial products that dramatically
simplify development of expert systems. Such products, often referred to as tools, are
specifically designed to aid in the construction of expert systems and are engineered to be
usable by experts who may not be programmers.
Supporting evidence for the effects of these developments is found by examining the
approximate system development time for some well known expert systems. Systems begun
in the mid-1960's, DENDRAL and M A C S Y M A required of the order of 40-80 man-years to
develop. Later systems of similar scope required less and less development time, of the
order of several man years, as programming languages and system building tools matured.
With current, commercially available tools, developers can expect to build a prototype of a
system, with some assistance, in the order of one month. The prototype that results
already performs at a significant level of expertise and may represent the core of a
subsequent, much larger system (examples are shown below). Such development times were
simply impossible to achieve with the limited tools that existed before mid-1984.
Developing Expert Systems. How has such rapid progress been achieved? The
improvement in hardware and software technologies is obviously important. Another
important factor is that people are becoming more experienced in actually building systems.
There has emerged, from the construction of many systems designed for diverse
applications, a strong model for the basic steps required in constructing an expert system.
The four major steps are as follows:
First, one must select an appropriate application. There are applications that are so
simple, that require so little expertise, that it is not worth the time and money to emulate
human performance in a machine. A t the other end of the spectrum, there are many
problems whose methods of solution are poorly understood. For several reasons, these are
not good candidates either. In between, there are many good candidates, and in the next
section I summarize some of the rules for choosing them.
Second, a prototype of a final system is built. This prototype is specifically designed
to have limited, but representative, functionality. During development of the prototype,
many important issues are resolved, for example, the details of the knowledge
representation, the man-machine interface, and the complexity of the rules required for high
performance. Rapid prototyping is already creeping into the jargon of the community.
The latest expert system building tools are sufficiently powerful that one can sit down and
try various ideas on how to approach the problem, find out what seems logical and what
doesn't, reconstruct the knowledge base into an entirely different form, step through
execution of each rule and correct the rules interactively. This approach differs
substantially from traditional methods of software engineering.
The third step, however, reminds us that we do have to pay attention to good
software development practices if a generally used, and useful system is to result from the
prototype. Development of a full system, based at least in part on the prototype, proceeds
with detailed specifications as the system architects define and construct its final form.
The last step is just as crucial as its predecessors. The system must be tested in the
field, and the usual requirements in the software industry for maintenance and updates
pertain.
The primary differences, then, between development of expert systems and more
traditional software engineering are found in steps one and two, above. First, the problems
chosen will involve symbolic reasoning, and will require the transfer of expertise from
experts to a knowledge base. Second, rapid prototyping, the "try it and see how it works,
then fix it or throw it away" approach will play an important role in system development.
The only phase of development of expert systems that I will say any more about is
the first, and in many ways the most crucial, step for those who are contemplating building
expert systems for the first time. How do you go about selecting an appropriate
application? Here are the basic criteria:
• Symbolic reasoning
• Importance of problem
• Scope of problem
Selected Applications
Biological Reactors. In this section I discuss some applications that are at least indirectly
related to chemical science and engineering. The first example, illustrated in Figure 1, is
derived from a simulation and diagnosis of a biological reactor that we put together for a
demonstration.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
Because the expert system was not connected to a real reactor, we built a small table-
driven simulation to model the growth of cells in suspension. The graphical interface
includes images representing the reactor itself, several feed bins and associated valves. Also
shown in Figure 1 are several types of gauges, including a strip chart, monitors of various
states and alarm conditions, temperature, and the on/off state of heaters and coolers.
The simulation runs through a startup phase, then through an exponential growth
phase which is inhibited by one of several conditions. The expertise captured in the rules in
the knowledge base is designed to diagnose one of several possible faults in the system and
to take action to correct the condition. Growth inhibition may be caused by incorrect
temperature, depletion of nutrients, incorrect pH or contamination. The system is able to
diagnose the fault and to take action to adjust temperatures, the pH, add nutrients or
recommend the batch be discarded due to contamination. A simple example, but one that
illustrates several points mentioned earlier. The graphical interface is essential for non-
experts. The system was developed rapidly as a prototype. As such, it does useful things,
it can be examined, criticized, refined, and can represent the beginnings of a larger system.
Combinations of relatively simple rules can diagnose problems and take specific actions.
Communication Satellites. The next example illustrates an expert system similar to those
under development in process control and instrumentation companies. These systems are
designed to diagnose faults and suggest corrective actions.
An aerospace company in California monitors telecommunication satellites in
geosynchronous orbit, 23,000 miles away in space. When something goes wrong on that
satellite, $50 to $100 million are dependent on taking the right corrective action. This
company is using expert systems to capture the knowledge of the developers of the satellites
in diagnosing and correcting problems, and to make this knowledge available to all
operators responsible for monitoring the condition of on-board systems.
Like many modern instruments, their instrument, the satellite, is connected to their
computer systems through an interface, in this case an antenna dish that transfers data
from the satellite to computers at a ground control center.
F i g u r e 1. G r a p h i c s s c r e e n f o r the p r o t o t y p e e x p e r t system f o r
diagnosing f a u l t s i n a b i o l o g i c a l reactor. The s c r e e n shows a
schematic of the r e a c t o r , t o g e t h e r w i t h gauges, s t r i p c h a r t s , and
" t r a f f i c l i g h t s " i n d i c a t i n g the s t a t e of the r e a c t o r o b t a i n e d from
sensor r e a d i n g s . (Reproduced w i t h p e r m i s s i o n . C o p y r i g h t 1983
IntelliCorp.)
What is especially interesting about their problem of diagnosis of failures and advice
On corrective measures is their treatment of the alarm conditions that trigger the execution
of the expert system. The first goal of their rules is to focus on the single, or small set of,
alarm(s) that are of highest priority, thereby ignoring what may be many lower priority
alarms for a single problem. This usually allows^ isolation of the problem to a specific
subsystem, such as the energy storage and heating system shown schematically in Figure 2.
When the problem is localized, the system provides advice on what actions to take, then
examines the other alarms to determine if they are of secondary importance or represent
concurrent, major problems. Here, the graphical presentations, for example, Figure 2,
provide information to the operator on which systems are being examined and where the
faults may occur.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
Space Stations. The final example I have selected results from work done by the National
Aeronautics and Space Administration (NASA) in preparation for flying the space station.
NASA's general problem is that many space station systems must be repairable in orbit by
astronauts who will not be familiar with the details of all the systems. Therefore, NASA is
looking to the technology of expert systems to diagnose problems and provide advice to the
astronauts on how to repair the problems.
The problem they chose for their prototype is part of the life support system,
specifically the portion that removes C 0 from the cabin atmosphere. This system already
2
has been constructed, and NASA engineers are already familiar with its operation and how
it can fail. Using this information they were able to build as part of their knowledge base a
simple simulation for the modes of failure of each of the components in the system. The
life support system is modular, in that portions of it can be replaced, once a problem has
been isolated. The graphical representation chosen for the instrument schematic and panel
is shown in Figure 3.
On the left of Figure 3 is a schematic of the system, with hydrogen gas (the
consumable resource) flowing through a valve to the six-stage fuel cell. Cabin atmosphere
enters from the right, excess hydrogen plus C 0 exits at the H Sink, and atmosphere
2 2
depleted in C 0 exits at the Air Sink. There is a variety of pressure, flow, temperature and
2
humidity sensors on the system. The lower subsystem is a coolant loop that maintains
temperature and humidity in the fuel cell. On the right of Figure 3 is a schematic of an
instrument panel that contains many of the instruments the astronauts will actually see.
Each component in the schematic is active. Pointing to any component with a mouse
yields a menu of possible modes of failure for that component. Selection of a failure results
in setting parameters in the underlying knowledge base, which are of course reflected in the
settings of the meters and gauges on the instrument panel.
Simply pointing to the IDENTIFY button runs the rule system, which diagnoses the
problem and provides advice on action to take to fix the life support system. The
remainder of the screen is devoted to various switches and output windows that are used to
build and debug the knowledge base.
As an indication of how rapidly the technology of expert systems has matured, this
prototype was built in our offices by two people from NASA, one a programmer who knew
nothing about LISP, the other an expert on the life support system who knew nothing
about programming. Neither had seen K E E ™ , our system building tool, before receiving
training and beginning work on the prototype. The system, including all the graphics, the
simulation and the rules, was built in four weeks. It is capable of diagnosing many of the
important modes of failure of this portion of the life support system. Much work remains
to be done before a final version of the expert system is completed, but this prototype
provides an important starting point.
Concluding Remarks
I have used this paper as an introduction to what amount to revolutionary change in the
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch001
DIAGNOSE ·^ RESET
(tv:nouse-1nitial1*e)
I £51.RESETTFQR.RULES I
MIL
K U L t . u L A S S t S ' s AND. I RACE DIAGNOSE '» *tDf-MTT Y
ON A L L - E S s CHOOSE MODE
RULE.CLASSES's Ofl.TRAUC Γ SET.IMDIVIDUAL.LEVELS I
SFLtCTJATTfRM
C S - T s F AULTY.COMPONEN/T
RULE-CLASSE S *s STEPPER J 4 0 0 É
MIL 2/04/85 15:25:49 C o n s i d e r g o a l ( f l COMPONENT.FAILURE.TYPE OF ?C0f1P0MENT I S ?FAILURE)
«Unit (U2H2 CS1) 2/04/85 1 5 : 3 1 : a t n o d e 2.
NIL 2/04/85 15:32:24
«Unit (U2H2 CS1) 2/04/85 1 5 : 4 8 : C o n s i d e r H2.SOURCE.RULE t o d e r i v e t h e g o a l .
MIL 2/04x85 16:29:26
C r e a t e node 5. b e l o w node 2..
ttriOREt»!
F i g u r e 3. G r a p h i c s s c r e e n f o r the p r o t o t y p e e x p e r t system d e v e l o p e d by N A S A f o r
d i a g n o s i s and r e p a i r of the l i f e - s u p p o r t system. T h i s p o r t i o n of the system
s t r i p s c a b i n atmosphere of C02« (Reproduced w i t h p e r m i s s i o n . C o p y r i g h t 1983
IntelliCorp.)
available, but these jobs will require substantially more education and skills.
For jobs that already require substantial skills, expert systems will serve to make the
people holding these jobs more productive. A n analogy has been made to engineers who
used to calculate trajectories by hand, but now use computers to perform these routine
tasks, thereby freeing their time for more intellectual pursuits. Chemists and chemical
engineers will see similar improvements to their own productivity.
0097-6156/86/0306-0018$06.00/0
© 1986 American Chemical Society
B e c a u s e o f t h e c u r r e n t h i g h demand f o r e x p e r t s y s t e m
a p p l i c a t i o n s , s o f t w a r e packages w h i c h a r e o p t i m i z e d f o r a p p l i c a t i o n
building, rather than for AI technique research, h a v e been
developed. One o f t h e s e i s R u l e M a s t e r (l) 9 which i s designed to
e x t r a c t e x p e r t r e a s o n i n g and t o i n c o r p o r a t e i t i n t o a w i d e range o f
s c i e n t i f i c a n d e n g i n e e r i n g a p p l i c a t i o n s . I n c o n t r a s t w i t h many
o t h e r A I approaches, R u l e M a s t e r i s based on contemporary s t r u c t u r e d
programming p r i n c i p l e s . C o n v e n t i o n a l m i c r o - and m i n i - c o m p u t e r s may
be u s e d b y a n y c o m p u t e r p r o f e s s i o n a l t o b u i l d e x p e r t s y s t e m s
i n t e g r a t e d w i t h e x i s t i n g computer programs. A knowledge a c q u i s i t i o n
system based on i n d u c t i v e l e a r n i n g speeds up t h e r u l e g e n e r a t i o n and
t e s t i n g process. A p r o c e d u r a l r e p r e s e n t a t i o n o f the r u l e base i s
a u t o m a t i c a l l y g e n e r a t e d , p r o v i d i n g c o n s i s t e n c y and c o m p l e t e n e s s
c h e c k i n g and e f f i c i e n t r u n - t i m e b e h a v i o r . Embedding e x p e r t system
r e a s o n i n g i n t o e x i s t i n g systems i s s u p p o r t e d by two f e a t u r e s :
a c c e s s t o e x t e r n a l u s e r programs from t h e R u l e M a s t e r r u l e l a n g u a g e ,
and t h e a u t o m a t i c g e n e r a t i o n o f a C c o d e r e p r e s e n t a t i o n o f t h e
expert system.
RuleMaster D e s c r i p t i o n
History. R a d i a n C o r p o r a t i o n i s a t e c h n i c a l c o n s u l t i n g company,
e m p l o y i n g about 1000 p e o p l e . About h a l f o f R a d i a n ' s b u s i n e s s i s i n
t h e c h e m i s t r y and c h e m i c a l e n g i n e e r i n g f i e l d s . I n 1981, Radian
management r e a l i z e d t h a t e x p e r t systems c a p a b i l i t y c o u l d enhance and
complement e x i s t i n g c o n s u l t i n g a c t i v i t i e s . R a d i a n e n t e r e d i n t o an
agreement w i t h D o n a l d M i c h i e , o f E d i n b u r g h U n i v e r s i t y and
I n t e l l i g e n t T e r m i n a l s L i m i t e d (ITL). F o r a number o f y e a r s , he had
done r e s e a r c h i n i n d u c t i v e l e a r n i n g a n d i n o t h e r e x p e r t s y s t e m
t e c h n i q u e s , and o f t e n used c o n v e n t i o n a l s t r u c t u r e d programming
languages l i k e P a s c a l . He n o t e d t h a t t h e s p e c i a l A I environments
were p r i m a r i l y u s e f u l f o r r e s e a r c h i n t o A I t e c h n i q u e s , and were n o t
n e c e s s a r y f o r an e x p e r t systems package o r i e n t e d toward b u i l d i n g
applications. R u l e M a s t e r was d e s i g n e d a n d d e v e l o p e d by I T L a n d
R a d i a n d u r i n g 1982 and 1 9 8 3 . S i n c e t h e n , b o t h companies have
c o n t i n u e d e n h a n c i n g R u l e M a s t e r , and s e v e r a l d o z e n e x p e r t s y s t e m
a p p l i c a t i o n s a r e under c o n s t r u c t i o n o r c o m p l e t e d .
R u l e M a s t e r e x p e r t s y s t e m s a r e r e p r e s e n t e d as R a d i a l programs. To
b u i l d an e x p e r t system, domain knowledge i s n o r m a l l y e n t e r e d i n two
parts: a m o d u l e s t r u c t u r e and t h e b o d i e s o f t h e m o d u l e s . The
s t r u c t u r e d e f i n e s t h e h i e r a r c h i c a l o r g a n i z a t i o n o f d e c i s i o n s used t o
s o l v e t h e p r o b l e m . The code w i t h i n each module d e f i n e s t h e d e t a i l s
o f one o f t h e s e d e c i s i o n s .
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
R u l e M a k e r i s a knowledge e x t r a c t i o n u t i l i t y f o r b u i l d i n g and
t e s t i n g the d e c i s i o n l o g i c contained w i t h i n R a d i a l modules. The
l o g i c i s s p e c i f i e d as a t a b l e o f e x a m p l e s o f c o r r e c t e x p e r t
d e c i s i o n s f o r each module. R u l e M a k e r t r a n s f o r m s each example s e t
i n t o an e q u i v a l e n t d e c i s i o n t r e e , and a u t o m a t i c a l l y generates t h e
body o f t h e module i n t h e form o f R a d i a l code. System b u i l d e r s may
a l s o choose t o e n t e r R a d i a l code d i r e c t l y , a l t h o u g h t h e y u s u a l l y
p r e f e r t o work w i t h example t a b l e s .
C o n s u l t a t i o n o f an e x p e r t system i s a c c o m p l i s h e d by u s i n g i t s
R a d i a l code r e p r e s e n t a t i o n as i n p u t t o t h e R a d i a l i n t e r p r e t e r . The
i n t e r p r e t e r f i r s t performs completeness and c o n s i s t e n c y c h e c k s , and
then provides i n t e r a c t i v e run-time support.
s e t by t h e e x p e r t i n a n o t h e r R a d i a l m o d u l e . " T h e r m a l " r e f e r s t o
t h e r m a l l y generated hydrocarbon g a s e s , w h i c h may be a b s e n t , s l i g h t ,
or d e f i n i t e l y p r e s e n t . The o t h e r two a t t r i b u t e s a r e t h e h y d r o g e n -
t o - a c e t y l e n e r a t i o and t h e e s t i m a t e o f t h e t e m p e r a t u r e a t w h i c h t h e
h y d r o c a r b o n gases were generated. A h i e r a r c h y o f r u l e s s u p p l i e d by
t h e e x p e r t determines t h e v a l u e o f each o f t h e s e a t t r i b u t e s , based
e v e n t u a l l y on t h e n u m e r i c a l c o n c e n t r a t i o n s r e c e i v e d from t h e gas
chromatograph.
The d e c i s i o n f o r each example i s e x p r e s s e d as an " a c t i o n - n e x t
state" pair. The " a c t i o n " i s a r e f e r e n c e t o e x e c u t a b l e R a d i a l code,
w h i c h c o n s i s t s o f a sequence o f R a d i a l s t a t e m e n t s . These s t a t e m e n t s
may c o n t a i n r e f e r e n c e s t o e x t e r n a l p r o g r a m s i n v a r i o u s l a n g u a g e s
( t h i s w i l l be d i s c u s s e d f u r t h e r l a t e r ) . The "next s t a t e " d e s c r i b e s
the c o n t e x t t o which c o n t r o l i s t o pass a f t e r the a c t i o n i s
c o m p l e t e d . F o r d i a g n o s t i c e x p e r t s y s t e m s , s u c h as TOGA, t h e n e x t
s t a t e w i l l u s u a l l y be t h e " g o a l " s t a t e o f t h e module. T h i s passes
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
R u l e Language ( R a d i a l ) . R u l e M a s t e r e x p e r t systems a r e e x p r e s s e d i n
R a d i a l , a b l o c k s t r u c t u r e d i n t e r p r e t e d language w i t h a syntax
s i m i l a r t o P a s c a l a n d ADA. R a d i a l i s a s i m p l e , e a s y - t o - l e a r n
language which supports the f u l l range of expert system
capabilities.
The b u i l d i n g b l o c k o f R a d i a l , c o r r e s p o n d i n g t o t h e P a s c a l
p r o c e d u r e , i s c a l l e d a "module". The s y n t a x w i t h i n each module i s
based on f i n i t e automata t h e o r y , t o p r o v i d e t h e c o n t r o l s t r u c t u r e s
needed t o s u p p o r t b o t h d i a g n o s t i c and p l a n n i n g a s p e c t s o f e x p e r t
systems a p p l i c a t i o n s . Other language features include recursive
r o u t i n e c a l l s , argument p a s s i n g , s c o p e d v a r i a b l e and f u n c t i o n s ,
a b s t r a c t d a t a t y p e s , and u s e r - d e f i n e d o v e r l o a d e d o p e r a t o r s . Built-
i n d a t a t y p e s i n c l u d e s t r i n g , i n t e g e r , f l o a t i n g p o i n t , and b o o l e a n .
The R a d i a l c o d e f o r t h e d e c i s i o n t r e e o f F i g u r e 2 i s shown i n
F i g u r e 3. T h i s c o d e was g e n e r a t e d b y R u l e M a k e r . Experts have
d i f f i c u l t y c o r r e c t l y g e n e r a t i n g a d e e p l y n e s t e d c o n d i t i o n a l phrase
l i k e t h i s , but t h e y are a b l e t o i n s p e c t i t f o r p o s s i b l e e r r o r s or
omissions.
TOGA u s e s t h e b u i l t - i n n u m e r i c a l c a p a b i l i t i e s o f R a d i a l t o
compute f u n c t i o n s o f c o n c e n t r a t i o n v a l u e s , w h i c h are used
e x t e n s i v e l y i n the r u l e s . The r a t i o o f h y d r o g e n t o a c e t y l e n e
c o n c e n t r a t i o n i n t h e corona r u l e i s a s i m p l e example o f t h i s . User-
d e f i n e d compound d a t a t y p e s a r e used t o h a n d l e b l o c k s o f d a t a as a
s i n g l e named s t r u c t u r e . These f e a t u r e s a r e i n v a l u a b l e i n b u i l d i n g
p r a c t i c a l e x p e r t systems, but a r e not a v a i l a b l e w i t h a l l packages.
Most R a d i a l code i s c o n s t r u c t e d by R u l e M a k e r from t r a i n i n g s e t s
next
H2 thermal H2/C2H2 temperature action state
unlikely
unlikely
marf
unlikely likely likely unlikely ( thermal ) possible
absent <·" [^^rresent
I F (temp) I S
" l o w " : I F (H2/C2H2) I S
" h i g h " : I F (H2) I S
" l o w " : ( " u n l i k e l y " -> r e s u l t , GOAL )
"med" : ( " l i k e l y " -> r e s u l t , GOAL )
ELSE ( " l i k e l y " -> r e s u l t , GOAL )
ELSE ( " u n l i k e l y " -> r e s u l t , GOAL )
"moderate" : I F (H2/C2H2) I S
" h i g h " : I F (H2) I S
" l o w " : ( " u n l i k e l y " -> r e s u l t , GOAL )
"med" : I F ( t h e r m a l ) IS
"absent" : ( " p o s s i b l e " -> r e s u l t , GOAL )
" s l i g h t " : ( " u n l i k e l y " -> r e s u l t , GOAL )
ELSE ( " u n l i k e l y " -> r e s u l t , GOAL )
ELSE ( " p o s s i b l e " -> r e s u l t , GOAL )
ELSE ( " u n l i k e l y " -> r e s u l t , GOAL )
EI£E ( " u n l i k e l y " -> r e s u l t , GOAL )
o f examples, as d e s c r i b e d i n t h e p r e v i o u s s e c t i o n . However, R a d i a l
code c a n a l s o be e n t e r e d d i r e c t l y by t h e s y s t e m b u i l d e r s , i f t h e y so
desire.
Explanation. A u s e r may a s k f o r e x p l a n a t i o n o f t h e l i n e o f
r e a s o n i n g a t any t i m e d u r i n g an e x p e r t s y s t e m c o n s u l t a t i o n .
R u l e M a s t e r p r e s e n t s e x p l a n a t i o n as a l i s t o f p r e m i s e s and
conclusions i n E n g l i s h - l i k e text. The e x p l a n a t i o n d e s c r i b e s t h e
e x e c u t i o n p a t h w h i c h l e d up t o t h e c u r r e n t c o n c l u s i o n o r q u e s t i o n .
E x p l a n a t i o n i s presented i n proof o r d e r i n g , which u s u a l l y d i f f e r s
f r o m t h e o r d e r i n w h i c h t h e q u e s t i o n s and c o n c l u s i o n s w e r e
encountered. T h i s i s p e r c e i v e d as more r e l e v a n t and u n d e r s t a n d a b l e
t h a n t h e t i m e - o r d e r e d p r e s e n t a t i o n o f f i r e d r u l e s , as i s p r e s e n t i n
most e x p e r t system approaches.
A sample e x p l a n a t i o n f o r t h e c o r o n a d e c i s i o n i s as f o l l o w s :
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
S i n c e t h e e s t i m a t e d o i l t e m p e r a t u r e i s moderate
when H2/C2H2 i s a b o v e j *
and t h e c o n c e n t r a t i o n o f H2 i s medium
and o v e r h e a t i n g o f o i l i s absent
i t follows t h a t a corona i s p o s s i b l e
T h i s t e x t was c o n s t r u c t e d a t r u n - t i m e b y t h e R a d i a l i n t e r p r e t e r
from t e x t fragments p r o v i d e d beforehand by t h e s y s t e m b u i l d e r s . It
d i s p l a y s , i n E n g l i s h , the path through the corona d e c i s i o n t r e e
( F i g u r e 2).
When e x p l a n a t i o n i s r e q u e s t e d a t i n t e r m e d i a t e p o i n t s i n a
s e s s i o n , just the reasoning for the current d e c i s i o n t r e e i s
presented. By a s k i n g f o r e l a b o r a t i o n , t h e u s e r c a n i n s p e c t t h e
reasoning u n d e r l y i n g the c u r r e n t r u l e . E l a b o r a t i o n of the corona
d e c i s i o n above w o u l d y i e l d d e s c r i p t i o n s o f t h e l i n e s o f r e a s o n i n g
which determined the premises: t h a t t h e o i l t e m p e r a t u r e was
moderate, t h a t t h e c o n c e n t r a t i o n o f H2 was medium, e t c . E l a b o r a t i o n
may be r e p e a t e d u n t i l t h e u s e r i s s a t i s f i e d o r u n t i l a l l t h e s t e p s
have been examined.
I f e x p l a n a t i o n i s r e q u e s t e d a t t h e end o f a s e s s i o n , t h e e n t i r e
l i n e o f r e a s o n i n g l e a d i n g up t o t h e l a t e s t t o p - l e v e l c o n c l u s i o n i s
presented i n proof order. Intermediate conclusions are d e r i v e d
b e f o r e t h e y a r e used i n p r e m i s e s .
The number o f l e v e l s o f e x p l a n a t i o n a v a i l a b l e depends on t h e
nesting o f routine c a l l s at run-time. The h i e r a r c h i c a l o r g a n i z a t i o n
o f m o d u l e s makes i t e a s i e r t o c o n t r o l a n d u n d e r s t a n d t h e r u n - t i m e
behavior o f r u l e execution.
E x p l a n a t i o n - d r i v e n expert system b u i l d i n g leads to robust
systems. By t e s t i n g t o e n s u r e t h a t t h e r i g h t c o n c l u s i o n s a r e
reached for the r i g h t reasons, the p r o b a b i l i t y of the reasoning
b e i n g c o r r e c t f o r u n f o r e s e e n s i t u a t i o n s i s enhanced. Quality
e x p l a n a t i o n a l s o makes systems more u s e f u l as t e a c h i n g t o o l s .
E x t e r n a l Processes. The R a d i a l l a n g u a g e s u p p o r t s i n t e r f a c i n g t o
s o f t w a r e w r i t t e n i n t h e v a r i o u s computer languages a v a i l a b l e under
UNIX: F o r t r a n , C, P a s c a l , L i s p , e t c . The R a d i a l l a n g u a g e t a k e s
c a r e o f t h e d e t a i l s o f p a s s i n g arguments t o and from e x t e r n a l
routines. T h i s c a p a b i l i t y a l l o w s R a d i a l t o be u s e d j u s t f o r t h e
C Code G e n e r a t i o n . The p r i m a r y r e p r e s e n t a t i o n o f a R u l e M a s t e r
e x p e r t s y s t e m i s as R a d i a l c o d e , much o f w h i c h i s g e n e r a t e d f r o m
example t a b l e s . The b u i l d i n g a n d t e s t i n g i s c a r r i e d o u t b y
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
D i a g n o s t i c Approach. P o s s i b l e causes o f t r a n s f o r m e r f a i l u r e i n c l u d e
g e n e r a l i n s u l a t i o n d e t e r i o r a t i o n , o v e r h e a t i n g due t o o v e r l o a d ,
s h o r t i n g a t f a i l e d j o i n t s , c o r o n a a c t i v i t y near i n s u l a t i o n , a r c i n g ,
and g r o u n d e d c o r e . E a c h f a i l u r e mode c a u s e s h e a t i n g o f t h e o i l ,
w h i c h may be l o c a l and i n t e n s e o r w i d e s p r e a d and moderate. The o i l
decomposes when s u b j e c t e d t o h e a t , a n d some o f t h e d e c o m p o s i t i o n
p r o d u c t s a r e gases w h i c h d i s s o l v e i n t h e o i l : h y d r o c a r b o n , c a r b o n
monoxide, c a r b o n d i o x i d e , and h y d r o c a r b o n s . The r e l a t i v e
c o n c e n t r a t i o n s o f t h e v a r i o u s gases depends on t h e h e a t i n g h i s t o r y ,
and i s t h e r e b y r e l a t e d t o t h e cause o f f a i l u r e . The c o n c e n t r a t i o n s
o f t h e s e gases c a n be a c c u r a t e l y measured w i t h gas chromatographs,
and t h i s i n f o r m a t i o n u s e d t o d i a g n o s e t h e c a u s e o f a n i n c i p i e n t
breakdown p r i o r t o c a t a s t r o p h i c f a i l u r e .
D i a g n o s i n g a t r a n s f o r m e r ' s c o n d i t i o n from c h e m i c a l a n a l y s i s o f
i t s o i l i s an e x p e r t s k i l l w h i c h has been d e v e l o p e d o v e r t h e p a s t 20
years. I t i s r e l a t i v e l y e a s y t o f i n d s k i l l e d c h e m i s t s who c a n
p r o v i d e t h e c h e m i c a l a n a l y s i s , b u t e x p e r t s who c a n d i a g n o s e a
t r a n s f o r m e r ' s c o n d i t i o n from t h i s d a t a a r e r a r e . The d i a g n o s i s i s
t y p i c a l l y based on a m i x t u r e o f s c i e n c e and h e u r i s t i c r u l e s
d e v e l o p e d from y e a r s o f e x p e r i e n c e .
Training. By u s i n g a d i a g n o s t i c s y s t e m b u i l t by an
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
acknowledged e x p e r t , n o v i c e s c a n q u i c k l y l e a r n t o d i a g n o s e
t r a n s f o r m e r s by o b s e r v i n g d e c i s i o n s w h i c h a r e r e a c h e d and
l i n e s o f reasoning.
D i s t r i b u t e E x p e r t i s e . TOGA a l l o w s n o v i c e s t o p e r f o r m as
experts at chemistry l a b o r a t o r i e s and u t i l i t y sites,
e s p e c i a l l y f o r t h e s i m p l e r a n d more p r e v a l e n t s i t u a t i o n s
c o v e r e d by t h e r u l e s .
Automate D e c i s i o n - m a k i n g P r o c e s s . F o r d a i l y o p e r a t i o n , TOGA
i s r u n a u t o m a t i c a l l y from gas chromatograph output and d a t a
b a s e s (as o p p o s e d t o i n t e r a c t i v e l y ) t o g e n e r a t e e x p e r t
i n t e r p r e t a t i o n of data, t h i s s p e e d s up t h e d a t a a n a l y s i s
t a s k a n d r e m o v e s t h e e l e m e n t o f human e r r o r f r o m r o u t i n e
diagnoses.
A i d E x p e r t W i t h C o m p l e x D e c i s i o n s . TOGA h e l p s p r e v e n t t h e
j u d g m e n t m i s t a k e s w h i c h c a n o c c u r when r a r e t r a n s f o r m e r
c o n d i t i o n s a r e encountered o r when e x p e r t s a r e f o r c e d t o make
a hurried diagnosis.
Table I . TOGA V a l i d a t i o n R e s u l t s
TOGA and E x p e r t :
Transformer
Condition Agreed Disagreed
No P r o b l e m 651 0
Problem 20k k
One w o u l d a l s o l i k e t o c o m p a r e t h e d i a g n o s e s w i t h t h e a c t u a l
t r a n s f o r m e r c o n d i t i o n , and n o t j u s t w i t h t h e e x p e r t ' s p r e v i o u s
assessment o f the c o n d i t i o n . U n f o r t u n a t e l y , t h i s i s u s u a l l y not
possible, i t i s e x p e n s i v e t o remove a t r a n s f o r m e r from s e r v i c e ,
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
Using RuleMaster
a t t r i b u t e s y e t t o be d e f i n e d ( l i k e p r e s e n c e o r absence o f t h e r m a l l y
generated hydrocarbon gases). W h e n e v e r new q u a n t i t i e s w e r e
i n t r o d u c e d , t h e e x p e r t was asked about t h e f a c t o r s used t o determine
it. T h i s p r o c e s s was r e p e a t e d r e c u r s i v e l y u n t i l e v e n t u a l l y t h e
e n t i r e s o l u t i o n was d e s c r i b e d i n terms o f chromatograph d a t a .
A t t h i s p o i n t , t h e r e was enough i n f o r m a t i o n t o g e t h e r t o b u i l d a
f i r s t p r o t o t y p e . Each i n t e r m e d i a t e o r f i n a l c o n c l u s i o n d e f i n e d a
d e c i s i o n module. These modules were o r g a n i z e d i n t o a h i e r a r c h i c a l
structure. W i t h i n each module, example t a b l e s t r u c t u r e s were
created. B a s e d on t h e i n t e r v i e w i n g r e c o r d s , a f i r s t c u t a t t h e
example s e t s was e n t e r e d . At t h i s p o i n t , a running prototype expert
system e x i s t e d .
The v a l u e o f t h i s a p p r o a c h i s t h a t a r u n n i n g e x p e r t s y s t e m i s
r a p i d l y created, without f o r c i n g the expert to a r t i c u l a t e a general
p r o b l e m - s o l v i n g procedure. The p r o t o t y p e system i s a v a i l a b l e f o r
t h e i t e r a t i v e k n o w l e d g e r e f i n e m e n t p r o c e s s , w h i c h d r a w s o u t more
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
Programming S k i l l s . One o f t h e f i r s t s t e p s i n b u i l d i n g R u l e M a s t e r
expert system i s c r e a t i n g the module h i e r a r c h y f o r the p r o t o t y p e .
T h i s r e q u i r e s s k i l l i n top-down d e s i g n and s t r u c t u r e d programming.
P e o p l e w i t h o u t some c o u r s e w o r k a n d e x p e r i e n c e i n t h e s e c o m p u t e r
s c i e n c e d i s c i p l i n e s t e n d t o make m i s t a k e s a n d f l o u n d e r a t t h i s
stage.
For t h e m a j o r i t y o f t h e i t e r a t i v e r e f i n e m e n t p r o c e s s , however,
o n l y m i n i m a l computer s k i l l s a r e r e q u i r e d . The modules a r e s m a l l
e n o u g h s o t h e i r l o g i c c a n be e a s i l y u n d e r s t o o d b y a n y o n e f a m i l i a r
with the a p p l i c a t i o n . Changes a r e u s u a l l y l i m i t e d t o e d i t i n g
e x a m p l e s , and t h e example o r d e r i n g i s not i m p o r t a n t . The i n d u c t i v e
l e a r n i n g a l g o r i t h m a u t o m a t i c a l l y takes care o f c o n t r o l flow. Most
o f k n o w l e d g e r e f i n e m e n t c a n be done b y a n y o n e who knows a l i t t l e
e d i t i n g and f i l e management. This i s often the expert h i m s e l f .
T h e r e f o r e , a d d i t i o n a l programmers w i t h h i g h l y s p e c i a l i z e d
s k i l l s a r e not r e q u i r e d t o add an e x p e r t r e a s o n i n g c a p a b i l i t y t o an
e x i s t i n g computer program. The programmers a l r e a d y on t h e p r o j e c t
can a l s o b u i l d t h e e x p e r t s y s t e m . N o t o n l y d o e s t h i s s a v e money,
b u t t h e s e p e o p l e u n d e r s t a n d t h e p r o b l e m a n d a r e l i k e l y t o do a
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch002
b e t t e r j o b t h a n someone whose p r i m a r y i n t e r e s t l i e s e l s e w h e r e .
Conclusions. TOGA i s an e x p e r t s y s t e m b u i l t w i t h R u l e M a s t e r w h i c h
has b e e n v a l i d a t e d a n d i s i n d a i l y u s e . The p r i m a r y b e n e f i t f r o m
b u i l d i n g TOGA i s t h a t t h e t r a n s f o r m e r d i a g n o s t i c k n o w l e d g e now
e x i s t s i n a f o r m w h i c h c a n be u s e d t o p a s s t h e s k i l l o n t o a new
generation of engineers. HSB w i l l n o t l o s e i t s transformer
d i a g n o s i s c a p a b i l i t y when t h e c u r r e n t e x p e r t r e t i r e s . Other
employees can use the expert system t o diagnose t r a n s f o r m e r s , or
t h e y c a n l e a r n t h e t e c h n i q u e by s t u d y i n g a w r i t t e n v e r s i o n o f t h e
knowledge base.
Other a p p l i c a t i o n s b u i l t w i t h R u l e M a s t e r demonstrate a d d i t i o n a l
reasons f o r b u i l d i n g e x p e r t systems.
WILLARD (3) i s a s e v e r e storms f o r e c a s t i n g e x p e r t s y s t e m w h i c h
can o b t a i n a l l i n p u t d a t a from N a t i o n a l Weather S e r v i c e d a t a l i n e s .
When s e v e r e s t o r m s i t u a t i o n s o c c u r , f o r e c a s t e r s become v e r y busy and
do n o t h a v e t i m e t o u t i l i z e a l l t h e d a t a w h i c h i s a v a i l a b l e . The
e x p e r t system can t a k e o v e r t h e r o u t i n e p o r t i o n o f t h e f o r e c a s t i n g ,
l e a v i n g t h e e x p e r t s f r e e t o focus on t h e more d i f f i c u l t and c r i t i c a l
portions o f the a n a l y s i s .
TURBOMAC 0 0 diagnoses f a u l t s i n l a r g e r o t a t i n g m a c h i n e r y , s u c h
as power g e n e r a t i o n t u r b i n e s . This expert system a l l o w s f i e l d
e n g i n e e r s t o i n c o r p o r a t e t h e r e a s o n i n g o f one o f t h e t o p e x p e r t s i n
v i b r a t i o n d i a g n o s i s i n t h e i r maintenance and o p e r a t i o n a l d e c i s i o n s .
G l o v e A I D (5.) p r e d i c t s t h e most e f f e c t i v e g l o v e m a t e r i a l s t o
c h o o s e f o r p r o t e c t i o n a g a i n s t h a z a r d o u s c h e m i c a l s . T h e r e a r e no
e s t a b l i s h e d e x p e r t s i n t h i s f i e l d , because much o f t h e p r o t e c t i o n
e f f e c t i v e n e s s m e a s u r e m e n t s a r e j u s t now b e i n g p e r f o r m e d . The
i n d u c t i v e l e a r n i n g a s p e c t o f R u l e M a s t e r i s used t o h e l p o r g a n i z e t h e
d a t e w h i c h i s a v a i l a b l e and t o suggest w h i c h measurements s h o u l d be
performed n e x t .
The o b j e c t i v e o f Q u a l A I D i s t o p r o v i d e a d v i c e o n how much
and what t y p e o f q u a l i t y a s s u r a n c e (QA) and q u a l i t y c o n t r o l (QC) i s
needed f o r v a r i o u s t y p e s o f e n v i r o n m e n t a l a n a l y s e s . The purpose o f
t h i s system i s t o p r o v i d e c o n s i s t e n t l y good a d v i c e t o c h e m i s t s whose
p r i m a r y f i e l d o f e x p e r t i s e i s o t h e r t h a n QA/QC.
Literature Cited
QualAId
high
No QA/QC
Litigation M medium
Importance
low
Yes
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch003
sampleAId / \
^ Routine . Advice J
f methodAId
y •( Advice j
Routine J
Inorganic \
Advice I
"*\ Routine .'
Θ A d v i C e
•I " e )
^ "^ ^ V
• Specific \ f *
Methods ι • Advice )
\ Routine ' \ y
Determine Extent of
Method Verification
and Validation
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch003
Determine Number
of Samples Planned
~ i —
Determine
Probable Analyte
Concentration Range
We need to establish what instrument you plan to use for the analysis.
F i g u r e 2. Q u e r i e s f o r t h e Module D e t e c t o r .
Adν ice:
You have your methodology both v e r i f i e d and validated as required for measurements needing a high
level of confidence. But, you must also assure that your analyst Is experienced In performing the
type of analysis you need, that you have standards for the analytes a v a i l a b l e , and that you have a
written quality assurance plan that documents good laboratory p r a c t i c e .
You Indicated that you need a high level of confidence In your data and GC-MS Is an Instrument
combination that w i l l provide you with the c a p a b i l i t y of generating that degree of q u a l i t y . You
should a l s o use c a p î l l a r y GC columns If at a l l possible in order t o obtain the best chromatographic
r e s o l u t i o n . When you have analytes that are completely, or almost completely resolved from other
compounds, the quality of the mass spectra generated w i l l be better. You w i l l a l s o need documenta-
tion that the mass spectrometer is operating and is tuned up c o r r e c t l y and that the resolution
obtained from the gas chromatograph meets your requirements. These requirements should be met with
QC standards for each instrument prior to analyzing your samples.
Less than 10 ppb Is the most d i f f i c u l t range of analytes to quantify with a high degree of c o n f i -
dence. Quantitative measurements which do not f a l l within the "Region of Quantitation" cannot be
For a high level of confidence you w i l l need to have both " f i e l d " and "method" blanks. Field
blanks are blanks from a s i m i l a r source that do not contain the analytes of Interest. Control
s i t e s (uncontamlnated s i t e s ) are used to obtain f i e l d blanks and If f i e l d blanks are not a v a i l a b l e ,
every e f f o r t should be made t o obtain blank samples that best simulate a sample that does not
contain the analyte (such as a simulated or synthetic f i e l d blank). Your method blanks w i l l
consist of a l l solvents, r e s i n s , e t c . that you w i l l use for e x t r a c t i n g , concentrating and cleaning
up the samples prior to a n a l y s i s . You may want about half of these unsplked and the remainder
spiked with known levels of your analyte standards. Similarly you may want to spike about half of
your f i e l d blanks with known levels of your analyte standards so that any matrix e f f e c t s wllI be
Identified during the a n a l y s i s . This plan would provide you with:
The total number of blanks you would need, based on the number of samples you plan to take, Is: 4.
(RETURN continues)
[Note: 4 blanks were recommended even though only 2 samples were planned.]
GloveAId
4J
cd
ο· ο·
> £>£ > £ o* Pu Ρ-ι ο· £4
•J
pU PU Οι pU pUPUPUC-O-pQPUPU Pu Ο pU Pu
>
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch003
φ
G
Φ PU [χι ο ·
μ pqpuOc-c^c-PuPu
pu pu pu pu pE4
α
ο
P* pq Pu pq puPno-c*-PupqPuPn pq pq pq o . pq
Ρ-ι >
μ
rH 0) pu pu Pu Pu pqpqpqpLipqpqpqpq pq Pu ο pq Pu
> >
a
ο
-P pq pq pq pq pqg4pqpuPupqpupq Pu * Ο pu
•H
>
Ό
0)
Cd
Ό cd
(D β Ό
α •Ρ Φ Φ
ο S o cd
tJO +J
Φ rH (3
60 CO φ
00 8
rH Ο θ4
(0 -Ρ rH
w
α
ο
f d Ο Q)
ι
υ
G Λ
1
Φ
•Η
r
•a •Η Ο ο Φ rH
cd 4J · Η •ri
Cd -Ρ
*d id φ Φ φ
ο •Ρ
cd rC μ «1 •Ρ
cd d
ο Λ Θ 0 Φ Ό Ό Μ Φ
·Η Φ Φ
M
•Η Ο Ο «Ο · Η · Η Ο 4J μ
. . .Μ. μ
rH
α pq
(χ ρα
ωw w
<< < 333 <j
Η es m 4 l O v O N C O C ^ O H C M en «tf m vo
Rule Induction
T a c t i l i t y requirement Is moderate t a c t i l e
Nltrlle
Approximate cost Is $3.00 per pair of gloves.
Protection time Is probably greater than 5 minutes
T a c t i l i t y Is moderately t a c t i l e
Butyl Rubber
Approximate cost Is $10.00 per pair of gloves.
Protection time (s probably greater than 200 minutes
T a c t i l i t y Is not t a c t i l e
/·
CLASSι AROMATIC NOT HALOGENATED
·/
MODULE! c l a s a l O
STATEt only
CONDITIONS:
glove [ a s k "What 1· t h e g l o v e type?"
N n
Butyl_Rubbor N«oproni N1trUo»PVA PVC V1ton ]
v v v v
Viton 78 80 0 3 =>(baet,G0AL)
Viton 130 195 0 0 =>(beet,GQAL)
VUon 106 136 0 0 =>(ba8t,G0AL)
Viton 106 144 0 1 =>(beet,G0AL)
ACTIONS:
best [ a d v i s e " T h i s g l o v e has a * b e s t * r a t i n g . " ]
good [ a d v i s e " T h i s g l o v e has β *good* r a t i n g . " )
fair [ a d v i s e " T h i s g l o v e has a * f a i r * r a t i n g . " ]
poor [ a d v i s e " T h i s g l o v e hes e • p o o r * r a t i n g . " ]
veryj>oor [ a d v i s e "Thia glove hat a *very poor* r a t i n g . " ]
<cless10>
0 ( a l l states)
1 only
[glove]
Butyljfcbber : [molwt]
<118 ι => ( f a i r , GOAL )
>=118 : => ( good, GOAL ]
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch003
MODULE : classIO
STATE: only m
IF [ask "What is the glove type?" g
"Buty l_Rubber,Neoprene,Nitrile,PVA,PVC,V iton"] I S >
"Butyl_Rubber" : IF [ integer.read "What is the molecular weight?" < "118" ] IS *
"Τ" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] ^
ELSE [ advise "This glove has a *good* r a t i n g . " , GOAL ] Η
"Neoprene" : C advise This glove has a * f a i r * r a t i n g . " , GOAL ] >
" N i t r i l e " : IF [ integer.read "What is the molecular weight?" < "92" ] IS 5
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL j
ELSE IF [ integer.read "What is the molecular weight?" < "118" ] IS
"T" : IF [ integer.read "What is the b o i l i n g point?" < "137" ] IS X
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] >3
ELSE IF [ integer.read "What is the b o i l i n g point" < "139" ] IS SL
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] ^
ELSE IF [ integer.read "What is the b o i l i n g point?" < "142" ] IS
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] g
ELSE [ advise "This glove has a *poor* r a t i n g . " , GOAL ] §·
3
ELSE IF [ integer.read "What is the molecular weight?" < "139" ] IS
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] ^
ELSE [ advise "This glove has a * b e s t * r a t i n g . " , GOAL ] <§
"PVA" : IF [ integer.read "What is the molecular weight?" < "92" ] IS 2
3
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ]
ELSE IF [ integer.read "What is the molecular weight?" < "118" ] IS o>
"T" : IF [ integer.read "What is the b o i l i n g point?" < "137" ] IS ^
"T" : [ advise "This glove has a * f a i r * r a t i n g . " , GOAL ] £
ELSE [ advise "This glove has a * b e s t * r a t i n g . " , GOAL ] J*
ELSE [ advise "This glove has a * b e s t * r a t i n g . " , GOAL ] ^
"PVC" : [ advise "This glove has a *very poor* r a t i n g . " , GOAL ]
ELSE [ advise "This glove has a * b e s t * r a t i n g . " , GOAL ]
Nitrile
KWSSSNN
Neoprene
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch003
Butyl Rubber
PVA
Butyl Rubber
\,'.'.'t'.'.'y//////a
I» » * *
IZ3
Summary
Neoprene
PVA i f PVA i f
MW >= 118 MW < 92
- or - - or -
MW >= 92 -< 118 MW >= 92 -< 118
and bp >= 137 and bp < 137
Viton PVC
MW = Molecular Weight
bp = B o i l i n g Point
BuR = Butyl rubber
PVA = P o l y v i n y l acetate
PVC = P o l y v i n y l chloride
Literature Cited
James C. Bellows
0097-6156/86/0306-0052$06.00/0
© 1986 American Chemical Society
·» i—
DC Ό
C
tO
•
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch004
C t-
«C=3
ι— (Λ
O- </>
0)
t- Ι
Ο) C L
Ο Φ
D- 4->
tO
tO »r~
TJ
φ
ε α)
tO +·>
L- C
σ>·ι~
to
•r~ *
Ο 0)
t- ·
Ο 3 >>
•r~ tO r—
tO Φ >
ε t- «r-
χ: ο
ο .c <υ
οο σ α
-σ -c eu
φ 1~
«4- - C *
•r- 4-> CO
ι— Q)
α. ω c
ε -M
•r Ό JD
to ο t-
•r- 3
X J •»->
• C
r H «r- φ
Ι
Ο) Û - 3
C I tO
3 CO
σ>Ό α;
11 Ό Q .
Definitions
was decided that the data gathering computer at the plant should
be sophisticated enough to determine that something is happening
and make a special transmission of data at that time. The monitor
set has been chosen to allow high r e l i a b i l i t y diagnosis of common
power plant conditions, but it will support some unusual
conditions as w e l l . Those unusual conditions are included in the
diagnostic system simply because the supporting data are
present. F i n a l l y , i t was decided that no information which might
be relevant, including manual analysis data, should be rejected
completely, and manual entry points have been included for that
data. Manual entry of data requires validation of the data before
entry.
Monitoring System
Diagnostic System
Condensate Condensate
Sensor Pump Polisher Economizer Hot
Description Discharge Effluent Inlet Reheat Makeup
Cation
Conductivity Χ X X X
Specific
Conductivity Χ X X X
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch004
Sodium Χ X X X
Chloride Χ X X X
Dissolved
Oxygen Χ X X
Hydrazine X
pH Χ X X
Silica Χ X
A i r Exhaust X
Makeup Flow
Electrical
Load aspect of the rule is known as s uXf f i c i e n c y .
This The rule w i l l
also state that i f the evidence is known to be false with absolute
c e r t a i n t y , that the conclusion w i l l be known to be false (or true)
with another s p e c i f i c confidence. This aspect of the rule i s
known as necessity. The sufficiency and necessity need not be
equal. There are many times when something may indicate the
presence of a condition but not be a necessary consequence of that
condition. The increases in monitor readings that occur at the
start of malfunctions are good examples of indicators which will
signal the presence of a malfunction, but when the malfunction
becomes stable at some s e v e r i t y , the increase w i l l no longer be
present. Of course a high value for the monitor reading w i l l then
be present. Evidence may be sensors or the conclusions from other
r u l e s . Several rules may support a single conclusion and the same
evidence may be used for several r u l e s .
When the system i s used to diagnose the power plant
chemistry, the inference engine w i l l activate a l l the rules for
which evidence e x i s t s . Thus a l l possible conclusions are examined
EVIDENCE NODE 1
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch004
RULE
NODE 2
Since the diagnostic process must be broken down into small steps,
the process of building the rule base is much l i k e that of
training an able, but rather ignorant person.
It has been a r b i t r a r i l y decided to say that any malfunction
for which there is less than 30% confidence i s probably not
present with enough severity to cause concern. Between 30% and
50% confidence, one should be concerned that the malfunction may
be developing. This represents an early warning, but with
increased p o s s i b i l i t y of e r r o r . Between 50% and 70% confidence,
action is appropriate to confirm or disconfirm the presence of the
malfunction by c o l l e c t i n g additional information, i f necessary.
Above 75% confidence, a plant malfunction is present with enough
confidence that action ought to be taken to correct the
malfunction. Action on a sensor malfunction indication should
take place above 50% confidence, since by that time the system has
l o s t substantial s e n s i t i v i t y to the plant malfunctions supported
by the sensor.
Malfunction Description
Numbers
Cation Conductivity 5
S p e c i f i c Conductivity 4
pH 4
Dissolved Oxygen 3
Sodium 2
Note: These values were held constant to show the effect of the
variation in the single variable.
I OTHER
SENSOR
(SENSORS
1 _ £
/ SENSOR \ INTERPRETATION
(DIAGNOSTICS/
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch004
SENSOR
MALFUNCTION
VALIDATED
INTERPRETATION
PLANT
EQUIPMENT
MALFUNCTIONS
.4
CONDENSATE
3
CATION ·"
CONDUCTIVITY
.2H
>
CONDENSATE H H
CATION
CONDUCTIVITY Ο
RESIN >
EXHAUSTION 0
CF
H
CONDENSER Ο
LEAK C F m
η
m
>
SF η
EVALUATION
RECOMMENDATION SUMMARY
PL3
I
L 1
I Find and repair air leak above hotwell waterline within Select
- 100 hr. Unit
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch004
I I
1 1
Remove polisher vessel #3 from service and regenerate
— within 8 hr.
Diagnostic
Summary
Diagnostic
Procedures
Explanation
RECOMMENDATION
Explanation
Summary
Acknowledgments
Literature Cited
0097-6156/ 86/0306-O069$06.00/0
© 1986 American Chemical Society
Process I n t e l l i g e n t Control
CAPTURE
y y
RULES DIAGRAM
/
/
I/O
/
RTIME MEMORY
0097-6156/86/0306-0075$06.00/0
© 1986 American Chemical Society
l a t e d l a n g u a g e , w h i c h were o r i g i n a l l y not as w e l l s u i t e d f o r c a l c u l a -
t i o n as f o r l o g i c a l m a n i p u l a t i o n . More r e c e n t l y i t has been
p o s s i b l e t o g e t an e x p e r t system t o s u p e r v i s e c a l c u l a t i o n s , d i g e s t
c o n s i d e r a b l e masses of o b s e r v a t i o n a l d a t a , and draw c o n c l u s i o n s which
a r e not s t r i c t l y c o m p u t a t i o n a l , as i n t h e c a s e of ELAS and t h e o i l -
w e l l d r i l l i n g programs. These i n v o l v e t h e EXPERT system b u i l d e r
( 1 ) , w h i c h has t h e f o l l o w i n g advantages: i t i s w r i t t e n i n FORTRAN
and can t h e r e f o r e e a s i l y communicate w i t h FORTRAN programs; a PROLOG
v e r s i o n has a l s o r e c e n t l y been p r e p a r e d ; i t has d a t a b a s e c a p a b i l i -
t i e s ; and i t i s good a t e x p l a i n i n g what i t i s d o i n g and why. Inter-
a c t i o n between a r t i f i c i a l i n t e l l i g e n c e and m o d e l i n g has e v o l v e d t o
where m o d e l i n g s o c i e t i e s r o u t i n e l y program a r t i f i c i a l i n t e l l i g e n c e
s e s s i o n s a t m e e t i n g s , and a r e f o r m i n g t e c h n i c a l committees on t h i s
subj e c t .
T h i s paper r e f l e c t s t h e p a s t a c t i v i t i e s of some of i t s a u t h o r s
i n computer m o d e l i n g of t h e c h e m i c a l a s p e c t s of b i o l o g i c a l systems.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
T h i s a c t i v i t y r e q u i r e s e x p e r t i s e i n b o t h m o d e l - b u i l d i n g and i n t h e
relevant biology. I t a l s o i n v o l v e s e x a m i n a t i o n of the a c t i o n s of and
r e s u l t s o b t a i n e d by e x p e r t s , l i k e t h a t r o u t i n e l y done i n b u i l d i n g ex-
p e r t systems. I t a l s o i n v o l v e s k e e p i n g t r a c k o f and coherently
e x p l a i n i n g sequences o f d e c i s i o n s , w h i c h e x p e r t systems a r e equipped
t o do.
In t h i s paper we a r e c o n c e r n e d w i t h a s e t of r e l a t i v e l y s i m i l a r
p o s s i b l e a p p l i c a t i o n s i n v o l v i n g management of c a l c u l a t i o n s and of
modeling. These i n v o l v e a c t i o n s ( c a l c u l a t i o n , i n f o r m a t i o n r e t r i e v a l ,
and " i n t e l l i g e n t " r e a s o n i n g ) a t more t h a n one h i e r a r c h i c a l l e v e l .
P a r t i c u l a r a t t e n t i o n w i l l be g i v e n t o t h e d e s i g n and i n t e r p r e t a t i o n
of e x p e r i m e n t s i n enzyme k i n e t i c s . D e s i g n i n g an experiment may i n -
v o l v e c o m p u t a t i o n o f o p t i m a l c o n d i t i o n s , and i t s i n t e r p r e t a t i o n may
i n v o l v e f i t t i n g of o p t i m a l parameters of a model, but n o n - n u m e r i c a l
reasoning procedures are a l s o involved. Attention i s therefore re-
q u i r e d t o t h e k i n d s of r e a s o n i n g employed i n d e s i g n i n g e x p e r i m e n t s
and t o t h e c r i t i q u i n g o f t h e r e a s o n i n g and t e c h n i q u e s i n v o l v e d i n
such experiments. A h i g h - l e v e l d e s c r i p t i o n o f an e x p e r i m e n t a l d e s i g n
c y c l e can be g i v e n i n s u c h s t e p s a s : d e f i n i t i o n of the problem (what
q u e s t i o n s a r e t o be a d d r e s s e d ? what h y p o t h e s e s a r e t o be t e s t e d ? ) ;
q u a n t i t a t i v e m o d e l i n g ; d e s i g n and t h e n p e r f o r m a n c e of t h e n e c e s s a r y
e x p e r i m e n t s ; a n a l y s i s of t h e r e s u l t s ; and t h e n model r e i n t e r p r e t a t i o n
and p o s s i b l e problem r e d e f i n i t i o n (2).
A Problem of D e f i n i t i o n
The p r o c e s s of b u i l d i n g e x p e r t systems u s u a l l y i n v o l v e s d e t e r m i n i n g
the c o n c e p t u a l framework and p a t t e r n o f d e c i s i o n making of e x p e r t s
( o f t e n one o u t s t a n d i n g e x p e r t ) . These a r e o f t e n not w r i t t e n down
and may not be c l e a r l y e x p l a i n a b l e b e c a u s e t h e r e i s heavy r e l i a n c e on
h e u r i s t i c s and even hunches. However, we would l i k e t o s u g g e s t t h a t
t h i s may not be t h e o n l y way t o a p p l y e x p e r t i s e . We have e n c o u n t e r e d
workers i n d i f f e r e n t f i e l d s h a n d l i n g t h e same s u b j e c t m a t t e r d i f f e r -
e n t l y b e c a u s e t h e y have d i f f e r e n t c o n c e p t u a l frameworks and d i f f e r e n t
j a r g o n as w e l l as d i f f e r e n t h e u r i s t i c s and p r i o r i t i e s . We o f f e r t h e
f o l l o w i n g example i n v o l v i n g a r e l a t i v e l y s i m p l e m u l t i p l e e q u i l i b r i u m
calculation.
A l t h o u g h t h e r e i s no c o n t r o v e r s y about t h e b a s i c d e f i n i t i o n o f
s t a b i l i t y c o n s t a n t s , p h y s i c a l c h e m i s t s and b i o c h e m i s t s h a n d l e t h e
c o n c e p t s i n v o l v e d and t h e r e s u l t i n g c a l c u l a t i o n s d i f f e r e n t l y . Physi
c a l c h e m i s t s t h i n k i n terms o f r e a c t i v e s p e c i e s and b i o c h e m i s t s i n
terms o f t o t a l c o n c e n t r a t i o n s o f components. A f u r t h e r source of
1 1
c o n f u s i o n i s t h e d i f f e r i n g d e f i n i t i o n s o f "apparent c o n s t a n t . To a
p h y s i c a l chemist t h e s t a b i l i t y c o n s t a n t f o r MgATP formation
2+ 4- 2-
Mg + ATP = MgATP
i s d e f i n e d as
2
CMgATP "*]
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
^ =
CMg 2 +
][ATP -] 4
*SlgATP
Κ = *
C k + 3
< 1 + C H + 3 K
HATP
+
W
W h i l e i t i s r e l a t i v e l y easy t o show t h a t t h e two c a l c u l a t i o n s
a r e e q u i v a l e n t i n s i m p l e systems, i t i s n o t so easy w i t h more comj^
p l e x i n v i v o systems, a s when t h e s e e q u i l i b r i a a r e s t u d i e d w i t h ?
NMR s p e c t r a f r o m p e r f u s e d o r i n t a c t o r g a n s . We r e c e n t l y (3) became
i n v o l v e d i n a c o n t r o v e r s y where a 4 - f o l d d i f f e r e n c e i n magnesium i o n
l e v e l was c a l c u l a t e d f r o m s u b s t a n t i a l l y i d e n t i c a l NMR s p e c t r a as a
r e s u l t o f such d i f f e r e n c e s i n d e f i n i t i o n . Our e x p e r i e n c e i n d i c a t e s
t h a t an i n t e l l i g e n t program t o s u p e r v i s e such c a l c u l a t i o n s would be
quite useful.
In such a s i t u a t i o n an i n t e l l i g e n t program may f u n c t i o n a s an
" i n t e l l i g e n t i n t e r f a c e " , a program which can t r a n s l a t e i n f o r m a t i o n
from one c o n c e p t u a l framework t o a n o t h e r . Even though t h e r e a r e many
e x p e r t s i n t h e s u b j e c t m a t t e r i n v o l v e d , programs o f t h i s t y p e would
be u s e f u l f o r t h e many o t h e r s who a r e n o t e x p e r t i n t h e s u b j e c t
m a t t e r o r t h e c a l c u l a t i o n s i n v o l v e d o r who have d i f f i c u l t i e s i n com
munication. The advent o f s o f t w a r e f o r s m a l l e x p e r t systems on
m i c r o c o m p u t e r s would add t h e advantage o f c o n v e n i e n c e as w e l l .
Applications
We d e s c r i b e h e r e t h r e e p o s s i b l e a p p l i c a t i o n s o f e x p e r t systems t o
s u p e r v i s e c a l c u l a t i o n s and d e s i g n e x p e r i m e n t s w h i c h a r e l a r g e l y chem-
i c a l l y based, a l t h o u g h t h e y have b i o l o g i c a l c o n t e n t as w e l l . These
are arranged i n a h i e r a r c h i c a l l y i n c r e a s i n g order of complexity
( i . e . , each l e v e l needs t h e c a p a b i l i t i e s o f t h e p r e c e d i n g o n e ) . The
s i m p l e s t o f t h e s e a p p l i c a t i o n s i s t o s u p e r v i s e complex e q u i l i b r i u m
calculations. The example d e s c r i b e d i s o f a t y p e w h i c h o f t e n o c c u r
i n s t u d y i n g b i o l o g i c a l systems where i t i s n e c e s s a r y t o c o n t r o l c o n -
c e n t r a t i o n s of r e a c t i v e s p e c i e s . Such c a l c u l a t i o n s a r e o f t e n n o t
p r o p e r l y handled.
C a l c u l a t i o n s I n v o l v i n g Magnesium Ions
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
c o n s i d e r a b l e d i f f i c u l t y , and w h i c h c o u l d be h a n d l e d by p r o v i d i n g an
e x p e r t system w i t h t h e n e c e s s a r y c o n v e r s i o n a l g o r i t h m s and d a t a .
Such a system would i n c l u d e a program s i m i l a r t o t h a t of S t o r e r and
Cornish-Bowden t o do e q u i l i b r i u m c a l c u l a t i o n s . A communication-
c o n t r o l subprogram would be l i i k e d t o an e x p e r t model by u s i n g t h e
EXPERT knowledge-base s h a l l ( o r s y s t e m - b u i l d e r ) w h i c h i s advantageous
h e r e because i t can i n t e r a c t w i t h p r o c e d u r e s such as t h o s e w r i t t e n i n
FORTRAN f o r n u m e r i c a l c o m p u t a t i o n . A d d i t i o n a l programs and a s m a l l
d a t a base, w h i c h EXPERT can h a n d l e , would keep t r a c k o f w h i c h chemi-
c a l was what a r r a y element, and o t h e r r e q u i r e m e n t s mentioned above.
The system c o u l d be used t o answer q u e s t i o n s such a s :
How c o u l d I add t o a s o l u t i o n c o m b i n a t i o n s of ATP and magnesium
i o n so t h e i r c h e l a t e i s c o n s t a n t and f r e e ATP varies systematically
so as t o d e f i n e families o f c u r v e s w i t h ( d i f f e r e n t ) c o n s t a n t magnesium
ion?
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
T h i s t y p e o f c a p a b i l i t y c o u l d be extended t o m a g n e s i u m - c o n t r o l -
l e d enzymes w i t h o u t s u b s t a n t i a l e x p e r t i s e r e g a r d i n g t h e i r k i n e t i c s
by a d d i n g a n o t h e r l i m i t e d e x p e r t system t o manage s i m p l e c a l c u l a t i o n s
i n v o l v i n g m o d i f i c a t i o n s to t h e i r k i n e t i c s . T h i s would r e q u i r e a d d i n g
a s m a l l d a t a base o f t h e b i n d i n g and i n h i b i t i o n c o n s t a n t s of magne-
sium i o n w i t h i m p o r t a n t enzymes. We have assembled t h i s i n f o r m a t i o n
f o r some o f t h e enzymes we have worked w i t h (6) . T h i s would p e r m i t
answering q u e s t i o n s l i k e :
How much magnesium i o n can I add to s o l u t i o n X without inhibit-
i n g enzyme Y by more t h a n 10%?
C a l c u l a t i o n s I n v o l v i n g Enzyme K i n e t i c s
S e l e c t i o n o f a c o n c e p t u a l model. As t h e f i r s t s t e p i n m o d e l i n g , i t i s
n e c e s s a r y t o d e c i d e what k i n d of a c o n c e p t u a l model t o t r y . F o r an
enzyme t h i s i n c l u d e s a c h o i c e of mechanism and an i n d i c a t i o n o f t h e
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
n u m e r i c a l v a l u e s t h a t go w i t h i t ( d e t e r m i n a t i o n o f t h e b e s t v a l u e s
comes l a t e r ) . P r o b a b l y t h i s w i l l be b e t t e r done by an e x p e r t human
than by a program f o r some t i m e . Examples o f r u l e s (domain knowledge)
f o r enzyme k i n e t i c s w h i c h a r e a p p l i c a b l e ( r e g a r d l e s s o f t h e methods
of c a l c u l a t i o n used) a r e :
1. K i n a s e s u s u a l l y have Km*s f o r ATP c o n s i d e r a b l y lower t h a n
t i s s u e l e v e l s o f ATP.
1
2. Most o t h e r Km s a p p r o x i m a t e t h e u s u a l t i s s u e l e v e l of t h e
substrate involved.
3. C e r t a i n c l a s s e s o f enzymes tend t o have c h a r a c t e r i s t i c
mechanisms. (Examples: t r a n s a m i n a s e s o f t e n have ping-pong mechan-
isms, k i n a s e s u s u a l l y do n o t ) .
4. The commonly used l i n e a r i z e d p l o t s o f k i n e t i c d a t a a r e a
u s a b l e i n i t i a l g u i d e t o d e t e r m i n i n g t h e mechanism.
A major part of the slow and expensive drug development process con-
s i s t s of t e s t i n g to determine that a given p o t e n t i a l drug i s both
safe and e f f e c t i v e . The number of drug (or cosmetic) t o x i c i t y t e s t s
performed annually i n the United States i s very l a r g e , involving
perhaps 15 m i l l i o n animals and considerably more d o l l a r s . The
expense of t e s t i n g and q u a l i f i c a t i o n may be p r o h i b i t i v e f o r use i n
m e t a b o l i c r a t e o f t h e a n i m a l , e t c . To some e x t e n t t h i s approach
("physiological pharmacokinetics") i s a chemical engineer's formula-
t i o n o f p h a r m a c o k i n e t i c problem. The r a t e a t w h i c h a g i v e n drug i s
d e l i v e r e d t o a m e t a b o l i z i n g o r t a r g e t organ by t h e plasma ( w i t h i t s
l e v e l of drug) i s c a l c u l a t e d a l o n g w i t h t h e r a t e s o f m e t a b o l i s m o r
d e t o x i f i c a t i o n by such o r g a n s , as w e l l as t h e r a t e o f removal of t h e
drug ( o r i t s m e t a b o l i t e s ) from t h e body. From t h i s i n f o r m a t i o n t h e
t o t a l and f r e e ( a f t e r b i n d i n g t o p r o t e i n s , e t c . ) organ c o n t e n t of t h e
drug and t h e l e v e l a t t h e a c t i v e s i t e i s c a l c u l a t e d . T h i s method i s
based on t h e o r d e r l y change o f many a n a t o m i c a l and p h y s i o l o g i c a l
p r o p e r t i e s w i t h body w e i g h t . Anatomical dimensions i n c r e a s e n e a r l y
l i n e a r l y w i t h w e i g h t , w h i l e p h y s i o l o g i c a l r a t e s v a r y as t h e .7 t o .8
power ( 1 4 ) . P h y s i o l o g i c a l p r o c e s s e s a r e t h e r e f o r e s l o w e r i n l a r g e r
a n i m a l s ; t h e c a r d i a c o u t p u t o f a mouse p e r body weight i s about an
o r d e r o f magnitude h i g h e r t h a n t h a t o f a man. T h i s t r e n d i s coherent:
t h e d i s p o s i t i o n h a l f - l i f e o f h e x o b a r b i t a l a p p r o x i m a t e s 1,680 g u t -
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
a b i l i t y t o h a n d l e d e s i g n o p t i m a l i t y problems l i k e t h o s e mentioned
above a r e i m p o r t a n t i n t h e s i m u l a t i o n program, i n a d d i t i o n t o t h e
good d a t a - b a s e and e x p l a n a t i o n c a p a b i l i t i e s o f EXPERT. Such an ex-
p e r t system c o u l d t h e n b u i l d m u l t i - s p e c i e s p h a r m a c o k i n e t i c models by
t h e method of B i s c h o f f and D e d r i c k . A f t e r r e p e a t i n g t h e i r work as
t h e t e s t c a s e , t h i s e x p e r t system c o u l d be u s e d f o r t h e o t h e r drugs
whose k i n e t i c s have been s u f f i c i e n t l y s t u d i e d ( i n c l u d i n g sampling i n
s e v e r a l t i s s u e s ) as r e q u i r e d f o r such a n a l y s i s . Subsequent e x t e n s i o n
to i n c l u d e a d d i t i o n a l m e t h o d o l o g i e s i s p o s s i b l e ( e . g . d e t a i l e d r e p r e -
s e n t a t i o n o f enzyme k i n e t i c s ) . Model c o n s t r u c t i o n w i t h o n l y p a r t of
the o r i g i n a l d a t a c o u l d t h e n be r e p e a t e d t o d e t e r m i n e t h e need f o r
completeness o f ( e x p e r i m e n t a l l y determined) i n f o r m a t i o n , i . e . , w h i c h
and how many a n i m a l e x p e r i m e n t s a r e r e a l l y n e c e s s a r y . Such c o n s i d e r -
a t i o n s a r e i m p o r t a n t i n drug t e s t i n g , and an e x p e r t system would h e l p
b o t h by d o i n g t h e m o d e l i n g f a s t e r t h a n a human, and a l s o more
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch006
systematically.
A w e l l - e s t a b l i s h e d s p e c i a l i z e d e x p e r t system w i t h w h i c h t h e
proposed e x p e r t system c o u l d be compared i s t h e d i g i t a l i s a d v i s o r of
S z o l o v i t z and Long (19) w h i c h r e p r e s e n t s a w e l l - u n d e r s t o o d c l i n i c a l
situation. I t p e r f o r m s c l i n i c a l f u n c t i o n s beyond t h e scope o f t h i s
proposed system, but i t does do some t h i n g s , l i k e m a i n t a i n i n g t h e
b l o o d l e v e l o f t h e drug i n v o l v e d , and m o n i t o r i n g i t s t o x i c i t y , that
t h i s proposed system i s c o n c e r n e d w i t h and s h o u l d p e r f o r m a d e q u a t e l y .
S t a r t i n g w i t h a p p r o p r i a t e knowledge o f t h e b e h a v i o r o f a p r o s -
p e c t i v e drug i n one s p e c i e s one c o u l d then e x t r a p o l a t e t o o t h e r
s p e c i e s , u l t i m a t e l y i n c l u d i n g humans. T h i s c a p a b i l i t y c o u l d be used
i n t e s t i n g a proposed drug t o d e t e r m i n e p r o p e r dosage and regimen
under what c o n d i t i o n s i t (and p o s s i b l y i t s m e t a b o l i t e s ) i s t o x i c , and
how s e n s i t i v e i t s b e h a v i o r might be t o p e r t u r b i n g c o n d i t i o n s , w h i c h
p r e s e n t l y have t o be r e - p e r f o r m e d f o r each s p e c i e s i n v o l v e d by empir-
i c a l l y and h e u r i s t i c a l l y g u i d e d e x p e r i m e n t s . I t i s reasonable to
hope f o r s i g n i f i c a n t l y improved e f f i c i e n c y i n p e r f o r m i n g t h e s e
expensive o p e r a t i o n s .
Conclusion
We have d e s c r i b e d a s e t o f a p p l i c a t i o n s of a c o n v e n t i o n a l e x p e r t s y s -
tem w h i c h extend t h e u s u a l f u n c t i o n s o f such systems from p r i m a r i l y
l o g i c a l r e a s o n i n g and s o l u t i o n o f c l a s s i f i c a t i o n problems t o i n c l u d e
s u p e r v i s i o n of c a l c u l a t i o n s and o f m o d e l i n g , i . e . , systems manage-
ment. A h i e r a r c h y of a p p l i c a t i o n s a r i s i n g from b i o c h e m i c a l r e s e a r c h
have been d i s c u s s e d . These f o l l o w b i o l o g i c a l systems i n b e i n g p r i -
m a r i l y c h e m i c a l a t t h e l o w e s t l e v e l but a c q u i r e more b i o l o g i c a l
character at the higher l e v e l s . At t h e l o w e s t l e v e l , t h e s e p e r m i t
t h e c o n v e n i e n t performance of c a l c u l a t i o n w h i c h i s n o t b e i n g done o r
done p r o p e r l y . At t h e i n t e r m e d i a t e l e v e l , t h e y p r o v i d e a b e t t e r
r e s e a r c h t o o l , e s p e c i a l l y f o r e x p e r i m e n t a l d e s i g n . At t h e most com-
p l e x l e v e l , t h e y would p e r m i t a complex, slow, and e x p e n s i v e p r o c e s s
to be c a r r i e d out w i t h l e s s r e s o u r c e e x p e n d i t u r e ( c a l e n d a r t i m e ,
money, and a n i m a l e x p e r i m e n t s ) .
Acknowledgments
Literature Cited
10. Weiss, S.; Kulikowski, C.; Apte, C.; Uschold, M.; Patchett, J.;
Briggham, R. M.; Spitzer, B. Proc. 2nd Annual Nat'l. Conf. on
A r t i f i c i a l Intelligence, Pittsburgh, PA,1982, 322.
11. Cornish-Bowden, A. Biochem. J. 1977, 165, 55.
12. Endrenyi, L. "Kinetic Data Analysis"; Endrenyi, L., Ed,;
Plenum Press: New York, 1981, p. 137.
13. Bischoff, Κ. B. Cancer Chemotheraphy Reports 1975, 59, Part 1,
p. 777.
14. Adolph, E. F. Science 1949, 109, 579.
15. Dedrick, R. L. J. Pharmacokinet. Biopharm. 1, 1978, 435.
16. Bischoff, Κ. B.; Dedrick, R. L.; Zaharko, D. S.; Longstreth, J.
A. J. Pharmaceutical Sciences 1971, 60, 1128.
17. Zaharko, D. S.; Dedrick, R. L.; Oliverio, V. T. Comp. Biochem.
Physiol. 1974, 42A, 183.
18. Bischoff, Κ. B. Fed. Proc. 1980, 39, 2456.
19. Szolovitz, P.; Long, W. J. " A r t i f i c i a l Intelligence in
Medicine", Szolovitz, P., Ed.: AAAS Selected Symposium 51;
Westover Press: Boulder, Colo.; 1982; p. 79.
What Is An A g r i c u l t u r a l Formulation
0097-6156/86/0306-O087$06.00/0
© 1986 American Chemical Society
Emulsifiable
Concentrate
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch007
Determine
Solvent
Det<srmine
Em jlsifier I
Emulsifier Emulsifier
1 2
F i g u r e 1. S t r u c t u r e o f the Problem
Load Relevant
Rules and Hypotheses
Collect
Background
User Information
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch007
Forward Chain
Sort Hypotheses
Reverse Chain on
Best Hypothesis
JCollect Additional
User Information
Fact name
Property 1
Value
Where to f i n d i t (Ask, prove, c a l c u l a t e )
Prompt (How to ask user)
Allowable response (Checks user's response)
Explanation (For prove and c a l c u l a t e )
Property 2
Value
Conclusion Name
Branch Point 1
Value
Where to f i n d i t (prove)
Explanation ( T e l l the user i f i t i s true)
Next l e v e l name
Background facts (Questions always asked)
Rule names ( L i s t of relevant rules)
Branch Point 2
Value
AgRule_l
If-1 (Isequal Active_Ingredient Desired_Level Value >40)
Then-1 (Suggest Form_Type EC -.5)
Then-2 (Suggest Form_Type WSL -.5)
Then-3 (Suggest Form_Type Flowable -.5)
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch007
Agrulel
IF
1. The value of the a c t i v e ingredient's desired concentration
i s >40%
THEN
1. There i s suggestive evidence (-0.5) that the
formulation type should not be emulsifiable concentrate
2. There i s suggestive evidence (-0.5) that the
formulation type should not be water soluble l i q u i d
3. There i s suggestive evidence (-0.5) that the
formulation type should not be flowable concentrate
BECAUSE:
EC's, WSL's and Flowables r a r e l y have that high an AI l e v e l
AgRule_13
If-1 (Isequal Solvent Req_EPA_Clear Value C)
Then-1 (Avoid NotEqual EC_Solvent EPA_Clear C -1)
Why I t ' s the law
Date 12/20/83
Author Hohne
Agrulel3
IF
1. The value of the solvent's required EPA clearance
is C
THEN
1. Avoid (-1) e m u l s i f i a b l e concentrate solvents where EPA clearance
i s not equal to C
BECAUSE
I t ' s the law
Predicate Meaning
BIGGER Bigger than
SMALLER Smaller than
MEMB Member of the l i s t
NOTMEMB Not a member of the l i s t
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch007
ISEQUAL Is equal to
NOTEQUAL Not equal to
Table V. Actions
Action Meaning
SUGGEST Adjust the property's value using the l i s t e d
confidence factor
SET_EQUAL Set the property's value equal to the l i s t e d value
Literature Cited
Richard Pavelle
interest (5).
There a r e o t h e r r o u t i n e s f o r c a l c u l a t i o n s i n number t h e o r y ,
c o m b i n a t o r i c s , c o n t i n u e d f r a c t i o n s , s e t t h e o r y and complex
Acoustics F l u i d Dynamics
Algebraic Geometry General R e l a t i v i t y
Antenna Theory Number Theory
C e l e s t i a l Mechanics Numerical Analysis
Computer-Aided Design P a r t i c l e Physics
Control Theory Plasma Physics
Deformation Analysis Solid-State Physics
Econometrics Structural Mechanics
Experimental Mathematics Thermodynamics
Examples of MACSYMA
Λ Λ Λ
(CI) Χ 3+B*X~2+Α 2*X~2-9*Α*Χ 2+A~2*B*X-2*A*B*X-
S
9*A~3*X+14*A~2*X-2*A' 3*B+14*A~4=0;
3 2 2 2 2 2 3
(Dl) Χ + B X +A X - 9 A X +A B X - 2 A B X - 9 A X
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch008
2 3 4
+ 14 A X - 2 A B + 1 4 A « 0
(C2) S0LVE(D1,X);
2
(D2) [X - 7 A - Β, X « - A , X » 2 A]
X
X
(Dl) X
(C2) D I F F ( D 1 , X ) ;
Time= 30 msec.
X
X X X- 1
(D2) Χ (X LOG(X) (LOG(X) + 1) + X )
s a t i s f i e s a t r i g o n o m e t r i c i d e n t i t y , namely TAN(AC0S(X)) •
SQRT(1-X"2)/X. I t t a k e s t h i s i n t o account b e f o r e d i s p l a y i n g ( D l ) .
(CI) ERF(TAN(ACOS(LOG(X))));
2
SQRT(1 - LOG ( X ) )
(Dl) ERF( )
LOG(X)
(C2) DIFF(D1,X),RATSIMP;
Time= 1585 msec.
1
1
2
LOG (X)
2 %E
( ) D2
2 2
SQRT(%PI) X LOG (X) SQRT(1 - LOG ( X ) )
Factorization
2 7 4 8 2 6 3 8 3 7 4 6
( D l ) - 36 W Χ Υ Ζ + 3 W X Y Z - 24 W Χ Υ Ζ
3 6 3 6 2 8 6 5 4 7 6 5
+ 2W Χ Υ Ζ +96W Χ Y Ζ -168W Χ Υ Ζ
2 7 6 5 2 10 5 5 2 7 5 5 7 5 5
+ 12 W Χ Υ Ζ - 216 W Χ Υ Ζ - 8 W Χ Υ Ζ + 9 Χ Υ Ζ
4 6 5 5 2 6 5 5 2 9 4 5 7 3 5
+ 14 W Χ Υ Ζ -W Χ Υ Ζ +18W Χ Υ Ζ +87 Χ Υ Ζ
2 6 3 5 7 5 3 7 3 3 3 6 3 3
- 3 W Χ Υ Ζ + 6WX Υ Ζ +58WX Υ Ζ - 2W Χ Υ Ζ
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch008
8 7 2 2 7 7 2 7 7 2 10 6 2
- 2 4 X Y Z + 4 2 W X Y Z - 3 X Y Z + 5 4 X Υ Ζ
8 5 2 2 7 5 2 7 5 2 4 6 5 2
- 232 Χ Υ Ζ + 4 1 4 W X Y Z - 29 Χ Υ Ζ - 14 W Χ Υ Ζ
2 6 5 2 10 4 2 2 9 4 2
+ W X Y Z + 522 X Y Z - 1 8 W X Y Z
(C2) FACTOR(Dl);
Time= 111998 msec.
6 3 2 3 2 2 2 2 3
(D2) - Χ Υ Ζ (3 Ζ + 2 W Z - 8 X Y +14W Y - Y +18X Y)
2 3 2 3 2 2
(12 W X Y Z - W Ζ - 3 X Y - 29 X + W )
Simplification. A v e r y i m p o r t a n t f e a t u r e of MACSYMA i s i t s a b i l i t y
t o s i m p l i f y e x p r e s s i o n s . When I s t u d i e d plane-wave m e t r i c s f o r a
new g r a v i t a t i o n t h e o r y ( 1 1 , 12)» one p a r t i c u l a r c a l c u l a t i o n produced
an e x p r e s s i o n w i t h s e v e r a l hundred thousand t e r m s . From g e o m e t r i c a l
arguments I knew the e x p r e s s i o n must s i m p l i f y and i n d e e d , u s i n g
MACSYMA, the e x p r e s s i o n c o l l a p s e d t o a s m a l l number of pages of
o u t p u t . The f o l l o w i n g e x p r e s s i o n o c c u r r e d r e p e a t e d l y i n the course
of the c a l c u l a t i o n and caused the c o l l a p s e of the l a r g e r e x p r e s s i o n
during s i m p l i f i c a t i o n .
2 2 2 2
(SQRT(R + A ) + A) (SQRT(R + Β ) + B)
( ) D l
2
R
2 2 2 2
SQRT(R + Β ) + SQRT(R + A ) + Β + A
2 2 2 2
SQRT(R + Β ) + SQRT(R + A ) - Β - A
(C2) RATSIMP(Dl);
Tim€= 138 msec.
(D2) 0
/
[ LOG(X) - 1
(Dl) I dX
] 2 2
/ LOG (X) - X
(C2) INTEGRATE(Dl,X);
Time= 744 msec.
L0G(L0G(X) + X) L0G(L0G(X) - X)
(D2)
2 2
D e f i n i t e Integration. D e f i n i t e i n t e g r a t i o n i s f a r more d i f f i c u l t to
code than i n d e f i n i t e i n t e g r a t i o n because the number of known
techniques i s much larger. One has the added complication of taking
l i m i t s at the endpoints of the i n t e g r a l . MACSYMA has impressive
c a p a b i l i t i e s for d e f i n i t e i n t e g r a t i o n . Here i s an example of a
2
2 - U X
(Dl) X %E LOG(X)
(C2) INTEGRATE(Dl,X,0,INF),FACTOR;
55
Time 138442 msec.
3/2
8 U
s
MACSYMA s y n t a x f o r the E u l e r - M a s c h e r o n i c o n s t a n t 0.577215664.· ·
2
3 Β L0G(X - X + 1)
(Dl) A SIN(X ) +
5
X
(C2) T A Y L O R ( D 1 , X , 0 , 1 5 ) ;
53
Time 365 msec.
2 3
Β Β 2B Β Β Β Χ Β Χ (Β + 8 A) Χ
(D2)/T/ + + + +
4 3 2 4 X 5 3 7 8
Χ 2 Χ 3 Χ
4 5 6 7 8 9
(2 Β) Χ ΒΧ ΒΧ ΒΧ ΒΧ (3 Β - 7 A) Χ
9 10 11 6 13 42
10 11 12 13 14 15
(2 Β) Χ ΒΧ ΒΧ ΒΧ ΒΧ (6 Β + A) Χ
15 16 17 9 19 120
O r d i n a r y D i f f e r e n t i a l E q u a t i o n s . Another p o w e r f u l f e a t u r e i s the
MACSYMA program ODE. ODE i s a c o l l e c t i o n of a l g o r i t h m s f o r s o l v i n g
o r d i n a r y d i f f e r e n t i a l e q u a t i o n s . I t was b u i l t over s e v e r a l y e a r s by
E . L . L a f f e r t y , J . P . G o l d e n , R . A . Bogen and B . K u i p e r s , and i t s
c a p a b i l i t i e s a r e d e s c r i b e d i n the MACSYMA Reference Manual (6) i n
V2-4-14.
In ( C I ) , we f i r s t d e c l a r e t h a t Y i s a f u n c t i o n of X . T h i s
a s s u r e s t h a t the d e r i v a t i v e (2nd) of Y w i t h r e s p e c t t o X w i l l not
v a n i s h when (C2) i s e v a l u a t e d .
(CI) DEPENDS(Y,X)$
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch008
(C2) (1+X"2)*DIFF(Y,X,2)-2*Y=0;
2
(D2) ( X + 1 ) Y - 2 Y = 0
X X
(C3) 0 D E ( D 2 , Y , X ) ;
Time- 2068 msec.
2 ATAN(X) X 2
(D3) Y = %K2 (X + 1) ( + ) + %K1 (X + 1)
2 2
2 X +2
(C4) 0 D E ( D 2 , Y , X , S E R I E S ) ;
2
T i m e 8766 msec.
INF
==== 1 2 1
2 \ ( - 1) X
(D4) Y = %K1 (X + 1) - %K2 X >
/ 1 1
= (I - -) (I + -)
0 2 2
(C5) D2,D3,DIFF,RATSIMP;
Time= 2051 msec.
(D5) 0=0
Literature Cited
Allan L. Smith
Most of the applications of artificial intelligence in chemistry so far have not involved
numerical computation as a primary goal. Yet there are aspects of the AI approach to
problem-solving which have relevance to computation. In scientific computation, one
could view the knowledge base as the set of equations, input variable values, and unit
conversions relevant to the problem, and the inference engine the numerical method
used to solve the equations. This paper describes such a software system,
TK!Solver.
Since the beginning of electronic computing, one of the major incentives for
developing computer languages has been to improve the ease of solving mathematical
problems arising in science and engineering. Many such problems can be reduced to
the solution of a set of Ν algebraic equations - not necessarily linear - in Ν
unknowns. The earliest ways of doing this involved direct hand coding in
hexadecimal machine language or in assembly language mnemonics, specifying in
excruciating detail the procedures needed to transform input data into results. My
first experience with computers (I) was on a Bendix laboratory computer, generating
three-component polymer-copolymer phase diagrams in assembly language. After a
summer of this I became quickly convinced that there must be a better way.
In the early 1960's the first compiled procedural programming language for
scientific computation, FORTRAN, became widely used in the US, with a parallel
development of the use of ALGOL in Europe. Later in the decade, the interpretive
procedural language BASIC emerged, followed by the powerful algebraic notational
language APL. The first structured, procedural language developed to teach the
concepts of programming, Pascal, appeared in 1971, followed later in the decade by
the C language.
In all of these procedural languages (also called imperative languages (2), one of
the basic elements of syntax is the assignment statement, in which an algebraic
TK!SQlygr
TKîSolver (S) is a high-level computer language for solving sets of algebraic
equations and tabulating or plotting their results. In TKîSolver, equations are viewed
as relationships or rules, not as assignment statements, and in that sense it may be
viewed as a declarative language. The basic computational approach taken by
TKîSolver grew out of the research of textile engineer Milos Konopasek in the
1970's. It was realized early on by Konopasek and Papaconstadopoulos (2) that a
high level computational langauge need not be procedural but could be declarative;
this point has been recently amplified by Konopasek and Jayaraman (1Q), who also
make the case for TKîSolver's being an expert system for equation solving.
To produce TKîSolver, the problem-solving methodology implemented by
Konopasek in his Question Answering System (2) was combined with the experience
in designing full-screen user interfaces of Software Arts, Inc. (the originators of the
electronic spreadsheet). The goal of the language was to obviate three of the
time-consuming stages of procedural program development (11): (1) algebraic
transformations necessary for formulating assignment statements; (2) sequencing
assignment statements to secure desiredflowof information through the program;
and (3) setting up input and output statements. The capabilities of TKîSolver, which
runs on a number of different personal computers, are as follows (10,11):
(1) It parses entered algebraic equations and generates a list of variables.
(2) It solves sets of equations using a consecutive substitution procedure (the
direct solver).
(3) It solves sets of simultaneous (non-linear) algebraic equations by a modified
Newton-Raphson iterative procedure when consecutive substitution fails
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch009
Take, for example (12), the problem of solving for the P-V-T properties of a real gas
obeying the van der Waals equation of state,
P =nRT/(V-nb)-n a/V 2 2
(1)
where a and b are coefficients characteristic of a given gas. Solving for P, given n,
V, and Τ is a simple assignment statement, but solving for η given Ρ, V, and Τ
requires considerable algebraic manipulation, followed either by applying the formula
for the roots of a cubic equation or by using a numerical technique for determining
roots (the latter usually requires more mathematical analysis - for example, finding
first derivatives using the Newton-Raphson method).
Figure 1 shows the Rule Sheet for a TKîSolver model REALGAS.TK (12).
Thefirstrule is the van der Waals equation of state. The second defines the gas
constant, and the third rule defines the number density. The fourth defines the
compressibility factor z, a dimensionless variable which measures the amount of
departure of a real gas from ideality. The next three rules give the critical pressure,
molar volume, and temperature of a van der Waals in terms of the coefficients a and
b. The Van der Waals equation can be recast in a form which uses only reduced,
dimensionless variables; these are defined in the next three rules. The last two rules
provides values for the van der Waals coefficients a and b when the name of the gas
is given (user-defined functions with symbolic domain elements and numerical range
elements can be used in any model which requires reference to built-in data tables).
S Rule
"Equation of State of a van der Waals Gas. Chap. 4. Model name: REALGAS.TK
* R = 0.0820568 "Value of gas constant
*nd = n/V "Number density
*z = P* V / ( n * R * T ) "Compressibility factor
A
*Pc = a/(27*b 2) "Critical Pressure
* Vc = 3 * b "Critical Molar Volume
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch009
A typical use for this model would be to solve for the number of moles of a gas,
given its identity, pressure, volume, and temperature. The iterative solver is used for
this purpose. You must decide which variable to choose for iteration and what a
reasonable initial guess is. Real gases approach ideal behavior at low pressure and
moderate temperatures. Since the compressibility factor ζ is 1 for an ideal gas, and
since knowing ζ along with Ρ, V, and Τ allows a calculation of n, we choose ζ as
the iteration variable and 1.0 as the initial guess.
The Variable Sheet with the solution to such a problem is shown in Figure 2.
Unit conversions from psi to atmospheres, from cubic feet to liters, and from
Fahrenheit to Kelvins have been built into the model via the Units Sheet. For input
values of 100 cubic feet of acetylene at 300 psi and 66°F, there are 728.9 moles of
acetylene and the value of ζ of 0.874 indicates that the deviationfromideality is
12.6%.
a 4.39 A
atm*l 2/m van der Waals a coefficient
b .05136 1/mol van der Waals b coefficient
Pc 61.638310 atm critical pressure
Vc .15408 1 critical molar volume
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch009
There are five laws of chemical equilibrium relevant to the Charlson-Vong model:
(1) the ideal gas law, relating gas species density to its temperature and partial
pressure; (2) Henry's law, relating the partial pressure to the concentration of
dissolved gas; (3) the law of mass action, giving equilibrium constant expressions for
the hydrolysis reactions of the dissolved gases; (4) conservation of mass for species
containing sulfur(IV), sulfur (VI), carbon(IV), nitrogen(V), and nitrogen(-IH); and
(5) conservation of charge. Applying these laws, Vong and Charlson were able to
calculate the pH of a raindrop by solving a set of 17 equations in 29 variables (cloud
water content, temperature, partial pressures, and species concentrations) and 9
parameters (Henry's law constants, equilibrium constants, and the gas constant).
They wrote a FORTRAN program which solved all equations but one, that of charge
conservation. The pH at electrical neutrality was determined by a graphical method, in
which the total positive and negative charge concentrations were calculated and
plotted for a series of assumed pH's and the crossing point found.
A TKîSolver model called RAINDROP.TK has beendeveloped to incorporate
the full Charlson-Vong model of cloud water equilibrium (12), including the
temperature dependence of all equilibrium constants. The iterative solver makes it
possible to compute the pH at charge neutrality without having to make plots of
intermediate results. The Rule Sheet is shown in Figure 3.
The Unit Sheet contains a number of conversions necessary to accommodate the
variety of units used in experimental atmospheric chemistry. The Variable Sheet is
arranged so that the variables at the top are the ones normally chosen as input
variables. Since the usual goal of running the model is to determine the pH of the
raindrop, the variable p H is chosen as the one on which to iterate.
The following problem, taken to match the conditions in Figure 2 of reference
13, is typical of those solved in less than one minute on an IBM PC with this model:
"a cloud at 278 Κ contains 0.5 grams of liquid water per cubic meter of air. The
atmosphere of the cloud contains 5 ppb sulfur dioxide, 340 ppm carbon dioxide, 0.29
3r 3
μg/m of nitrogen base, 3 μg/m of sulfate aerosol, and no nitrate aerosol. What is
the pH of the cloud water?' Figure 4 shows the Variable Sheet after solution.
Rule
K1S = CHS03m * CHp / CS02 "Mass action law for S02 - HS03m
K2S = CS032m * CHp / CHS03m "Mass action law for HS03m - S032m
KB = CNH4p * COHm / CNH3 "Mass action law for NH3 - NH4p
K1C = CHC03m * CHp / CC02 "Mass action law for C02 - HC03m
K2C = CC032m * CHp / CHC03m "Mass action law for HC03m - C032m
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch009
C^4p+CHp=ŒS03m+2*CS032m+COHm+CHC03m+2*CC032m+CN03m+2*CS042m
"Chargebalance
pH = -log(CHp) "Definition of pH
In addition to the two examples above, I have developed TKîSolver models for the
ideal gas, for two-component mixture concentrations, for acid base chemistry
(including the generation of titration curves), for transition metal complex equilibria,
for general gaseous and solution equilibria, and for linear regression (12).
Drexel undergraduate students in both the lecture and the laboratory of physical
chemistry have been using TKîSolver for such calculations as least squares fitting of
experimental data, van der Waals gas calculations, and quantum mechanical
computations (plotting particle-in-a-box wavefunctions, atomic orbital electron
densities, etc.). I use TKîSolver in lectures (on a Macintosh with video output to a
25" monitor) to solve simple equations and plot functions of chemical interest.
TKîSolver has also had heavy use in the material balance course in chemical
engineering, and in a mathematical methods course in materials engineering. Graduate
students in chemistry are using it in research projects in spectroscopy and kinetics.
In the teaching of quantum mechanics, TKîSolver has proved especially useful.
For example, Berry, Rice, and Ross Q4) give several problems on the regions of
278 Τ Κ temperature
.5 L g/m 3 A
liquid water content of the cloud
bonding and anti-bonding in diatomics, one of which requires the calculation and
plotting of contours of constant bonding force. They suggest calculation of the
bonding force on a large grid of points and then connecting points of constant force,
but with TKîSolver it is possible to solve directly the set of parametric equations in r
and theta and to plot the resulting contours.
In summary, the rule-based, declarative approach to solving sets of algebraic
equations presented by TKîSolver has proved to be a fruitful medium for chemical
computations.
Literature Cited
Computer s i m u l a t i o n o f c h e m i c a l r e a c t i o n o r r e a c t i o n - t r a n s p o r t
systems has l o n g been used i n chemical engineering process
d e s i g n , and h a s more r e c e n t l y moved i n t o t h e c h e m i c a l r e s e a r c h
1
Current address: Department of Chemistry, Florida State University, Tallahassee,
F L 32306
0097-6156/86/0306-0119$06.00/0
© 1986 American Chemical Society
p r e c e d i n g t h e l a t t e r by a + o r - s i g n . C h a r g e i n d i c a t i o n c a n be
by appropriate repetition of the sign, o r by a single sign
f o l l o w e d by n u m e r i c a l i n d i c a t i o n . Parenthesized expressions are
accommodated and expanded i n the usual way, and nesting to
several l e v e l s i s allowed.
Where i t i s f e l t that c l a r i t y ( f o r the chemist) i s b e t t e r
s e r v e d by u s i n g compound names r a t h e r t h a n f o r m u l a s , t e x t i n p u t
i s a c c e p t e d by s u r r o u n d i n g i t w i t h q u o t a t i o n m a r k s . This text i s
n o t s u b j e c t t o l e x i c a l a n a l y s i s ; s u b s i d i a r y t a s k s s u c h as s y n t a x
c h e c k i n g c a n n o t be p e r f o r m e d i n t h i s c a s e . Quoted t e x t can a l s o
be a t t a c h e d t o a compound e x p r e s s e d by f o r m u l a ; t h e f o r m u l a i s
i n t e r p r e t e d and t h e t e x t p a s s e d t h r o u g h u n c h a n g e d .
Syntax A n a l y s i s
As each reaction equation is entered, several checks are
performed to c a t c h e r r o r s i n f o r m u l a t i o n or t y p i n g : the c o r r e c t
number of tabs, equals, balanced quotes or parentheses, and
c o n f o r m a n c e t o t h e s y n t a x r u l e s w h i c h a l l o w s t h e e q u a t i o n t o be
s e p a r a t e d i n t o r e a c t a n t s and p r o d u c t s , and t h e s e i n t u r n t o be
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch010
Program Output
The principal task of the interpreter is to provide two
s u b r o u t i n e s f o r use by t h e s i m u l a t i o n program, w h i c h i s a s o l v e r
for o r d i n a r y or p a r t i a l d i f f e r e n t i a l e q u a t i o n s . Since chemical
s y s t e m s a r e f o r t h e most p a r t " s t i f f " as a r e s u l t o f n e g a t i v e
feedback(72) the i n t e r p r e t e r expects the s i m u l a t i o n package t o
Literature Cited
0097-6156/86/0306-0125$06.00/0
© 1986 American Chemical Society
however, c o n s i d e r a c a t e g o r y o f s o f t w a r e t h a t c o u l d be l a b e l e d
C o m p u t e r - A s s i s t e d L e a r n i n g ( C A L ) . I n such s o f t w a r e , the s t u d e n t
makes d e c i s i o n s about what he o r she w i l l i n v e s t i g a t e w h i l e u s i n g the
software. Simulations f a l l into t h i s category. P r o f e s s o r John
G e l d e r ' s i d e a l gas law program (2^ i s a c l a s s i c example o f u s i n g
simulations i n chemical education. I n u s i n g t h a t program the s t u d e n t
has c o n t r o l o v e r the p a r a m e t e r s , and by e x p l o r i n g the model c o u l d
p o t e n t i a l l y l e a r n a s p e c t s o f the b e h a v i o r o f i d e a l gases unknown t o
the a u t h o r o f the program. Other s i m u l a t i o n s may a l s o f a l l i n t o the
c a t e g o r y o f c o m p u t e r - a s s i s t e d l e a r n i n g . A p a r t from s i m u l a t i o n s ,
examples o f s o f t w a r e w i t h w h i c h the s t u d e n t i s i n c o n t r o l and a r e
d i f f i c u l t to f i n d .
T h i s paper d e s c r i b e s an example o f a d i f f e r e n t s t y l e o f program
which i s under the c o n t r o l o f the s t u d e n t . The p r o j e c t began i n t h e
f a l l o f 1983 when D i c k C o r n e l i u s spent p a r t o f a s a b b a t i c a l a t the
U n i v e r s i t é de N i c e w o r k i n g w i t h D a n i e l C a b r o l and Claude C a c h e t . The
f i r s t t a s k t h e r e was t o w r i t e a c h a p t e r on microcomputers i n c h e m i c a l
e d u c a t i o n f o r a book on computers i n c h e m i s t r y . D u r i n g the c o u r s e o f
w r i t i n g t h i s c h a p t e r we d e s c r i b e d programs a v a i l a b l e i n the d i f f e r e n t
software s t y l e s : page t u r n e r s , d r i l l and p r a c t i c e , t u t o r i a l d i a l o g s ,
s i m u l a t i o n , p r e - l a b o r a t o r y a c t i v i t i e s , and p r o b l e m - s o l v i n g . I n t h e
a r e a o f p r o b l e m - s o l v i n g , however, t h e r e was l i t t l e t h a t we c o u l d
discuss. Some s o f t w a r e c o u l d be used f o r p r o b l e m - s o l v i n g , but t h e r e
were no examples o f programs w r i t t e n f o r the p r i m a r y purpose o f
helping students learn general problem-solving techniques. I t was t o
t h i s a r e a , t h e n , t h a t we t u r n e d our programming a t t e n t i o n . The
r e s u l t was a program t h a t we c a l l e d GEORGE ( 3 ) t h a t r u n s on the
A p p l e I I s e r i e s o f computers. GEORGE d i f f e r s v e r y much from most
programs a v a i l a b l e f o r c h e m i c a l e d u c a t i o n : GEORGE a s k s no q u e s t i o n s
o f s t u d e n t s . I n s t e a d , s t u d e n t s t a k e problems t o GEORGE. GEORGE
s o l v e s the problems t h a t s t u d e n t s p r o v i d e a n d , most i m p o r t a n t l y ,
e x p l a i n s the s o l u t i o n s u s i n g both t e x t and d i a g r a m s . If insufficient
o r c o n t r a d i c t o r y i n f o r m a t i o n i s a v a i l a b l e , GEORGE can p r o v i d e
d i a g n o s t i c comments t o h e l p the s t u d e n t .
The domain i n w h i c h GEORGE o p e r a t e s i s a s m a l l but i m p o r t a n t one
f o r i n t r o d u c t o r y c h e m i s t r y . He works w i t h problems i n v o l v i n g the
fundamental q u a n t i t i e s mass, volume, and number o f m o l e s . He can
a l s o work w i t h d e r i v e d q u a n t i t i e s such as d e n s i t y , molar mass, molar
The Logic
s i m p l y c o n t a i n s a s e t o f h e u r i s t i c r u l e s which he f o l l o w s t o s e a r c h
f o r a s o l u t i o n . One r e s u l t o f u s i n g t h e s e h e u r i s t i c r u l e s i s t h a t he
can s o l v e problems never worked by the a u t h o r s o f the program.
Another r e s u l t i s t h a t GEORGE may be a b l e t o make p r o g r e s s toward a
s o l u t i o n even i f i n c o m p l e t e i n f o r m a t i o n i s a v a i l a b l e . I n such an
i n s t a n c e , GEORGE may be a b l e t o respond w i t h a statement such as " I f
you c o u l d g i v e me the d e n s i t y o f a l c o h o l , then I c o u l d s o l v e the
p r o b l e m . " The r u l e s a r e v e r y s i m p l e i n c o n c e p t . F i r s t GEORGE
examines the v a r i o u s p i e c e s o f d a t a a v a i l a b l e . He examines a l l
p o s s i b l e p a i r s o f d a t a t o see whether any p a i r can be m u l t i p l i e d o r
d i v i d e d t o g i v e i m m e d i a t e l y the s o l u t i o n . I f he cannot f i n d a
s o l u t i o n i n t h a t way, he checks t o see whether he can a p p l y a
r e l a t i o n t o g e n e r a t e a new p i e c e o f d a t a . I f GEORGE cannot a p p l y a
r e l a t i o n , he s e a r c h e s f o r i n t e r m e d i a t e r e s u l t s t h a t might r e p r e s e n t a
s t e p toward the s o l u t i o n . GEORGE can s e a r c h f o r two t y p e s o f
intermediates. The p r e f e r r e d type i s the r e s u l t o f u n i t s c a n c e l l i n g
t o y i e l d a fundamental q u a n t i t y . Thus d i v i d i n g the mass o f a
substance by i t s molar mass i s a p r e f e r r e d method t o form an
intermediate r e s u l t . L e s s d e s i r a b l e i s the f o r m a t i o n o f an
i n t e r m e d i a t e r e s u l t which i s not a fundamental q u a n t i t y but which
r e p r e s e n t s i n f o r m a t i o n e x p r e s s e d i n a manner not r e p r e s e n t e d by o t h e r
data or i n t e r m e d i a t e s . Each time GEORGE c a l c u l a t e s a new q u a n t i t y ,
he b e g i n s a g a i n t o l o o k f o r an immediate s o l u t i o n . These a r e a l l the
r u l e s t h a t GEORGE needs t o f i n d s o l u t i o n s t o m i l l i o n s o f d i f f e r e n t
problem s t a t e m e n t s . The r e s u l t i s u s u a l l y a s o l u t i o n approached i n
the same manner t h a t a t e a c h e r might use f o r an e x p l a n a t i o n .
The Program
I n s t r u c t i o n s (page 1)
George understands 11 d i f f e r e n t
quantities. Each o f these q u a n t i t i c
has a synbol and a nane:
Synbol Nane
η Mass
η no. o f noies
Ρ no. o f p a r t i c l e
ν volune
d density
c nolarity
ne nass cone,
H nolar Mass
«r Mass r a t i o
nr MOIar ratio
vr volune ratio
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch011
A v a i l a b l e Options
C a l c u l a t e Molar Mass
Netnork
.0034 g κ = 3.40 ng
1 g
A. Volume o f C H N H6 5 2 3.00 mL
B. Volume o f s o l u t i o n 0.100 L
D. M o l a r Mass o f C H N H 6 5 2
NetHork
Here i s a diagran of hon I used the
v a r i o u s p i e c e s of information to reach a
solution.
For d e t a i l s t y p e t h e r e l e v a n t letter or
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch011
NetHork
Here i s a diagran of hou I used the
v a r i o u s p i e c e s of information to reach a
solution.
Β .···'"'
For d e t a i l s t y p e t h e r e l e v a n t letter or
nunber. ESC d i s p l a y s Menu.
F i g u r e 6. D i a g r a m s h o w i n g how p i e c e s o f d a t a a r e u s e d t o g e t h e r t o
f i n d the s o l u t i o n to the problem i n v o l v i n g the m o l a r i t y of
aniline. ( R e p r o d u c e d w i t h p e r m i s s i o n f r o m R e f . 3. C o p y r i g h t 1985
COMPress.)
Relation
Coef. Quantity
no. of M O l e s
of HC1
= no. of Moles
of NaOH
F i g u r e 7. A s a m p l e o f how a r e l a t i o n c a n be d e f i n e d . (Reproduced
w i t h p e r m i s s i o n f r o m R e f . 3. C o p y r i g h t 1985 C O M P r e s s . )
E x t e n d i n g t h e Domain
Conclusion
Literature Cited
Background
0097-6156/86/0306-0136$06.00/0
© 1986 American Chemical Society
f o r c e m i n i m i z a t i o n t o the s t e r e o c h e m i s t r y s p e c i f i e d . 5) Use a n a l o g y
t o s e l e c t parameters f o r f o r c e c o n s t a n t s t h a t are not a v a i l a b l e .
PRXBLD never b a l k e d for l a c k o f a p a r a m e t e r , t h u s always gave an
answer. U s i n g PRXBLD, the c h e m i s t c o u l d , f o r the f i r s t t i m e , o b t a i n
a t h r e e - d i m e n s i o n a l model by s i m p l y drawing the t w o - d i m e n s i o n a l
s t r u c t u r a l d i a g r a m . A l t h o u g h i t was the f a s t e s t model b u i l d e r o f i t s
time, certain types of structures still required considerable
c o m p u t a t i o n because PRXBLD used n u m e r i c a l m i n i m i z a t i o n .
More r e c e n t l y , the SCRIPT program by Cohen(jB) a l s o t a k e s a
drawing as i n p u t and uses a l i m i t e d l i b r a r y o f r i n g c o n f o r m a t i o n s t o
g e n e r a t e approximate geometry f o r m i n i m i z a t i o n . D o l a t a , u s i n g PROLOG
and p r e d i c a t e c a l c u l u s methods (analogous to those used i n our QED(£)
work) developed an e x p e r t system c a l l e d WIZARD (10) t o s e l e c t a
r e a s o n a b l e s e t o f i n t e r n a l c o o r d i n a t e s f o r an a c y c l i c m o l e c u l e . From
t h e s e i n t e r n a l c o o r d i n a t e s C a r t e s i a n c o o r d i n a t e s are d e r i v e d which
are t h e n g i v e n t o MM2 f o r r e f i n e m e n t . WIZARD has not yet handled
c y c l i c systems.
There i s a need f o r q u i c k 3-D model g e n e r a t i o n . Models are
r e q u i r e d where knowledge o f m o l e c u l a r shape i s e s s e n t i a l t o the
understanding of structure-activity and structure-reactivity
relationships. Most c e r t a i n l y t h e r e w i l l be programs i n the f u t u r e
t h a t h y p o t h e s i z e s t r u c t u r e s ; t h e s e programs w i l l need r a p i d model
generation i n order to evaluate 3-D c o n s t r a i n t s . For these
a p p l i c a t i o n s , t h e models must be c r e a t e d a u t o m a t i c a l l y , without
interactive intervention. We a l s o e n v i s i o n t h e v a s t l i b r a r i e s o f
m o l e c u l a r s t r u c t u r e s s t o r e d i n c h e m i c a l d a t a b a s e s w i l l need t o be
c o n v e r t e d t o 3-D geometry l i b r a r i e s i n order t o use t h e s e d a t a b a s e s
i n d e s i g n i n g new 3-D s t r u c t u r e s .
Goals of AIMB
F i g u r e 1. G o a l s o f AIMB.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch012
Components of AIMB
1 C r e a t e l i b r a r y o f models
2 Enter s t r u c t u r a l diagram o f t a r g e t
3 Perceive target
4 T a r g e t or analogs i n l i b r a r y ?
5 No, D i v i d e i n t o subproblems, s o l v e each
6 Assemble s o l v e d subproblem p a r t s
7 Compute degree o f f i t o f s u b p a r t s
8 Prepare s u p p o r t i n g d a t a
9 D i s p l a y completed model
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch012
Figure 2. AIMB a l g o r i t h m .
AIMB Procedure
I Κ
WHERE A e TARGET, A ' € ANALOG, J = MAP(l)
AIMB 40
Human b e i n g 118
PRXBLD 644
MM2 4436
Conclusion
Acknowledgments
Literature Cited
Many problems in chemistry may benefit from developments in the field of Artificial Intelligence
(AI), particularly the area now known as knowledge engineering. Knowledge can be described as
that which includes both empirical material and that "which is derived by inference or interpreta-
tion". (1) It may consist of descriptions, relationships, and procedures in some domain of interest
(2) We are now incorporating methods from knowledge engineering research in computer assisted
drug design.
Molecular modeling with interactive color computer graphics in real time is a powerful
method for studying molecular structures and their interactions. Display and manipulation of
computer generated skeletal and surface models provide efficient methods for the chemist to
examine steric interactions of many ligands with the binding sites in their receptors. We have
combined x-ray crystallographic results, quantitative structure-activity relationships (QSAR), and
interactive three-dimensional graphics in earlier attempts to design better ligands for enzyme bind-
ing. (3,4) We are applying knowledge engineering techniques provided by the software KEE
(Knowledge Engineering Environment (5) ) to the development of rational drug design methods
without having x-ray crystallographic results in hand.
Our integrated system, KARMA, KEE Assisted Receptor Mapping Analysis, uses knowledge
sources, including QSAR and conformational analysis, in a rule-based system to create an anno-
tated visualization of the receptor site. This is then used in an iterative manner to guide the inves-
tigator in generating rules, hypotheses, and new candidate structures for drug design. This
approach to receptor mapping and drug design differs from the traditional approach used by
chemists in two significant ways. Classically, in computerized drug design, one superimposes a
set of structurally related molecules (congeners) so that their bioactive functional groups coincide,
yielding a pharmacophore. A surface is then derived based on the composite molecule supposedly
yielding a complementary shape of the receptor. (6) This approach has met with limited success
because compounds that act as substrates or inhibitors of certain receptors do not necessarily bind
similarly. It is our belief that the commonality of the binding mode must be established. The
other shortcoming of the traditional approach is that it provides little information on the qualitative
character of the enzyme surface. The classical lock and key concept of ligand-receptor
0097-6156/86/0306-O147$06.00/0
© 1986 American Chemical Society
emphasizes structural geometry and may neglect the importance of interactions such as hydropho-
bicity. Processing of the binding data using QSAR prior to receptor mapping analysis yields
information not only about the hydrophobic and polar nature of the surface model, but also about
the steric and electronic properties of the data. (7)
System Design
KARMA is a set of programs residing on several machines connected by a high bandwidth network
(see Figures 1 and 2). The main program resides on the Lisp machine and controls all processing.
The controlling program on the Lisp machine is implemented on top of KEE which embodies many
knowledge engineering techniques. KEE provides a set of software tools that allows for very rapid
software prototyping, evaluation, debugging, and modification. Specifically, KARMA takes advan
tage of KEE*s capabilities that includeframebased knowledge representation with inheritance, a
rule-based inference system, a graphical interface for debugging and displaying knowledge bases,
and aflexibleinterface that allows for the integration of outside methods. (5)
Input to the controlling program consists of congener sets and their related QSAR equa
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
tions. A satellite program, based on the Pomona MedChem Software SMILES (Simplified Molec
ular Input Line Editor System) is used for input of the structures. (8) SMILES creats a unique
identifying code for each chemical structure which is useful for searching for structures and phy-
siochemical parameters, and minimizing duplication of structural information. These structures
are passed to satellite programs, including distance geometry (9) and energy minimization (10), to
generate multiple conformations that are then displayed so that users may select those of interest
These structures, which constitute the basis set, are used to define the receptor model.
The receptor model is represented graphically by a set of surfaces. These surfaces are
defined by a set of control points which are calculated on the compute server. Control points,
which are based on minimized structures, are then manipulated by KARMA's rules system. These
rules provide detail to the receptor surface model. During this process, KEE provides a graphical
interface showing which rules and derivations are being accepted as true. The user can also
interact with KARMA's rule system during this time. The surface model is displayed using the con
trol points to form bicubic patches on the graphics workstation. The user can then manipulate the
surface as well as modify the structure. These modifications are sent back to the controlling pro
gram for refinement by the rules. This iterative process continues until the user is satisfied with
KARMA's results.
As seen in Figure 1, our hardware is connected by an Ethernet (11) The control server is a
Symbolics 3600 Lisp Machine and the compute server is a DEC VAX 8600. The three dimen
sional graphics workstations include the Silicon Graphics IRIS 2400 and the Evans and Sutherland
PS350. Electronic communication with collaborating scientists at other institutions is available
through the VAX 750 via several networks including the ARPAnet and CSnet
System Implementation
Input to the controlling program is achieved through a series of "pop-up" menus in the Karma
Window (see Figure 3a). For example, if the user is interested in entering a set of congeners, the
user would select the molecule editor, KARMA will then display the molecule editor layout in the
current window. Users can then enter the chemical structures selecting structurefromthe
molecule editor menu (see Figure 3b). Structures are currently entered using the tree structure of
SMILES (see Figure 3c). (The molecule editor will be expanded to allow for graphical input in
the future.) KARMA then displays the two-dimensional structure for user verification (see Figure
3d). Coordinates for the three-dimensional structures are saved in a knowledge base in KEE. The
three-dimensional structures are based on x-ray crystallographic data, standard bond angles, and
bond lengths. All congener data, including physiochemical parameters such as log Ρ or MR (cal
culated or experimental), can easily be entered and revised in the molecule editor (see Figure 4).
Three-dimensional coordinates for the congener set are passed to the distance geometry and
minimization programs. These satellite programs provide efficient methods for searching confor
mational space. Distance geometry programs includes subroutines for controllingringplanarity of
aromatic rings and orientation of the molecules based on a common group of atoms. (12)
Ethernet
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
Outside
Control Server
Communication Server World
Symbolics 3600
[ (e.g. ARPAnet,
DEC VAX 750
Lisp Machine . CSnet,etc.)y
s
Display Model Κ Rules for Surface Generation
Characterization
1. Control Points
of the
User Modification Ν Surface 2. Bicubic Patches
Editors Edit
Molecule Structure
Equation Parameters
Rule Name
Graphics Parent
(a) (b)
Structure
c 1 ccccc 1 Cc2c(N )nc(N)nc2
NH2
SMILES: clccccclCc2c(N)nc(N)nc2
Name:
(c) (d)
Molecule Editor
Calculated Parameters
Parent:
clog Ρ = 2.025
NH2
cMR = 5.979
Substituent:
Revise
Save
N
NH2 Abort
Experimental Parameters
Parent:
log Ρ = 1.58
Substituent:
π = 0.000
SMILES: clccccclCc2c(N)nc(N)nc2
Name:
The output of the distance geometry and minimization programs is passed to the graphics
program EDGE (Easy Distance Geometry Editor). The structures are displayed three-
dimensionally so users may select structures to represent conformational space. Models are easily
selected pointing at the desired structure (see Figure 5). X, Y, and Ζ rotations and translations,
depth cueing, color, and labeling have been incorporated in EDGE, EDGE also provides a RMS
matching routine for Ν arbitrary atoms designated by the user. The selected models are then used
for surface generation.
Surface generation is based on a set of points derived from the outcome of distance
geometry programs applied to the basis set of structures. The basis set of points, P, is defined as:
where Pi is the uniformly distributed set of points over a sphere corresponding to atom i, and,
2
g (PiPj) is the overlap of the two sets of points. The density of points/angstrom can be arbi
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
trarily set by the user. If the density is relatively high, a large number of bicubic patches with
small area are generated; to address each bicubic patch at a high density would be time-consuming
and difficult at best If the density of points is low, the patches become too large and don't yield
enough detailed information about the surface model.
The control points are defined by the basis set of points P. These control points define the
parametric bicubic patches which form the surface model. Advantages of the parametric bicubic
surface include continuity of position, slope, and curvature at the points where two patches meet
All the points on a bicubic surface are defined by cubic equations of two parameters s and /, where
s and t varyfrom0 to 1. The equation for x(s,t) is:
3 3 2 2 3
x(s ,f ) = a u s * + a \2S t 4- a \-$sh + α χ ^
2 7> 2 2 2 2
+ ct2\s t + a22S t + <i2zs t +α24β
3 2
+ a ^\st + a yist + a y$st + a 345
Equations for y and ζ are similar. (13) Either cardinal spline or B-spline bicubic patches can be
used as they differ only by the starting coefficients. (14) Overlapping sets of control points allow
for the joining of patches. Sixteen points define a bicubic patch. To determine which points
define which patches, an initial triangle is formedfromthree nearest neighbors. The next triangle
shares one side of the initial triangle and is connected to its next nearest neighbor. This process is
continued iteratively until all points are accounted for. The internal edge of two triangles is then
dropped to form a quadrilateral. Each internal edge is used only once. Nine quadrilaterals define a
single patch. These patches are combined to form the surface model and are manipulated by both
KARMA's rule system and the investigator at the graphics station.
System Core
The information contained in KARMA's knowledge bases is based upon quantitative structure-
activity relationships (QSAR), kinetic data, and structural chemistry. The combination of QSAR
and kinetic data allows for the study of enzyme-ligand interactions. The Hansen approach to
QSAR, based on a set of congeners, states:
Biological Activity = f(physiochemical parameters)
Physiochemical parameters are used to model the effects of structural changes on the electronic,
hydrophobic, and steric effects for organic molecules. (15) Examples of physiochemical parame
ters include, among others:
σ, an electronic constant based on the Hammett equation for the ionization of substituted
benzoic acids;
π, the hydrophobic parameter for a chemical substituent based on the octanol-water par
tition coefficient log P;
MR, the molar refractivity, which parameterizes polarizability and steric effects; and
Verloop's parameters, which are steric substituent values calculated from bond angles
and distances.
Using multivariable linear regression, a set of equations can be derived from the parameterized
data. Statistical analysis yields the "best" equations to fit the empirical data. This mathematical
model forms a basis to correlate the biological activity to the chemical structures.
K A R M A describes the interactions for enzyme-ligand binding using QSAR equations and
parameters, and the structural information of the congener data. These interactions, with illustra
tive examples, are shown below:
Interaction Example
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
CAC
• E.coli
Proteins
DHFR «- L.casei
Ν
ADH Χ
>> Chicken
KEE provides many different mechanisms for inheritance. KEE has the ability to constrain the type
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
and number of values assigned to attributes for consistency and description in the knowledge base.
©
Currently, KARMA's rules are formulated in an if-then format. A rule may have multiple
conditions, conclusions, and actions, KARMA takes advantage of both the forward and backward
chainers for derivation of the three-dimensional receptor model. For example, two types of rules,
generic and specific, can be defined empirically from the results of QSAR as well as from molecu
lar structure.
Generic rules are based on the QSAR equations and their coefficients. Forward chaining
using these rules yields basic characteristics for the receptor site model. For instance, an
abstracted generic rule may take the form:
If the coefficient of the hydrophobic parameter is approximately equal to one, then ex
pect complete desolvation about substituent X of the ligand.
This rule was derived empirically from some recent work on several species of alcohol dehydro
genase (ADH). (16) The following equations were found:
Compounds Enzyme Equations
H o r s e
^NH 2 log 1/K. = 0.98 log Ρ - 0.83 σ + 3.69
2
Τ η =14, r = 0.937, s = 0.280
The average of the coefficients of the hydrophobic term is approximately equal to one (average =
0.97) suggesting complete desolvation about substituent X. Figure 6 shows complete desolvation
by the enzyme ADH (hydrophobic space - red; polar space - blue) around substituent X of the
pyrazole (green).
Another example of a rule dealing with hydrophobicity may take the form:
If the coefficient of the hydrophobic parameter is greater than 0.5 and less than 1.0, then
expect a concave surface about substituent X of the ligand.
This type of rule is empirically based on the enzyme-ligand binding such as that of carbonic anhy-
drase c (CAC) and sulfonamides. (4) The following equation was found:
Compound Equation
Figure 7 shows how the solvent accessible surface of the enzyme CAC (hydrophobic space - red;
polar space - blue) is slighdy concave about the substituent X of the sulfonamide (green). Similar
rules exist for the coefficients which describe other aspects of hydrophobicity, as well as polar
space, which help define the basic shape, i.e., cleft or hole, of the surface receptor model.
Specific rules are based on the attributes of congeners, including the physiochemical param
eters used to determine the QSAR equation, the biological activity, and the molecular structure.
Backward chaining, using these rules with specific instances of substituents, yields detailed shape
and character for the receptor model. For instance, an abstracted specific rule may take the form:
If the biological activity of compound y is less in enzyme A than that of related enzyme
B, expect possible steric hindrance about substituent X.
One possible interpretation of this type of rule is the enzyme ligand binding of trimethoprim with
bacterial DHFR and chicken liver DHFR. (17,18)
DHFR Species Binding Affinity (log 1/K.)
L. casei 8.87
E. coli 6.88
chicken 3.98
This data shows a noticeable drop in binding affinity for trimethoprim and chicken liver DHFR.
Figure 8 illustrates steric interaction between the 5-OMe of trimethoprim (green) with the
sidechain of Tyr 31 of native chicken liver DHFR (red). There is no steric interaction seen
between the 5-OMe of trimethoprim (green) and the sidechain of Phe 30 of L. casei DHFR (red).
(Right view: chicken liver DHFR; Left View: L. casei DHFR) It is knownfromx-ray crystallo
graphic results that the sidechain of Tyr 31 of chicken liver DHFR rotates to accommodate
trimethoprim. (18)
A specific rule can also be based upon comparisons of bond lengths and van der Waals
radii, and biological activities. For instance,
If the biological activity of substituent X is less than the biological activity of substi
2
tuent X and, X is atomically larger than X then expect possible steric hindrance with
r 2 p
the receptor wall about X , provided that other factors are equal.
2
This rule can be exemplified by two compounds that differ by the type of the substituent, i.e., a
chlorine and a bromine atom. If the binding affinity for the bromine compound was lower (and
possibly even lower for the iodine compound), it would suggest that the wall of the receptor model
is contacted by the ligand at the bond distance of the chlorine atom and its related van der Waals
radius. Therefore, one could assume that the larger bromine atom represents an intrusion into the
receptor wall.
The above examples used to illustrate the specific rules for backward chaining are similar to
other attempts at receptor mapping. (6) However, these other methods do not account for interac-
tions that may be based on a combination of effects such as hydrophobicity and ligand potency.
For instance, a rule that might apply to a compound with a substituted phenyl ring may take the
form (19)
If a meta disubstituted compound is symmetrical, and the biological activities differ
between hydrophobic and polar substituents, then expect possibleringrotation to max-
imize hydrophobic and polar interactions between the ring substituents and the hydro-
phobic and polar surface.
Many rules can be derivedfromthe molecular structures and biological activities as seen from the
above examples, which add both shape and character to the surface model.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
Graphics Interface
KARMA presents the resultsfromthe rule system on a three-dimensional graphics workstation.
The bicubic patches of the surface model are displayed graphically and may be manipulated by
the user. The user may also modify the model and return to the control server for another iteration
in the rule system if the results are not satisfactory.
The bicubic patches are characterized with different colors, intensities and line textures to
show attributes such as hydrophobicity and steric properties. Only one attribute may be displayed
at a time, with color and intensity representing the value of the attribute, and line texture
representing KARMA*s confidence level in the information. For example, when displaying hydro-
phobicity, red patches are hydrophobic space while blue patches are polar space. Patches drawn
with solid lines represent areas which are well explored while patches with short dashes contain
little information. Displaying information using multiple cues allows the user to examine various
aspects of the surface model without having to deal with large amounts of numerical data.
The graphics interface is also the appropriate place to alter the model since it lets the user
look at an overall picture of the model as it is modified. The graphics interface provides user-
friendly tools for this purpose, including a pointing device for selecting the modification site and a
hierarchical menu system to guide the user through the actual process of making changes. Thus,
the user may select a control point on one of the bicubic patches with the pointing device; pop up
a menu of permitted modifications; select an operation, e.g., move the control point outwards
along the surface normal. After the control point data has been modified, the graphics interface
will recalculate and redraw the bicubic patches of the surface model based on the new data. After
modifying the model to the desired state, the user may simply return to the control server and ini-
tiate the rule system for further refinement
Conclusion
Currently, KARMA is in the prototyping phase. Although the hardware is connected via the high
bandwidth network, it is necessary to implement the servers for data communications. Addition-
ally, a completely new graphics package is in development for KARMA. The next two steps in
terms of development are the turnkey and production versions of KARMA.
Current methods in computer-assisted drug design are most successful if the structure of the
receptor is known. Our goal is to aid the investigator in those situations where the structure of the
receptor may or may not be known, KARMA emphasizes two critical factors. First, three dimen-
sional graphics presents the resultsfromthe rule-based system in a manageable format. Second,
KARMA provides a means for the user to inject knowledge about the model, KARMA is designed as
a tool to aid the chemist and the ability to incorporate ideasfromthe user is a very important
aspect It is our goal to successfully look at computer assisted drug design from a new perspective
using KARMA.
Acknowledgments
This work was supported in part by N M RR-1081, DAAG29-83-G-0080, Evans and Sutherland,
Silicon Graphics and IntelliCorp. We also wish to thank Dennis Miller, ID. Kuntz, Don Kneller,
Greg Couch, Ken Arnold, and Willa Crowell for help and discussion.
Literature Cited
(1) Morris, W., Ed. In "The American Heritage Dictionary of the English Language"; Ameri-
can Heritage and Houghton Mifflin: New York, 1969; p. 725.
(2) Hayes-Roth, F.; Waterman, D.A.; Lenat, D.B., Eds. "Building Expert Systems'';
Addison-Wesley: USA, 1983;
(3) Blaney, J.M.; Jorgensen, E.C.; Connolly, M.L.; Ferrin, T.E.; Langridge, R.; Oatley, S.J.;
Burridge, J.M.; Blake, C.C.F. J. Med. Chem. 1982, 25, 785-790.
(4) Hansen, C.; McClarin, J.; Klein, T.; Langridge, R. Molec. Pharm. 1985, 27, 493-498.
(5) K E E User's Manual. 707 Laurel Street, Menlo Park, California, 94025 K E E is a registered
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch013
trademark of IntelliCorp.
(6) Marshall, G.R. "Computer Aided Drug Design". First European Seminar and Exhibition on
Computer-Aided Molecular Design. October 18-19, 1984.
(7) Blaney, J.M.; Hansen, C.; Silipo, C.; Vittoria, A . Chem. Rev. 1984, 84, 333.
(8) We wish to thank Dr. David Weininger and Dr. Albert Leo at the MedChem Project,
Department of Chemistry, Pomona College for providing us with the SMILES software.
(9) Crippen, G . M . "Distance Geometry and Conformational Calculations"; Research Studies:
New York, 1981.
(10) Weiner, P.K.; Kollman, P.A. J. Comp. Chem. 1981, 2, 287-303.
(11) Metcalfe, R.M.; Boggs, D.R. Comm of the ACM 1976, 9. Ethernet is a registered trade-
mark of Xerox Corporation.
(12) We wish to thank Dr. Gordon Crippen from Texas A & M University and Dr. Jeffrey M . Bla-
ney from E.I. DuPont de Nemours & Company for providing us with the Distance
Geometry software.
(13) Foley, J.D.; Van Dam, A . "Fundamentals of Interactive Computer Graphics"; Addison-
Wesley: 1982; Chap. 13.
(14) Clark, J.; "Parametric Curves, Surfaces, and Volumes in Computer Graphics and Computer
Aided Geometric Design," Technical Report No. 221, Computer Systems Laboratory, Stan-
ford University, 1981.
(15) Hansen, C.; Leo. A . "Substituent Constants for Correlation Analysis in Chemistry and Biol-
ogy"; Wiley-Interscience: 1979.
(16) Hansen, C.; Klein, T.; McClarin, J.; Langridge, R.; Cornell, N . J. Med. Chem. (in press).
(17) Hansen, C.; L i , R.; Blaney, J.; Langridge, R. J. Med. Chem, 1982, 25, 777-784.
(18) Selassie, C.; Fang, Z.; Li, R.; Klein, T.; Langridge, R.; Kaufman, B. J. Med. Chem. (in
press).
(19) Smith, R.N.; Hansen, C.; Kim, K.I.; Omiya, B.; Fukumura, G; Selassie, C.D.; Jow, P.Y.C.;
Blaney, J.M; Langridge, R. Arch. Biochem. Biophys. 1982, 215, 319-328.
Carl Trindle
0097-6156/86/0306-0159$06.00/0
© 1986 American Chemical Society
A R o u t i n e t o A s s i g n Lewis S t r u c t u r e s . The p r o c e d u r e f o r a s s i g n i n g
Lewis s t r u c t u r e s i s f a m i l i a r ( 8 ) . G i v e n t h e s e t o f atoms, one must
sum the v a l e n c e e l e c t r o n s . In our LISP system, each ATOM can be
a s s i g n e d PROPERTIES w h i c h may i n c l u d e the number o f v a l e n c e e l e c -
t r o n s i t c o n t r i b u t e s t o the m o l e c u l e , and e q u a l l y i m p o r t a n t , i t s
s e t o f NEIGHBORS by w h i c h the s k e l e t o n o f the m o l e c u l e i s s p e c i f i e d .
Each such l i n k i s a s s i g n e d a p a i r o f v a l e n c e e l e c t r o n s , and a census
i s k e p t o f e l e c t r o n p a i r s i n the v i c i n i t y o f each atom. Among the
PROPERTIES of each atom i s an e s t i m a t e o f i t s e l e c t r o n e g a t i v i t y ,
and the program a s s i g n s e l e c t r o n p a i r s t o f i l l o c t e t s u s i n g the
e l e c t r o n e g a t i v i t y to s e t p r i o r i t y . The l a s t s t e p i s most " d i f f i -
cult." F o r each o f t h o s e atoms w h i c h l a c k a f u l l o c t e t , the system
must l o o k among the NEIGHBORS f o r atom(s) p o s s e s s i n g a l o n e p a i r
w h i c h i t might s h a r e . Of a l l those p o t e n t i a l d o n o r s , one chooses
the atom w i t h the most n e g a t i v e f o r m a l charge. The m u l t i p l e bond
A D i a l o g u e Accompanying t h e E n t r y
of a Molecule o f Moderate Complexity
NUMBER: 0
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX
HYDROGENS AT VERTEX 8
HYDROGENS AT VERTEX 9
HYDROGENS AT VERTEX 11 :
HYDROGENS AT VERTEX 12
HYDROGENS AT VERTEX 13
awareness.
The s t r u c t u r a l formula at minimum i d e n t i f i e s the atoms and
t h e i r connectivity. This hardly seems to be adequate i n complexity
to express much molecular information. This apparent paradox i s
resolved when we recognize that the chemist brings much of h i s ex-
perience to the task of i n t e r p r e t i n g the sketch, and much of the
information i s evoked rather than transmitted by means of the s t r u c -
t u r a l formula. The atoms' names—carbon, n i t r o g e n — c a l l up a f l o o d
of associations which (although they are almost never w r i t t e n ex-
p l i c i t l y i n the chemist's sketch) are nonetheless part of the
information i t can summon. Among t h i s data are the atomic mass,
t y p i c a l valencies, l o c a l geometry, perhaps a van der Waals radius,
and a guide to chemical behavior, i t s " e l e c t r o n e g a t i v i t y . "
The connectivity can define some aspects of the geometry i n a
useful semiquantitative way. The chemist has a very r e l i a b l e idea
of the range of bond lengths; CC(single), 1.54 A; CC(double), 1.33 A,
etc. By counting connections and recognizing the atoms being con-
nected, one can assign good estimates of the distances between
d i r e c t l y bonded atoms.
The chemist's knowledge of molecular geometry extends beyond
t y p i c a l values of bond distances. He w i l l also be able to predict
many bond angles f a i r l y accurately. This i s equivalent to speci-
fying a 1-3 nonbonded interatomic distance. The chemist's sketch
portrays c i s and trans isomerization, syn and a n t i , and gauche
conformations which specify either t o r s i o n angles, or i n d i r e c t l y ,
a 1-4 nonbonded distance.
Besides primary bond distances and angles, and some s p e c i a l
cases of t o r s i o n a l and dihedral angles, the chemist knows more global
features of molecular geometry. However, such knowledge becomes more
and more fragmentary; the longest distances i n a molecule are most
poorly defined.
Assignment, o f .3 L e w i s S t r u c t u r e
FORMULA: HI Π Ν 03 <+)
23 PAIRS ASSIGNED TO L I N K S
VERTEX 4 , 5 , 6 , 7 , 11 UNSATISFIED
VERTEX 5 UNSATISFIED
SHARING BETWEEN VERTICES 5 AND 4
VERTEX 3 UNSATISFIED
SHARING BETWEEN VERTICES 3 AND 2
SHARING BETWEEN VERTICES 6 AND 1
SHARING BETWEEN VERTICES 8 AND 7
SHARING BETWEEN VERTICES 12 AND 11
NONZERO FORMAL CHARGES:
VERTEX 1: «• 1
Literature Cited
13. Benson, S. W., et al. Chem. Rev. 1969, 69, 279; Int. J . Chem
Kinet. 1974, 6, 813.
14. Benson, S. W. "The Foundations of Chemical Kinetics";
McGraw-Hill: New York, 1960; p. 665.
15. NMR and v i b r a t i o n a l spectra of organic molecules are well
described by group-additivity ideas; o p t i c a l spectra require
corrections to the spectra of chromophores. Cf. discussion
of spectra by Gordon, A. J . and Ford, R. Α., "The Chemist's
Companion: A Handbook of P r a c t i c a l Data, Techniques and
References"; Wiley-Interscience: New York, 1972.
16. Almost every elementary textbook of organic chemistry provides
a systematic description of properties of functional groups
and their c h a r a c t e r i s t i c r e a c t i v i t y ; f o r example,
Fessendon, R. J . and Fessendon, J . S. "Organic Chemistry";
Willard Grant Press: Boston, 1979.
17. Dewar, M. J . S. "The Molecular O r b i t a l Theory of Organic
Chemistry"; McGraw-Hill: New York, 1969. Albright, Τ. Α.;
Burdett, J . K.; Whangbo, M. H. " O r b i t a l Interactions i n
Chemistry"; Wiley-Interscience: New York, 1985.
T h e S i m i l a r i t y of G r a p h s and M o l e c u l e s
1 2
Steven H.Bertz and William C. Herndon
1
AT&T Bell Laboratories, Murray Hill,ΝJ07974
2
University of Texas at El Paso, El Paso, TX 79968-0509
The concept of the similarity of molecules has important ramifications for physical,
chemical, and biological systems. Grunwald (7) has recently pointed out the
constraints of molecular similarity on linear free energy relations and observed that
"Their accuracy depends upon the quality of the molecular similarity." The use of
quantitative structure-activity relationships (2-6) is based on the assumption that
similar molecules have similar properties. Herein we present a general and rigorous
definition of molecular structural similarity. Previous research in this field has usually
been concerned with sequence comparisons of macromolecules, primarily proteins and
nucleic acids (7-9). In addition, there have appeared a number of ad hoc definitions of
molecular similarity (10-15), many of which are subsumed in the present work.
Difficulties associated with attempting to obtain precise numerical indices for
qualitative molecular structural concepts have already been extensively discussed in the
literature and will not be reviewed here.
We begin with the way chemists perceive similarity between two molecules. This
process involves, consciously or unconsciously, comparing several types of structural
features present in the molecules. For example, considering the five aliphatic alcohols
(represented by their Η-suppressed molecular graphs) in Figure 1, we note both
similarities and differences: they are all four-carbon alcohols; a, b, c and d are acyclic,
whereas e has a ring; a and b are primary alcohols, c and e are secondary alcohols and
d is a tertiary alcohol; b and c have the same skeleton, but for the labeling of points
(atoms), while the other skeletons are distinct; etc.
0097-6156/86/0306-0169$06.00/0
© 1986 American Chemical Society
The first step in quantifying the concept of similarity is to list all subgraphs of
the given molecular graphs, e.g. a-e, which has been done in the first column of
Table I. The subgraphs include the vertices (atoms), all connected subgraphs, and the
full molecular graphs themselves, since it can be seen that the molecular graphs for a
and c are both subgraphs of e. Next, the number of each subgraph contained in the
molecular graphs must be counted. Row 1 lists the number of C atoms, row 2 the
number of Ο atoms, row 3 the number of C-C bonds, row 4 the number of C-O bonds,
etc. Gordon and Kennedy (16) defined N.. as the number of subgraphs of graph j
isomorphic with graph /, and more colloquially as "the number of distinct ways in
which skeleton ι can be cut out of skeleton j" The entries in Table 1 are the number
of ways the subgraphs can be cut out of the molecular graphs (the number of
subgraphs of the molecular graphs isomorphic with the subgraphs in the first column).
In terms of the numbers of C or Ο atoms, a-e are equally complex. In terms of
C-C bonds (ethane subgraphs) a-d are 3/4 as complex as e; however, in terms of
propane subgraphs (row 5) a and c are 1/2 as complex as e. A simple algorithm that
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch015
takes account of all the subgraphs involves comparison of two columns at a time,
examining them row by row and dividing the smaller of the numbers by the larger. A
similarity index (57) can then be calculated by taking the average of the quotients. Of
course, for two identical molecular graphs, 57-1. Inclusion of the molecular graphs in
the list of subgraphs ensures that two different molecules which have the same number
of each proper subgraph will not have 5/— 1. The values of S1(1) for a-e are
summarized in the form of a similarity matrix SM(l) in Figure 2.
A simpler similarity index can be calculated by dividing the sum of the lesser of
the two numbers in each row by the sum of the greater. (Only two columns of Table I
are considered at a time, of course.) The values of SI(2) for a-e are summarized in
SM(2), also in Figure 2. According to both SI(l) and 5/(2), 1-butanol (a) and 2-
butanol (c) are the most similar, whereas f-butanol (d) and cyclobutanol (e) are the
least similar pair. In between these extremes there are a significant number of
disagreements between these indices. For example based on SI(l), c and e are more
similar than c and d; however, c and d are more similar than c and e based on 57(2).
There are seven such pairs (out of 45 possible pairs), and each index has one
"degeneracy". By considering standard measures of "distance," 57(2) would appear to
be the superior index (vide infra).
The calculations of similarity indices can also be done with labeled subgraphs of
a labeled molecular graph. The points can be labeled according to the valency of the
corresponding atoms (i.e. whether they are primary, secondary, tertiary, etc.), labeled
with stereochemical descriptors, or labeled to reflect isotopic composition to cite but a
few examples. Furthermore, the number of similarity indices can be doubled by
relaxing the stricture that only connected subgraphs be considered. We have
concentrated on connected subgraphs, as they are more intuitively meaningful to the
average chemist; nevertheless, for some applications the inclusion of disconnected
subgraphs may be desirable or even necessary.
b c d
1.000
1.000
b c d
1.000
1.000
Figure 2. Similarity matrices SM(l) and SM(2) for the graphs in Figure 1.
• 4 4 4 4 4
ο 1 1 1 1 1
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch015
3 3 3 3 4
— ·
•—0 1 1 1 1 1
2 3 2 3 4
1 1 2 3 2
1 0 1 0 4
1 2 1 0 2
X 0 1 0 1 0
X 0 0 1 3 1
π 0 0 0 0 1
α 1 0 0 0 2
b 0 1 0 0 0
d
X- 0 0 1 0 2
0 0 0 1 0
e 0 0 0 0 1
2,-ΙΉ/—Hjl, or Hamming distance, which counts the number of positions in which the
corresponding elements are unequal. It may be noted that these are measures of
dissimilarity; of course, it is easy to draw conclusions about similarity from them (e.g.
by taking their inverse). Table II contains the distances calculated according to each
of the definitions discussed above as applied to molecular graphs a-e. The three
distance functions parallel each other quite closely: there are only two disagreements
between Hamming distance and Euclidean distance, and there are no disagreements
between city-block distance and Euclidean distance. There is a two-fold degeneracy
within city-block distance and Euclidean distance (the same as S1(1) and S1(2)) and a
four-fold one within Hamming distance, which is the crudest measure. Both city-block
and Euclidean distance have only a single disagreement with 5/(2), but many with
5/(7); therefore, it is recommended that 5/(2) or one of the distance measures that
parallel it be used to index similarity.
</(a,b) =
6 6 2.449 0.167 0.408 0.561 0.684
</(a,c) SES 4 4 2.000 0.250 0.500 0.682 0.778
rf(M) 8 11 4.359 0.091 0.229 0.417 0.522
</(a,e) 10 14 4.899 0.071 0.204 0.462 0.517
d(.b,c) 8 8 2.828 0.125 0.354 0.472 0.619
* 5 9 0.576 0.609
d{b,d) 4.359 0.111 0.229
rf(b,e)=
d(c,a)
dice)
- 11
8
8
16
9
12
5.657
3.317
4.690
0.062
0.111
0.083
0.177
0.301
0.213
0.400
0.472
0.577
0.484
0.609
0.586
d(d,e) 12 19 6.245 0.053 0.160 0.367 0.441
of increasing complexity. The same order is obtained by considering the total number
of subgraphs or by counting only the number of propane subgraphs (19), η (Table III).
Subgraph Enumeration. The total number of subgraphs increases rapidly with the
number of atoms, making hand calculations of SI impractical for large molecules.
Therefore a computer program was written. Our program is based on the fact that the
entries in the nth power of the adjacency matrix of a graph count paths of length n,
which includes retraced pathways and, therefore, branched chains and cycles. A
molecular graph is represented by the string adjacency matrix A $(/,/), where the
e
/,/-entry is a string of characters describing a bond (I^J) or an atom ( / J ) .
Matrix multiplication is defined as string concatenation. The concatenated strings are
alphabetized, processed to eliminate duplicates, sorted by number of bonds, and stored
for future use. (A copy of this program can be obtained by writing to WCH.)
PA Q K -x
4 K A
Table III. Complexity Measures
PA 0.156 10 2
SM (2)-
1.000 0.516 *4
C
4 0.266 17 4
1.000 *4
K —x
4 0.516 33 8
*4 1.000 64 12
of similarity between two molecular structures." Randic et al. (14) have related
pharmacological activity to the numbers of paths in the molecular graph. The
extension from this one kind of subgraph to all possible subgraphs should improve the
statistical correlation of properties with substructures; but, even more importantly, it
will make the results easier to visualize in a way that is meaningful to a chemist.
Gordon and Kennedy (16) observe that a physical measurable can be expressed as a
linear combination of graph-theoretical invariants (N , see above). By using all
tj
possible subgraphs in such an analysis and optimizing the coefficients the most
important ones might be found.
Another important subject for similarity considerations is the planning of organic
syntheses. Wipke and Rogers (20) point out that "chemists do not always work
systematically backward but sometimes make an 'intuitive leap' to a specific starting
material from a target without consideration of reactions needed for interconversion.
This intuitive leap probably involves a Gestalt pattern recognition based on the
chemist's knowledge of available starting materials and similarity between the starting
material structure and the target structure." Our method should allow not only the
overall similarity of target and potential starting material to be assessed, but also the
similarity of portions (substructures) of the target and all or part of a starting material.
Literature Cited
Gordon D. Renkes
Chemistry Department, Ohio Northern University, Ada, OH 45810
0097-6156/86/0306-0176$06.00/0
© 1986 American Chemical Society
L i s p as a Language f o r Implementation
Given that such programs would be useful, we must next decide which
language would be most appropriate f o r implementation. At l e a s t
three reasons j u s t i f y the symbolic language L i s p .
F i r s t , L i s p i s designed to be used i n t e r a c t i v e l y at a
computer terminal. This would be very convenient for the
investigator i n the midst of thinking about a p a r t i c u l a r problem.
Suppose a question a r i s e s which requires the use of group theory
tables. Rather than digging through appendices or searching i n the
l i b r a r y , the computer programs would be employed to supply r e s u l t s
on the spot, even i f no one has ever done i t before.
(1 2 3) and ( (1 2 3) (4 5 6) (7 8 9 ) )
(1 2) * (12 3)= (2 1 3)
(1 2 3) * (1 2 3) =(3 2 1)
Implementation Discussion
(2 3 1)
(23145)
(21354)
((1 2 3))
#:GRP-3
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch016
((1 3 2))
Operators by Classes
1 i s (NIL).
(LASS 1 2 3 4 5 6
A1A 1 1 1 1 1 1
A1B 1 1 1 -1 -1 -1
A2A 1 1 -1 1 1 -1
A2B 1 1 -1 -1 -1 1
EA 2 -1 0 2 -1 0
EB 2 -1 0 -2 1 0
the forward sense, from a subgroup to the product group, and i n the
reverse sense, from product group t o a subgroup. Table IV shows the
terminal display of both forward c o r r e l a t i o n s .
Character c o r r e l a t i o n table.
SUBGROUP PRODUCT-GROUP
S3 S3-DP-S2
A1 A1B A1A
A2 A2B A2A
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch016
Ε EB EA
Character c o r r e l a t i o n table.
SUBGROUP PRODUCT-GROUP
S2 S3-DP-S2
A EA A2A A1A
Β EB A2B A1B
EA 2 - 1 0 2 - 1 0
EB 2 - 1 0 - 2 1 0
EAxEB 4 1 0 - 4 - 1 0
The decomposition i s
ORDER 2
OPERAND-LIST (4 5)
(#:GRP--10
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch016
GROUP-LIST #:GRP-11)
SUBGROUPS NIL
SUPERGROUPS (S3-DP--S2)
EA1 2 - 1 0 2 2 -1 -1 0 0 0 0
EA1 2 - 1 0 2 2 -1 -1 0 0 0 0
EA1xEA1 4 1 0 4 4 1 1 0 0 0 0
The decomposition i s
ET2 6 -3 0 -2 0 2 -2 1 0 -1 1 0 0 0 0
ET2 6 -3 0 -2 0 2 -2 1 0 -1 1 0 0 0 0
ET2xET2 36 9 0 4 0 4 4 1 0 1 1 0 0 0 0
The decomposition i s
Future Plans
Literature Cited
A M u l t i v a l u e d Logic P r e d i c a t e C a l c u l u s Approach
to Synthesis Planning
- Heuristics
- Macro Operators
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch017
- Abstraction
- Planning
A => Β
Β
:. Β
A VA
Α ν Β 3 χ Β
Α Λ Β A A ( B V C )
Α => Β ( ( Α => Β ) => C )
Α ν ( Β => A )
Bimodal Logic
Socrates i s Mortal.
Mortal(socrates) maps to 1
v(Mortal(socrates)) = 1
Definition 7: v ( ~ A) = 1 - v(A)
Atom5 i s on a_ ring
Atom6 might be on a ring
v(0n-ring(atom5) ) = 2
v(0n-ring(atom6) ) = 1 (12)
D e f i n i t i o n 10: TV (Α Λ B) =: CV (Α Λ Β) - DV (Α Λ Β)
D e f i n i t i o n 13: TV (Α ν B) =: CV (Α ν B) - DV (Α ν B)
D e f i n i t i o n 14: CV ( ~ A) = DV (A)
D e f i n i t i o n 15: DV ( ~ A) = CV (A)
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch017
D e f i n i t i o n 16: TV ( ~ A) = CV ( - A) - DV Γ A)
While these formulae are more complex than those of the LT, they
give the same r e s u l t s as do those of LT l o g i c . Two features of the
IMVL are however quite d i f f e r e n t from the LT: 1) the rules for
incrementally acquiring evidence; and 2) the rules for computing the
value of a consequent of an implication when the antecedent i s not
f u l l y True.
1
Definition 26: DV (B, when A => ~ B) = TV (A) * TV(A => ~B)/m
QED was implemented on the SUMEX-AIM DEC 2060 TOPS-20 system. The
source code consists of about 18000 l i n e s for FORTRAN code and 1500
l i n e s of macro code. A block diagram of the program modules i s shown
in Figure 3. QED i t s e l f contains no chemical information. The
chemical knowledge i s stored as postulates i n a formal f i r s t order
predicate calculus language. The grammar for t h i s language i s also
e x p l i c i t l y described i n the BNF notation. The PARSER i n t e r p r e t s the
postulates and i n t e r a c t i o n with the user, both for entering questions
and also for entering new r u l e s i n t e r a c t i v e l y . The QED EXEC handles
opening of f i l e s , entry of a molecule, and debugging aides. The
AGENDA EXEC creates, p r i o r i t i z e s , s e l e c t s , performs, and deletes
tasks. The INFER EXEC selects r u l e s , examines the data base,
i n s t a n t i a t e s predicates and i n t e r p r e t s the l o g i c . A l l information,
including postulates, r u l e s , d i c t i o n a r y , i n s t a n t i a t i o n s , tasks, etc.,
i s stored i n an associative r e l a t i o n a l data base. The ANSWER
EXTRACTER and FORMATTER communicates the answer to a question i n a
form the chemist can understand and that SECS can understand. The
design of the system i s very much l i k e the Japanese 5th Generation
Computer System design which i s also based on l o g i c .
USER
QED
PARSER EXECUTIVE
AGENDA
EXECUTIVE i ANSWER
TRANSLATOR
INFER I EXTRACTION
EXECUTIVE
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch017
ASSOCIATIVE MEMORY
ν ν ν ν
DICTIONARY POSTULATES AGENDA ITEMS INFERENCES
Figure 3. Block diagram of the QED system.
dictionary
Rule ALPHA-TO-SC
$A11 Atom(x) $A11 Atom(y)
[IF Stereocenter(x) .AND. Alpha (x,y)
THEN Alpha-anisotropic (x)] CF 0.7
<rule>
/
/
<quants><quants><implicationXCF value>
/ / \ \
$A11 / \ CF 0.7
Atom χ / \
y <antecedent> <impsymbol> <consequent>
$A11 Atom / ι !
/ !
<conjunction> THEN Alpha-anisotropic χ
/ ! \
/ ! \
Stereocenter y •AND. Alpha x y
1 isa rule
1 Rule-id "Alpha-to-SC"
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch017
1 son index # 2
2 isa quant-description
2 quantifier $A11
2 variable "X"
2 parent index # 1
2 son index # 3
3 isa quant-description
3 quantifier $A11
3 variable "y"
3 parent index # 2
3 son index // 4
4 isa inference
4 antecedent-son index # 5
4 consequent-son index # 6
4 CF value 0.7
4 parent index # 3
5 isa conjunction
5 formula-son index # 7
5 formula-son index // 8
5 parent index # 4
7 isa atomic-formula
7 Predicate "Stereocenter"
7 variable-1 "x"
7 parent index // 5
8 isa atomic-formula
8 Predicate "Alpha"
8 variable-1 "X"
8 variable-2 Il y II
8 parent index # 5
6 isa atomic-formula
6 Predicate "alpha-anisotropic"
6 variable-1 "X"
6 parent index # 4
Agenda L i s t Control
Example Rules
Rule Suggest-control-sc
$A11 Atom (x)
[IF Stereocenter (x) THEN Control-sc (x) ] CF 0.8 ;
Rule Connect-to-control
$A11 Atom (x) $A11 Atom (y)
[IF Control-sc (x) .and. Anisotropic (y)
THEN Connect (x,y) ] CF 0.8 ;
Rule Connect-apps-for-control
$A11 Atom(z) $A11 Appendage (y) $A11 Ring ( r )
[IF Root-of-appendage (z,y) .and.
Control-sc (z) .and. Atom-of-ring (z,r)
THEN Reconnect-app (y,r) ] CF 0.8 ;
Example of Analysis
@QED
- QED -
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch017
Conclusion
Acknowledgment
Literature Cited
1 2,3
Craig S.Wilcox and Robert A. Levinson
1
Department of Chemistry, University of Texas at Austin, Austin, TX 78712
2
Department of Computer Science, University of Texas at Austin, Austin, TX 78712
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
8
-
Li
0
ι Ρ
^
ο
yK • ^
Reactions are i n v a r i a b l y w r i t t e n t h i s way, and obviously have a l e f t
hand side and a r i g h t hand side. To the beginning organic student,
t h i s format n a t u r a l l y suggests a "before and a f t e r " or "cause and
e f f e c t " perception of reactions. " I f the s t a r t i n g material i s
treated i n this way, then the product w i l l r e s u l t . " This perception
has influenced the design of some computer programs. Reactions have
been represented either as two related structures or as one structure
and a set of changes required to produce the other structure.
:12
.Ν
Note that bonds which are invariant with time are represented i n
the usual way. The dotted l i n e s represent bonds which change over
the time course of the reaction event. Each changed-bond i s labeled
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
(c=:-c1-:»c:-c2=:-c3:-),c2-c4-o-c5-c3,c4»o,c5«o,c1-i. (b)
9
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
C-C01
1 1KC-F11) 2.12
2 6(C-C12) 1.3,12
3 5<C-C21) 2.4
4 2ÎC-C01) 5,13
5 KC-C11) 4.6,7.13
6 9ÎC-022) 5,7
7 16(C-011) 5.6.8
8 16 7,9,10
9 9 8.10
10 1 8,9,11.13
11 2 10,12,13
12 5 1.2,11
13 S 4.5.10,11
con.
•C022
con, CM
kccii ocm, ο 3?
CC21
t
CC11
ccir
(
C011
ccâTccoTccîï ccrfl
con r
SilCH,l 3
ID
121
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
131
14)
151
required.
The estimation of one type of v a l i d i t y i s a task faced by
organic chemists every day. In the process of reviewing research
grants, experts must predict whether proposed reactions, hitherto
unknown, w i l l succeed. To make this judgement, the expert r e l i e s i n
part on precedent. Previously observed reactions s i m i l a r to the
proposed reaction lend credence to the proposal. I f many reactions
(very s i m i l a r to be proposed reaction) are known to proceed i n high
y i e l d , the v a l i d i t y or l i k e l y y i e l d of the new reaction i s high. I f
s i m i l a r reactions are known to give low y i e l d s , then the proposed
transform i s of low v a l i d i t y .
Before precedent can be used to estimate v a l i d i t y , the meaning
of " s i m i l a r " (as i t i s applied to reactions) must be defined. I t i s
not s u r p r i s i n g that problems of conceptualization and s i m i l a r i t y
a r i s e i n the same project. Philosophers have long recognized the
complexity and interdependence of comparison and concept formation.
What makes one reaction a better precedent than another? Can
s i m i l a r i t y be quantified and i f so can the s i m i l a r i t y of a reaction
and a proposed transform be quantified? The ways i n which reactions
are s i m i l a r or d i s s i m i l a r and the p r e d i c t i o n of y i e l d s based on
precedent are important questions which deserve further study.
At present, we calculate transform v a l i d i t i e s (estimated y i e l d s )
for a generalization or an unknown reaction as follows.
i?iS(r) C W ( 1 ) 2
TV(r) = (2)
iÎs(r) C W ( i
>
CW(i) i s the closeness ( s i m i l a r i t y ) weighted v a l i d i t y of
transform i with respect to the new transform r . I f the denominator
i n Equation 2 i s 0, TV(r) i s 0. The constant a determines the
magnitude of the e f f e c t of closeness A ( r ) / A ( i ) on the calculated
transform v a l i d i t y .
Equations 1 and 2 were intended to produce the following
results. I f there are a large number of r e a c t i v e bonds i n the
precedent not i n the proposed transform, the closeness weighted
v a l i d i t y of the precedent i s small. I f there be the same number of
reactive bonds i n both reactions, the closeness weighted v a l i d i t y of
the precedent i s equal to i t s y i e l d or known v a l i d i t y . This i s an
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
which have the most reactive bonds i n common with the target. The
r e s u l t i s that precursors are suggested with l i t t l e s o p h i s t i c a t i o n .
In f a i r n e s s , i t should be emphasized that the data base was generated
from only about 230 reactions, and no generalizing concepts were
provided by the operators. We look forward to testing the system
when i t has acquired more knowledge.
Conclusions
The system described i n this paper stores and retrieves reactions and
structures, creates generalizations which further organize the
knowledge base, estimates the v a l i d i t y of these generalizations, and
uses both s p e c i f i c reactions and machine derived generalizations to
generate precursors. We have shown that the representation of
reactions as single labeled graphs i s possible based on the idea of a
bond which changes during a reaction and this graph representation
s i m p l i f i e s the machine driven act of induction. Concepts are
generated automatically and these concepts organize the data base,
aid in the retrieval, and support the precursor-generation
c a p a b i l i t i e s of the system. A method for c a l c u l a t i n g the v a l i d i t i e s
of a given generalization has been devised and methods of r e f i n i n g
these calculations have been i d e n t i f i e d .
This study examined some unexplored aspects of conceptualization
i n organic chemistry. How are c l a s s i f i c a t o r y concepts created? Can
the value of a generalization be quantified? Although here these
questions are presented i n r e l a t i o n to organic chemistry, they are i n
fact basic questions of epistemology and go beyond organic
chemistry.(9)
This program makes generalizations about real-world reactions
and uses these generalizations to generate precursors. M i t c h e l l ' s
approach to conceptualization requires an "instance language" to
represent observations, a "generalization language" to create
concepts, and a "matching predicate" to associate observations with
generalizations.(12,23) Our approach to generalization i n organic
chemistry r e l i e s on a bond-centered labeled graph representations of
reactions and structures (observations). In this language
"more-general-than" i s defined as equivalent to "subgraph-of". We
take advantage of the fact that i n organic chemistry the instance
language and the generalization language are i d e n t i c a l , and matching
predicates are based on graph comparisons.
Supergraphs: (196 282 296 432 436 484 509 510 515 526 668 669 670 677 678 682 683 684 766
815 816 817 819 828 829 830 831 987 989 991 1164 1183 1192 1193 1194 1225 1226 1227)
2
Initiating a query...
Please enter the list of classes. libit time we are interested in
(only reactions.
(r)
Supergraphs: (667 681 826 1224) Ifour known reactions are supergraphs
Ιοί the query.
Close matches: ((508 7) (814 7) (676 7) (136 4) (1063 3) (105 3) (1057 2) (359 2))
{concept 50S. for example, has a 7 bond
Number of concepts searched: 21 Isubgraph in common with the query.
Figure 5. Reaction r e t r i e v a l .
The precursors are on list pre'. Ithe veer new views the first three
M0CH
^OCM 3 ^ • 3
•
1 441 77 14
2 11 76 11
3 425 36 13
4 300 20 13
The precursors are on list 'pre'. Ithe user now views three of the
F i g u r e 7. The c a p a c i t y t o g e n e r a l i z e from s p e c i f i c f a c t s i s
r e v e a l e d by the systems a b i l i t y t o p r o v i d e these p r e c u r s o r s .
Acknowledgments
APPENDIX
S:-U
-While there i s an unmarked element y i n the
database such that each member of IP(y) i s marked
Τ and y has fewer nodes than G:
I f y < G (graph comparison needed)
Then mark y as Τ
{ β } : - {S - IP(y)} U {y}
Else mark y as F.
are found.
Phase 1 and Phase 2 answer parts 1-4 of the query as follows:
- ; single bond
= ; double bond
* ; t r i p l e bond
+ ; delocalized double bond
Other than these symbols, the chemist needs to remember only two
r u l e s : ( i ) rings are encoded i n parentheses wherein the l a s t atom i s
followed by a bond which connects i t to the f i r s t atom i n the
parenthetical expression, and ( i i ) atoms at branching points must be
numbered. Linear or c y c l i c strings are separated by commas.
Hydrogens are o r d i n a r i l y ignored. Thus cyclopentane i s encoded as
(c-c-c-c-c-) and sec-butanol as c - c l - c - c , c l - o . A menu i s available
which contains commonly used structures which can be used i n an
abbreviated form to define molecules. The t - b u t y l d i m e t h y l s i l y l ether
derived from n-propanol can be represented as *tbs*-o-c-c-c. Further
examples of representations based on t h i s system are shown i n Figures
4-7.
The chemist can encode a structure i n many ways and, provided
the representation follows the above r u l e s , each alphanumeric s t r i n g
will generate a proper connectivity file. For example,
lf 11 lf
(c-c-c-cl-c-c-c-c2-) ,cl-c2" or (cl-c2-c-c-c-) ,cl-c-c-c-c2 are both
proper representations of 3.3.0-bicyclooctane. IUPAC numbering can
be followed or the numbering can be a r b i t r a r y .
Literature Cited
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch018
Expert systems of today are powerful when used in the proper domains.
Unfortunately, the most difficult part of applying these systems is the struc-
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
turing of knowledge into rule format. This paper describes methods devel-
oped which allow the capture of Diels-Alder reaction knowledge into simple
and elegant expert system rule format. Essential components of the system
include: a grammar for matching the input molecular structure expressed
in Wiswesser Line Notation (WLN), the unification of many reactions into a
single generalized mechanism using synthon template patterns, use of WLN
rules to produce valid synthons, and use of frontier molecular orbital theory
(FMO) to verify the disconnection. This system is implemented in Prolog,
whose natural backtracking and generation capabilities easily express and
produce the many structural combinations possible.
There have been attempts to apply formal methods to the representation of organic
compounds [l],[2], some attempts to apply artificial intelligence to organic synthesis
[3],[4], and numerous attempts to apply the use of molecular orbital calculations to
the verification of the validity of compounds in the synthesis route. This effort was a
moderate attempt to examine the representation issues involved in writing production
rules for Diels-Alder disconnections.
The disconnection approach [5] is adopted in this work because it is amenable
to backward chaining systems. The starting point is the target compound, which is, in
this case, a Diels-Alder product. The target compound is broken or disconnected into
two distinct parts called synthons. The synthons are the ideal representations of the
actual reactants used to produce the target compound. Synthons embody the physical
properties of the actual compounds they represent.
As an initial implementation approach, rules could consist of specific targets and a
list of their synthons. No one uses this method because the naive approach of expressing
every possible chemical disconnection is impracticable: the number of rules involved to
express even trivial synthetic routes grows exponentially. Any expert system solution to
the synthesis problem must attack two fundamental problems: the variety of functional
groups which may participate in a given reaction and the symmetry involved between
function groups in a reactant (intra-synthon and inter-synthon functional group interac-
tion, respectively). The thrust of this research has been to capture the reaction routes
for a chemical disconnection in a clear, symbolic notation which accommodates quali-
tative reasoning with functional groups and which comprehends the symmetry of this
problem.
Ideally, an implementation language would support symbolic and linguistic ap-
proaches to representation and manipulation, a qualitative approach to verification, and
a deductive approach to disconnection. Prolog [6] is a symbolic language which directly
supports backward chaining deduction. Viewed as a declarative language it naturally
supports elegant grammar formalisms and its procedural aspects support qualitative
reasoning. For these reasons, Prolog was chosen as the implementation language for
this project.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
2.1 B a c k g r o u n d for W L N a n d D C G
grammars are carefully written to avoid exponential behavior. However, parsing algo
rithms exist (e.g., the active chart parser [11]) where the worst case parsing time is
3 2
0(n ) for any C F G grammar and 0(n ) when the grammar is unambiguous (n is the
sentence length). Nevertheless, Prolog provides an adequate DCG grammar parsing
mechanism for the purposes of this work.
This section examines grammars used to recognize parent molecules (carbocyclic rings
for example).
The following regular expression [12] recognizes cyclohexene:
where if r is any regular expression, [r] is an abbreviation for (e + r) (in other words,
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
r is optional), e is a regular expression that denotes the empty set and is the
union operator for the languages represented by the regular expression arguments. The
symbol V represents an arbitrary substituent, with the subscripts indicating to which
ring locant the substituent belongs.
Using DCG, the more general class of carbocyclic rings can be recognized. The
grammar rule
a tt tt tt
carbocyclic(Substituents, Number) — • L " , number(Number), U", T", J",
substituents(Substituents, Number).
achieves the desired result. Within this rule the logical variables are denoted by a
leading capital letter. This declaratively states that carbocyclic rewrites into the letter
tt
L", followed by a number (which in turn is recognized by DCG grammar rules), followed
by the letters "UTJ", followed by the substituents. The substituents rule recognizes
the Substituents at each ring locant and uses the instantiated value for Number to
verify that the ring locant values are within the proper range. Subsequent steps in the
disconnection process utilize the variables mentioned in the head of the rule.
Finally, using the grammar rule described above (and related rules not presented),
the goal
rewrites the string "L6UTJ A l BNW F3" into the empty set [] (meaning that the entire
string is recognized) and produces the result
S = [[A,1],[B,N,W],[F,3]],N = 6.
S is a list of ring locants and the corresponding substituents used in subsequent discon
nection stages. Ν represents the number of ring locants.
2.3 A p p l i c a t i o n to O t h e r Reactions
The general grammars and the mixture of declarative and procedural Prolog code allows
easy grammar rule writing for other reactions. As an additional example, consider
heterocyclic rings. The grammar rule
Curly braces allow direct inclusion of Prolog terms within DCG grammars (the terms are
not translated). In this case, the member predicate tests the value of the Heteroatom
variable for membership in a list of heteroatoms.
the rule of maximum orbital overlap, then it is a suprafacial, suprafacial process and is
termed a [,-4, + 2 ] reaction. By the Woodward-Hoffmann rules this is a symmetry-
r 9
It is known from molecular orbital theory that molecules possess sets of individual
molecular orbitals (as long as the molecules are sufficiently far apart from each other).
These are the basic unperturbed molecular orbitals used in the evaluation of the reaction.
As the molecules move more closely together, their orbitals begin to overlap. This
interaction between the orbitals on the different molecules results in the mixing of the
orbitals on each molecule [13].
According to frontier molecular orbital theory, the strongest interactions are be
tween those orbitals that have coefficients with similar magnitudes relative to the unper
turbed molecules, i.e. the interaction is between the small coefficient on the dienophile
and the small coefficient on the diene [16], [17].
If both of the molecular orbitals involved in the bonding are filled, the resulting
orbital is not significantly reduced in energy [18]. The greatest reduction in energy
arises in the interaction between a filled molecular orbital and an empty one. Since
the interaction is strongest between the orbitals of like energy, the ideal combination
of orbitals is between the highest occupied molecular orbital (HOMO) on one molecule
and the lowest unoccupied molecular orbital (LUMO).
Although Diels-Alder reactions can occur in the unsubstituted case, the reaction
is most successful when the diene and the dienophile contain substituents which exert
a favorable electronic influence [19]. In the normal electron demand case, the most
favorable interactions are between dienes with electron-donating groups and dienophiles
with electron-withdrawing groups. Cases have been reported in which inverse electron
demand occurs and the electronic nature of the diene and dienophile are reversed [20],
[21], [22]. This case of inverse electron demand is accounted for in the system.
3.2 S t r u c t u r a l C o n s t r a i n t s o n Reactants
It became necessary early on in the project to develop a method for quickly checking the
reactants for structural features which would make them unsuitable for the Diels-Alder
reaction. The constraints are integrated into the notation package, since they are most
easily recognized in terms of the notation patterns resulting from the disconnection. The
synthons produced by a Diels-Alder disconnection are checked for proper configuration.
All synthons are checked before the FMO algorithm begins, resulting in the failure of
program execution and the return of a "no" to indicate no reaction. This assures that
synthons produced by the rules are actually reactive.
The following structural features of diene-synthons are considered unreactive in
+ 2 ] cycloadditions:
T e
4. Acyclic compounds that have bulky substituents at the central positions on the
diene-synthon. The substituents at these positions are relatively close to each
other, and bulk leads to steric hindrance.
All double bonds are perceived as possible dienophile synthons by the notation
package. The screening involves only the elimination of all double bonds in aromatic
tt
compounds (WLN symbol R").
From work performed in 1983 by Burnier and Jorgensen [15], the following ab initio
calculations for the HOMO and LUMO energies of the synthons were developed. The
function n(x, parent) returns the number of atoms of type χ in the parent. This
function is abbreviated below as simply n(x) where the parent is understood. The
symbols UU, Ο, N, S represent triple bonds, oxygen, nitrogen, and sulfer, respectively.
The subscripts 'c' and 't' denote central and terminal locations respectively in the
parent for the elements which they modify. For brevity, the terms diene-synthon and
dienophile-synthon will be replaced with diene and dienophile respectively.
For Dienes:
In the carbocyclic ring case, the HOMO-LUMO values default to the constants at
the end of the equations. The formulas above are used to compute the orbital energies
(both HOMO and LUMO) of the unsubstituted parent compounds. In the case of
substituted compounds, additional formulas account for the electronic effects of the
substituents.
The explanation of the regiospecificity of Diels-Alder reactions requires knowledge
of the effect of substituents on the coefficients of the HOMO and LUMO orbitals. In
the case of normal electron demand, the important orbitals are the HOMO on the
diene and the LUMO on the dienophile. It has been shown that the reaction occurs
in a way which bonds together the terminal atoms with the coefficients of greatest
magnitude and those with the coefficients of smaller magnitude [18]. The additions
are almost exclusively cis and with only a few exceptions, the relative configurations of
substituents in the components is kept in the products [19].
It is known that the effects of substituent groups on a diene or dienophile vary
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
between different types of parents [23]. A function, τ(Υ), has been determined for several
functional groups, with Y corresponding to their electron donating or withdrawing
capability such that a reasonable estimate of the HOMO energy could be obtained by
use of the equation [15]:
To determine substituent effects, substituent groups are built from primary recognized
atoms and functional groups. A functional group is scanned one Wiswesser symbol
a
at a time. A Wiswesser symbol can represent either an individual atom (e.g., G "
a
for chlorine) or a functional group (e.g., Z " for the amino group). This allows us to
adapt the "layer" method of Jorgensen to the scanning of the functional groups on
the rings. These groups are provided as Prolog sublists as outlined in the previous
section. Once the comparison between the functional group elements and the known
values are compared, τ is calculated by the following method. The formula for the
numeric calculation is:
+ 2 w / ( l + NFG) (?)
name WLN τ
tau_entry (p-methoxyaryl, "R DOl", 51).
tau_entry(trimethylamino, "Ν1&Γ, 44).
tau_entry(aryl, "R", 42).
a
tau_entry('methyl sulfate', sr, 38).
tau _entry (amino, "Z", 36).
tau jent ry (olefinic, "1U2", 36).
tau_entry (sulfate, U
SH", 32).
direction.
The above is based on the calculation of a collective τ for the whole molecule. This
value changes the HOMO of either the diene or dienophile, as is necessary. This equation
is accurate to about 0.5 eV on either side of the "known" values [15]. The value of r tai to
is inserted into the HOMO-LUMO calculation as the parameter τ(Υ). Note that in its
pure form, this equation only yields values for the HOMO orbitals. Corrections are used
for the calculation of the LUMO values. Table 1 contains examples of the Wiswesser
Line Notation and the raw r values used in the computation of orbital energies.
3.5 D e t e r m i n a t i o n of P e r m u t a t e d L U M O Coefficients
The following rules were used for the determination of the LUMO orbital coefficients
from the values determined for the HOMO coefficients [15].
1. An electron donating functional group raises the energy of the HOMO orbital of
a system about twice as much as it raises the LUMO.
3. Groups which add conjugation such as olefinic, acetylenic and aromatic groups
lower the LUMO orbital energy one third to one half as much as the HOMO
energy.
The same equations are used to determine both the HOMO and LUMO values.
This is consistent with the fact that the HOMO and LUMO orbitals are calculated from
the same parent system, and that the difference between the orbital energies can be
adequately covered by the two parameters 7(P) which represents the sensitivity of the
parent to substitution and τ(Υ) which represents the electronic effect exerted by the
functional group acting as a substituent.
To implement the rules mentioned above, only the r(Y) values for the functional
groups are changed. Thus, the r(Y ) values for the calculation of the LUMO orbitals on
both the diene and dienophile are changed following these rules:
1. Positive τ values except those for conjugated hydrocarbons are divided by a factor
of 2.
3. r values for conjugated hydrocarbons are divided by a factor of 3 and their signs
are reversed.
This method covers many combinations of functional groups that influence the
orbital energies. A feature of this method is that it uses the same functional group
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
r values as in the HOMO energy calculation. The algorithm described above is used
for the calculation of both the HOMO and LUMO atomic coefficients. The r values of
the substituents are permutated to give the proper values for the LUMO orbitals. The
following steps are required:
1. r values on terminal positions are taken from the list previously described.
2. Resultant τ values on the central diene positions are divided by a factor of two
to accommodate the fact that the orbital coefficients at these positions are very
small.
Parent Synthons
tf
discon( L6UTJ Al Β Γ , ["lUYl&Yl&Ul", "lUl"]).
a tt a w
discon( L6UTJ D1Q", [ l U 2 U r , Q2Ul ]).
tt
discon( L6UTJ Al Bl DOVI", [ « I U Y I & Y I & U I " , "îvoiur]).
a
discon( L6UTJ Al Bl Dl E N W , ["lUYlfcYlfcUl", "WN1U2"]).
4.1 M o t i v a t i o n : the N a i v e A p p r o a c h
In the naive approach, disconnections are simply listed as facts with the molecule to
disconnect as the first argument and a list of the synthons as the second. Table 2
contains some examples. This approach suffers in many ways; primarily, the number
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
of rules would become unmanageable (quite huge even for cyclohexene), slowing the
inferencing speed of the expert system.
A sample inference mechanism using these facts (given the natural backward chain-
ing of Prolog) might be
disconnect(Parent, Given.Synthons) : -
discon(Parent, Synthons),
disconnect(Synthons, Given.Synthons).
disconnect(Parent, [Parent]) : -
given(Parent).
disconnect [First I Rest] . [First.Disc|Rest J)isc] ) :-
disconnect(First, F i r s t J ) i s c ) ,
disconnect(Rest, Rest.Disc),
disconnect ([] , []).
This procedure recursively disconnects synthons until the final synthons for the orig-
inal parent are all available (or given) compounds. Upon successful completion, the
variable Given-Synthons contains a tree (in list notation) which denotes the synthon
combination order to reproduce the parent compound.
4.2 D e r i v a t i o n of the G e n e r a l F o r m
Consider the domain of a six-membered ring with single unsaturation. Table 3 expresses
the synthetic route with one substituent. Again, the symbol V represents an arbitrary
substituent. Square brackets surrounding a set of symbols indicates optionality of those
symbols (as in regular expression notation). For example, the string may reduce
α
to the string V or σ&* depending on whether the substituent represented by σ ends
in a terminal symbol or not (following the rules of WLN).
Symmetry in the patterns, however, hides many details in the diene and dienophile
patterns. Table 4, with combinations of symmetric substituents, reveals more of the
details. The order of the symmetric substituents may be chosen arbitrarily. Alphabetical
ordering was chosen here for consistency.
Finally, for a full cyclohexene molecule, the patterns become
It should be clear that this notation applies to many different classes of reactions.
Use and manipulation of this general form will be discussed in the next section. The
following discussion outlines its use in expert system rules.
4.3 U s e of the M e c h a n i s m i n R u l e F o r m a t i o n
Given the general form, it is possible to capture many disconnections of a given class
with a single rule. The following example illustrates the approach advocated in this
paper for cyclohexene.
adjacent numbers are summed (for a longer carbon chain), a three way branch reduces
to a carbon when one of the branches is empty, optional ampersands are eliminated, and
required ampersands are retained. The rules must be applied to the string repeatedly
until no changes to the string occur.
N[] — N.
[]N — N.
<TN —• {number(<r), N N is σ + Ν } , N N .
Νσ —-> {number(a), NN is Ν NN.
NiN 2 — {NN is Ni+ N }, NN. 2
Y[]& — 1.
Υσ& —• {not(number(a)), endsJn_terminal(a)}, Υσ.
Y<7& —• {not (number (σ)), not(endsJn_terminal(<r))}, Yak,.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch019
K
[] ... up = []) yields the diene 1U2U1" and the dienophile " Ι ϋ Γ . Once the synthons
are in pseudo-WLN form, they are rearranged to conform to the standard WLN form
(described in Section 5).
4.4 A p p l i c a t i o n to O t h e r Reactions
General forms are easily developed for other reactions. The machinery introduced in
this section can then be utilized to write disconnection rules for other reactions. For
example, consider the Diels-Alder adduct bicyclo[2.2.1]hept-2-ene. Using the regular
expression notation described previously, the line notation for these types of compounds
can be represented as
L55 CU ATJ [Ασ ] [Βσ ] [Ca ] [Όσ ] [Εσ ] [Fa ] [Ga } [-A&(F+G)] [-B&(F+G)]
Α Β c Β Ε F G
The information following the hyphens describes the orientation of the substituents at
locants where stereoisomerism can occur. F and G are the locants where the stereo
chemistry may occur.
This compound can be disconnected into a cyclopentadiene synthon and a dieno
phile synthon similar to the the one previously described. The general form for the
disconnection is then given in the notation by
L5 AHJ Ασ Βσ Ca Α Β c Όσ Ό Εσ + a l\Jla
Ε F G (9)
Additional pseudo-WLN rewrite rules would eliminate ring locant symbols which are
followed by an empty substituent.
5 Notation Rearrangement
The previous section illustrated the formation of diene and dienophiles and noted that
the intermediate notation did not necessarily obey the WLN rules. This section de
scribes the transformation from pseudo-WLN form to legal WLN notation.
A predicate called wln_order occurs within the make_synthon predicate. This
predicate builds a graph from the pseudo-WLN (using WLN Rule 8(a)) and possibly
reorders the graph as described below. The following Prolog code describes this manip
ulation:
wln_order(Pseudo_WLN. WLN) : -
notation_graph(Pseudo_WLN),
rule6(Chain), % uses graph in database
mle7and8(Chain. WLN).
Next, Rule 7 orients branch choices along the primary chain chosen above. This
rule orders branches using the branches with the lowest branching factor and with the
fewest notation symbols. Ties are again broken by Rule 2. Rule 8 guides the reassembly
of the molecule in proper WLN form. It reintroduces ampersands and inserts hyphens
where necessary. All of this was easily implemented in Prolog, using DCG to parse the
pseudo-WLN form and the Prolog database to represent the graph.
Many additional rules are required for other reactions. Probably the entire comple-
ment of WLN rules must be implemented for even moderately sophisticated chemistry.
It may be desirable at this point, however, to design a notation which encompasses
WLN'S strong points, but is more computationally oriented.
6 Conclusions
Other systems have developed FMO reaction checks and used WLN for cataloging, but
this system has relied heavily on a symbolic approach to chemistry, including application
of grammar techniques to WLN strings. We feel that our system is very successful in
the domain that it has been applied, eliminating hundreds of naive expert system rules.
We also feel that our techniques are applicable to many other reactions as well.
This paper has primarily stressed concepts rather than implementation details. A
prototype system based on these concepts has been implemented, with concentration in
the cyclohexene domain. The entire system, including grammars, the FMO verification,
and WLN manipulation required only 12 pages of Prolog code. Although execution
speed was never considered a factor at this stage, the system performs the disconnection
Acknowledgments
We wish to express our appreciation for the Texas Instruments IDEA program which
sponsored the majority of this research. This is a unique program within a large com-
pany which provides excellent research opportunities. Texas Instruments' unsurpassed
computing facilities also deserve acknowledgment.
Literature Cited
Tunghwa Wang, Ilene Burnstein, Michael Corbett, Steven Ehrlich, Martha Evens,
Alice Gough, and Peter Johnson
Illinois Institute of Technology, Chicago, IL 60616
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
U s i n g LMA ( L o g i c M a c h i n e A r c h i t e c t u r e ) , a c o l l e c t i o n o f
P a s c a l p r o g r a m s w r i t t e n by t h e t h e o r e m p r o v i n g g r o u p a t
A r g o n n e N a t i o n a l L a b o r a t o r y ( 1 - 2 ) , we have d e v e l o p e d
SYNLMA ( S Y N t h e s i s w i t h LMA), a n e x p e r t s y s t e m f o r o r g a n i c
s y n t h e s i s t h a t u s e s a r e s o l u t i o n based theorem p r o v e r a s
t h e r e a s o n i n g component. The m a j o r a d v a n t a g e s o f SYNLMA
stem f r o m t h e i n d e p e n d e n c e o f t h e d a t a b a s e a n d t h e
inferencing. F i r s t , t h e d a t a b a s e c a n be m o d i f i e d o r a n
e n t i r e l y d i f f e r e n t one u s e d w i t h o u t r e p r o g r a m m i n g t h e
d e c i s i o n making u n i t o f t h e s y s t e m . This conversion
i n v o l v e s m o d i f y i n g a s h o r t program t h a t t r a n s l a t e s a
database r e p r e s e n t a t i o n f o rmolecules i n t o a molecular
r e p r e s e n t a t i o n t h e t h e o r e m p r o v e r r e c o g n i z e s ; SYNLMA i s
not changed a t a l l . S e c o n d , t h e scheme f o r r e p r e s e n t i n g
0097-6156/86/0306-0244$06.00/0
© 1986 American Chemical Society
a m o l e c u l e c a n be c h a n g e d w i t h o u t c h a n g i n g SYNLMA. Once
a g a i n SYNLMA r e m a i n s t h e same, o n l y t h e i n t e r f a c e between
t h e d a t a b a s e a n d SYNLMA w i l l h a v e t o be a l t e r e d . This
f l e x i b i l i t y makes SYNLMA a n a t t r a c t i v e a l t e r n a t i v e t o
o t h e r o r g a n i c s y n t h e s i s programs.
SYNLMA p e r f o r m s a r e t r o s y n t h e t i c a n a l y s i s u s i n g a
s p e c i a l p u r p o s e t h e o r e m p r o v e r b u i l t f r o m LMA components.
The compound t o be s y n t h e s i z e d becomes a t h e o r e m t o be
proved. The r e a c t i o n r u l e s a n d s t a r t i n g m a t e r i a l s become
axioms. The c h o i c e o f a knowledge r e p r e s e n t a t i o n h a s
b e e n one o f o u r g r e a t e s t p r o b l e m s .
D a t a f o r t h e t h e o r e m p r o v e r h a s t o be t r a n s l a t e d
i n t o c l a u s e s , t h e o n l y form t h e theorem p r o v e r
recognizes. A c l a u s e i s t h e "OR" o f one o r more l i t e r a l s
where a l i t e r a l i s a p r e d i c a t e a n d i t s a r g u m e n t s . A
predicate i s a property or r e l a t i o n s h i p that i s true or
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
f a l s e . I t s a r g u m e n t s c a n encompass a n y number o f
f u n c t i o n s . A f u n c t i o n r e t u r n s t r u e , f a l s e o r some o t h e r
value. The s t a t e m e n t "x + y > y + ζ" c a n be w r i t t e n a s a
c l a u s e u s i n g t h e f u n c t i o n "Sum" a n d t h e P r e d i c a t e
" G r e a t e r T h a n . " The r e s u l t i n g o n e - l i t e r a l c l a u s e l o o k s
like this:
GreaterThan(Sum(x,y),Sum(y,z))
Molecular Representations
The r e p r e s e n t a t i o n o f m o l e c u l a r s t r u c t u r e i n c l a u s e f o r m
i s c r u c i a l t o t h i s r e s e a r c h as i t i s a major determinant
1
o f t h e t h e o r e m p r o v e r s e f f i c i e n c y . The c l a u s e
r e p r e s e n t a t i o n a f f e c t s t h e time i t takes t o r e t r i e v e
r e a c t i o n r u l e s and s t a r t i n g m a t e r i a l s and t h e time
n e c e s s a r y t o make c o m p a r i s o n s between s t r u c t u r e s . The
i m p o r t a n c e o f t h e r e l a t i o n s h i p between e f f i c i e n c y a n d t h e
c l a u s e r e p r e s e n t a t i o n i s i l l u s t r a t e d by t h e d i f f e r e n c e i n
t h e r u n t i m e s between p r o v i n g o u r f i r s t c l a u s e s a n d
c u r r e n t ones. Our f i r s t r e p r e s e n t a t i o n scheme was a
s i m p l e one w i t h one p r e d i c a t e f o r e a c h atom e x c e p t
h y d r o g e n and one f o r e a c h bond ( F i g u r e l a ) . U s i n g t h i s
c l a u s e form, a m o l e c u l e w i t h t e n atoms t o o k s e v e r a l h o u r s
t o p r o v e on a n IBM m a i n f r a m e . F o r SYNLMA t o be a v i a b l e
s y s t e m f o r o r g a n i c s y n t h e s i s t h e " p r o v i n g t i m e " h a s t o be
r e a s o n a b l e a n d one k e y t o t h i s i s t h e c l a u s e
representation. By u s i n g a s i n g l e p r e d i c a t e t o d e s c r i b e
e a c h atom a n d i t s "bond e n v i r o n m e n t , " t h e p r o o f o f a
m o l e c u l e h a s b e e n r e d u c e d t o a few s e c o n d s . We w i l l
continue t o experiment with the r e p r e s e n t a t i o n f o r
m o l e c u l e s , t r y i n g t o f i n d t h e r i g h t b a l a n c e between t h e
number o f c l a u s e s a n d t h e i r l e n g t h . We c u r r e n t l y
r e p r e s e n t s t a r t i n g m a t e r i a l s a n d compounds t h a t we want
to s y n t h e s i z e ( t a r g e t s ) by a c l a u s e l i s t ( F i g u r e l b ) . I n
t h i s scheme:
1. A m o l e c u l e i s r e p r e s e n t e d by a l i s t o f c l a u s e s , where
e a c h c l a u s e c o r r e s p o n d s t o one atom a n d d e s c r i b e s i t s
e n v i r o n m e n t ( i . e . , i t s bonds, c h a r g e , e t c . ) .
2. The number o f atoms i n a m o l e c u l e d o e s n o t c o r r e s p o n d
t o t h e number o f c l a u s e s i n t h e c l a u s e l i s t . An atom
generates a clause o n l y i f i t i s bonded t o two o r
more atoms; o t h e r w i s e t h e atom w i l l be i g n o r e d a s a l l
i t s i n f o r m a t i o n w i l l be c o n t a i n e d i n a c l a u s e
g e n e r a t e d b y a n o t h e r atom.
3. Each c l a u s e c o n s i s t s o f the p r e d i c a t e c a l l e d
F r a g m e n t , a Bond f u n c t i o n ( B r r l , B211, B l l l l , e t c . )
l i s t i n g t h e t y p e s o f bonds, s u c h a s a r o m a t i c ,
r e s o n a n t , t r i p l e , d o u b l e , s i n g l e , f o r t h e atom b e i n g
d e s c r i b e d a n d a n Atom f u n c t i o n f o r t h i s c e n t r a l atom
of r e f e r e n c e a n d f o r e a c h atom bonded t o i t . A c l a u s e
is terminated with a semicolon.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
4. The a r g u m e n t s f o r t h e Atom f u n c t i o n a r e : t h e c h e m i c a l
symbol f o r t h e e l e m e n t , a number a s s i g n e d by o u r
n u m b e r i n g scheme, t h e c h a r g e on t h e atom (-1, 0, +1,
+2 e t c . ) , a s t e r e o c h e m i s t r y f l a g a n d a r i n g f l a g
i n d i c a t i n g w h e t h e r o r n o t t h e atom i s a member o f a
ring. D e f a u l t v a l u e s f o r t h e l a s t t h r e e arguments
are zero.
F i g u r e l a . Our F i r s t C l a u s e R e p r e s e n t a t i o n f o r a S i m p l e
Molecule. The numbers f o l l o w i n g t h e e l e m e n t s y m b o l s i n
t h e d i a g r a m a r e u s e d t o i d e n t i f y atoms i n t h e c l a u s e s .
Fragment(B211(Atom(C,1,0,0,0),Atom(0,3,0,0,0),
Atom(C,2,0,0,0),Atom(H,4,0,0,0)));
Fragment(Bllll(Atom(C,2,0,0,0),Atom(C,1,0,0,0),
Atom(0,5,0,0,0),Atom(H,6,0,0,0),
Atom(H,7,0,0,0))) ;
Fragment(Β11(Atom(0,5,0,0,0),Atom(C,2,0,0,0),
Atom(H,8,0,0,0)));
F i g u r e l b i s a s i m p l e example o f a c l a u s e l i s t a n d
t h e r u l e s f o r c o n s t r u c t i n g i t . I n a c t u a l i t y , t h e r e a r e no
s p a c e s between c h a r a c t e r s i n a c l a u s e . T h e y a r e i n c l u d e d
t o make i t e a s i e r t o g r a s p t h e c l a u s e n o t a t i o n . N o t e ,
t h a t a l t h o u g h t h e r e a r e e i g h t atoms i n t h e m o l e c u l e o n l y
three generated clauses. F o r example, 0 ( 3 ) d o e s n o t
g e n e r a t e a c l a u s e s i n c e i t w o u l d be r e d u n d a n t . The c l a u s e
f o r 0 ( 3 ) w o u l d be " F r a g m e n t ( B 2 ( A t o m ( 0 , 3 , 0 , 0 , 0 ) ,
Atom(C,1,Ο,Ο,Ο)))" and a l l t h i s i n f o r m a t i o n i s c o n t a i n e d
i n t h e c l a u s e generated by C ( l ) . The f i r s t Fragment
predicate i n figure l b i s :
Fragment(B211(Atom(C,1,0,0,0),Atom(0,3,0,0,0),
Atom(C,2,0,0,0),Atom(H,4,0,0,0)));
R e a c t i o n Rule Database
Our p r e s e n t r e a c t i o n r u l e d a t a b a s e i s made up o f
a p p r o x i m a t e l y one h u n d r e d r u l e s a d a p t e d f r o m a m i c r o f i c h e
g e n e r o u s l y sent t o us by G e l e r n t e r ( 4 ) . F o r a g i v e n
r e a c t i o n , a r u l e s p e c i f i e s t h e r e a c t a n t s (subgoal) and
the p r o d u c t ( s ) ( g o a l ) , i n c o n n e c t i o n t a b l e f o r m a t a n d a n y
c o n s t r a i n t s on t h e i r c o m p o s i t i o n ( F i g u r e 2 a ) . The r u l e s
a r e i d e n t i f i e d by c h a p t e r a n d schema numbers. The
connection tables a r e organized as follows:
0 0
I I 11
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
C c c c
/ \ / \ / \ / \
$1 CI $1 #J $1 $1 $1 $2
F i g u r e 2a i s t h e G e r l e r t n e r r e a c t i o n r u l e f o r t h e
" r e a c t i o n o f magnesium w i t h a l k y l b r o m i d e s " . The number
o f ( s i x ) and t y p e o f row atoms (Mg, B r , C, $2, $4, $6)
a r e i d e n t i c a l f o r b o t h t h e t h e g o a l and s u b g o a l
c o n n e c t i o n t a b l e s and i s a c o m p o s i t e o f a l l atoms i n b o t h
the p r o d u c t and r e a c t a n t s . D i f f e r e n c e s between g o a l and
s u b g o a l s t r u c t u r e s a r e i n d i c a t e d by t h e numbers t o t h e
r i g h t o f row atoms a n d n o t t h e i r p r e s e n c e o r a b s e n c e i n
t h e t a b l e s . F o r example, i n t h e g o a l t a b l e Row Atom 1,
magnesium, i s bonded t o Row Atom 2 by a s i n g l e bond
( i n d e x : b o n d = 2:1) and t o Row Atom 3 by a s i n g l e bond
( i n d e x : b o n d = 3 : 1 ) . W h i l e magnesium d o e s n o t a p p e a r i n
t h e s u b g o a l s t r u c t u r e , i t i s s t i l l t h e f i r s t row atom i n
1
t h e s u b g o a l s t a b l e . B u t t h e v a l u e s f o r bond i n d e x e s a n d
bond t y p e s a r e now z e r o ; t h a t i s , M g ( l ) i s n o t bonded t o
o t h e r atoms i n t h e t a b l e . An example o f a n atom t h a t
a p p e a r s i n b o t h t h e g o a l and s u b g o a l s t r u c t u r e s i s Row
Atom 3. One o f t h e atoms t h a t C ( 3 ) i s bonded t o c h a n g e s
(Br t o Mg) b u t C ( 3 ) i s c o n s i d e r e d t h e same t h r o u g h o u t t h e
r e a c t i o n and k e e p s t h e same i n d e x .
Schema 2
Goal TSD
S u b g o a l TSD
136 T h i o l
126 Oxime
122 D i a z o k e t o n e
(and o t h e r s )
The c o n s t r a i n t s l i s t e d u n d e r t h e schema t e s t s g i v e
l i m i t a t i o n s on t h e p o s s i b l e values of the v a r i a b l e s i n
Multistep:
0
M 1) NaOET
ET-O-C 2) RX 0 CH3
\ 3) OH-,H20 II /
CH2 > H-0- C -CH2-C -H
/ 4) H+ \
ET-O-C CH3
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
1 I
0
0 0
II II
ET-O-C ET-O-C
\ \
CH2 + NaOET > CH-
/ /
ET-O-C ET-O-C
II II
0 0
0 0
II II
ET-O-C H CH3 ET-O-C CH3
\ - \ / \ /
CH + C > CH-C-H
/ / \ / \
ET-O-C Br CH3 ET-O-C CH3
II II
0 0
0 0
II II
ET-O-C CH3 H-O-C CH3
\ / OH-, H20 \ /
CH-C-H > CH-C-H
/ \ / \
ET-O-C CH3 H-O-C CH3
0I' J0J
I n t h i s example a f o u r s t e p s y n t h e s i s i s a l s o
e x p r e s s e d as a v e r y g e n e r a l one s t e p r e a c t i o n .
We have w r i t t e n a p r o g r a m t h a t t r a n s l a t e s t h e
c o n n e c t i o n t a b l e s i n t o c l a u s e s , a form t h a t the theorem
p r o v e r c a n p r o c e s s , and s t o r e s them i n f i l e s organized
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
f i r s t by g o a l o r s u b g o a l and t h e n by t h e f u n c t i o n a l
groups i n the molecule. The c o n s t r a i n t s a r e i n a n o t h e r
s e t o f f i l e s . SYNLMA u s e s t h e s e f i l e s ; i t d o e s n o t u s e
the f i l e s of G e r l e r n t e r formatted r u l e s . In a d d i t i o n to
t h e r e a c t i o n r u l e d a t a b a s e , we have f u n c t i o n a l g r o u p and
s t a r t i n g m a t e r i a l databases (also i n clause form).
E a c h atom i n a t a r g e t o r s t a r t i n g m a t e r i a l m o l e c u l e i s
defined. T h i s i s not t r u e f o r a r e a c t i o n r u l e or
f u n c t i o n a l g r o u p m o l e c u l e where p a r t s o f t h e m o l e c u l e a r e
r e p r e s e n t e d by v a r i a b l e s ($1, $ J , e t c . ) . SYNLMA t r e a t s a
r e a c t i o n r u l e or f u n c t i o n a l group s t r u c t u r e as a
m o l e c u l e , e v e n t h o u g h some o f i t s atoms a r e unknown, and
r e p r e s e n t s i t i n e s s e n t i a l l y t h e same f o r m as known
m o l e c u l e s ( F i g u r e 2b). A m o l e c u l e w i t h a v a r i a b l e
s u b s t r u c t u r e d i f f e r s f r o m a known m o l e c u l e i n t h e
following:
1. The p r e d i c a t e s a r e ORed f o r a m o l e c u l e w i t h v a r i a b l e s
(one c l a u s e p e r m o l e c u l e ) i n s t e a d o f ANDed (one list
of c l a u s e s f o r each molecule).
2. The s i g n o f t h e p r e d i c a t e i s n e g a t i v e i n s t e a d o f
positive.
3. V a r i a b l e atoms o r s u b s t r u c t u r e s a r e r e p r e s e n t e d by
t h e l e t t e r "y" f o l l o w e d by a number ( y l , y2) o r t h e
l e t t e r " j " ( y j ) . " Y j " r e p r e s e n t s a h a l i d e ; the
" y / e v e n numbered" v a r i a b l e s c a n r e p r e s e n t any
s u b s t r u c t u r e o r atom; and t h e "y/odd numbered" c a n
r e p r e s e n t any s u b s t r u c t u r e o r atom e x c e p t h y d r o g e n .
4. The Atom f u n c t i o n s h a v e v a r i a b l e s f o r a r g u m e n t s , n o t
constants.
5. Each g o a l or subgoal c l a u s e i s terminated w i t h the
p r e d i c a t e R x n r u l e whose f i r s t argument i s a r e a c t i o n
r u l e i d e n t i f i c a t i o n number. A f t e r t h i s number, t h e
p r e d i c a t e u s e s t h e f u n c t i o n LL ( f o r l i n k e d l i s t ) t o
l i s t a l l t h e atoms i n t h e c o n n e c t i o n t a b l e .
F u n c t i o n a l group c l a u s e s are terminated w i t h the
-Fragment(Bl1(Atom(Mg,xl,s1,tl,ul),Atom(Br,x2,s2,t2,u2),
Atom(C,x3,s3,t3,u3)))|
-Fragment(Bllll(Atom(C,x3,s3,t3,u3),
Atom(Mg,xl,sl,tl,ul),y2,y4,y6))|
Rxnrule(202,LL(Atom(Mg,xl,sl,tl,ul,),
LL(Atom(Br,x2,s2,t2,u2),
LL(Atom(C,x3,s3,t3,u3),
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
LL(y2,LL(y4,LL(y6,NIL)))))));
-Fragment(Bllll(Atom(C,x3,s3,t3,u3),
Atom(Br,x2,s2,t2,u2),y2,y4,y6))|
Rxnrule(202,LL(Atom(Mg,xl,sl,tl,ul,),
LL(Atom(Br,x2,s2,t2,u2),
LL(Atom(C,x3,s3,t3,u3),
LL(y2,LL(y4,LL(y6,NIL)))))));
A c o m p a r i s o n between t h e c o n n e c t i o n t a b l e s i n f i g u r e
2a and t h e i r c l a u s e r e p r e s e n t a t i o n s i n f i g u r e 2b
i l l u s t r a t e s t h e c o n v e r s i o n r u l e s and some o f t h e
d i f f e r e n c e s between a known m o l e c u l e ' s c l a u s e and a
r e a c t i o n r u l e c l a u s e . Two row atoms, M g ( l ) and C ( 3 ) , i n
t h e g o a l and o n l y C ( 3 ) i n t h e s u b g o a l a r e bonded t o two
o r more atoms a n d t h e r e f o r e g e n e r a t e p r e d i c a t e s . Unlike
the c l a u s e l i s t ( f i g u r e l b ) these p r e d i c a t e s a r e not
s e p a r a t e d by s e m i c o l o n s ( i m p l i c i t l y ANDed one p r e d i c a t e
c l a u s e s ) b u t a r e j o i n e d by a v e r t i c a l b a r , t h e symbol f o r
OR. The p r e d i c a t e , F r a g m e n t , i s c o n s t r u c t e d i n t h e same
way a s f o r a known m o l e c u l e w i t h t h e e x c e p t i o n t h a t some
o f t h e Atom f u n c t i o n s a r g u m e n t s a r e v a r i a b l e s ( e . g . x l ,
s i , t l , e t c . ) . V a r i a b l e s a r e n o t w r i t t e n u s i n g Atom
f u n c t i o n s ( t h e y a r e unknowns) b u t a r e s i m p l y l i s t e d i n
t h e p r o p e r o r d e r i n t h e bond f u n c t i o n . The c l a u s e i s
t e r m i n a t e d w i t h an i d e n t i f y i n g R x n r u l e p r e d i c a t e t h a t
l i s t s t h e r e a c t i o n r u l e c h a p t e r and schema ( c h a p t e r
number * 1000 + schema number) and e v e r y row atom i n t h e
c o n n e c t i o n t a b l e . Note t h a t t h e R x n r u l e p r e d i c a t e i s
i d e n t i c a l f o r t h e g o a l and s u b g o a l , l i n k i n g t h e two
clauses together.
SYNLMA i s c u r r e n t l y c a p a b l e o f h a n d l i n g t h e s y n t h e t i c
d e s i g n f o r compounds t h e s i z e o f t h e a n a l a g e s i c D a r v o n
u s i n g an i n - c o r e database o f a p p r o x i m a t e l y a hundred
reactions. The s y n t h e s i s p r o c e s s s t a r t s w i t h t h e i n p u t
o f t h e s t r u c t u r e o f t h e compound ( i n c l a u s e form) t h a t we
are t r y i n g t o synthesize. Next, an i n t e r n a l
r e p r e s e n t a t i o n o f t h e compound i s g e n e r a t e d . This
becomes t h e t a r g e t ( t h e t h e o r e m t o be p r o v e d ) . The
t h e o r e m p r o v e r b e g i n s by i d e n t i f y i n g t h e t a r g e t ' s m a j o r
f u n c t i o n a l g r o u p s a n d u s e s them a s k e y s i n t o t h e
database. As t h e s e a r c h b e g i n s f o r r e a c t i o n s a n d
compounds f r o m w h i c h t h e t a r g e t c a n be s y n t h e s i z e d , t h e
theorem p r o v e r o n l y s e a r c h e s t h e g o a l f u n c t i o n a l group
f i l e s c o r r e s p o n d i n g t o t h e f u n c t i o n a l groups i t has
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
ο
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
10
•H
10
<u
SX
•M
CO
<D
u
ο
m
ω
ω
u
D>
fi
•H
>
Ο
CO
ο
u
CO
ω α
u ο
3 >
en u
•Η
Q
same i n t e n s i t y and a l o t o f t i m e i s s p e n t p u r s u i n g
deadend p a t h s . The p r o g r a m has t h e p o t e n t i a l t o be more
c l e v e r i n i t s approach. I t c a n g e n e r a t e a number o f
t r e e s of v a r y i n g s p e c i f i c i t y . F i r s t , SYNLMA c o u l d
generate a t r e e of m u l t i s t e p r e a c t i o n r u l e s . A tree
b u i l t f r o m m u l t i s t e p r e a c t i o n r u l e s w o u l d be q u i c k e r t o
b u i l d t h a n one where e a c h s t e p i s s p e c i f i e d . Then a
s e c o n d , more s p e c i f i c t r e e c o u l d be g e n e r a t e d u s i n g t h e
knowledge g a i n e d from the f i r s t . F o r example, some
s y n t h e t i c p a t h w a y s c o u l d be r u l e d o u t on t h e b a s i s o f t h e
multistep rules. The more pathways t h a t c a n be
e l i m i n a t e d on t h e b a s i s o f one m u l t i s t e p r u l e as o p p o s e d
to a s e r i e s of s i n g l e s t e p r u l e s , the f a s t e r the system
c a n work. For paths t h a t appear p r o m i s i n g , the p r o d u c t s
and r e a c t a n t s i n t h e f i r s t t r e e f o r m p a i r s o f t a r g e t s and
s t a r t i n g m a t e r i a l s t h a t w i l l d i r e c t the growth of the
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
Future Directions
A f t e r t h e two t r e e s y s t e m i s f u n c t i o n i n g , we w o u l d l i k e
t o add a t h i r d t r e e d e f i n i t i o n l a y e r t h a t p r e c e d e s t h e
o t h e r s and d e t e r m i n e s an o v e r a l l s y n t h e t i c s t r a t e g y . The
f o c u s d u r i n g t h i s s t a g e i s on t h e r e c o g n i t i o n o f c o g e n t
s u b s t r u c t u r e s , thus i t r e q u i r e s a database of about 200
compounds i n s t e a d o f r e a c t i o n r u l e s . The t a r g e t w i l l be
compared t o t h e s e compounds r a t h e r t h a n r e a c t i o n r u l e s
and "matches" one o f t h e s e compounds when a l a r g e
s u b s t r u c t u r e i n t h e t a r g e t i s i d e n t i f i e d i n a compound.
T h i s m a t c h i n g compound now becomes t h e new t a r g e t and the
p r o c e s s i s r e p e a t e d , r e s u l t i n g i n a much more a b s t r a c t
problem s o l v i n g t r e e . Then the t w o - t r e e system i s
a p p l i e d t o t h i s t r e e t o d e f i n e t a r g e t s and s t a r t i n g
materials. The s y s t e m moves f r o m t h e g e n e r a l t o t h e
s p e c i f i c , u s i n g the i n f o r m a t i o n from the f i r s t t r e e t o
b u i l d t h e s e c o n d t r e e and i n f o r m a t i o n f r o m t h e s e c o n d
t r e e to b u i l d the t h i r d . The t h i r d and f i n a l t r e e
d e s c r i b e s t h e s p e c i f i c s t e p s i n t h e s y n t h e t i c pathway.
I f an o r g a n i c s y n t h e s i s s y s t e m i s t o be o f p r a c t i c a l
u s e t o c h e m i s t s , i t must be s e t up t o i n t e r f a c e w i t h
l a r g e c h e m i c a l d a t a b a s e s s u c h as t h e d a t a b a s e s made
a v a i l a b l e by I S I ( t h e I n s t i t u t e f o r S c i e n t i f i c
I n f o r m a t i o n ) and by C h e m i c a l A b s t r a c t s . We have s t a r t e d
t o c o n v e r t o u r d a t a b a s e t o t h e CAS c o n n e c t i o n table
format to s i m p l i f y database i n t e r f a c e s . Fortunately,
t h i s d o e s n o t r e q u i r e c h a n g i n g SYNLMA. We o n l y n e e d t o
w r i t e a new p r o g r a m t o t r a n s l a t e c o n n e c t i o n t a b l e s i n t o
Summary
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch020
Acknowledgments
T h i s r e s e a r c h was p a r t i a l l y s u p p o r t e d by t h e N a t i o n a l
S c i e n c e F o u n d a t i o n u n d e r G r a n t MCS 82-16432.
Literature Cited
Acquisition a n d R e p r e s e n t a t i o n of K n o w l e d g e
f o r E x p e r t S y s t e m s in O r g a n i c C h e m i s t r y
1
J. Gasteiger, M. G.Hutchings ,P.Löw,and H. Saller
Institute of Organic Chemistry, Technical University Munich, D-8046 Garching,
West Germany
0097-6156/86/0306-0258$06.00/0
© 1986 American Chemical Society
I—J ι J
+ —> I +
I
K—L Κ L
CH —Br CH, Br
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
a) I +I
HO—Η HO Η
b)
CH =CH-C=N
2 CH =CH—C=N
9
CH^CH-Ç-H
+ 2
I I : H
0 H-OH O-N-OH
H OH
d)
CH — Br
+
H—OH
3
+ CH.
H
I +I
Br
OH
CH-CH -C< _
Λ2 0
3
5 6
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
2
ά = |(Στ.) (!)
i
Mean molecular p o l a r i z a b i l i t y can be calculated through the
Lorenz-Lorentz- Equation from r e f r a c t i v e index, η , molecular weight,
MW, and density, d, of a compound, demonstrating that the parameters
T£ can be derived from these elementary molecular properties (Figure
3).
P o l a r i z a b i l i t y i s a measure of the r e l a t i v e ease of d i s t o r t i o n
of a dipolar system when exposed to an external f i e l d . The s t a b i l i
zation energy due to the i n t e r a c t i o n between an external charge and
the induced dipole i s highly distance-dependent and can be c a l c u l a
ted through c l a s s i c a l e l e c t r o s t a t i c s . The s i t u a t i o n i s , however,
less c l e a r l y defined when the charge resides w i t h i n the molecule
that i s being polarized. To model the s t a b i l i z a t i o n r e s u l t i n g from
p o l a r i z a b i l i t y i n these s i t u a t i o n s , we have modified Equation 1 by
n
introducing a damping factor d i ~ ^ , where 0 < d < l , and n£ gives the
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
n i 2 2
»d-i<?* "S> <>
i
α i s c a l l e d e f f e c t i v e p o l a r i z a b i l i t y , as the damping factor models
the distance dependent attenuation of the s t a b i l i z a t i o n e f f e c t .
Furthermore, t h i s factor gives d i f f e r e n t values f o r f o r the same
molecule depending on where the charge center i s located. An alterna
t i v e a d d i t i v i t y scheme (14) f o r estimating mean molecular p o l a r i z a
b i l i t y can be s i m i l a r l y modified to obtain values of e f f e c t i v e
p o l a r i z a b i l i t y (15). The significance of these values has been demon
strated by c o r r e l a t i o n with physical data (13).
Charge D i s t r i b u t i o n , Inductive and Resonance E f f e c t s . U n t i l now,
the discussion has been concerned with models based on a d d i t i v i t y
schemes and t h e i r modifications. However, we have also explored
other types of models that can be put into algorithms that are f a s t ,
a l b e i t less convenient f o r p e n c i l and paper a p p l i c a t i o n .
This i s true f o r our procedure f o r c a l c u l a t i n g p a r t i a l atomic
charges i n σ-bonded molecules (16). The method s t a r t s from Mulliken's
d e f i n i t i o n of e l e c t r o n e g a t i v i t y , χ, derived from atomic i o n i z a t i o n
p o t e n t i a l s , IP, and e l e c t r o n e g a t i v i t i e s , EA (Equation 3)(17).
H C—H + Χ—X
3 ^ H C—X + H—X
3
— charge distribution
0 Θ
δ+.Ο I
H 3 C — + :Nu
hUC — C—Nu
Η
— inductive effect
Cl-CH -COOH 2 Cl-CH -C00 9
G
+ H®
— polarizability
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
Br-CH CH -Cl + OH
2 2
G
HO-CH CH -Cl + Br
2 2
resonance effect
A H
H C—CH=C
2
Lorenz- Additivity
MW Lorentz- »cc- Scheme
Equation
d '
Attenuation
Model
7
OU
P A = C
V l°d " 2 12
C Y ( 4 )
R 1
X R e 1
x
R -^N
2
+ Η φ
> R -N-H 2
R3/ R / 3
R 1 R
\ e
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
p2/ R
d) R-O-H * R
- 0 y
+ H*
f) \=0 + Η Θ
» Vo®
MeOH
-
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
5
ο
ε "
0 H
Ίοα -r . .
370
1 -
ΖΣΖ
365 χ F v
r" 0 H
Δ Η r= c
0 " 1X" 2 d
C c a
l F ν
ι ι ι I I
365 370 375
ΔΗ (calc.)
ρ kcal/mol
negative sign.
Points 2 and 3 are characterized by the same value for the (homo-
l y t i c ) bond d i s s o c i a t i o n energy. However, resonance s t a b i l i z a t i o n of
charges can occur only f o r the h e t e r o l y s i s represented by point 3.
Therefore i n this case, the resonance parameter R has a high value,
whereas i t i s zero f o r the h e t e r o l y s i s represented by point 2.
Figure 6 shows an a d d i t i o n a l feature, The points are d i s t i n
guished according to whether the associated bond i s considered react
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
\θ θ
point 2
\6+ 6-
C=0
/ \φ θ
C—Ο point 3
κ
Figure 7. The two choices for heterolysis of the carbonyl double
bond, and t h e i r representation as points i n Figure 8.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
1) cyclopropane
2) cyclobutane
3) cyclopentene
4) cyclopentadiene
5) ethyl bromide
6) ethyl iodide
7) methylene chloride
8) a l l y l chloride
9) neopentyl chloride
10) 1-methyl-1-cyclopropyl-ethy1 bromide
11) 1-methyl-1-cyclobuty1-ethyl iodide
12) 2,2,4,4,-tetramethylcyclobutanol
13) acetaldehyde
14) acetone
15) trimethylacetaldehyde hydrate
16) choral hydrate
17) aldol
18) methyl propionate
19) ethyl acetoacetate
20) ct-chloropropionic acid
2Π 5-hydroxy-nona-3,5,8-triene-2-one
22) 2-oxocyclopentane carboxylic acid
23) 5-hydroxy-5-methyl-butylrolactone
24) 1-dimethylamino-propene
25) 4-amino-2,4-dimethyl-2-pentanole
26) succinimide
27) a-picoline
28) 6-chloro-6-methoxy-bicyclo [3.1 .oJhex-2-
F i g u r e 9. Example of a problem f o r r e a c t i o n p r e d i c t i o n
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch021
3
F i g u r e 1 0 . Network of b o n d - b r e a k i n g and -making p a t t e r n s
e x p l o r e d by the r e a c t i v i t y f u n c t i o n s l e a d i n g t o the c o r r e c t p r e
d i c t i o n o f p r o d u c t 3 from Κ
Acknowledgments
Literature Cited
An E x p e r t S y s t e m f o r High P e r f o r m a n c e Liquid
Chromatography Methods Development
1 1 2
RenéBach ,JoeKarnicky ,and Seth Abbott
1
Varian Research Center, Varian Associates, Inc., Palo Alto, CA 94303
2
Varian Instrument Group, Varian Associates, Inc., Walnut Creek, CA 94598
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
Background
Related Work
System Strategy
DOMAIN
EXPERT
KNOWLEDGE
USER ENGINEER
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
KNOWLEDGE BASE
x
USER
INTERFACE CONSTRUCTION
AIDS
N Kr CMP
KNOWLEDGE
BASE
META LEVEL
ECAT SHELL
DECIDE ON DIAGNOSE
SAMPLE CLEANUP HARDWARE
(MODULE 4) FAULTS
(MODULE 6)
OPTIMIZE THE
SEPARATION
(MODULE 5)
~ ~ r ~
OPTIMIZED
SEPARATION
Implementation
(userl
(type fact)
(pform (largest-mw 500 daltons)))
(user2
(type fact)
(pform (analyte-class phenols)))
(user3
(type fact)
(pform (asked (analyte-class $class))))
(cmpgenl
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
(descr n i l )
(type rule)
(text ( i f sample molecular-weight i s > 100 then there are more
than three carbons i n the molecule))
(pform ( i f (and (largest-mw $mw daltons)
(> $mw 100))
then (more-than-three-carbons))))
(cmpgen7
(type rule)
(text ( i f the analyte class i s not a protein and not a peptide,
then use the s p e c i f i e d analyte class f o r further
inferencing))
(pform ( i f (and (analyte-class $class)
(asked (analyte-class $class))
(unknown (analyte-class protein))
(unknown (analyte-class peptide)))
then (consider (analyte-class $ c l a s s ) ) ) ) )
(cmpl
(descr (a default rule f o r selecting separation mode))
(type rule)
(text ( i f the chemical class of the analyte i s not a protein,
and the analyte has more than three carbons, and the
analyte does not belong to a class for which s t r a i g h t
phase i s recommended, then use a reverse phase sepa-
r a t i o n mode))
(pform ( i f (and (consider (analyte-class $class))
(more-than-three-carbons)
(unknown (consider (analyte-class protein)))
(unknown (straight-phase-packing $class $x $y)))
then (separation-mode reverse-phase))))
hypotheses by testing whether the " i f " parts of relevant rules are
known or provable using other rules. For example, i f the program
was asked the equivalent of "What separation mode should I use?"
i t could use backward chaining through the rules i n Figure 3 to
i n f e r that i t should ask the user about molecular weight and
analyte classes to provide the answer to the question. We use
backward chaining for the column diagnosis. MRS runs i n Z e t a l i s p ,
Maclisp and Franzlisp. We have made some modifications to the MRS
inferencing c a p a b i l i t y and provided a better user i n t e r f a c e .
We selected MRS for the following reasons: The domain exper-
t i s e of the column troubleshooting and of the CMP design i s read-
i l y expressed i n IF-THEN rules that MRS i s designed to handle.
Previous users of MRS had indicated that i t was a v e r s a t i l e t o o l
for reasoning with various forms of domain expertise and that the
meta l e v e l reasoning could be used to solve p a r t i c u l a r l y d i f f i c u l t
problems. MRS doesn't require, although i t runs well on, s p e c i a l -
ized hardware such as a Lisp machine supporting high r e s o l u t i o n
graphics. Because the source code i s provided, i t i s easy to
write extensions to MRS d i r e c t l y i n Lisp (such as the user i n t e r -
face). F i n a l l y , since MRS i s academic software, i t i s inexpen-
sive.
Results
CHAOSNET
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
TELEPHONE
LINES
PHENOLS
OPIUM ALKALOIDS
ACID EXTRACT OF URINE
TETRACYCLINES
SCH 28191 EXPT'L DRUG
BETA-CAROTENE
LDH ISOENZYMES
HGH TRYPTICDIGEST
UREA, THIOUREA
TRICYCLIC ANTIDEPRESSANTS
AVERMECTINS
CARDIAC DRUGS
IBUPROFEN
CHL0R0-, NITRO-PHENOLS
TESTOSTERONE STEROIDS
6 7 8 9 10 12 13 14 15
PROBE NUMBER
Module 3, Column and Mobile Phase Design (CMP). This i s the core
module f o r ECAT. I t can currently specify i ) a n a l y t i c a l column
and mobile phase constituents for reverse phase chromatography of
common classes of organic molecules; i i ) reverse phase, i o n
exchange phase and hydrophobic i n t e r a c t i o n chromatography of
proteins and peptides; i i i ) a limited set of s p e c i a l t y classes
of molecules best treated by straight phase chromatography (e.g.,
mono- and disaccharides). The rules f o r s e l e c t i o n of the HPLC
detector are under development within Module 3. Some of the rules
for detector mobile phase compatibility are already encoded. A
set of rules for detector s e l e c t i o n i s ready but not yet encoded.
The program i n f e r s design parameters using data base informa-
t i o n from Module 1 and user-supplied information, along with an
extensive knowledge base of chromatography h e u r i s t i c s . Module 3
currently contains ca. 160 rules, generated to cover 15 sample
probes which represent some commonly separated classes of com-
pounds (see Table I ) . Figure 6 shows an example of the a p p l i c a -
t i o n of ECAT to a design problem. The items i n Figure 6 are the
user inputs and system recommendations i n the form i n which they
are actually processed and generated by the program.
Figure 7 shows part of the user consultation that e l i c i t e d
the inputs l i s t e d i n Figure 6. The current user interface pro-
vides on-line help as w e l l as a menu of numbered v a l i d r e -
sponses. The user may e i t h e r type i n the number or the l i s t e d
item. In answer to the user typing "?", the system rephrases the
question, redisplays acceptable values, and specifies what other
characters are recognized. I f this i s not enough information, the
molecules
Probe 2. opium a l k a l o i d s polar, basic nitrogen hetero
cycles, t y p i c a l of many drugs
Probe 3. acid extract of urine carboxylic acids
Probe 4. tetracyclines molecules with s i g n i f i c a n t metal-
complexation character
Probe 5. SCH 28191 same as opium a l k a l o i d s (Probe 2)
(experimental drug)
Probe 6. beta-carotene non-polar, neutral molecules
Probe 7. LDH isoenzymes proteins
Probe 8. HGH t r y p t i c digest peptide fragments
Probe 9. urea, thiourea small, polar molecules
Probe 10. t r i c y c l i c a n t i - same as opium a l k a l o i d s (Probe 2)
depressants
Probe 11. avermectins moderately polar, neutral mole-
cules
Probe 12. cardiac drugs same as opium a l k a l o i d s (Probe 2)
Probe 13. ibuprofen moderately polar carboxylic acid
Probe 14. chlorophenols non-fluorescent phenols
nitrophenols (see Probe 1)
Probe 15. testosterone steroids complex mixture of compounds
sharing same hydrocarbon backbone
and d i f f e r i n g i n functional group
USER ENTRIES
(analyte-class phenols)
(specific-analyte phenol)
(pka-of phenol 11)
(largest-mw 400 daltons)
(detector-type fluorescence)
(smallest-analyte-amount 10 ng)
(class-of sample-matrix river-water)
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
RECOMMENDATIONS
Guard column
(additional-column guard-column)
(packing-of guard-column p e l l i c u l a r )
(packing-of guard-column silica-based)
(diameter-of p e l l i c u l a r 25 micron)
A n a l y t i c a l column
(separation-mode reverse-phase)
( r e s t r i c t (diameter-of p a r t i c l e $value micron)
(<= $value 5))
(packing-of $column silica-based)
(prefer (bonded-phase $column CI8)
(bonded-phase $column C8) 0.2)
Mobile phase
(prefer ( l i q u i d - o f solventb a c e t o n i t r i l e )
( l i q u i d - o f solventb methanol) 0.4)
( l i q u i d - o f solventb methanol)
( l i q u i d - o f solventb a c e t o n i t r i l e )
( l i q u i d - o f solventa water)
(additive-of solventb competing-acid phosphoric-acid)
(additive-of solventa competing-acid phosphoric-acid)
( r e s t r i c t (ph-of $3 $4) (>= $4 2) (<= $4 7.5))
(concentration-of phosphoric-acid solventb 0.1%)
(concentration-of phosphoric-acid solventa 0.1%)
You are running the Column and Mobile Phase Selection module.
V a l i d values:
1. amino-acid-hydrolysate 17. oligonucleotides
2. amino-acid- 18. oligosaccharides
physiological-fluids 19. oligosaccharides
3. c i t r i c - a c i d - c y c l e - a c i d s 20. peptide
4. diastereomers 21. phenols
5. carboxylic-acid 22. phospholipids
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch022
V a l i d value i s a number.
phenol pKas: ? <CR>
Enter the pKa values f o r phenol.
user can type "h" f o r help. The help text i s not yet completely
written, but i s r e a d i l y extensible. "How" and "why" queries are
not yet recognized.
Future Plans
those known analytes that are best separated by GC. For example,
trace analysis of v o l a t i l e pesticides at sub-picogram l e v e l s i s
best performed by GC, and the user should have access to the
recommended GC method before considering an LC development. On
the other hand, analysis of r e l a t i v e l y non-volatile i o n i c drugs i s
best done by HPLC, and here the user should be provided with a
standard, q u a l i f i e d HPLC method i f such a method e x i s t s . Thus,
Module 2 w i l l include both a knowledge base guiding the decision
as to GC versus LC and w i l l provide a l i b r a r y of standard, qual-
i f i e d chromatographic methods.
Knowledge Representation
User Interface
Conclusion
Acknowledgments
Literature Cited
Major Functions
User Interface
Operation
Optimization C r i t e r i a
Lab Rotors
1. Optimal rotors - the rotors that are both best suited to per-
form the run and to achieve the stated optimization c r i t e r i o n .
2. Alternate rotors - other rotors that are not optimal but can
perform the run.
3. Not q u a l i f y i n g rotors - rotors that are not recommended for the
problem usually because they are too large or too small for the
sample volume, or because they do not generate s u f f i c i e n t l y
high c e n t r i f u g a l forces.
4. Not compatible rotors - rotors that are not c l a s s i f i e d , as part
of the rotor safety program, for running i n the ultracentrifuge
chosen from the lab.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch023
The investigator can select any rotor from categories 1 and 2 above.
This allows the investigator to experiment with the rotors i n the
lab and to design procedures as variations on the theme established
i n the Optimal Plan. Ultimately, the rotor selected i n the Optimal
Plan by SpinPro and i n the Lab Plan by the investigator are the
major source of difference i n the run parameters, p u r i t y , and
o v e r a l l effectiveness of the two plans.
Acceleration/deceleration: fast/fast
SpinPro Top
Exit to D O S
Prior to the run prepare sample as follows:
No special sample preparation is required.
Load 0.3 mL of the Protein sample in full tubes at
the top position of the gradient.
At the end of the run the 16 S particles will be
approximately 45% from the top of the gradient.
To process the entire sample volume requires approximately
2 centrifuge run(s) with an estimated total run time of
12 hours, 5 minutes.
Run summaries:
Optimal: SW 55 Ti at 55000 rpm for 6 hours per run
Page Forward
in 2 run(s). Requiring a total of approximately
Page Backward
12 hours, 5 minutes Optimal Plan
Lab: S W 41 Ti at 41000 rpm for 15 hours, Lab Plan
45 minutes per run in 2 run(s). Requiring a total of Comparisons
approximately 31 hours, 30 minutes Design Inputs
Change Answer
Comparisons: Save Reports
The Optimal Plan requires 38% of the Lab Plan run SpinPro Top
time for a single run. It requires 38% of the Lab Exit to D O S
Plan run time when processing the entire sample.
increasing density from the top to the bottom of the tube. The
sample to be separated i s layered, as a t h i n band, on the top of the
gradient. As the run begins, each component i n the sample moves
toward the bottom of the tube. Some components sediment f a s t e r than
others. This fact i s the basis f o r the separation. I f the run
parameters are appropriate, the components w i l l form separate bands
within the gradient. At the conclusion of the run, the band
representing the component of interest can be removed from the tube.
If: 1) V E R T I C A L . T U B E . R O T O R S , and
Development of SpinPro
achieving product status and other expert systems have not, although
we suspect that an early decision to produce a product rather than
to do AI research has been important. The problem domain of u l t r a -
centrifugation appears to have been a good choice. The domain has
proven to be f a i r l y well bounded, even though the 800 rules required
has exceeded early estimates by a factor of four. When considering
the various stages of prototyping, debugging, and refinement, over
25,000 rules have been w r i t t e n , and tossed out. Perseverance,
sustained by having a concrete goal of "completeness" rather than a
more indeterminate goal of "demonstrating f e a s i b i l i t y " or
"prototyping", was c r u c i a l to the success of the project.
In some ways expert systems programming i s l i t t l e d i f f e r e n t
from more " t r a d i t i o n a l " programming. For example, s i m i l a r to most
software programs, about 50% of the code i n SpinPro i s f o r the user
i n t e r f a c e ; debugging has been very time consuming; and miscommuni-
cation was the source of a great deal of additional e f f o r t . Since
these problems are a part of t r a d i t i o n a l programming as w e l l , tech-
niques designed to a s s i s t t r a d i t i o n a l programmers, such as organ-
i z a t i o n p r i n c i p l e s , s p e c i f i c a t i o n , and e f f e c t i v e communication also
apply to expert systems.
In other ways expert systems programming i s much d i f f e r e n t .
T r a d i t i o n a l p r i n c i p l e s of s p e c i f i c a t i o n and organization are tested,
i n part, because the program undergoes evolutionary and sometimes
revolutionary revisions as an understanding of the problem domain
grows. Despite early detailed s p e c i f i c a t i o n , the tendency of the
s p e c i f i c a t i o n and the project to evolve toward i t s f i n a l d e f i n i t i o n
seems to be unavoidable.
From i t s inception to completion, the development of SpinPro
has taken about s i x person years. The development team has included
a manager, two knowledge engineers, one primary expert, four experts
for review, and two people responsible f o r the content of the
INFORMATION function. During this time, we have completed the
following major a c t i v i t i e s :
1. s p e c i f i c a t i o n and prototyping
2. knowledge a c q u i s i t i o n from the expert
3. knowledge coding into rules and debugging of rules
4. design and implementation of the "MP" inference engine
5. design and implementation of the user interface
Conclusion
Acknowledgments
1
Hugh B. Woodruff, Sterling A.Tomellini ,and Graham M. Smith
Merck Sharp & Dohme Research Laboratories, Rahway, NJ 07065
0097-6156/ 86/0306-0312$06.00/0
© 1986 American Chemical Society
Digitized
Spectrum
Chemical
PAIRS Functionality
(Interpreter) Predictions.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch024
CONCISE
Rules
S t r u c t u r e o f the a n t i b i o t i c actinospectacin.
Relative
Peak No. P o s i t i o n (cm Intensity Width
1 3527 9 Broad
2 3401 10 Broad
3 3311 10 Broad
4 3254 10 Broad
5 3071 9 Broad
6 2962 9 Average
7 2796 6 Broad
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch024
8 2486 2 Average
9 1645 5 Average
10 1629 5 Average
11 1581 4 Average
12 1566 4 Average
13 1460 8 Average
14 1429 6 Average
15 1392 8 Average
16 1351 6 Sharp
17 1330 7 Average
18 1271 2 Average
19 1235 3 Average
20 1215 3 Average
21 1190 6 Average
22 1176 7 Sharp
23 1145 9 Average
24 1121 7 Average
25 1107 8 Average
26 1087 9 Average
27 1078 10 Average
28 1046 9 Average
29 1037 9 Average
30 1024 9 Sharp
31 999 7 Sharp
32 981 3 Average
33 952 4 Sharp
34 936 4 Sharp
35 923 7 Average
36 891 2 Average
37 875 3 Average
38 859 5 Average
39 814 3 Average
40 728 5 Average
1 ALCOHOL 0.99
2 SULFONE 0.85
3 OLEFIN-(NON-AROMÏ 0.75
4 0LEFIN-CHR=CH2 0.75
5 ALCOHOL-PHENOL 0.75
6 ALC0H0L-PRIM(*1*) 0.75
7 ALCOHOL-PRIMARY 0.75
8 ALCOHOL-SEC-(*1*) 0.75
9 ALCOHOL-SEC-(*2*) 0.75
10 ALCOHOL-SEC-RING 0.75
11 ALCOHOL-SECONDARY 0.75
12 ALCOHOL-TERT-(*1*) 0.75
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch024
13 ALCOHOL-TERT-(*2*) 0.75
14 ALCOHOL-TERT-(*3*) 0.75
15 ALCOHOL-TERT-RING 0.75
16 ALCOHOL-TERTIARY 0.75
17 SULFONAMIDE 0.75
18 SULFONAMIDE-PRIM 0.75
19 SULFONAMIDE-SEC 0.75
20 SULFONAMIDE-TERT 0.75
F U N C T I O N A L I T Y SULFONE
PASSED I N I T I A L E M P I R I C A L FORMULA TEST
PEAK QUERY
ANY P E A K ( S ) POSITION: 1290 - 1360
INTENSITY: 7 - 10 WIDTH : SHARP TO AVERAGE
ANSWER YES
PEAK QUERY
ANY P E A K ( S ) POSITION: 1110 - 1170
INTENSITY: 7 - 10 WIDTH: SHARP TO BROAD
ANSWER YES
ACTION SET SULFONE TO 0 . 5 0 0
CURRENT V A L U E = 0 . 5 0 0
PEAK QUERY
AT L E A S T 2 P E A K ( S ) POSITION: 1260 - 1360
INTENSITY: 4 - 1 0 WIDTH: SHARP TO AVERAGE
ANSWER YES
ACTION ADO 0 . 1 0 0 TO SULFONE
CURRENT V A L U E = 0 . 6 0 0
PEAK QUERY
AT L E A S T 2 P E A K ( S ) POSITION: 1260 - 1360
INTENSITY: 7 - 10 WIDTH: SHARP TO AVERAGE
ANSWER NO
PEAK QUERY
AT L E A S T 2 P E A K ( S ) POSITION: 1 0 6 5 - 1170
INTENSITY: 4 - 1 0 WIDTH: SHARP TO AVERAGE
ANSWER YES
ACTION ADD 0 . 1 0 0 TO SULFONE
CURRENT V A L U E = 0 . 7 0 0
PEAK QUERY
AT L E A S T 2 P E A K ( S ) POSITION: 1065 - 1170
INTENSITY: 7 - 1 0 WIDTH: SHARP TO AVERAGE
ANSWER YES-----
ACTION ADD 0 . 1 5 0 TO SULFONE
CURRENT V A L U E = 0 . 8 5 0
Table I I I . D i g i t i z e d P r o p i o n i t r i l e Spectrum
Relative
Peak No. P o s i t i o n (cm 1) Intensity Width
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch024
1 2246 10 Average
2 2996 8 Average
3 1461 7 Average
4 2950 6 Average
5 1431 5 Average
6 1074 4 Average
7 787 3 Average
8 2892 3 Average
9 1319 2 Average
10 1386 1 Average
11 546 1 Average
Summary
Literature Cited
1 2 3 4 5
K. P.Cross ,P. T. Palmer, C. F.Beckner ,A. B.Giordani ,H. G.Gregg ,P. A. Hoffman ,
and C. G. Enke
À .
o MATCH LISTS
ο
ο
til
a.
TEST STRUCTURES STRUCTURE/
SUBSTRUCTURE
LIBRARY STORAGE SUBSTRUCTURE
SEARCHINO
DATA BASE
MATCHED IDENTIFIED
SUBSTRUCTURES SUBSTRUCTURES
MOLECULAR
STRUCTURE
GENERATOR
The MS /MS spectra matching program allows the chemist t o match any
MS/MS spectrum against e i t h e r MS or MS/MS spectra i n the reference
spectrum data base (3). The program uses inverted data organized by
m/z value t o l o g i c a l l y eliminate inappropriate reference spectra.
The program f i r s t determines the data base frequency (length o f the
pointer table) o f each major peak i n the experimental daughter
spectrum and then ranks the peaks i n ascending order o f frequency.
Inverted data l i s t s o f reference spectra containing peaks are
r e t r i e v e d i n t h i s order and l o g i c a l l y ANDed together u n t i l the
number o f candidate reference spectra i s s u f f i c i e n t l y small.
Additional reductions i n the number o f candidate spectra i s possible
by using molecular weight, parent i o n m/z value, and empirical
formula may also be invoked t o further reduce the number o f
candidate spectra. When matching daughter spectra, s p e c i f y i n g the
parent ion m/z value alone usually produces a s u f f i c i e n t l y small
PT PC NC NS NR IS IR Compound
D î — η —octyl —phthalate
.4? 100
"en
S 1 0
-î
1 1
Ί ' 1 H"
Di-n--pentyl-phthalate
100
c 10
1 r
Ί 1 ~
10
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch025
ι • 1 • 1 ' — η · r
2—t—butyl — 6 — m e t h y î p h e n o !
-T? 100
'cn
1 0
5
π j 1 1 1 Τ 1 Γ
Senzy!—t—butano!
^ 100
'(0
10
I
c 1 τ 1
1 1
— I"
p - t - b u t y l b e n z y l alcohol
100 "1
'cn 5
c
<u
10 1
c 1 1 1 1
I 1 ι Γ
2~t-butyl—4-methylphenol
'to
£ 10
1 1 1 1 1
Τ 1 1 1 Γ
4—t—butyl—1,2-benzenedioi
100
S 1 0
τ j 1 1 « j 1 [ 1 Γ
0 20 40 60 80 100
m/z
PT PC NC NS NR IS IR Compound
87 86 4 0 0 0 0 D i-n-pentylphthalate
87 86 4 0 0 0 0 Di-n-ethylphthalate
54 57 3 1 7 3 2 2-t-butyl-4-methylphenol
44 56 3 1 10 9 15 p-t-Butylbenzyl alcohol
42 35 1 3 1 19 29 p-t-amylphenol
35 61 3 1 10 3 26 2-t-butyl-6~methylphenol
Di—η—butyl—phthalate
100 -m
'co
S
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch025
1 0
1
_L- 1
Ί Γ ^ 1 —I Γ « 1 τ
2-t-butyl-6-methylphenol
& 100
1 0
5
, hi, ll I
Ί 1
—\ 1 r
— Γ
Benzyl—t—butanol
^ 100 Έ
'co 1
s 1 0
-a
1 1
ι — 1
— ι — 1
— ι — 1
— ι — 1
— Γ
p-t-buty!benzyl alcohol
£ 100 -g
'to
1 0
S -i
1
Ί 1
1 1
1 f
1 1
Τ
1L
2—t—butyl —4—methylphenol
^ 100
'co
S 1 0
Ί 1
Γ ι — ' — I — 1
— I — 1
— Γ
ρ—t-amylphenol
'co
S ίο -i
1
— Τ "Τ I ι — 1
— ι —
ο 20 40 60 80 100 120 1 40
m/z
100.0 -a
10.0 -
& 1
c
t>
·+-»
c
a* 1.0 -
3 :
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch025
M/Z
Figure 6. Parent spectrum of mass 149 from di-n-octylphthalate.
100 -ι
80 Η
60 Η
S 40
20 H
J I I I I J ι' ι ι ι ι ι j ι ι ι ι j f ι ι ι j ι ι ι ι y ι ι ι r
50 100 150 200 250 300 350 400
Μ/Ζ
13
Figure 7. Daughter spectrum of the C - c o n t a i n i n g protonated
molecular ion of di-n-octylphthalate.
Conclusions
Teresa J. Harner, George C. Levy, Edward J. Dudewicz, Frank Delaglio, and Anil Kumar
National Institutes of Health Resource for Multi-Nuclei NMR and Data Processing,
Department of Chemistry, Syracuse University, Syracuse, NY 13210
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch026
MR Imaging
While a n a l y t i c a l spectroscopy has been used f o r many years i n order to
obtain information regarding chemical structure, magnetic resonance
imaging i s a r e l a t i v e l y new f i e l d . M i s l e a d i n g l y well-resolved
images may a i d an expert Physician i n diagnosing human tissue abnormal-
i t i e s , but as l i t t l e i s understood about the relationships which e x i s t
between t i s s u e MRI parameters and tissue h e a l t h , not t o mention
secondary f a c t o r s ( g e n e t i c , environmental, macro-physiological,
e t c . ) , such judgements, accurate or not, are often purely subjective.
In s i m i l a r applications, precedents are well established for the use
of expert systems as medical diagnostic t o o l s (1,1,5).
The optimal research strategy involves the systematic search to
uncover these relationships at the same time as the development of a
computer methodology proceeds. Such software systems w i l l not only
give the kind of information about physicochemical structure as have
previously designed systems f o r NMR spectroscopic analysis, but w i l l
serve as co-investigators, f a c i l i t a t i n g through automated procedures,
a n a l y t i c a l tasks which are normally time-consuming and complex.
Experimental data analysis of tissue parameters and construction
of an automated format f o r MRI research has been proceeding i n our
l a b o r a t o r y w i t h seme success. S t a t i s t i c a l Analysis has revealed
that, with the proper normality transformations applied t o T^, ^ and
1H density over eight regions of interest i n the human brain ( l e f t
and r i g h t sides respectively of C o r t i c a l White Matter, I n t e r n a l
Capsule, Caudate Nucleus, and Thalamus), the values within t i s s u e
type generally follow a normal distribution(£). This implies that
e x i s t i n g discriminant functions may be able to optimally c l a s s i f y
data according t o t i s s u e type (although i n i t i a l r e s u l t s a l s o show
large overlaps between the normal d i s t r i b u t i o n s of several t i s s u e
types). Indeed, preliminary results have yielded correct c l a s s -
i f i c a t i o n percentages between 73 and 86%(2).
As shown i n Figure 1, however, s t a t i s t i c a l analysis alone i s
only one of the steps towards r e a l i z i n g a f u l l y functional system for
MRI tissue discrimination. Experimental data i s passed through
software ( e v e n t u a l l y NMR1/NMR2) f o r pre-processing. Since the
s t a t i s t i c a l analyses must themselves be applied to an ever-increasing
number of regions of i n t e r e s t (ROI's) i t would be of great use to
[NATURAL L A N G U A G E INTERFACE
IPRE-PROCESSOR: NMR1/NMR2
[PROLOG C O N T R O L P R O G R A M
I X Z
ISTATISTICAU LOGICAL
EXPERT INFERENCE EXPERIMENT
SYSTEM ENGINE CONTROL
Ob? T1 T2 Densitv
list_of_yariable_jiaees [ a, b, ,obs,...,T1,...,
T2,...,Density,...].
symbol__list[<integers>,<real_numbers>, " * , . . . ] .n
position_peanings[column(Number,Symbol,Meani n g ) ,
row(Number1,Symbol1,Meaning 1 ) ] .
Model_data_sets[model1(column(A,B,C),row(D, Ε , F ) ) , .
modeln(column(I,J,K),row(X,Υ,Ζ))].
Obs T1 T2 Density
formed names, which information, taken with the f a c t that "*" has
been used i n the place of data i n c e r t a i n positions i d e n t i f i e s the
set as being raw and untransformed. The program w i l l then proceed by
"cleaning up" the raw data set, making appropriate transformations
and applying a discriminant analysis t o the set, under the assumption
of four classes.
In actual practice a number of t e s t s must be passed a t various
nodes before f i n a l c l a s s i f i c a t i o n takes place. Also, a p r o h i b i t i v e
time would be required t o search a large database of models f o r ones
which most c l o s e l y approximated the actual data set. For t h i s reason
the concept of s i m i l a r i t y nets i s introduced. In t h i s case, a more
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch026
Specific Applications:
Imaging
WELCOME TO MRILOG
Use t h e o p t i o n s l i s t below t o g u i d e y o u r i n t e r a c t i o n
w h i l e k e e p i n g y o u r r e s p o n s e s r e l a t i v e l y s i m p l e , and
you s h o u l d have few p r o b l e m s w o r k i n g w i t h i n t h e s y s t e m .
GENERAL FEATURES
I. a u t o _ a n a l y s i s : based on a u s e r s p e c i f i e d
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch026
d a t a f i l e , program d e t e r m i n e s a n a l y s e s and
r u n s them.
II. u s e r _ d r i v e n _ a n a l y s i s : user s p e c i f i e s f i l e s
and r u n l i s t ,
III. h e l p _ f i l e : p r o b a b l y a good p l a c e t o s t a r t .
What i s y o u r g e n e r a l o b j e c t i v e ,
based on t h e i n f o r m a t i o n j NOTE !
s u p p l i e d above? ! Answers can t a k e a !
j n a t u r a l l a n g u a g e form)
! but user should t r y j
I t o respond w i t h i n thei
j c o n t e x t o f t h e prompt!
no,
F i g u r e 3. An I n t e r a c t i v e S e s s i o n w i t h MRI_LOG_ESP.
P l e a s e e i t h e r g i v e a c l e a r and
concise d e s c r i p t i o n of
your o b j e c t i v e , o r type i n
a l i s t o f the procedures
you w i s h t o i n v o k e f o r d a t a
analysis.
<twod> two d i m e n s i o n a l
graphics.
<normtest > - normality testing.
<tran> - transformation of
variables.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch026
<tran_disc> - d i s c r i m i n a n t a n a l y s i s
of t r a n s f o r m e d v a r s .
<disc__fun> - d i s c r i m i n a n t a n a l y s i s
(untransformed data)
L e t ' s do a n o r m t e s t .
The f o l l o w i n g a n a l y s e s w i l l be r u n u s i n g d a t a from a n
input f i l e . I f the l i s t i s not c o r r e c t i n d i c a t e
t h a t a change i s r e q u i r e d . O t h e r w i s e , t y p e "go",
(or some o t h e r a f f i r m a t i v e )
RUN L I S T :
[normtest ]
I : go.
INPUT F I L E SPECIFICATION: normtest
W i l l t h i s be new d a t a ?
! : no.
What i s ( a r e ) y o u r i n p u t f i l e name(s)?
I: I want t o examine t 1 c c and t 2 c l .
Do you w i s h t o examine f i l e t 1 c c
i : no.
S p e c i f i e d i n p u t f i l e s scanned.
S t a r t i n g normtest using i n p u t f i l e : t 1 c c
S t a r t i n g normtest using input f i l e : t 2 c l
Do you want a p r i n t o u t
i : no.
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch026
P l e a s e e i t h e r g i v e a c l e a r and
concise d e s c r i p t i o n of
your o b j e c t i v e , or type i n
a l i s t of the procedures
you w i s h t o i n v o k e f o r d a t a
analysis.
<twod> two d i m e n s i o n a l
graphics.
<normtest > normality testing,
<tran> t r a n s f o r m a t i o n of
variables.
< t r a n_d i s c > discriminant analysis
of transformed v a r s .
<disc_fun> discriminant analysis
(untransformed data)
halt.
Figure 3. Continued.
• aiito__analysis
• ujser^âriveru^nalysis
• helçL/ile
[tnod noimtest]
r
(yes) or negative (no) responses from the user, f o r the most part,
communication with the program i s governed by what has been termed a
"Context Parser", the main predicate of which has three l e v e l s t o
handle varying l e v e l s of l i n g u i s t i c complexity.
The aim of the graphics software, (twod, threeD), i s t o enable
the user to rapidly examine a large number of two- and three-dimension-
a l scatter p l o t s . At present the program i s capable of handling up t o
120 variables with up to 200 observations f o r each.
Predicate normtest tests/evaluates normality of the given set of
data points (corresponding t o any v a r i a b l e ) , while, disc_fun performs
a l i n e a r discriminant analysis on groups of data (maximum o f 10
groups) with respect t o any selected variables (maximum of 20 var-
iables) ·
There are only two types of output f i l e s and output f i l e names.
These are:
autXL_3nalysia_out<xxx>
and
user jEmalysis_out<xxx>
%
Acknowledgments
Literature Cited
An E x p e r t S y s t e m for O r g a n i c S t r u c t u r e Determination
Bo Curry
Chemical Systems Department, Hewlett-Packard Laboratories, Palo Alto, CA 94304-1209
0097-6156/86/0306-0350$06.00/0
© 1986 American Chemical Society
Identify
Subunits
Specify
Global
Constraints
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch027
MS
J, ι ill. il., il
MW rnQthyl-ketone
1 48 monosubst-benzQnQ
Program D e s c r i p t i o n
The TEST message asks the Expert to return any evidence i t may
have against the presence of the group being tested.
The REEVALUATE message i s sent when a piece of evidence sup-
p l i e d by an Expert has been contradicted. I t asks the Expert to
modify or r e t r a c t the evidence, i f possible. Many i n f r a r e d c o r r e l a -
tions have known exceptions i n s p e c i f i c cases. For example, a n i t r o
group on a benzene r i n g raises the expected frequency ranges of the
hydrogen wags. I f the presence of a n i t r o group i s known or suspec-
ted, the aromatic wag assignments must be reevaluated.
The EXPOUND message asks the Expert to p r i n t out, f o r the user's
b e n e f i t , the reasons supporting a piece of evidence. Each piece of
evidence o r i g i n a t e d i n i t i a l l y i n some feature of the data. The
degree of d e t a i l supplied i n response to t h i s message depends on the
i n d i v i d u a l Expert. The IR Expert, f o r example, can report the i n f r a -
red bands which were assigned to a p a r t i c u l a r v i b r a t i o n a l mode of a
substructure, as w e l l as possible a l t e r n a t i v e assignments. The STIRS
Expert reports the incidence of the substructure among the best h i t s
i n d i f f e r e n t STIRS data classes.
Example : 4-phenyl-2-butanone
Results
ο
^^CH CH CCH
2 2 3
•
19 30
CCC
-CH -
2 65 65
-CH 3
98 56 98
69 -44 25
14
O r
X=C-CH 3 37
Why wQthyl-bQnzQno?
36% POSITIVE:
11% NEGATIVE:
23% bQcausQ o f f a i l u r e t o s a t i s f y
C-Hsym-mQthy 1 -benzQne-1
IR band 2860-2883 m
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch027
( c o n f l i c t 45%)
False positives
IXXXXI
Recal1
> 45% c o n f i d e n c e
R e l i a b i l i t y ( S ) » Number_falsely_reported(S) / Total_number_reported(S)
Total_number_present(S) * R e c a l l * R e l i a b i l i t y
FP(S) -
Total_number_present(NOT S) * (1 - R e l i a b i l i t y )
This i s the p r o b a b i l i t y that a compound which does not contain sub-
structure S w i l l be i n c o r r e c t l y reported to contain i t . For sub-
structures which occur r a r e l y i n the database, the (1 - FP) rate w i l l
be considerably greater than the r e l i a b i l i t y , and may be misleading.
For example, f o r the S02 group (1% of the database), the FP rate was
< 8%, although the r e l i a b i l i t y was only 25% (Figure 6). That i s ,
although the program f a l s e l y asserted the presence of an S02 group
(with > 45% CL) only 8% of the time, 3/4 of the assertions of S02
were i n c o r r e c t . The l a t t e r s t a t i s t i c i s probably of more i n t e r e s t to
an analyst t r y i n g to evaluate the program's reports. On the other
hand, the FP i s a better measure of the raw d i s c r i m i n a t i n g power of
the program, since i t would presumably be unchanged by changing the
proportion of the target substructure i n the database. The two meas-
ures serve d i f f e r e n t functions, and should both be reported.
The tradeoff between r e l i a b i l i t y and r e c a l l can be adjusted f o r
i n d i v i d u a l f u n c t i o n a l groups by changing the frequency ranges allowed
for the IR c o r r e l a t i o n s . For some of the f u n c t i o n a l groups which are
w e l l represented i n the EPA l i b r a r y (e.g. esters, alcohols) we have
manually optimized the r u l e ranges to maximize ( 3 * R e l i a b i l i t y +
R e c a l l ) . Since the l i b r a r y i s known to contain e r r o r s , and i s skewed
towards the smallest (often anomalous) members of homologous s e r i e s ,
we have not t r i e d to do t h i s f o r a l l groups (e.g. S02). Further
t e s t i n g on l a r g e r l i b r a r i e s w i l l allow further refinements of the IR
rules.
Many of the errors observed r e s u l t from the consistent confusion
of two p a r t i c u l a r f u n c t i o n a l groups. For example, although the pres-
ence of a methyl group was erroneously reported (at >45% confidence)
for 30% of the 400 compounds which lack methyl groups, a methyl group
was reported f o r only 1 of the 33 compounds l a c k i n g both CH3 and CH2
groups. Conversely, the presence of a methylene group was never i n -
Conclusions
Acknowledgments
Literature Cited
C o n c e r t e d O r g a n i c A n a l y s i s of M a t e r i a l s and
Expert-System Development
1 1 1 1 2
S. A.Liebman ,P. J.Duff ,M. A.Schroeder ,R. A.Fifer ,and A.M.Harper
1
U.S. Army Ballistic Research Laboratory, Aberdeen Proving Ground, MD 21005-5066
2
Chemistry Department, University of Texas at El Paso, El Paso, TX 79968-0513
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
0097-6156/86/0306-0365$06.00/0
© 1986 American Chemical Society
EXMAT
REASON
A SUBPROGRAM PRODUCED BY GRC
ENABLES
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
Ο EXMATH
ES #2 INSTRUMENTAL CONFIGURATION/CONDITIONS
ES #3 DATABASE GENERATION
ES #4 DATA TREATMENT
ES #5 DATA RESULTS
ES #6 DATA INTERPRETATION
ES #7 ANALYTICAL GOAL
DECISION:
ANALY STRATEGY
Choices:
GC/SYS1
FTIR/SYS2
MS/SYS3
LC/SYS4
TA/SYS5
EL/SYS6
FACTORS:
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
SCOPE
Type of Values: Unordered D e s c r i p t i v e Phrases
Values:
SCREEN
TIME/FUND LIMIT
QUAL/QUANT
QUANT
PURITY
VOLATILES
TRACE DETECT
KINETICS
MECHANISM
CORRELATION
R&D
SAMPLE AMT
Type of Values: Linearly-Ordered D e s c r i p t i v e Phrases
Values:
UNLIMITED
GM
MG
MICROGM
TRACE
SAMPLE FORM
Type of Values: Unordered D e s c r i p t i v e Phrases
Values:
POWDER
BULK
SEMISOLID
LIQUID
FILM/LAMIΝ
FIBER
MULTIMEDIA
Rule 17
If:
SCOPE IS SCREEN
SAMPLE AMT IS GM
SAMPLE FORM IS MULTIMEDIA
SAMPLING PROCESS IS RANDOM
SAMPLE HISTORY IS UNKWN
INSTR. AVAIL IS NO LC
Then:
ANALY STRATEGY IS GC/SYS1(50)
FTIR/SYS2Î5Û)
Rule 18
I-f :
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
Rule 19
If:
SCOPE IS QUANT
SAMPLE AMT IS TRACE
SAMPLE FORM IS FILM/LAMIΝ
SAMPLING PROCESS IS STATIC
SAMPLE HISTORY IS UNKWN
INSTR. AVAIL IS NO METHOD
Then:
ANALY STRATEGY IS MS/SYS3(100)
Rule 20
If:
SCOPE IS TRACE DETECT
SAMPLE AMT IS TRACE
SAMPLE FORM IS FILM/LAMIN
SAMPLING PROCESS IS RANDOM
SAMPLE HISTORY IS DEGRADATION
INSTR. AVAIL IS NO ELEM
Then :
ANALY STRATEGY IS GC/SYS1(20)
MS/SYS3(80)
DECISIONS
EXPTL CONFIG
Choices:
GCSYS1/A
GCSYS1/AEC
FTIRSYS2/D
FTIRSYS2/ABCD
MSSYS3/E
MSSYS3/ABCE
LCSYS4/FIK
LCSYS4/GJK
LCSYS4/FIL
LCSYS4/GJL
TASYS5/M
TASYS5/N
TASYS5/0
TASYS5/P
ELSYS6/Q
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
ELSYS6/R
FACTORS:
ANALY STRATEGY
Type of Values: Unordered D e s c r i p t i v e Phrases
Values:
GC/SYS1
FTIR/SYS2
MS/SYS3
LC/SYS4
TA/SYS5
EL/SYS6
GC CONFIG
Type of Values: Unordered D e s c r i p t i v e Phrases
Values:
DIRECT GC/FID/TCD
DIRECT GC/FID/NPD
DHS/FID/TCD
DHS/FID/NPD
PGC/FID/TCD
PGC/FID/NPD
DHS/PGC/FID/TCD
DHS/PGC/FID/NPD
FTIR CONFIG
Type of Values: Unondered D e s c r i p t i v e Phrases
Values:
DIRECT
MICROSAMPLING
DRIFT
ATR
VARIABLE Τ
DHS/FTIR
GC-FTIR
DHS/GC-FTIR
PGC-FTIR
DHS/PGC-FTIR
MS CONFIG
Type of Values: Unordered D e s c r i p t i v e Phrases
Values:
RIC
SIM
PYROL/MS
DUG/MS
GC-MS/P1D
Summary
EXAMPLE-ALGORITHM B UIL DI Ν G - Ε X S Ρ D S
SAMPLE MEANS
TARGET OR HYPOTHESIS
-RECONSTRUCTION OF MEASUREMENT
INFORMATION MATRIX TO REFLECT
CORRELATIONS
WHY?
(1) D E C O N V O L U T I O N OF C O M P O N E N T S IN MIXTURES
(2) H Y P O T H E S I S T E S T S ON I N T E R P R E T A T I O N O F RESPONSES
EXMAT
A LINKED NETWORK OF E X P E R T S Y S T E M S
WITH P A T T E R N R E C O G N I T I O N A N D S E A R C H P R O G R A M S
FOR M A T E R I A L S CHARACTERIZATION
COMPONENTS ATTRIBUTES
1. D A T A B A S E M A N A G E M E N T A . S T O R A G E OF P A R A M E T E R S
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ch028
<
~ ~ ~ ~ B
™ ~ ~ ™ ,
" ™ ~ B
~ AND D A T A O N S A M P L E S FOR
SELECTED INSTRUMENTAL
TECHNIQUES
B. RETRIEVAL OF S E L E C T E D
S A M P L E S FORMING A DATA
SET F O R M A T T E D FOR
MULTIVARIATE ANALYSIS
C. CREATE, A D D , D E L E T E , HELP
AND SHOW FUNCTIONS
2. E X P E R T S Y S T E M S AND
EMBEDDING SUBPROGRAMS - TIMM
B. EMBEDDING O F TIMM S Y S T E M
WITHIN U S E R P R O G R A M S
C. C A P A B L E OF HANDLING
M E T R I C AND N O N - M E T R I C
INFORMATION
D. H E U R I S T I C DESIGN
COMPONENT ATTRIBUTES
E V A L U A T I O N OF R E S U L T S
D. U S E R INTERVENTION FOR
DATABASE MODIFICATIONS
E. IMPLEMENTABLE AS JACKNIFING
PROCEDURE
4.
R A N P
SPEÇTRAI, ?^ 9f1 A. PAIRS INFRARED SPECTRA
M A T C H ALGORITHIMS
PARTIAL I N T E R P R E T A T I O N
Figure 7. Continued.
tn
PROBLEM STATEMENT EASUREMENTS S O L U T I O N OR D A T A I
OR H Y P O T H E S I S
HE >
m
L A B O R A T O R Y A U T O M A T I O N USING E X P E R T S Y S T E M S DRIVERS
H
EXPERT SYSTEMS
EXPERIMENTAL DESIGN ?
INSTRUMENT 1 INSTRUMENT 2 INSTRUMENT Ν s.
CONTROL CONTROL CONTROL
OPTIMIZATION OPTIMIZATION OPTIMIZATION
PREPROCESSING PREPROCESSING PREPROCESSING
Ci
INTERPRETATION INTERPRETATION INTERPRETATION 9"
Acknowledgments
Literature Cited
C o r b e t t , M i c h a e l , 244 K n i c k e r b o c k e r , C a r l G . , 69
C o r n e l i u s , R i c h a r d , 125 K u l i k o w s k i , C a s i m i r Α . , 75
C r o s s , K. P . , 321 Kumar, A n i l , 337
C u r r y , B o , 350 L a n g r i d g e , R o b e r t , 147
D e l a g l i o , F r a n k , 337 LaRoe, W i l l i a m D . , 231
D o l a t a , D a n i e l P . , 188 L e v i n s o n , Robert Α . , 209
Dudewicz, Edward J . , 337. L e v y , George C , 337
D u f f , P . J . , 365 Liebman, S . Α . , 365
E d e l s o n , D a v i d , 119 Low, P . , 258
E h r l i c h , S t e v e n , 244 M a r t z , P h i l i p R . , 297
Enke, C . G . , 321 Moore, Robert L . , 69
Evens, M a r t h a , 244 M o s e l e y , C . Warren, 231
F e r r i n , Thomas E . , 147 Palmer, P . T . , 321
F i f e r , R. Α . , 365 P a v e l l e , R i c h a r d , 100
G a r f i n k e l , D a v i d , 75 Renkes, Gordon D . , 176
G a r f i n k e l , L i l l i a n , 75 R i e s e , C h a r l e s Ε . , 18
G a s t e i g e r , J . , 258 S a i l e r , H . , 258
G i o r d a n i , A . B . , 321 S c h r o e d e r , Μ. Α . , 365
Gough, A l i c e , 244 S m i t h , A l l a n L . , 111
Gregg, H. G . , 321 S m i t h , Dennis Η . , 1
G r i f f i t h , Owen M i t c h , 297 S m i t h , Graham Μ . , 312
Hahn, Mathew Α . , 136 Soo, Von-Wun, 75
Hansch, C o r w i n , 147 S t u a r t , J . D . , 18,31
H a r n e r , Teresa J . , 337 T o m e l l i n i , S t e r l i n g Α . , 312
H a r p e r , A . M . , 365 T r i n d l e , C a r l , 159
Hawkinson, L o w e l l B . , 69 Wang, Tunghwa, 244
H e f f r o n , M a t t , 297 W i l c o x , C r a i g S . , 209
H e m p h i l l , C h a r l e s T . , 231 Wipke, W. Todd, 136,188
Herndon, W i l l i a m C , 169 Woodruff, Hugh B . , 312
Hoffman, P . Α . , 321
Subject Index
Actinospectacin—Continued
trace of sulfone f u n c t i o n a l i t y during
PAIRS i n t e r p r e t a t i o n , 315,3l8f
A b s t r a c t i o n , 189 A c t i o n s , d e f i n i t i o n , 94,95t,96
Actinospectacin A g r i c u l t u r a l formulations
d i g i t i z e d spectrum, 315,317t r e q u i r e m e n t s , 87
PAIRS i n t e r p r e t a t i o n , 315,3l8t s t r u c t u r e o f the e x p e r t
s t r u c t u r e , 315-316 system, 89,91-97
386
C o r b e t t , M i c h a e l , 244 K n i c k e r b o c k e r , C a r l G . , 69
C o r n e l i u s , R i c h a r d , 125 K u l i k o w s k i , C a s i m i r Α . , 75
C r o s s , K. P . , 321 Kumar, A n i l , 337
C u r r y , B o , 350 L a n g r i d g e , R o b e r t , 147
D e l a g l i o , F r a n k , 337 LaRoe, W i l l i a m D . , 231
D o l a t a , D a n i e l P . , 188 L e v i n s o n , Robert Α . , 209
Dudewicz, Edward J . , 337. L e v y , George C , 337
D u f f , P . J . , 365 Liebman, S . Α . , 365
E d e l s o n , D a v i d , 119 Low, P . , 258
E h r l i c h , S t e v e n , 244 M a r t z , P h i l i p R . , 297
Enke, C . G . , 321 Moore, Robert L . , 69
Evens, M a r t h a , 244 M o s e l e y , C . Warren, 231
F e r r i n , Thomas E . , 147 Palmer, P . T . , 321
F i f e r , R. Α . , 365 P a v e l l e , R i c h a r d , 100
G a r f i n k e l , D a v i d , 75 Renkes, Gordon D . , 176
G a r f i n k e l , L i l l i a n , 75 R i e s e , C h a r l e s Ε . , 18
G a s t e i g e r , J . , 258 S a i l e r , H . , 258
G i o r d a n i , A . B . , 321 S c h r o e d e r , Μ. Α . , 365
Gough, A l i c e , 244 S m i t h , A l l a n L . , 111
Gregg, H. G . , 321 S m i t h , Dennis Η . , 1
G r i f f i t h , Owen M i t c h , 297 S m i t h , Graham Μ . , 312
Hahn, Mathew Α . , 136 Soo, Von-Wun, 75
Hansch, C o r w i n , 147 S t u a r t , J . D . , 18,31
H a r n e r , Teresa J . , 337 T o m e l l i n i , S t e r l i n g Α . , 312
H a r p e r , A . M . , 365 T r i n d l e , C a r l , 159
Hawkinson, L o w e l l B . , 69 Wang, Tunghwa, 244
H e f f r o n , M a t t , 297 W i l c o x , C r a i g S . , 209
H e m p h i l l , C h a r l e s T . , 231 Wipke, W. Todd, 136,188
Herndon, W i l l i a m C , 169 Woodruff, Hugh B . , 312
Hoffman, P . Α . , 321
Subject Index
Actinospectacin—Continued
trace of sulfone f u n c t i o n a l i t y during
PAIRS i n t e r p r e t a t i o n , 315,3l8f
A b s t r a c t i o n , 189 A c t i o n s , d e f i n i t i o n , 94,95t,96
Actinospectacin A g r i c u l t u r a l formulations
d i g i t i z e d spectrum, 315,317t r e q u i r e m e n t s , 87
PAIRS i n t e r p r e t a t i o n , 315,3l8t s t r u c t u r e o f the e x p e r t
s t r u c t u r e , 315-316 system, 89,91-97
386
D e c l a r a t i v e languages a s s i s t a n c e team
c h a r a c t e r i s t i c s , 112 Elaboration of reactions for organic
d e s c r i p t i o n , 112 s y n t h e s i s (EROS), r e a c t i o n
D e f i n i t e c l a u s e grammar (DCG), 232-233 schemes, 2 5 9 , 2 6 l f
Definite integration, application of Emulsifiable concentrate,
MACSYMA, 107 d e s c r i p t i o n , 88
D i - n - o c t y l phthalate _ EROS—See E l a b o r a t i o n o f r e a c t i o n s
daughter spectrum o f C-containing for organic synthesis
M , 333,334f,335
+
Example s e t , corona r u l e , 2 0 - 2 2 f
mass spectrum, 328,329f Examples o f e x p e r t - s y s t e m a p p l i c a t i o n s
match o f 1 0 5 daughter s p e c t r a v s .
+
b i o l o g i c a l reactors, 9,10f
di-n-octv_l phthalate, 328t,330f communication s a t e l l i t e s , 9 , 1 1 , 1 2 f
match o f 149 daughter s p e c t r a v s . space s t a t i o n s , 11,13-15
d i - n - o c t y l phthalate, 331t,332f Execution e f f i c i e n c y
parent spectrum o f r a d i a l , 24-25
mass 149, 331,333,334f r e a l - t i m e a p p l i c a t i o n of expert
spectrum-substructure systems, 69
c o r r e l a t i o n s , 331 EXMAT—See Linked network o f e x p e r t
s t r u c t u r e s , 328,329f systems f o r m a t e r i a l s a n a l y s i s
Diagnosis EXMATH—See Expert system f o r p a t t e r n
d e f i n i t i o n , 56 recognition
e x p e r t s y s t e m , 57 Expert
Diels-Alder reactions experimental design with
algorithm for regiochemical PENNZYME, 81-82
s e l e c t i o n , 238 f i t t i n g o f models to d a t a , 80
basic f r o n t i e r molecular o r b i t a l s e l e c t i o n of a computational
t h e o r y , 234 model, 80
b a s i c h i g h e s t occupied m o l e c u l a r s e l e c t i o n o f a c o n c e p t u a l model, 80
o r b i t a l - l o w e s t unoccupied m o l e c u E x p e r t chromatographic a s s i s t a n c e team
l a r o r b i t a l c a l c u l a t i o n s , 235-236 (ECAT)
d e t e r m i n a t i o n o f permutated l o w e s t automatic t e s t i n g , 288
unoccupied m o l e c u l a r o r b i t a l development equipment, 2 8 3 , 2 8 5 , 2 8 7 f
c o e f f i c i e n t s , 237-238 development o f knowledge
determination of substituent b a s e s , 285-286
e f f e c t s , 236-237t elements i n v o l v e d i n development
d i s c o n n e c t i o n a p p r o a c h , 231 and a p p l i c a t i o n , 2 8 0 , 2 8 1 f
g e n e r a l from d e r i v a t i o n , 239,240t examples o f f a c t s and r u l e s , 2 8 3 , 2 8 4 f
grammar, 233-234 e x p e r t system programming, 279-280
n a i v e approach d e r i v a t i o n , 238,239t f i r s t r u l e s , 286
n o t a t i o n rearrangement, 241-242 IF-THEN r u l e s , 2 8 6 , 2 8 7 f , 2 8 8
s t r u c t u r a l c o n s t r a i n t s on knowledge r e p r e s e n t a t i o n , 294-295
r e a c t a n t s , 235 l i m i t a t i o n s of conventional
use o f g e n e r a l form i n r u l e programming, 279
f o r m a t i o n , 240-241 module development, 292-294
ECAT—Continued Expert s y s t e m s — C o n t i n u e d
p r o j e c t m o t i v a t i o n , 279 diagnosis of plant conditions,
system s t r a t e g y , 280,283 r e a l - t i m e a p p l i c a t i o n o f , 69-70
t a s k modules, 280,282f D i e l s - A l d e r r e a c t i o n s , 231-242
user i n t e r f a c e s , 295 execution e f f i c i e n c y , real-time
Expert system a p p l i c a t i o n o f , 69
d e f i n i t i o n , 56,279-280 hardware technology r e v o l u t i o n , 13
examples o f a p p l i c a t i o n s , 29 high-performance l i q u i d
formulation o f a g r i c u l t u r a l chromatographic methods
c h e m i c a l s , 87-97 developments, 278-295
major components, 3 knowledge e x t r a c t i o n , 27-28
p a r t s , 56 MS-MS d a t a , 321-335
TOGA, 20-21 NMR s p e c t r o s c o p y , 337-348
Expert system f o r a g r i c u l t u r a l o r g a n i c c h e m i s t r y , 258-274
formulations organic structure
a c c e s s i n g e x t e r n a l s o f t w a r e , 93 d e t e r m i n a t i o n , 350-363
f u t u r e developments, 96-97 o r g a n i c s y n t h e s e s , 244-257
response c h e c k i n g f u n c t i o n s , 9 2 t programs f o r c h e m i s t r y , 280
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ix002
Generic r u l e s KARMA—Continued
d e s c r i p t i o n , 153 i n t e r a c t i o n s f o r enzyme-ligand
hydrophobicity b i n d i n g , 152
examples, 153-154,155-156f knowledge, 152
GENOA—See G e n e r a t i o n o f m o l e c u l a r molecule e d i t o r , I48,150f
structures pop-up menus, I48,150f
GEORGE, 126-127,128f r u l e f o r m u l a t i o n , 153
comparison to o t h e r programs, 126 s p e c i f i c r u l e s , 156-157
diagram f o r d e t e r m i n a t i o n o f a n i l i n e system c o r e , 151-157
m o l a r i t y , 121,129,132f system d e s i g n , I 4 8 , l 4 9 f
diagram f o r d e t e r m i n a t i o n o f e t h a n o l system i m p l e m e n t a t i o n , 148,151
d e n s i t y , 129,132f KarmaData, d e s c r i p t i o n , 152
d i s p l a y o f u n i t c o n v e r s i o n , 129,130f K E E - a s s i s t e d r e c e p t o r mapping a n a l y s i s
domain, 126 d e s c r i p t i o n 147-148
example o f a r e l a t i o n page, 131,132f d i f f e r e n c e from t r a d i t i o n a l approach
e x t e n s i o n o f the domain o f to drug d e s i g n , 147-148
a p p l i c a t i o n , 133 Knowledge, m a n i p u l a t i o n f o r use i n
l e v e l s o f u s e , 127-132 computer programs, 2
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ix002
IF-THEN r u l e s , r u l e - b a s e d e x p e r t L
systems, 3
Incremental multivalued l o g i c
d e s c r i p t i o n , 199-200 Languages, programming, v s . programming
i m p l i c a t i o n , 201 environments, 6
incremental a c q u i s i t i o n of L i n k e d network o f e x p e r t systems f o r
e v i d e n c e , 200-201 m a t e r i a l s a n a l y s i s (EXMAT)
Indefinite integration, application of a n a l y t i c a l g o a l s , 375
MACSYMA, 107 chemometric-search a l g o r i t h m s , 375
I n t u i t i v e t h e o r y , d e f i n i t i o n , 194 data g e n e r a t i o n , 367
documentation and e v a l u a t i o n o f
r e s u l t s , 375
Κ e x p e r t system network, 368,370f
i n d i v i d u a l systems, 367
i n s t r u m e n t a l c o n f i g u r a t i o n and
KARMA c o n d i t i o n s , 368
g e n e r i c r u l e s , 153-156 i n t e r p r e t a t i o n , 375
g r a p h i c i n t e r f a c e , 157 o u t l i n e , 376,379-38lf
Ν spectra, 356-357,358f
Physicochemical parameters,
Necessity, d e f i n i t i o n , 58 examples, 151-152
Nuclear magnetic resonance spectro PICON—See Process i n t e l l i g e n t control
scopic analysis, systems, 337 Planning, 189
P o l a r i z a b i l i t y , d e f i n i t i o n , 263
Polynomial equations, applications,
0 application of MACSYMA, 103-104
Postulates, d e f i n i t i o n , 194-195
Power plant
Once-through b o i l e r system malfunction d e f i n i t i o n , 53
cation conductivity sensor schematic, 53,54f
malfunction, 62t,64f,65 types of b o i l e r s , 53
description, 60,6lt Power plant chemistry
diagnoses, 60-61 dependence on b o i l e r , 53,55
number diagnosed for each problems, 55
sensor, 60,6lt Predicate, d e f i n i t i o n , 193
sensor validation, 61-62,63f Predicate calculus
Organic structure determination formal symbols used QED, 192t
a c c e s s i b i l i t y to knowledge base and l o g i c , 190-192
reasoning process, 352 translation of chemical statements
chemical data base, 355,356f into predicate l o g i c , 192
example for working d e f i n i t i o n , 192
4-phenyl-2-butanone, 356-358 Predicates, d e f i n i t i o n , 93-94,95t
flow chart, 350,353f Problem solving and inference engine,
interpretation of spectra, 352,353f expert systems, 3
IR expert module, 355 Procedural languages
messages, 355-356 characteristics, 111-112
program description, 354-355 d e f i n i t i o n , 111
r e c a l l , 360 steps i n algebraic
r e l i a b i l i t y , 360 equation solving, 113
testing of known Process control, real-time expert
structures, 357-362 system f o r , 69-74
Process i n t e l l i g e n t control (PICON)
backward-chaining inference, 70-71
design requirements, 70
Ρ example of inference, 73
focus f a c i l i t y , 71,73
PAIRS—See Program for the analysis forward-chaining inference, 70-71
of infrared spectra overall structure of package, 74f
Pattern recognition programs, system for process
development, 366 control, 71,72f
Reasoning
symbolic a p p l i c a t i o n appropriate
QED program to e x p e r t s y s t e m s , 8
agenda l i s t , 204-205 use i n problem s o l v i n g , 3
a n a l y s i s example, 2 0 5 , 2 0 6 f , 2 0 7 Rule-based system f o r s p e c t r a
b l o c k diagram, 201,202f c l a s s i f i c a t i o n , 351
BNF grammar f o r language, 203f Rule-based systems, d e f i n i t i o n , 306
c o m p i l a t i o n p r o c e s s f o r r u l e s , 202f RuleMaker
data b a s e , 204 i n d u c t i v e l e a r n i n g , 20-21
d e s c r i p t i o n , 201-202 knowledge a c q u i s i t i o n system, 20
i n t e r n a l form o f ALPHA-TO-SC, 204t RuleMaster
parse t r e e , 203f C-code g e n e r a t i o n , 24
r u l e p a s s i n g , 202 e x p e r t systems, 18-29
r u l e s , 205 e x p l a n a t i o n o f the l i n e o f
r e a s o n i n g , 23
e x t e r n a l p r o c e s s e s , 23-24
h i s t o r y , 19
knowledge e x t r a c t i o n , 27-28
R p o r t a b i l i t y , 25
programming s k i l l s r e q u i r e d , 28-29
two p r i n c i p a l components, 2 0 , 2 1 , 2 3
Radial
d i s c u s s i o n , 20-21,23
error detection at building S
t i m e , 24-25
e x e c u t i o n e f f i c i e n c y , 24-25
i n t e r f a c i n g s o f t w a r e , 23 S c i e n t i f i c and e n g i n e e r i n g
language f e a t u r e s , 21 a p p l i c a t i o n s , e x p e r t s y s t e m s , 5-6
s i m i l a r i t i e s t o P a s c a l and ADA, 21 SECS—See S i m u l a t i o n and e v a l u a t i o n o f
R e a c t i o n r u l e data base c h e m i c a l s y n t h e s i s program
connection tables S e l f - o r g a n i z e d knowledge base f o r
o r g a n i z a t i o n , 250-251 organic chemistry
G e l e r n t e r r e a c t i o n r u l e , 247,249f calculation of generalization
m u l t i s t e p r u l e s , 250-251 v a l i d i t i e s , 217-218
s i n g l e - s t e p r u l e s , 250-251 i n t e r a c t i v e s e s s i o n s , 219,220-223f
Reaction r u l e t r a n s l a t i o n i n t o clauses reaction generalizations
c l a u s e r e p r e s e n t a t i o n o f g o a l and based on s p e c i f i c
s u b g o a l , 251 o b s e r v a t i o n s , 212,214,215-216f
v a r i a b l e s u b s t r u c t u r e molecule v s . r e a c t i o n r e p r e s e n t a t i o n , 211-212,213f
known m o l e c u l e , 251-252 S e n t e n t i a l c a l c u l u s , d e s c r i p t i o n , 195
R e a c t i v i t y space approach S i m i l a r i t y o f molecules
c l u s t e r a n a l y s i s , 270,272f,273 a l i p h a t i c a l c o h o l s , 169,170,171f
problems, 120-121 a p p l i c a t i o n a p p r o p r i a t e to e x p e r t
d a t a s t r u c t u r e s , 122 systems, 8
equipment, 121 use i n problem s o l v i n g , 3
i n p u t language, 121-122 S y n t h e s i s p l a n n i n g programs
mathematical problem, 120 approaches to l a r g e s e a r c h
program o u t p u t , 122-123 s p a c e s , 189-190
syntax a n a l y s i s , 122 complexity o f synethesis
Software e n g i n e e r i n g , t r a d i t i o n a l , t r e e , 189,191f
d i f f e r e n c e s , expert systems, 7 f i r s t - o r d e r predicate
Software f o r s c i e n t i f i c c o m p u t a t i o n , c a l c u l u s , 190,192
r e v i e w , 111-112 problems, 189
Specific rules p r o c e d u r e , 188-189
d e s c r i p t i o n , 154 s t r a t e g i c b a s i s , 189
examples, 154,156f,157 symmetry-based s t r a t e g y f o r
Spectrum-substructure r e l a t i o n s h i p s 8 - c a r o t e n e , 190,191f
example f o r d i - n - o c t y l S y n t h e s i s w i t h LMA (SYNLMA)
p h t h a l a t e , 328-333 advantage, 244-245
p r o c e d u r e , 326,328 d e f i n i t i o n , 244
SpinPro u l t r a c e n t r i f u g a t i o n expert improvements, 256-257
system p r o c e s s , 245
backward-chaining inference r e a c t i o n r u l e d a t a b a s e , 247-251
e n g i n e , 306-307,308f s y n t h e t i c d e s i g n p r o c e s s , 253-256
c a l c u l a t i o n f u n c t i o n , 309 translation of reaction rules into
c o n s u l t a t i o n f u n c t i o n , 299 c l a u s e s , 251,252f
d e s c r i p t i o n , 298 S y n t h e t i c d e s i g n p r o c e s s u s i n g SYNLMA
d e s i g n i n p u t s r e p o r t , 301-302 problem-solving tree f o r synthesis of
development, 309-310 d a r v o n , 253,254-255f
i n f o r m a t i o n f u n c t i o n , 307,308f t w o - t r e e system, 253,256
l a b p l a n r e p o r t , 306
l a b r o t o r s , 300-301
major f u n c t i o n s , 298-299
methods, 204 Τ
o p e r a t i o n , 299-300
o p t i m a l p l a n r e p o r t , 301,303f,304
o p t i m i z a t i o n c r i t e r i a , 300 Taylor-Laurent s e r i e s , application of
p l a n comparison r e p o r t , 3 0 1 , 3 0 3 f , 3 0 6 MACSYMA, 108-109
p r o t e i n sample s e p a r a t i o n , 304-305 The i n t e l l i g e n t machine model (TIMM)
u s e r i n t e r f a c e , 299 d e c i s i o n and c o n t r o l
v s . e x p e r t , 310-311 s t r u c t u r e , 368,369f
Steam power p l a n t , downtime, 52 s e c t i o n s , 367
S u f f i c i e n c y , d e f i n i t i o n , 57-58 Time-ordered p r e s e n t a t i o n o f f i r e d
Symbolic programs f o r group theory r u l e s , p r o o f o r d e r i n g , 23
advantages, 176-177 TIMM—See The i n t e l l i g e n t machine model
TK S o l v e r U
a c i d r a i n example, 115-116f, 117
c a p a b i l i t i e s , 113 Ultracentrifugation, problems, 297
c h e m i c a l a p p l i c a t i o n s , 117-118
c o m p u t a t i o n a l a p p r o a c h , 112-113
d e f i n i t i o n , 112
van der Waals gas V
example, 1 1 3 , 1 1 ^ , 1 1 5
T o o l s , used i n c o n s t r u c t i n g e x p e r t V a l i d i t y , aid i n precursor
systems d e s c r i p t i o n , 6 g e n e r a t i o n , 218
Transformer f a u l t d i a g n o s i s , e x p e r t Variance-covariance matrix of
system f o r , TOGA, 25-29 parameters, c a l c u l a t i o n by
Transformer o i l gas a n a l y s i s (TOGA) PENNZYME, 82
expert system, 20-21
e x p e r t system f o r t r a n s f o r m e r
f a u l t d i a g n o s i s , 25-29
d i a g n o s t i c a p p r o a c h , 25 W
knowledge e x t r a c t i o n , 27-28
Publication Date: April 30, 1986 | doi: 10.1021/bk-1986-0306.ix002
knowledge r e f i n e m e n t , 28
o p e r a t i o n a l u s e , 27 Wettable powders, d e s c r i p t i o n , 88
reasons f o r b u i l d i n g the system, 26 Wiswesser l i n e n o t a t i o n (WLN),
v a l i d a t i o n , 26-27 background, 232