Artificial Intelligence (Topic New Page)
Artificial Intelligence (Topic New Page)
By
Dr Zafar. M. Alvi
Artificial Intelligence (CS607)
Table of Contents:
Lecture # 1
1 Introduction .................................................................................................................................................. 4
1.1 What is Intelligence? ......................................................................................................................... 4
1.2 Intelligent Machines .......................................................................................................................... 7
Lecture # 2
1.3 Formal Definitions for Artificial Intelligence .................................................................................. 7
Lecture # 3
1.4 History and Evolution of Artificial Intelligence .............................................................................. 9
1.5 Applications ..................................................................................................................................... 13
1.6 Summary .......................................................................................................................................... 14
Lecture # 4
2 Problem Solving ........................................................................................................................................ 15
2.1 Classical Approach .......................................................................................................................... 15
2.2 Generate and Test ............................................................................................................................ 15
2.3 Problem Representation. ................................................................................................................. 16
2.4 Components of Problem Solving .................................................................................................... 17
Lecture # 5
2.5 The Two-One Problem .................................................................................................................... 18
2.6 Searching .......................................................................................................................................... 21
2.7 Tree and Graphs Terminology ........................................................................................................ 21
2.8 Search Strategies .............................................................................................................................. 23
Lecture # 6
2.9 Simple Search Algorithm ................................................................................................................ 24
2.10 Simple Search Algorithm Applied to Depth First Search ............................................................. 25
Lecture # 7
2.11 Simple Search Algorithm Applied to Breadth First Search .......................................................... 28
2.12 Problems with DFS and BFS .......................................................................................................... 32
2.13 Progressive Deepening .................................................................................................................... 32
Lecture # 8
2.14 Heuristically Informed Searches ..................................................................................................... 37
2.15 Hill Climbing ................................................................................................................................... 39
2.16 Beam Search .................................................................................................................................... 43
Lecture # 9
2.17 Best First Search .............................................................................................................................. 45
2.18 Optimal Searches ............................................................................................................................. 47
2.19 Branch and Bound ........................................................................................................................... 48
2.20 Improvements in Branch and Bound .............................................................................................. 55
2.21 A* Procedure ................................................................................................................................... 56
2.22 Adversarial Search ........................................................................................................................... 62
Lecture # 10
2.23 Minimax Procedure ......................................................................................................................... 63
2.24 Alpha Beta Pruning ......................................................................................................................... 64
2.25 Summary .......................................................................................................................................... 71
2.26 Problems ........................................................................................................................................... 72
Lecture # 11, 12
3 Genetic Algorithms ................................................................................................................................... 76
3.1 Discussion on Problem Solving ...................................................................................................... 76
3.2 Hill Climbing in Parallel ................................................................................................................. 76
3.3 Comment on Evolution. .................................................................................................................. 77
3.4 Genetic Algorithm ........................................................................................................................... 77
3.5 Basic Genetic Algorithm ................................................................................................................. 77
3.6 Solution to a Few Problems using GA ........................................................................................... 77
Lecture # 13
3.7 Eight Queens Problem ..................................................................................................................... 82
3.8 Problems ........................................................................................................................................... 88
Lecture # 14
2
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
4 Knowledge Representation and Reasoning .............................................................................................. 89
4.1 The AI Cycle .................................................................................................................................... 89
4.2 The dilemma .................................................................................................................................... 90
4.3 Knowledge and its types ................................................................................................................. 90
4.4 Towards Representation .................................................................................................................. 91
4.5 Formal KR techniques ..................................................................................................................... 93
4.6 Facts ................................................................................................................................................. 94
4.7 Rules ................................................................................................................................................. 95
Lecture # 15
4.8 Semantic networks .......................................................................................................................... 97
4.9 Frames .............................................................................................................................................. 98
4.10 Logic................................................................................................................................................. 98
Lecture # 16, 17
4.11 Reasoning .......................................................................................................................................102
4.12 Types of reasoning ........................................................................................................................102
Lecture # 18
5 Expert Systems ........................................................................................................................................111
3
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 39
7.13 LEARNING: Connectionist ..........................................................................................................181
7.14 Biological aspects and structure of a neuron ..............................................................................181
7.15 Single perceptron ...........................................................................................................................182
7.16 Linearly separable problems .........................................................................................................184
7.17 Multiple layers of perceptrons ......................................................................................................186
Lecture # 40
7.18 Artificial Neural Networks: supervised and unsupervised .......................................................... 187
7.19 Basic terminologies .......................................................................................................................187
7.20 Design phases of ANNs ................................................................................................................188
7.21 Supervised ......................................................................................................................................190
Lecture # 41
7.22 Unsupervised .................................................................................................................................190
7.23 Exercise ..........................................................................................................................................192
Lecture # 42
8 Planning ...................................................................................................................................................195
8.1 Motivation ......................................................................................................................................195
8.2 Definition of Planning ...................................................................................................................196
8.3 Planning vs. problem solving ........................................................................................................197
8.4 Planning language..........................................................................................................................197
8.5 The partial-order planning algorithm – POP ................................................................................198
8.6 POP Example .................................................................................................................................199
8.7 Problems .........................................................................................................................................202
Lecture # 43
9 Advanced Topics .....................................................................................................................................203
9.1 Computer vision ............................................................................................................................203
Lecture # 44
9.2 Robotics .........................................................................................................................................204
9.3 Clustering .......................................................................................................................................205
Lecture # 45
10 Conclusion ..........................................................................................................................................206
4
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Artificial Intelligence
Lecture # 1
Introduction of Artificial Intelligence
1 Introduction
This booklet is organized as chapters that elaborate on various concepts of Artificial
Intelligence. The field itself is an emerging area of computer sciences and a lot of
work is underway in order to mature the concepts of this field.
In this booklet we will however try to envelop some important aspects and basic
concepts which will help the reader to get an insight into the type of topics that
Artificial Intelligence deals with.
We have used the name of the field i.e. Artificial Intelligence (commonly referred
as AI) without any explanation of the name itself. Let us now look into a simple
but comprehensive way to define the field.
To define AI, let us first try to understand that what is Intelligence?
If you were asked a simple question; how can we define Intelligence? many of
you would exactly know what it is but most of you won’t exactly be able to define
it. Is it something tangible? We all know that it does exist but what actually it is.
Some of us will attribute intelligence to living beings and would be of the view that
all living species are intelligent. But how about these plants and tress, they are
living species but are they also intelligent? So can we say that Intelligence is atrait
of some living species? Let us try to understand the phenomena of intelligence by
using a few examples.
Consider the following image where a mouse is trying to search a maze in order
to find its way from the bottom left to the piece of cheese in the top right corner of the
image.
This problem can be considered as a common real life problem which we deal with
many times in our life, i.e. finding a path, may be to a university, to a friends house,
to a market, or in this case to the piece of cheese. The mouse tries various paths
as shown by arrows and can reach the cheese by more than one path. In other words
the mouse can find more than one solutions to this problem. The mouse was
intelligent enough to find a solution to the problem at hand. Hence the ability of
problem solving demonstrates intelligence.
1, 3, 7, 13, 21,
If you were asked to find the next number in the sequence what would be your
answer? Just to help you out in the answer let us solve it for you “adding the next even
number to the” i.e. if we add 2 to 1 we get 3, then we add 4 to 3 we get 7, then we
get 6 to 7 we get 13, then we add 8 to 13 we get 21 and finally if we’lladd 10 to
21 we’ll get 31 as the answer. Again answering the question requires a little bit
intelligence. The characteristic of intelligence comes in when we try to solve
something, we check various ways to solve it, we check differentcombinations, and
many other things to solve different problems. All this thinking, this memory
manipulation capability, this numerical processing ability and a lot of other things add
to ones intelligence.
All of you have experienced your college life. It was very easy for us to look at the
timetable and go to the respective classes to attend them. Not even caring that how
that time table was actually developed. In simple cases developing such a timetable
is simple. But in cases where we have 100s of students studying in different classes,
where we have only a few rooms and limited time to scheduleall those classes.
This gets tougher and tougher. The person who makes thetimetable has to look into
all the time schedule, availability of the teachers, availability of the rooms, and many
other things to fit all the items correctly withina fixed span of time. He has to look
into many expressions and thoughts like “If room A is free AND teacher B is ready to
take the class AND the students of the class are not studying any other course at that
time” THEN “the class can be scheduled”. This is a fairly simple one, things get
complex as we add more and more parameters e.g. if we were to consider that
teacher B might teach more than one course and he might just prefer to teach in
room C and many other things like that. The problem gets more and more complex.
We are pretty much sure than none of us had ever realized the complexity through
which our teachers go through while developing these schedules for our classes.
However, like we know such time tables can be developed. All this information has to
reside in the developer’s brain. His intelligence helps him to create such a
schedule. Hence the ability to think, plan and schedule demonstrate intelligence.
Consider a doctor, he checks many patients daily, diagnoses their disease, gives
them medicine and prescribes them behaviors that can help them to get cured.
Let us think a little and try to understand that what actually he does. Though
checking a patient and diagnosing the disease is much more complex but we’ll try
to keep our discussion very simple and will intentionally miss out stuff from this
discussion.
A person goes to doctor, tells him that he is not feeling well. The doctor asks him
a few questions to clarify the patient’s situation. The doctor takes a few
measurements to check the physical status of the person. These measurements
might just include the temperature (T), Blood Pressure (BP), Pulse Rate (PR) and
things like that. For simplicity let us consider that some doctor only checks these
measurements and tries to come up with a diagnosis for the disease. He takes these
measurements and based on his previous knowledge he tries to diagnose the
disease. His previous knowledge is based on rules like: “If the patient has ahigh BP
and normal T and normal PR then he is not well”. “If only the BP isnormal
then what ever the other measurements may be the person should be healthy”, and
many such other rules.
6
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The key thing to notice is that by using such rules the doctor might classify a person
to be healthy or ill and might as well prescribe different medicines to him using the
information observed from the measurements according to his previous knowledge.
Diagnosing a disease has many other complex information and
observations involved, we have just mentioned a very simple case here. However,
the doctor is actually faced with solving a problem of diagnosis having looked at
some specific measurements. It is important to consider that a doctor who would
have a better memory to store all this precious knowledge, better ability of
retrieving the correct portion of the knowledge for the correct patient will be better
able to classify a patient. Hence, telling us that memory and correct and
efficient memory and information manipulation also counts towards ones
intelligence.
Things are not all that simple. People don’t think about problems in the same
manner. Let us give you an extremely simple problem. Just tell us about your
height. Are you short, medium or tall? An extremely easy question! Well you might
just think that you are tall but your friend who is taller than you might say that NO!
You are not. The point being that some people might have such a distribution in
their mind that people having height around 4ft are short, around 5ftare medium
and around 6ft are tall. Others might have this distribution that people having
height around 4.5ft are short, around 5.5ft are medium and around6.5ft are tall.
Even having the same measurements different people can get to completely
different results as they approach the problem in different fashion. Things can be
even more complex when the same person, having observed same
measurements solves the same problem in two different ways and reaches
different solutions. But we all know that we answer such fuzzy questions very
efficiently in our daily lives. Our intelligence actually helps us do this. Hence the
ability to tackle ambiguous and fuzzy problems demonstrates intelligence.
Can you recognize a person just by looking at his/her fingerprint? Though we all know
that every human has a distinct pattern of his/her fingerprint but just by looking at a
fingerprint image a human generally can’t just tell that this print mustbe of person
XYZ. On the other hand having distinct fingerprint is really important information as it
serves as a unique ID for all the humans in this world.
Let us just consider 5 different people and ask a sixth one to have a look at different
images of their fingerprints. We ask him to somehow learn the patterns, which make
the five prints distinct in some manner. After having seen the imagesa several
times, that sixth person might get to find something that is making the prints distinct.
Things like one of them has fever lines in the print, the other onehas sharply curved
lines, some might have larger distance between the lines inthe print and some
might have smaller displacement between the lines and many such features. The
point being that after some time, which may be in hours ordays or may be even
months, that sixth person will be able to look at a new fingerprint of one of those five
persons and he might with some degree of accuracy recognize that which one
amongst the five does it belong. Only with 5 people the problem was hard to solve.
His intelligence helped him to learn the features that distinguish one finger print from
the other. Hence the ability to learn and recognize demonstrates intelligence.
7
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Let us give one last thought and then will get to why we have discussed all this. A lot
of us regularly watch television. Consider that you switch off the volume ofyour
TV set. If you are watching a VU lecture you will somehow perceive that the person
standing in front of you is not singing a song, or anchoring a musical show or playing
some sport. So just by observing the sequence of images of the person you are
able to perceive meaningful information out of the video. Your intelligence helped you
to perceive and understand what was happening on the TV. Hence the ability to
understand and perceive demonstrates intelligence.
We will have to call such a machine Intelligent. Is this real or natural intelligence?
NO! This is Artificial Intelligence.
8
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 2
Different Definitions of Artificial Intelligence
To make computers think like humans we first need to devise ways to determine that
how humans think. This is not that easy. For this we need to get inside the actual
functioning of the human brain. There are two ways to do this:
Introspection: that is trying to catch out own thoughts as they go by.
Psychological Experiments: that concern with the study of science of mental
life.
Once we accomplish in developing some sort of comprehensive theory that how
humans think, only then can we come up with computer programs that follow the
same rules. The interdisciplinary field of cognitive science brings together computer
9
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The issue of acting like humans comes up when AI programs have to interact
with people or when they have to do something physically which human usually
do in real life. For instance when a natural language processing system makes a
dialog with a person, or when some intelligent software gives out a medical diagnosis,
or when a robotic arm sorts out manufactured goods over a conveyerbelt and
many other such scenarios.
Keeping in view all the above motivations let us give a fairly comprehensivecomment
that Artificial Intelligence is an effort to create systems that can learn, think, perceive,
analyze and act in the same manner as real humans.
People have also looked into understanding the phenomena of Artificial Intelligence
from a different view point. They call this strong and weal AI.
Strong AI means that machines act intelligently and they have real conscious minds.
Weak AI says that machines can be made to act as if they are intelligent. That is
Weak AI treats the brain as a black box and just emulates its functionality. While
strong AI actually tries to recreate the functions of the inside of the brain as opposed
to simply emulating behavior.
The concept can be explained by an example. Consider you have a very intelligent
machine that does a lot of tasks with a lot of intelligence. On the other hand you have
very trivial specie e.g. a cat. If you throw both of them into a poolof water, the cat
will try to save her life and would swim out of the pool. The “intelligent” machine would
die out in the water without any effort to save itself. The mouse had strong
Intelligence, the machine didn’t. If the machine has strong artificial intelligence, it
would have used its knowledge to counter for this totally new situation in its
environment. But the machine only knew what we taught it orin other wards only
knew what was programmed into it. It never had the inherent capability of intelligence
which would have helped it to deal with this new situation.
Most of the researchers are of the view that strong AI can’t actually ever be created
and what ever we study and understand while dealing with the field of AIis related
to weak AI. A few are also of the view that we can get to the essence of strong AI as
well. However it is a standing debate but the purpose was to introduce you with
another aspect of thinking about the field.
10
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 3
History and Applications of Artificial Intelligence
1.4 History and Evolution of Artificial Intelligence
AI is a young field. It has inherited its ideas, concepts and techniques from many
disciplines like philosophy, mathematics, psychology, linguistics, biology etc. From
over a long period of traditions in philosophy theories of reasoning and learning have
emerged. From over 400 years of mathematics we have formal theories of logic,
probability, decision-making and computation. From psychology we have the tools
and techniques to investigate the human mind and ways to represent the resulting
theories. Linguistics provides us with the theories of structure and meaning of
language. From biology we have information about the network structure of a human
brain and all the theories on functionalities of different human organs. Finally from
computer science we have tools and concepts to make AI a reality.
The proposed an artificial model of the human neuron. Their model proposed a
human neuron to be a bi-state element i.e. on or off and that the state of the neuron
depending on response to stimulation by a sufficient number of neighboring
neurons. They showed, for example, that some network of connected neurons
could compute any computable function, and that all the logical connectives can
be implemented by simple net structures. They also suggested that suitably
connected networks can also learn but they didn’t pursue this idea much at that time.
Donald Hebb (1949) demonstrated a simple updating rile for the modifying the
connection strengths between neurons, such that learning could take place.
1.4.2 The name of the field as “Artificial Intelligence”
In 1956 some of the U.S researchers got together and organized a two-month
workshop at Dartmouth. There were altogether only 10 attendees. Allen Newell
and Herbert Simon actually dominated the workshop. Although all the
researchers had some excellent ideas and a few even had some demo programs
like checkers, but Newell and Herbert already had a reasoning program, the Logic
Theorist. The program came up with proofs for logic theorems. TheDartmouth
workshop didn’t lead to any new breakthroughs, but it did all the majorpeople who
were working in the field to each other. Over the next twenty years these people,
their students and colleagues at MIT, CMU, Stanford and IBM, dominated the field
of artificial intelligence. The most lasting and memorable thingthat came out of that
workshop was an agreement to adopt the new name for the field: Artificial
Intelligence. So this was when the term was actually coined.
them met great successes. Newell and Simon’s early success was followed up with
the General Problem Solver. Unlike Logic Theorist, this program was developed in
the manner that it attacked a problem imitating the steps that human take when
solving a problem. Though it catered for a limited class of problems but it was found
out that it addressed those problems in a way very similar to that as humans. It was
probably the first program that imitated human thinking approach.
1.4.5 Microworlds
Marvin Minsky (1963), a researcher at MIT supervised a number of students who
chose limited problems that appeared to require intelligence to solve. These
limited domains became known as Microworlds. Some of them developed
programs that solved calculus problems; some developed programs, which were
able to accept input statements in a very restricted subset of English language,
and generated answers to these statements. An example statement and an
answer can be:
Statement:
If Ali is 2 years younger than Umar and Umar is 23 years old. How old is Ali?
Answer:
Ali is 21 years old.
In the same era a few researchers also met significant successes in building neural
networks but neural networks will be discusses in detail in the section titled “Learning”
in this book.
It is not my aim to surprise of shock you -- but the simplest way I can
summarize is to say that there are now in the world machines that think,
that learn and that create. Moreover, their ability to do these things is going
to increase rapidly until -- in a visible future – the range of problems they
can handle will be coextensive with the range to which human mind has
been applied
on one or two simple examples but most of them turned out to fail when tried out on
wider selection of problems and on moredifficult tasks.
One of the problems was that early programs often didn’t have much knowledge
of their subject matter, and succeeded by means of simple syntacticmanipulations
e.g. Weizenbaum’s ELIZA program (1965), which could apparently engage in
serious conversation on any topic, actually just borrowed and manipulated the
sentences typed into it by a human. Many of the language translation programs
tried to translate sentences by just a replacement of wordswithout having catered
for the context in which they were used, hence totally failing to maintain the
subject matter in the actual sentence, which was to be translated. The famous
retranslation of “the spirit is willing but the flesh is weak” as “the vodka is good
but the meat is rotten” illustrates the difficulties encountered.
Second kind of difficulty was that many problems that AI was trying to solve were
intractable. Most of the AI programs in the early years tried to attack a problem by
finding different combinations in which a problem can be solved and then combined
different combinations and steps until the right solution was found. This didn’t
work always. There were many intractable problems in which this approach failed.
A third problem arose because of the fundamental limitations on the basic structures
being used to generate intelligent behavior. For example in 1969, Minsky and
Papert’s book Perceptrons proved that although perceptrons could be shown to
learn anything they were capable of representing, they could represent very little.
However, in brief different happenings made the researchers realize that as they tried
harder and more complex problem the pace of their success decreased so they now
refrained from making highly optimistic statements.
Even after realizing the basic hurdles and problems in the way of achieving
success in this field, the researchers went on exploring grounds and techniques.
The first successful commercial expert system, R1, began operation at Digital
Equipment Corporation (McDermott, 1982). The program basically helped to
configure the orders for new computer systems. Detailed study of what expert
systems are will be dealt later in this book. For now consider expert systems as a
programs that somehow solves a certain problem by using previously stored
information about some rules and fact of the domain to which that problem
belongs.
In 1981, the Japanese announced the “Fifth Generation” project, a 10-year plan
to build intelligent computers running Prolog in much the same way that ordinary
computers run the machine code. The project proposed to achieve full-scale natural
language understanding along with many other ambitious goals. However, by this
time people began to invest in this field and many AI projectsgot commercially
funded and accepted.
by Bryson and Ho. The algorithm was applied to many learning problemin computer
science and the wide spread dissemination of the results in the collection Parallel
Distributed Processing (Rumelhart and McClelland, 1986) caused great excitement.
People tried out the back propagation neural networks as a solution to many learning
problems and met great success.
1.5 Applications
Artificial finds its application in a lot of areas not only related to computer sciences
but many other fields as well. We will briefly mention a few of the application areas
and throughout the content of this booklet you will find various applications of the field
in detail later.
Many information retrieval systems like Google search engine uses artificially
intelligent crawlers and content based searching techniques to efficiency and
accuracy of the information retrieval.
A lot of computer based games like chess, 3D combat games even many arcade
games use intelligent software to make the user feel as if the machine on which that
game is running is intelligent.
Computer Vision is a new area where people are trying to develop the sense of
visionary perception into a machine. Computer vision applications help to establish
tasks which previously required human vision capabilities e.g. recognizing human
faces, understanding images and to interpret them, analyzing medical scan and
innumerable amount of other tasks.
14
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Natural language processing is another area which tries to make machines speak
and interact with humans just like humans themselves. This requires a lot from
the field of Artificial Intelligence.
Expert systems form probably the largest industrial applications of AI. Software like
MYCIN and XCON/R1 has been successfully employed in medical and
manufacturing industries respectively.
Robotics again forms a branch linked with the applications of AI where people are
trying to develop robots which can be rather called as humanoids. Organizations have
developed robots that act as pets, visitor guides etc.
In short there are vast applications of the field and a lot of research work is going on
around the globe in the sub-branches of the field. Like mentioned previously, during
the course of the booklet you will find details of many application of AI.
1.6 Summary
Intelligence can be understood as a trait of some living species
Many factors and behaviors contribute to intelligence
Intelligent machines can be created
To create intelligent machines we first need to understand how the real
brain functions
Artificial intelligence deals with making machines think and act like
humans
It is difficult to give one precise definition of AI
History of AI is marked by many interesting happenings through which the
field gradually evolved
In the early years people made optimistic claims about AI but soon they
realized that it’s not all that smooth
AI is employed in various different fields like gaming, business, law,
medicine, engineering, robotics, computer vision and many other fields
This book will guide you through basic concepts and some core algorithms
that form the fundamentals of Artificial Intelligence
AI has enormous room for research and posses a diverse future
15
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 4
Problem Solving Techniques
2 Problem Solving
In chapter one, we discussed a few factors that demonstrate intelligence. Problem
solving was one of them when we referred to it using the examples of a mouse
searching a maze and the next number in the sequence problem.
Historically people viewed the phenomena of intelligence as strongly related to
problem solving. They used to think that the person who is able to solve more
and more problems is more intelligent than others.
In order to understand how exactly problem solving contributes to intelligence, we
need to find out how intelligent species solve problems.
Consider the maze searching problem. The mouse travels though one path and finds
that the path leads to a dead end, it then back tracks somewhat and goes along some
other path and again finds that there is no way to proceed. It goes on performing such
search, trying different solutions to solve the problem until a sequence of turns in the
maze takes it to the cheese. Hence, of all the solutionsthe mouse tries, the one that
reached the cheese was the one that solved the problem.
Consider that a toddler is to switch on the light in a dark room. He sees the
switchboard having a number of buttons on it. He presses one, nothing happens, he
presses the second one, the fan gets on, he goes on trying different buttons tillat
last the room gets lighted and his problem gets solved.
Consider another situation when we have to open a combinational lock of a briefcase.
It is a lock which probably most of you would have seen where we have different
numbers and we adjust the individual dials/digits to obtain a combination that opens
the lock. However, if we don’t know the correctcombination of digits that open the
lock, we usually try 0-0-0, 7-7-7, 7-8-6 or any such combination for opening the lock.
We are solving this problem in the same manner as the toddler did in the light switch
example.
All this discussion has one thing in common. That different intelligent species usea
similar approach to solve the problem at hand. This approach is essentially the
classical way in which intelligent species solve problems. Technically we call this hit
and trial approach the “Generate and Test” approach.
16
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Possible Tester
Solutions Correct
Solution Solutions
Generator
Incorrect
Solutions
It shows the problem of switching on the light by a toddler in a graphical form. Each
rectangle represents the state of the switch board. OFF | OFF| OFF means that all
the three switches are OFF. Similarly OFF| ON | OFF means that the first and the
last switch is OFF and the middle one is ON. Starting from the statewhen all
the switches are OFF the child can proceed in any of the three ways by
17
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
switching either one of the switch ON. This brings the toddler to the next level in
the tree. Now from here he can explore the other options, till he gets to a state where
the switch corresponding to the light is ON. Hence our problem was reduced to
finding a node in the tree which ON is the place corresponding to the light switch.
Observe how representing a problem in a nice manner clarifies the approach to be
taken in order to solve it.
18
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
reaches the cheese represents a state called the goal state. The set of the start state,
the goal state and all the intermediate states constitutes something whichis called
a solution space.
Lecture # 5
Problem Solving Example and Searching in Artificial Intelligence
2.5 The Two-One Problem
In order to explain the four components of problem solving in a better way we have
chosen a simple but interesting problem to help you grasp the [Link]
diagram below shows the setting of our problem called the Two-One Problem.
Start Goal
11?22 22?11
Rules:
Now let us try to solve the problem in a trivial manner just by using a hit and trial
method without addressing the problem in a systematic manner.
Trial 1
In Move 1 we slide a 2 to the left, then we hop a 1 to the right, then we slide the 2
to the left again and then we hop the 2 to the left, then slide the one to the right hence
at least one 2 and one 1 are at the desired positions as required in thegoal state
but then we are stuck. There is no other valid move which takes us outof this state.
Let us consider another trial.
Trial 2
19
Starting from the start state we first hop a 1 to the right, then we slide the other 1
to the right and then suddenly we get STUCK!! Hence solving the problem through
a hit and trial might not give us the solution.
Let us now try to address the problem in a systematic manner. Consider the diagram
below.
Starting from the goal state if we hop, we get stuck. If we slide we can further carry
on. Keeping this observation in mind let us now try to develop all the possible
combinations that can happen after we slide.
H H
??11112222 1111??2222 11112222 ??
S S
11??112222 111122??22
S H H S
222211??1 2?211
S S
1 1? 2 2
20
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The diagram above shows a tree sort of structure enumerating all the possible states
and moves. Looking at this diagram we can easily figure out the solution to our
problem. This tree like structure actually represents the “Solution Space” ofthis
problem. The labels on the links are H and S representing hop and slide operators
respectively. Hence H and S are the operators that help us travel through this
solution space in order to reach the goal state from the start state.
We hope that this example actually clarifies the terms problem statement, start state,
goal state, solution space and operators in your mind. It will be a niceexercise to
design your own simple problems and try to identify these components in them in
order to develop a better understanding.
2.6 Searching
All the problems that we have looked at can be converted to a form where we have
to start from a start state and search for a goal state by traveling through a solution
space. Searching is a formal mechanism to explore alternatives.
Most of the solution spaces for problems can be represented in a graph where nodes
represent different states and edges represent the operator which takes us from one
state to the other. If we can get our grips on algorithms that deal with searching
techniques in graphs and trees, we’ll be all set to perform problemsolving in an
efficient manner.
The diagram above is just to refresh your memories on the terminology of a tree.
As for graphs, there are undirected and directed graphs which can be seen in the
diagram below.
21
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
C
C
D E F G
D E F G
H I
H I
Directed Graph
Let us first consider a couple of examples to learn how graphs can represent
important information by the help of nodes and edges.
Graphs can be used to represent city routes.
We will use graphs to represent problems and their solution spaces. One thing to
be noted is that every graph can be converted into a tree, by replicating the nodes.
Consider the following example.
22
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
S B F
S
A G
The graph in the figure represents a city map with cities labeled as S, A, B, C, D,
E, F and G. Just by following a simple procedure we can convert this graph to a tree.
Start from node S and make it the root of your tree, check how many nodes are
adjacent to it. In this case A and D are adjacent to it. Hence in the tree make A
and D, children of S. Now go on proceeding in this manner and you’ll get a tree
with a few nodes replicated. In this manner depending on a starting node you can get
a different tree too. But just recall that when solving a problem; we usuallyknow
the start state and the end state. So we will be able to transform our problem
graphs in problem trees. Now if we can develop understanding of algorithms that are
defined for tree searching and tree traversals then we will be in a better shape to
solve problems efficiently.
We know that problems can be represented in graphs, and are well familiar with
the components of problem solving, let us now address problem solving in a more
formal manner and study the searching techniques in detail so that we can
systematically approach the solution to a given problem.
Suppose the mouse does not know where and how far is the cheese and is totally
blind to the configuration of the maze. The mouse would blindly search the maze
without any hints that will help it turning left or right at any junction. The mouse will
purely use a hit and trial approach and will check all combinations till one takes it to
the cheese. Such searching is called blind or uninformed searching.
Consider now that the cheese is fresh and the smell of cheese is spread through the
maze. The mouse will now use this smell as a guide, or heuristic (we will comment
on this word in detail later) to guess the position of the cheese andchoose the
best from the alternative choices. As the smell gets stronger, the
23
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
mouse knows that the cheese is closer. Hence the mouse is informed about the
cheese through the smell and thus performs an informed search in the maze.
For now you might think that the informed search will always give us a better solution
and will always solve our problem. This might not be true as you will find out when
we discuss the word heuristic in detail later.
When solving the maze search problem, we saw that the mouse can reach the cheese
from different paths. In the diagram above two possible paths are shown.
In any-path/non optimal searches we are concerned with finding any one solution
to our problem. As soon as we find a solution, we stop, without thinking that there
might as well be a better way to solve the problem which might take lesser time
or fewer operators.
Contrary to this, in optimal path searches we try to find the best solution. For example,
in the diagram above the optimal path is the blue one because it is smaller and
requires lesser operators. Hence in optimal searches we find solutions that are
least costly, where cost of the solution may be different for each problem.
24
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 6
Searching Algorithms
2.9 Simple Search Algorithm
Let us now state a simple search algorithm that will try to give you an idea about
the sort of data structures that will be used while searching, and the stop criteria for
your search. The strength of the algorithm is such that we will be able to usethis
algorithm for both Depth First Search (DFS) and Breadth First Search (BFS).
Here Q represents a priority queue. The algorithm is simple and doesn’t need much
explanation. We will use this algorithm to implement blind and uninformed searches.
The algorithm however can be used to implement informed searches as well.
The critical step in the Simple Search Algorithm is picking of a node Xfrom Q
according to a priority function. Let us call this function P(n). While using this
algorithm for any of the techniques, our priority will be to reduce the value of P(n) as
much as we can. In other words, the node with the highest priority will have the
smallest value of the function P(n) where n is the node referred to as Xin the
algorithm.
25
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As mentioned previously we will give priority to the element with minimum P(n) hence
the node with the largest value of height will be at the maximum priority tobe picked
from Q. The following sequence of diagrams will show you how DFS works on a tree
using the Simple Search Algorithm.
If Q is not empty, pick the node X with the minimum P(n) (in this case S), as it is
the only node in Q. Check if X is goal, (in this case X is not the goal). Hence find
all the children of X not in Visited and add them to Q and Visited. Goto Step 2.
26
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Again check if Q is not empty, pick the node X with the minimum P(n) (in this case
either A or B), as both of them have the same value for P(n). Check if X is goal, (in
this case A is not the goal). Hence, find all the children of A not in Visited
and add them to Q and Visited. Go to Step 2.
Lecture # 7
Searching Algorithms
Go on following the steps in the Simple Search Algorithm till you find a goal node.
The diagrams below show you how the algorithm proceeds.
27
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Here, from the 5th row of the table when we remove H and check if it’s the goal, the
algorithm says YES and hence we return H as we have reached the goal state.
The path followed by the DFS is shown by green arrows at each step. The diagram
below also shows that DFS didn’t have to search the entire search space, rather
only by traveling in half the tree, the algorithm was able to searchthe solution.
28
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Hence simply by selecting a specific P(n) our Simple Search Algorithm was converted
to a DFS procedure.
2.11 Simple Search Algorithm Applied to Breadth First Search
Breadth First Search explores the breadth of the tree first and progresses
downward level by level. Now, we will use the same Simple Search Algorithm to
implement BFS by keeping our priority function as
P(n) height(n)
As mentioned previously, we will give priority to the element with minimum P(n) hence
the node with the largest value of height will be at the maximum priority tobe
picked from Q. In other words, greater the depth/height greater the [Link]
following sequence of diagrams will show you how BFS works on a treeusing the
Simple Search Algorithm.
29
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
If Q is not empty, pick the node X with the minimum P(n) (in this case S), as it is
the only node in Q. Check if X is goal, (in this case X is not the goal). Hence find
all the children of X not in Visited and add them to Q and Visited. Goto Step 2.
Again, check if Q is not empty, pick the node X with the minimum P(n) (in this case
either A or B), as both of them have the same value for P(n). Remember, n refers to
the node X. Check if X is goal, (in this case A is not the goal). Hence findall the
children of A not in Visited and add them to Q and Visited. Go to Step 2.
30
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Now, we have B, C and D in the list Q. B has height 1 while C and D are at a
height 2. As we are to select the node with the minimum P(n) hence we will select
B and repeat. The following sequence of diagram tells you how the algorithm
proceeds till it reach the goal state.
31
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
32
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
When we remove H from the 9th row of the table and check if it’s the goal, the algorithm
says YES and hence we return H since we have reached the goal state. The path
followed by the BFS is shown by green arrows at each step. The diagram below also
shows that BFS travels a significant area of the search spaceif the solution is
located somewhere deep inside the tree.
Hence, simply by selecting a specific P(n) our Simple Search Algorithm was
converted to a BFS procedure.
33
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
DFS has small space requirements (linear in depth) but has major problems:
DFS can run forever in search spaces with infinite length paths
DFS does not guarantee finding the shallowest goal
BFS guarantees finding the shallowest path even in presence of infinite paths, but
it has one great problem
BFS requires a great deal of space (exponential in depth)
We can still come up with a better technique which caters for the drawbacks of both
these techniques. One such technique is progressive deepening.
Progressive deepening actually emulates BFS using DFS. The idea is to simply apply
DFS to a specific level. If you find the goal, exit, other wise repeat DFS to the next
lower level. Go on doing this until you either reach the goal node or thefull height
of the tree is explored. For example, apply a DFS to level 2 in the tree,if it reaches
the goal state, exit, otherwise increase the level of DFS and apply it again until you
reach level 4. You can increase the level of DFS by any factor. An example will further
clarify your understanding.
Consider the tree on the previous page with nodes from S … to N, where I is the
goal node.
Apply DFS to level 2 in the tree. The green arrows in the diagrams below showhow
DFS will proceed to level 2.
34
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
35
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
After exploring to level 2, the progressive deepening procedure will find out that the
goal state has still not been reached. Hence, it will increment the level by a factor of,
say 2, and will now perform a DFS in the tree to depth 4. The blue arrows in the
diagrams below show how DFS will proceed to level 4.
36
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
37
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
38
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As soon as the procedure finds the goal state it will quit. Notice that it guarantees
to find the solution at a minimum depth like BFS. Imagine that there are a number
of solutions below level 4 in the tree. The procedure would only travel a small portion
of the search space and without large memory requirements, will find out the
solution.
39
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 8
Heuristically Informed Searching Techniques
2.14 Heuristically Informed Searches
So far we have looked into procedures that search the solution space in an
uninformed manner. Such procedures are usually costly with respect to either time,
space or both. We now focus on a few techniques that search the solution space in
an informed manner using something which is called a heuristic. Such techniques
are called heuristic searches. The basic idea of a heuristic search isthat rather
than trying all possible search paths, you try and focus on paths that seem to be
getting you closer to your goal state using some kind of a “guide”. Of course, you
generally can't be sure that you are really near your goal state. However, we might
be able to use a good guess for the purpose. Heuristics are used to help us make
that guess. It must be noted that heuristics don’t alwaysgive us the right guess,
and hence the correct solutions. In other words educated guesses are not always
correct.
Recall the example of the mouse searching for cheese. The smell of cheese guides
the mouse in the maze, in other words the strength of the smell informs the mouse
that how far is it from the goal state. Here the smell of cheese is the heuristic and it
is quite accurate.
Similarly, consider the diagram below. The graph shows a map in which the numbers
on the edges are the distances between cities, for example, the distance between
city S and city D is 3 and between B and E is 4.
Suppose our goal is to reach city G starting from S. There can be many choices, we
might take S, A, D, E, F, G or travel from S, to A, to E, to F, and to G. At each city, if
we were to decide which city to go next, we might be interested in somesort of
information which will guide us to travel to the city from which the distance of goal is
minimum.
If someone can tell us the straight-line distance of G from each city then it might
help us as a heuristic in order to decide our route map. Consider the graph below.
40
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
It shows the straight line distances from every city to the goal. Now, cities that are
closer to the goal should be our preference. These straight line distances also known
as “as the crow flies distance” shall be our heuristic.
It is important to note that heuristics can sometimes misguide us. In the example
we have just discussed, one might try to reach city C as it is closest from the goal
according to our heuristic, but in the original map you can see that there is no
direct link between city C and city G. Even if someone reaches city C using the
heuristic, he won’t be able to travel to G from C directly, hence the heuristic can
misguide. The catch here is that crow-flight distances do not tell us that the two
cities are directly connected.
Similarly, in the example of mouse and cheese, consider that the maze has fences
fixed along some of the paths through which the smell can pass. Ourheuristic might
guide us on a path which is blocked by a fence, hence again the heuristic is
misguiding us.
The conclusion then is that heuristics do help us reduce the search space, but it is
not at all guaranteed that we’ll always find a solution. Still many people use
them as most of the time they are helpful. The key lies in the fact that how do we use
the heuristic. Consider the notion of a heuristic function.
Hence to every node/ state in our graph we will assign a heuristic value, calculated
by the heuristic function. We will start with a basic heuristically informed search
which is called Hill Climbing.
Before going to the actual example, let us give another analogy for which the name
Hill Climbing has been given to this procedure. Consider a blind person climbing a
hill. He can not see the peak of the hill. The best he can do is that froma given point
he takes steps in all possible directions and wherever he finds thata step takes
him higher he takes that step and reaches a new, higher point. He goes on doing this
41
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
until all possible steps in any direction will take him higher and this would be the peak,
hence the name hill climbing. Notice that each step that we take, gets us closer to
our goal which in this example is the peak of a hill.
Foothill Problem: Consider the diagram of a mountain below. Before reaching the
global maxima, that is the highest peak, the blind man will encounter local maxima
that are the intermediate peaks and before reaching the maximum height. At each
of these local maxima, the blind man gets the perception of having reached the
global maxima as none of the steps takes him to a higher point. Hence he might
just reach local maxima and think that he has reached the global maxima. Thus
getting stuck in the middle of searching the solution space.
When he reaches the portion of a mountain which is totally flat, whatever step he
takes gives him no improvement in height hence he gets stuck.
Ridge Problem: Consider another problem; you are standing on what seems like
a knife edge contour running generally from northeast to southwest. If you take step
in one direction it takes you lower, on the other hand when you step in some other
direction it gives you no improvement.
42
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
All these problems can be mapped to situations in our solution space searching. If
we are at a state and the heuristics of all the available options take us to a lower
value, we might be at local maxima. Similarly, if all the available heuristics take
us to no improvement we might be at a plateau. Same is the case with ridgeas we
can encounter such states in our search tree.
The solution to all these problems is randomness. Try taking random steps in random
direction of random length and you might get out of the place where youare stuck.
Example
Let us now take you through an example of searching a tree using hill climbing to end
out discussion on hill climbing.
Consider the diagram below. The tree corresponds to our problem of reaching city
M starting from city S. In other words our aim is to find a path from S to [Link]
now associate heuristics with every node, that is the straight line distancefrom the
path-terminating city to the goal city.
43
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
From C we see that city I give us more improvement hence we move to I and
then finally to M.
44
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Notice that we only traveled a small portion of the search space and reached our goal.
Hence the informed nature of the search can help reduce space and time.
The following sequence of diagrams will show you how Beam Search works in a
search tree.
We start with a search tree with L as goal state and k=2, that is at every level we
will only consider the best 2 nodes. When standing on S we observe that the only two
nodes available are A and B so we explore both of them as shown below.
45
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
From here we have C, D, E and F as the available options to go. Again, we select
the two best of them and we explore C and E as shown in the diagram below.
46
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
47
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 9
Heuristically Informed Searching Techniques
Just as beam search considers best k nodes at every level, best first search considers
all the open nodes so far and selects the best amongst them. The following sequence
of diagrams will show you how a best first search procedure works in a search tree.
We start with a search tree as shown above. From S we observe that A is the
best option so we explore A.
48
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
which is D.
49
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
At last from H we find L as the best. Hence best first search is a greedy approach will
looks for the best amongst the available options and hence can sometimes reduce
the searching time. All these heuristically informed procedures are considered better
but they do not guarantee the optimal solution, as they aredependent on the quality
of heuristic being used.
A simplest approach to find the optimal solution is this; find all the possible solutions
using either an uninformed search or informed search and once youhave searched
the whole search space and no other solution exists, then choose the most optimal
amongst the solutions found. This approach is analogous to the brute force method
and is also called the British museum procedure.
But in reality, exploring the entire search space is never feasible and at times is
not even possible, for instance, if we just consider the tree corresponding to a game
of chess (we will learn about game trees later), the effective branchingfactor is 16
and the effective depth is 100. The number of branches in an exhaustive survey
would be on the order of 10120. Hence a huge amount of computation power and time
is required in solving the optimal search problems ina brute force manner.
In order to solve our problem of optimal search without using a brute force technique,
people have come up with different procedures. One such procedureis called
branch-and-bound method.
The length of the complete path from S to G is 9. Also note that while traveling from
S to B we have already covered a distance of 9 units. So traveling further from S D
A B to some other node will make the path longer. So we ignore any further paths
ahead of the path S D A B.
The diagram above shows the same city road map with distance between the
cities labels on the edges. We convert the map to a tree as shown below.
51
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
From S the options to travel are B and D, the children of A and D the child of S.
Among these, D the child of S is the best option. So we explore D.
52
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
then B,
then D,
53
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
then E.
54
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
When we explore E we find out that if we follow this path further, our path length will
increase beyond 9 which is the distance of S to G. Hence we block all the further
sub-trees along this path, as shown in the diagram below.
We then move to F as that is the best option at this point with a value 7.
then C,
We see that C is a leaf node so we bind C too as shown in the next diagram.
55
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Then we move to B on the right hand side of the tree and bind the sub trees
ahead of B as they also exceed the path length 9.
56
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
We go on proceeding in this fashion, binding the paths that exceed 9 and hence
we are saved from traversing a considerable portion of the tree. The subsequent
diagrams complete the search until it has found all the optimal solution, that is
along the right hand branch of the tree.
57
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Notice that we have saved ourselves from traversing a considerable portion of the
tree and still have found the optimal solution. The basic idea was to reducethe
search space by binding the paths that exceed the path length from S to G.
1. Estimates
2. Dynamic Programming
The idea of estimates is that we can travel in the solution space using a heuristic
estimate. By using “guesses” about remaining distance as well as facts about
distance already accumulated we will be able to travel in the solution space more
efficiently. Hence we use the estimates of the remaining distance. A problem here
is that if we go with an overestimate of the remaining distance then we might loose a
solution that is somewhere nearby. Hence we always travel with underestimates of
the remaining distance. We will demonstrate this improvement with an example.
The second improvement is dynamic programming. The simple idea behind dynamic
programming is that if we can reach a specific node through more thanone different
path then we shall take the path with the minimum cost.
58
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
In the diagram you can see that we can reach node D directly from S with a cost of
3 and via S A D with a cost of 6 hence we will never expand the path with the larger
cost of reaching the same node.
When we include these two improvements in branch and bound then we name it
as a different technique known as A* Procedure.
2.21 A* Procedure
This is actually branch and bound technique with the improvement of underestimates
and dynamic programming.
We will discuss the technique with the same example as that in branch-and- bound.
The values on the nodes shown in yellow are the underestimates of the distance
of a specific node from G. The values on the edges are the distance between two
adjacent cities.
59
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Then B. As all the sub-trees emerging from B make our path length more than 9 units
so we bound this path, as shown in the next diagram.
60
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Now observe that to reach node D that is the child of A we can reach it either with
a cost of 12 or we can directly reach D from S with a cost of 9. Hence using dynamic
programming we will ignore the whole sub-tree beneath D (the child ofA) as
shown in the next diagram.
61
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Now A and E are equally good nodes so we arbitrarily choose amongst them,
and we move to A.
62
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As the sub-tree beneath A expands the path length is beyond 9 so we bind it.
We proceed in this manner. Next we visit E, then we visit B the child of E, we bound
the sub-tree below B. We visit F and finally we reach G as shown in the subsequent
diagrams.
63
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Notice that by using underestimates and dynamic programming the search space
64
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
was further reduced and our optimal solution was found efficiently.
Such scenarios usually occur in game playing where two opponents also called
adversaries are searching for a goal. Their goals are usually contrary to each other.
For example, in a game of tic-tac-toe player one might want that he should complete
a line with crosses while at the same time player two wants to completea line of
zeros. Hence both have different goals. Notice further that if player oneputs a
cross in any box, player-two will intelligently try to make a move that would leave
player-one with minimum chance to win, that is, he will try to stop player- one from
completing a line of crosses and at the same time will try to completehis line of
zeros.
Many games can be modeled as trees as shown below. We will focus on board games
for simplicity.
Searches in which two or more players with contrary goals are trying to explore the
same solution space in search of the solution are called adversarial searches.
65
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 10
Heuristically Informed Searching Techniques
2.23 Minimax Procedure
In adversarial searches one player tries to cater for the opponent’s moves by
intelligently deciding that what will be the impact of his own move on the over all
configuration of the game. To develop this stance he uses a look ahead thinking
strategy. That is, before making a move he looks a few levels down the game tree
to see that what can be the impact of his move and what options will be opento the
opponent once he has made this move.
To clarify the concept of adversarial search let us discuss a procedure called the
minimax procedure.
Here we assume that we have a situation analyzer that converts all judgments about
board situations into a single, over all quality number. This situation analyzer is
also called a static evaluator and the score/ number calculated by the evaluator is
called the static evaluation of that node. Positive numbers, by convention indicate
favor to one player. Negative numbers indicate favor to the other player. The player
hoping for positive numbers is called maximizing player or maximizer. The other
player is called minimizing player or minimizer. The maximizer has to keep in view
that what choices will be available to the minimizeron the next step. The minimizer
has to keep in view that what choices will be available to the maximizer on the next
step.
66
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Standing at node A the maximizer wants to decide which node to visit next, that
is, choose between B or C. The maximizer wishes to maximize the score so
apparently 7 being the maximum score, the maximizer should go to C and then to
G. But when the maximizer will reach C the next turn to select the node will be of the
minimizer, which will force the game to reach configuration/node F with ascore of
2. Hence maximizer will end up with a score of 2 if he goes to C from A. On the other
hand, if the maximizer goes to B from A the worst which the minimizer can do is
that he will force the maximizer to a score of 3. Now, since the choice is between
scores of 3 or 2, the maximizer will go to node B from A.
67
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Sitting at A, the player-one will observe that if he moves to B the best he can get
is 3.
68
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
So the value three travels to the root A. Now after observing the other side of the tree,
this score will either increase or will remain the same as this level is for the maximizer.
When he evaluates the first leaf node on the other side of the tree, he will see
that the minimizer can force him to a score of less than 3 hence there is no needto
fully explore the tree from that side. Hence the right most branch of the tree will be
pruned and won’t be evaluated for static evaluation.
69
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
70
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
71
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
72
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
73
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
74
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
2.25 Summary
People used to think that one who can solve more problems is more intelligent
Generate and test is the classical approach to solving problems
Problem representation plays a key role in problem solving
The components of problem solving include
o Problem Statement
o Operators
o Goal State
o Solution Space
Searching is a formal mechanism to explore alternatives
Searches can be blind or uninformed, informed, heuristic, non-optimal and
optional.
Different procedures to implement different search strategies form the major
content of this chapter
2.26 Problems
Q1 Consider that a person has never been to the city air port. Its early in the morning
and assume that no other person is awake in the town who can guidehim on the
way. He has to drive on his car but doesn’t know the way to airport. Clearly identify
the four components of problem solving in the above statement,
i.e. problem statement, operators, solution space, and goal state. Should he follow
blind or heuristic search strategy? Try to model the problem in a graphical
representation.
Q2 Clearly identify the difference between WSP (Well-Structured Problems) and ISP
(Ill- Structured) problems as discussed in the lecture. Give relevant examples.
Q3 Given the following tree. Apply DFS and BFS as studied in the chapter. Show the
state of the data structure Q and the visited list clearly at every step. S is the initial
state and D is the goal state.
75
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Q4 Discuss how progressive deepening uses a mixture of DFS and BFS to eliminate
the disadvantages of both and at the same time finds the solution is a given tree.
Support your answer with examples of a few trees.
10
S
9 11
B
7 9 12
C D E
7 7 7
F G H
7 5
I
Q6 Discuss how best first search works in a tree. Support your answer with an
example tree. Is best first search always the best strategy? Will it always guarantee
the best solution?
Q7 Discuss how beam search with degree of the search = 3 propagates in the given
search tree. Is it equal to best first search when the degree = 1.
76
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Q8 Discuss the main concept behind branch and bound search strategy. Suggest
Improvements in the Algorithm. Simulate the algorithm on the given graph below. The
values on the links are the distances between the cities. The numbers on the nodes
are the estimated distance on the node from the goal state.
Q9. Run the MiniMax procedure on the given tree. The static evaluation scores for
each leaf node are written under it. For example the static evaluation scores for the
left most leaf node is 80.
80 10 55 45 65 100 20 35 70
Q10 Discuss how Alpha Beta Pruning minimizes the number of static evaluations
at the leaf nodes by pruning branches. Support your answer with small examples
of a few trees.
Q11 Simulate the Minimax procedure with Alpha Beta Pruning algorithm on the
following search tree.
77
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
78
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 11
Introduction of Genetic Algorithm
3 Genetic Algorithms
3.1 Discussion on Problem Solving
Another thing we have noticed in the previous chapter is that we perform a sequential
search through the search space. In order to speed up the techniqueswe can follow
a parallel approach where we start from multiple locations (states) in the solution
space and try to search the space in parallel.
Suppose we were to climb up a hill. Our goal is to reach the top irrespective of
how we get there. We apply different operators at a given position, and move in the
direction that gives us improvement (more height). What if instead of starting from
one position we start to climb the hill from different positions as indicated by the
diagram below.
In other words, we start with different independent search instances that startfrom
different locations to climb up the hill.
Further think that we can improve this using a collaborative approach where these
instances interact and evolve by sharing information in order to solve the problem.
You will soon find out that what we mean by interact and evolve.
79
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
algorithms are motivated from the biological concept of evolution of our genes,
hence the name Genetic Algorithms, commonly terms as GA.
Genetic Algorithms is a search method in which multiple search paths are followed
in parallel. At each step, current states of different pairs of these paths are
combined to form new paths. This way the search paths don't remain independent,
instead they share information with each other and thus try to improve the overall
performance of the complete search space.
Quit when you have a satisfactory solution (or you run out of time)
The two terms introduced here are inheritance and mutation. Inheritance has the
same notion of having something or some attribute from a parent while mutation
refers to a small random change. We will explain these two terms as we discuss
the solution to a few problems through GA.
80
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As you can observe, the above solution is totally in accordance with the basic
algorithm you saw in the previous section. The table on the next page shows which
steps correspond to what.
81
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
For the sake of simplicity we only use mutation for now to generate the new
individuals. We will incorporate inheritance later in the example. Let’s introduce the
concept of an evaluation function. An evaluation function is the criteria that check
various individuals/ solutions for being better than others in the population. Notice
that mutation can be as simple as just flipping a bit at random or any number of bits.
We go on repeating the algorithm until we either get our required word that is a 32-bit
number with all ones, or we run out of time. If we run out of time, we either present
the best possible solution (the one with most number of 1-bits) as the answer or we
can say that the solution can’t be found. Hence GA is at times usedto get optimal
solution given some parameters.
82
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 12
Genetic Algorithm Example
3.6.2 Problem 2:
• Suppose you have a large number of data points (x, y), e.g., (1, 4), (3,
9), (5, 8), ...
• You would like to fit a polynomial (of up to degree 1) through these
data points
• That is, you want a formula y = mx + c that gives you a
reasonably good fit to the actual data
• Here’s the usual way to compute goodness of fit of the
polynomial on the data points:
• Compute the sum of (actual y – predicted y)2 for all the
data points
• The lowest sum represents the best fit
• You can use a genetic algorithm to find a “pretty good” solution
By a pretty good solution we simply mean that you can get reasonably good
polynomial that best fits the given data.
• Your formula is y = mx + c
• Your unknowns are m and c; where m and c are integers
• Your representation is the array [m, c]
• Your evaluation function for one array is:
• For every actual data point (x, y)
• Compute ý = mx + c
• Find the sum of (y – ý)2 over all x
• The sum is your measure of “badness” (larger numbers
are worse)
• Example: For [5, 7] and the data points (1, 10) and (2, 13):
• ý = 5x + 7 = 12 when x is 1
• ý = 5x + 7 = 17 when x is 2
• (10 - 12)2 + (13 – 17)2 = 22 + 42 = 20
• If these are the only two data points, the “badness” of [5,
7] is 20
• Your algorithm might be as follows:
• Create two-element arrays of random numbers
• Repeat 50 times (or any other number):
• For each of the arrays, compute its badness (using all
data points)
• Keep the best arrays (with low badness)
• From the arrays you keep, generate new arrays as
follows:
• Convert the numbers in the array to binary, toggle
one of the bits at random
• Quit if the badness of any of the solution is zero
• After all 50 trials, pick the best array as your final answer
Let us solve this problem in detail. Consider that the given points are as follows.
83
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
We start will the following initial population which are the arrays representing the
solutions (m and c).
• [2 7][1 3]
• ý = 2x + 7 = 9 when x is 1
• ý = 2x + 7 = 13 when x is 3
• (5 – 9)2 + (9 – 13)2 = 42 + 42 = 32
• ý = 1x + 3 = 4 when x is 1
• ý = 1x + 3 = 6 when x is 3
• (5 – 4)2 + (9 – 6)2 = 12 + 32 = 10
Second iteration
• ý = 3x + 3 = 6 when x is 1
• ý = 3x + 3 = 12 when x is 3
84
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
• (5 – 6)2 + (9 – 12)2 = 1 + 9 = 10
Third Iteration
• ý = 2x + 3 = 5 when x is 1
• ý = 2x + 3 = 9 when x is 3
• (5 – 5)2 + (9 – 9)2 = 02 + 02 = 0
• Solution found [2 3]
• y = 2x+3
So you see that how by going through the iteration of a GA one can find a solution
to the given problem. It is not necessary in the above example that you get a
solution that gives 0 badness. In case we go on doing iterations and we run out of
time, we might just present the solution that has the least badness as the most
optimal solution given these number of iterations on this data.
In the examples so far, each “Individual” (or “solution”) had only one parent. The only
way to introduce variation was through mutation (random changes). In Inheritance or
Crossover, each “Individual” (or “solution”) has two parents. Assuming that each
organism has just one chromosome, new offspring are produced by forming a new
chromosome from parts of the chromosomes of each parent.
Let us repeat the 32-bit word example again but this time using crossover instead
of mutation.
• Suppose your “organisms” are 32-bit computer words, and you want
a string in which all the bits are ones
• Here’s how you can do it:
• Create 100 randomly generated computer words
• Repeatedly do the following:
• Count the 1 bits in each word
• Exit if any of the words have all 32 bits set to 1
• Keep the ten words that have the most 1s (discard the
rest)
• From each word, generate 9 new words as follows:
• Choose one of the other words
85
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Notice that we are generating new individuals from the best ones by using crossover.
The simplest way to perform this crossover is to combine the head of one individual
to the tail of the other, as shown in the diagram below.
86
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 13
Well known problem of Eight Queen
3.7 Eight Queens Problem
Let us now solve a famous problem which will be discussed under GA in many famous
books in AI. Its called the Eight Queens Problem.
The problem is to place 8 queens on a chess board so that none of them can
attack the other. A chess board can be considered a plain board with eight columns
and eight rows as shown below.
The possible cells that the Queen can move to when placed in a particular square are
shown (in black shading)
Where the 8 digits for eight columns specify the index of the row where the queen
is placed. For example, the sequence 2 6 8 3 4 5 3 1 tells us that in first column
87
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
the queen is placed in the second row, in the second column the queen is in the
6th row so on till in the 8th column the queen is in the 1st row.
Now we need a fitness function, a function by which we can tell which board position
is nearer to our goal. Since we are going to select best individuals at every step,
we need to define a method to rate these board positions or individuals. One fitness
function can be to count the number of pairs of Queens that are not attacking each
other. An example of how to compute the fitness of a board configuration is given in
the diagram on the next page.
So once representation and fitness function is decided, the solution to the problem
is simple.
Let us quickly go through an example of how to solve this problem using GA. Suppose
individuals (board positions) chosen for crossover are:
Where the numbers 2 and 3 in the boxes to the left and right show the fitness of each
board configuration and green arrows denote the queens that can attacknone.
88
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
89
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The individuals in the initial population are shown on the left and the children
generated by swapping their tails are shown on the right. Hence we now have a
total of 4 candidate solutions. Depending on their fitness we will select the best two.
The diagram below shows where we select the best two on the bases of their fitness.
The vertical over shows the children and the horizontal oval shows the selected
individuals which are the fittest ones according to the fitness function.
90
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
That is, we represent the individual in binary and we flip at random a certain number
of bits. You might as well decide to flip 1, 2, 3 or k number of bits, at random position.
Hence GA is totally a random technique.
This process is repeated until an individual with required fitness level is found. If
no such individual is found, then the process is repeated till the overall fitness of
the population or any of its individuals gets very close to the required fitnesslevel.
An upper limit on the number of iterations is usually used to end the process in
finite time.
One of the solutions to the problem is shown as under whose fitness value is 8.
91
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Start
Initialize
Population
Evaluate Fitness
Apply mutation of Population
Mate No Yes
Solution End
individuals in Found?
population
You are encouraged to explore the internet and other books to find more
applications of GA in various fields like:
• Genetic Programming
• Evolvable Systems
• Composing Music
• Gaming
• Market Strategies
• Robotics
• Industrial Optimization
and many more.
92
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
3.8 Problems
Q1 what type of problems can be solved using GA. Give examples of at least 3
problems from different fields of life. Clearly identify the initial population,
representation, evaluation function, mutation and cross over procedure and exit
criteria.
Q2 Given pairs of (x, y) coordinates, find the best possible m, c parameters of the line
y = mx + c that generates them. Use mutation only. Present the best possible solution
given the data after at least three iterations of GA or exit if you find the solution earlier.
Q3 Solve the 8 Queens Problem on paper. Use the representations and strategy
as discussed in the chapter.
93
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 14
Knowledge Representation Techniques
4 Knowledge Representation and Reasoning
Now that have looked at general problem solving, lets look at knowledge
representation and reasoning which are important aspects of any artificial intelligence
system and of any computer system in general. In this section we will become familiar
with classical methods of knowledge representation and reasoning in AI.
An AI system has a perception component that allows the system to get information
from its environment. As with human perception, this may be visual, audio or other
forms of sensory information. The system must then form a meaningful and useful
representation of this information internally. This knowledge representation maybe
static or it may be coupled with a learning component that is adaptive and draws
trends from the perceived data.
PERCEPTION
REPRESENTATION REASONING
EXECUTION
Figure 1: The AI Cycle
problems about ratios, we would most likely use algebra, but we could also use simple
hand drawn symbols. To say half of something, you could use 0.5x or you could draw
a picture of the object with half of it colored differently. Both would convey the same
information but the former is more compact and useful in complex scenarios where
you want to perform reasoning on the information. It is important at this point to
understand how knowledge representation and reasoning are interdependent
components, and as AI system designer, you haveto consider this relationship when
coming up with any solution.
Since we do not know how the KR and reasoning components are implemented
in humans, even though we can see their manifestation in the form of intelligent
behavior, we need a synthetic (artificial) way to model the knowledge representation
and reasoning capability of humans in computers.
Before we go any further, lets try to understand what ‘knowledge’ is. Durkin refers
to it as the “Understanding of a subject area”. A well-focused subject area isreferred
to as a knowledge domain, for example, medical domain, engineering domain,
business domain, etc..
If we analyze the various types of knowledge we use in every day life, we can broadly
define knowledge to be one of the following categories:
95
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Structural Declarative
Relationships Knowledge Knowledge Objects
between Facts
Objects,
Concepts
Knowledge
Heuristic
Procedural Knowledge
Knowledge Rules
Rules of
Procedure Thumb
s
Knowledge
Knowledge
about
Knowledge
There are multiple approaches and schemes that come to mind when we begin to
think about representation
– Pictures and symbols. This is how the earliest humans represented
knowledge when sophisticated linguistic systems had not yet evolved
– Graphs and Networks
– Numbers
4.4.1 Pictures
Each type of representation has its benefits. What types of knowledge is best
represented using pictures? , e.g. can we represent the relationship between
individuals in a family using a picture? We could use a series of pictures to store
procedural knowledge, e.g. how to boil an egg. But we can easily see that pictures
are best suited for recognition tasks and for representing structural information.
However, pictorial representations are not very easily translated to useful information
in computers because computers cannot interpret pictures directly with out complex
reasoning. So even though pictures are useful for human understanding, because
they provide a high level view of a concept to be obtained readily, using them for
representation in computers is not as straight forward.
96
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Tariq
Amina Hassan
Ali
We can also represent procedural knowledge using graphs, e.g. How to start a
car?
4.4.3 Numbers
4.4.4 An Example
In the context of the above discussion, let’s look at some ways to represent the
knowledge of a family
Using a picture
97
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As you can see, this kind of representation makes sense readily to humans, but if we
give this picture to a computer, it would not have an easy time figuring out the
relationships between the individuals, or even figuring out how many individualsare
there in the picture. Computers need complex computer vision algorithms to
understand pictures.
Using a graph
Tariq Ayesha
Mona
This example demonstrates the fact that each knowledge representation scheme has
its own strengths and weaknesses.
98
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
4.6 Facts
Facts are a basic block of knowledge (the atomic units of knowledge). They represent
declarative knowledge (they declare knowledge about objects). A proposition is the
statement of a fact. Each proposition has an associated truth value. It may be either
true or false. In AI, to represent a fact, we use a proposition and its associated
truth value, e.g.
–Proposition A: It is raining
–Proposition B: I have an umbrella
–Proposition C: I will go to school
Facts may be single-valued or multi-valued, where each fact (attribute) can take
one or more than one values at the same time, e.g. an individual can only have one
eye color, but may have many cars. So the value of attribute cars may contain
more than one value.
Uncertain facts
Fuzzy facts
Fuzzy facts are ambiguous in nature, e.g. the book is heavy/light. Here it is unclear
what heavy means because it is a subjective description. Fuzzy representation is
used for such facts. While defining fuzzy facts, we use certainty factor values to
specify value of “truth”. We will look at fuzzy representation in more detail later.
Object-Attribute-Value triplets
Object-Attribute Value Triplets or OAV triplets are a type of fact composed ofthree
parts; object, attribute and value. Such facts are used to assert a particular property
of some object, e.g.
o Object: Ali
o Attribute: eye color
99
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
o Value: brown
Eye Color
Ali
4.7 Rules
Rules are another form of knowledge representation. Durkin defines a rule as “A
knowledge structure that relates some known information to other information that
can be concluded or inferred to be true.”
IF it is raining OR it is snowing
THEN I will not go to school
10
0
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Relationship
Relationship rules are used to express a direct occurrence relationship between
two events, e.g. IF you hear a loud sound THEN the silencer is not working
Recommendation
IF it is raining
THEN bring an umbrella
Directive
Directive rules are like recommendations rule but they offer a specific line of
action, as opposed to the ‘advice’ of a recommendation rule, e.g.
Variable Rules
If the same type of rule is to be applied to multiple objects, we use variable rules,
i.e. rules with variables, e.g.
If X is a Student
AND X’s GPA>3.7
THEN place X on honor roll.
Such rules are called pattern-matching rules. The rule is matched with knownfacts
and different possibilities for the variables are tested, to determine the truthof the
fact.
Uncertain Rules
Uncertain rules introduce uncertain facts into the system, e.g.
IF you have never won a match
THEN you will most probably not win this time.
Meta Rules
Meta rules describe how to use other rules, e.g.
IF you are coughing AND you have chest congestion
THEN use the set of respiratory disease rules.
Rule Sets
As in the previous example, we may group rules into categories in our knowledge
representation scheme, e.g. the set of respiratory disease rules
10
1
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 15
Knowledge Representation through Networks and Logics
4.8 Semantic networks
Semantic networks are graphs, with nodes representing objects and arcs
representing relationships between objects. Various types of relationships may be
defined using semantic networks. The two most common types of relationships
are
–IS-A (Inheritance relation)
–HAS (Ownership relation)
Let’s consider an example semantic network to demonstrate how knowledge in a
semantic network can be used
Travels by
Network Operation
To infer new information from semantic networks, we can ask questions from
nodes
– Ask node vehicle: ‘How do you travel?’
– This node looks at arc and replies: road
– Ask node Suzuki: ‘How do you travel?’
– This node does not have a link to travel therefore it asks other
nodes linked by the IS-A link
– Asks node Car (because of IS-A relationship)
– Asks node Vehicle (IS-A relationship)
– Node Vehicle Replies: road
10
2
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
4.9 Frames
Properties:
Age: 19
GPA: 4.0
Ranking: 1
The various components within the frame are called slots, e.g. Frame Name slot.
4.9.1 Facets
A slot in a frame can hold more that just a value, it consists of metadata and
procedures also. The various aspects of a slot are called facets. They are a feature
of frames that allows us to put constraints on frames. e.g. IF-NEEDED Facets are
called when the data of a particular slot is needed. Similarly, IF- CHANGED Facets
are when the value of a slot changes.
4.10 Logic
Just like algebra is a type of formal logic that deals with numbers, e.g. 2+4 = 6,
propositional logic and predicate calculus are forms of formal logic for dealing with
propositions. We will consider two basic logic representation techniques:
–Propositional Logic
–Predicate Calculus
4.10.1 Propositional logic
10
3
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
p = It is raining
q = I carry an umbrella
AND (Conjunction)
OR (Disjunction)
NOT (Negation)
If … then (Conditional)
If and only if (bi-conditional)
T T T T T T
T F F T F F
F T F T T F
F F F F T T
[Link] Quantifiers
The symbol for the universal quantifier is .It is read as “for every” or “for all” and
used in formulae to assign the same truth value to all variables in the domain,
e.g. in the domain of numbers, we can say that ( .x) ( x + x = 2x). In words this
is: for every x (where x is a number), x + x = 2x is true. Similarly, in the domain of
shapes, we can say that (. x) (x = square x = polygon), which is read inwords
as: every square is a polygon. In other words, for every x (where x is a shape), if x is
a square, then x is a polygon (it implies that x is a polygon).
Existential quantifier
The symbol for the existential quantifier is .. It is read as “there exists”, “ for some”,
“for at least one”, “there is one”, and is used in formulae to say that something is true
for at least one value in the domain, e.g. in the domain ofpersons, we can say that
(. x) (Person (x) father (x, Ahmed) ). In words this reads as: there exists some
person, x who is Ahmed’s father.
100
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
First order predicate logic is the simplest form of predicate logic. The main types
of symbols used are
–Constants are used to name specific objects or properties, e.g. Ali, Ayesha, blue,
ball.
man(ahmed)
father(ahmed, belal)
brother(ahmed, chand)
Predicates
owns(belal, car)
tall(belal)
hates(ahmed, chand)
family()
Y (sister(Y,ahmed))
Formulae
X,Y,Z(man(X) man(Y) man(Z) father(Z,Y)
father(Z,X) brother(X,Y))
X, Y and Z Variables
ahmed, belal, chand and car Constants
The predicate section outlines the known facts about the situation in the form of
predicates, i.e. predicate name and its arguments. So, man(ahmed) means that
ahmed is a man, hates(ahmed, chand) means that ahmed hates chand.
101
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The formulae sections outlines formulae that use universal quantifiers and variables
to define certain rules. Y (sister(Y,ahmed)) says that there existsno Y such
that Y is the sister of ahmed, i.e. ahmed has no sister. Similarly,
X,Y,Z(man(X) man(Y) man(Z) father(Z,Y) father(Z,X) brother(X,Y))
means that if there are three men, X, Y and Z, and Z is the father of both X and
Y, then X and Y are bothers. This expresses the rule for the two individuals
being brothers.
102
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 16
Different Types of Reasoning
4.11 Reasoning
Throughout this section, you will notice how representing knowledge in a particular
way is useful for a particular kind of reasoning.
–Observation: All the crows that I have seen in my life are black.
–Conclusion: All crows are black
Thus the essential difference is that inductive reasoning is based on experience while
deductive reasoning is based on rules, hence the latter will always be correct.
103
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Deduction is exact in the sense that deductions follow in a logically provable way from
the axioms. Abduction is a form of deduction that allows for plausible inference, i.e.
the conclusion might be wrong, e.g.
This conclusion might be false, because there could be other reasons that she is
carrying an umbrella, e.g. she might be carrying it to protect herself from the sun.
Analogical reasoning works by drawing analogies between two situations, looking for
similarities and differences, e.g. when you say driving a truck is just like driving a
car, by analogy you know that there are some similarities in the driving mechanism,
but you also know that there are certain other distinct characteristicsof each.
Non-Monotonic reasoning is used when the facts of the case are likely to change
after some time, e.g.
Rule:
IF the wind blows
THEN the curtains sway
When the wind stops blowing, the curtains should sway no longer. However, if we
use monotonic reasoning, this would not happen. The fact that the curtains are
swaying would be retained even after the wind stopped blowing. In non-
104
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
4.12.7 Inference
Inference is the process of deriving new information from known information. Inthe
domain of AI, the component of the system that performs inference is calledan
inference engine. We will look at inference within the framework of ‘logic’, which
we introduced earlier
[Link] Logic
Logic, which we introduced earlier, can be viewed as a formal language. As a
language, it has the following components: syntax, semantics and proof systems.
Syntax
Syntax is a description of valid statements, the expressions that are legal in that
language. We have already looked at the syntax of two type of logic systems called
propositional logic and predicate logic. The syntax of proposition gives us ways to
use propositions, their associated truth value and logical connectives to reason.
Semantics
Semantics pertain to what expressions mean, e.g. the expression ‘the cat drovethe
car’ is syntactically correct, but semantically non-sensible.
Proof systems
A logic framework comes with a proof system, which is a way of manipulating given
statements to arrive at new statements. The idea is to derive ‘new’ information from
the given information.
Recall proofs in math class. You write down all you know about the situation and then
try to apply all the rules you know repeatedly until you come up with the statement
you were supposed to prove. Formally, a proof is a sequence of statements aiming
at inferring some information. While doing a proof, you usually proceed with the
following steps:
–You begin with initial statements, called premises of the proof (or knowledge
base)
–Use rules, i.e. apply rules to the known information
–Add new statements, based on the rules that match
Repeat the above steps until you arrive at the statement you wished to prove.
105
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Rules of inference are logical rules that you can use to prove certain things. As
you look at the rules of inference, try to figure out and convince yourself that the rules
are logically sound, by looking at the associated truth tables. The rules we will use
for propositional logic are:
Modus Ponens
Modus Tolens
And-Introduction
And-Elimination
Modus ponens
“Modus ponens" means "affirming method“. Note: From now on in our discussion
of logic, anything that is written down in a proof is a statement that is true.
Modus-
Ponens
Modus Ponens says that if you know that alpha implies beta, and you know alpha
to be true, you can automatically say that beta is true.
Modus Tolens
Modus Tolens says that "alpha implies beta" and "not beta" you can conclude "not
alpha". In other words, if Alpha implies beta is true and beta is known to benot
true, then alpha could not have been true. Had alpha been true, beta would
automatically have been true due to the implication.
Modus -
Tolens
And-introduction say that from "Alpha" and from "Beta" you can conclude "Alphaand
Beta". That seems pretty obvious, but is a useful tool to know upfront. Conversely,
and-elimination says that from "Alpha and Beta" you can conclude "Alpha".
106
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
And- And-
Introduction elimination
Now, we will do an example using the above rules. Steps 1, 2 and 3 are added initially,
they are the given facts. The goal is to prove D. Steps 4-8 use the rulesof inference
to reach at the required goal from the given rules.
2 A→C Given
3 (B C) →D Given
4 A 1 And-elimination
5 C 4, 2 Modus Ponens
6 B 1 And-elimination
7 B C 5, 6 And-introduction
8 D 7, 3 Modus Ponens
107
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Note: The numbers in the derivation reference the statements of other step numbers.
108
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 17
Different Types of Reasoning
The deduction mechanism we discussed above, using the four rules of inference may
be used in practical systems, but is not feasible. It uses a lot of inference rules that
introduce a large branch factor in the search for a proof. An alternativeis approach
is called resolution, a strategy used to determine the truth of an assertion, using only
one resolution rule:
To see how this rule is logically correct, look at the table below:
Α β Γ
F F F T F T F
F F T T F T T
F T F F T F F
F T T F T T T
T F F T T T T
T F T T T T T
T T F F T F T
T T T F T T T
You can see that the rows where the premises of the rule are true, the conclusion
of the rule is true also.
To be able to use the resolution rule for proofs, the first step is to convert all given
statements into the conjunctive normal form.
( A B) (B C) (D)
109
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
note : D (D D)
The outermost structure is made up of conjunctions. Inner units called clauses are
made up of disjunctions. The components of a statement in CNF are clauses and
literals. A clause is the disjunction of many units. The units that make up a clause are
called literals. And a literal is either a variable or the negation of a variable. So you
get an expression where the negations are pushed in as tightlyas possible, then
you have ORs, then you have ANDs. You can think of each clause as a requirement.
Each clause has to be satisfied individually to satisfy the entire statement.
A B A B
2. Drive in negations using De Morgan’s Laws, which are given below
( A B) (A B)
( A B) (A B)
3. Distribute OR over AND
A (B C)
( A B) ( A C)
[Link] Example of CNF conversion
( A B) (C D)
1.( A B) (C D)
3.(A C D) (B C D)
[Link] Resolution by Refutation
Now, we will look at a proof strategy called resolution refutation. The steps for
proving a statement using resolution refutation are:
• Write all sentences in CNF
• Negate the desired conclusion
• Apply the resolution rule until you derive a contradiction or cannot apply
the rule anymore.
• If we derive a contradiction, then the conclusion follows from the given
axioms
• If we cannot apply anymore, then the conclusion cannot be proved from
the given axioms
110
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
4 ¬C Negated Conclusion 3 B →C
5 BC 1, 2
6 ¬A 2, 4
7 ¬B 3, 4
8 C 5, 7 Contradiction!
The statements in the table on the right are the given statements. These are
converted to CNF and are included as steps 1, 2 and 3. Our goal is to prove C.
Step 4 is the addition of the negation of the desired conclusion. Steps 5-8 use the
resolution rule to prove C.
Note that you could have come up with multiple ways of proving R:
Step Formula Step Formula
4 ¬C 4 ¬C
5 ¬B 3,4 5 BC 1,2
6 A 1,5 6 ¬A 2,4
7 C 2,6 7 ¬B 3,4
8 C 5,7
111
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
1. (A→B) →B
2. A→C
3. ¬C → ¬B
Prove C
Convert to CNF
1.( A B) B
(A B) B
(A B) B
( A B) B
( A B) (B B)
( A B)
2.A C A C
3.C B C B
Proof
3 C ¬B Given 3 C ¬B Given
4 ¬C Negation of 4 ¬C Negation of
conclusion conclusion
5 A 2,4 5 ¬B 3,4
6 C 2,5 6 A 1,5
7 C 2,6
As you can see from the examples above, it is often possible to apply more than
one rule at a particular step. We can use several strategies in such cases. We
may apply rules in an arbitrary order, but there are some rules of thumb that may
make the search more efficient
112
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
• Unit preference: prefer using a clause with one literal. Produces shorter
clauses
• Set of support: Try to involve the thing you are trying to prove. Chose a
resolution involving the negated goal. These are relevant clauses. We move
‘towards solution’
113
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 18
Different Expert Systems and their usage
5 Expert Systems
Expert Systems (ES) are a popular and useful application area in AI. Having studied
KRR, it is instructive to study ES to see a practical manifestation of the principles
learnt there.
Before we attempt to define an expert system, we have look at what we take the term
‘expert’ to mean when we refer to human experts. Some traits that characterize
experts are:
Try to think of the various traits you associate with experts you might know, e.g. skin
specialist, heart specialist, car mechanic, architect, software designer. Youwill
see that the underlying common factors are similar to those outlined above.
Knowledge
Reasoning
Before we begin to study development of expert systems, let us get some historical
perspective about the earliest practical AI systems. After the so-called dark ages in
AI, expert systems were at the forefront of rebirth of AI. There was a realization in the
late 60’s that the general framework of problem solving was not
114
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
enough to solve all kinds of problem. This was augmented by the realization that
specialized knowledge is a very important component of practical systems. People
observed that systems that were designed for well-focused problems and domains
out performed more ‘general’ systems. These observations provided the motivation
for expert systems. Expert systems are important historically as the earliest AI
systems and the most used systems practically. To highlight the utilityof expert
systems, we will look at some famous expert systems, which served to define the
paradigms for the current expert systems.
115
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The following table compares human experts to expert systems. While looking at
these, consider some examples, e.g. doctor, weather expert.
An expert system may take two main roles, relative to the human expert. It may
replace the expert or assist the expert
Replacement of expert
This proposition raises many eyebrows. It is not very practical in some situations,
but feasible in others. Consider drastic situations where safety or location is an issue,
e.g. a mission to Mars. In such cases replacement of an expert may be the only
feasible option. Also, in cases where an expert cannot be available at aparticular
geographical location e.g. volcanic areas, it is expedient to use an expert system
as a substitute.
An example of this role is a France based oil exploration company that maintains
a number of oil wells. They had a problem that the drills would occasionally become
stuck. This typically occurs when the drill hits something that prevents it from turning.
Often delays due to this problem cause huge losses until an expert can arrive at the
scene to investigate. The company decided to deploy an expert system so solve the
problem. A system called ‘Drilling Advisor’ (Elf-Aquitane 1983) was developed,
which saved the company from huge losses that would be incurred otherwise.
Assisting expert
116
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Assisting an expert is the most commonly found role of an ES. The goal is to aid an
expert in a routine tasks to increase productivity, or to aid in managing a complex
situation by using an expert system that may itself draw on experienceof other
(possibly more than one) individuals. Such an expert system helps an expert
overcome shortcomings such as recalling relevant information.
XCON is an example of how an ES can assist an expert.
Control applications
Design
ES are used for design applications to configure objects under given design
constraints, e.g. XCON. Such ES often use non-monotonic reasoning, because of
implications of steps on previous steps. Another example of a design ES is PEACE
(Dincbas 1980), which is a CAD tool to assist in design of electronicstructures.
117
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Simulation
ES can be used to model processes or systems for operational study, or for use along
with tutoring applications
Interpretation
ES may be used for planning applications, e.g. recommending steps for a robot
to carry out certain steps, cash management planning. SMARTPlan is such asystem,
a strategic market planning expert (Beeral, 1993). It suggests appropriate
marketing strategy required to achieve economic success. Similarly, prediction
systems infer likely consequences from a given situation.
When analyzing a particular domain to see if an expert system may be useful, the
system analyst should ask the following questions:
118
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 19
Architecture of Expert Systems
5.7 Expert system structure
Having discussed the scenarios and applications in which expert systems may be
useful, let us delve into the structure of expert systems. To facilitate this, we use
the analogy of an expert (say a doctor) solving a problem. The expert has the
following:
Solution Conclusions
We can view the structure of the ES and its components as shown in the figure
below
Expert System
119
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Analogy: STM
-Initial Case facts
-Inferred facts
USER
Analogy: LTM
- Domain knowledge
The knowledge base is the part of an expert system that contains the domain
knowledge, i.e.
As discussed in the KRR section, one way of encoding that knowledge is in the
form of IF-THEN rules. We saw that such representation is especially conduciveto
reasoning.
The working memory is the ‘part of the expert system that contains the problem
facts that are discovered during the session’ according to Durkin. One session in
the working memory corresponds to one consultation. During a consultation:
120
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The inference engine can be viewed as the processor in an expert system that
matches the facts contained in the working memory with the domain knowledge
contained in the knowledge base, to draw conclusions about the problem. It works
with the knowledge base and the working memory, and draws on both toadd new
facts to the working memory.
We will illustrate the above features using examples in the following sections
5.7.4 Expert System Example: Family
Let’s look at the example above to see how the knowledge base and working memory
are used by the inference engine to add new facts to the working memory. The
knowledge base column on the left contains the three rules of the system. The
working memory starts out with two initial case facts:
The inference engine matches each rule in turn with the rules in the working memory
121
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
to see if the premises are all matched. Once all premises are matched,the rule is
fired and the conclusion is added to the working memory, e.g. the premises of rule 1
match the initial facts, therefore it fires and the fact brother(Ali, Ahmed is fired). This
matching of rule premises and facts continues until no new facts can be added to
the system. The matching and firing is indicated by arrowsin the above table.
IF person(X)
AND sameSchool(X,Y)
THEN sameSchool(Ali, Ahmed)
weekend()
AND weekend()
carryUmbrella(Ahmed)
122
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 20
Architecture of Expert Systems and Inference Mechanism
123
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The arrows above provide the explanation for how the fact like(Ali, Ahmed) was
added to the working memory.
Having looked at the basic operation of expert systems, we can begin to outline
desirable properties or characteristics we would like our expert systems to possess.
ES have an explanation facility. This is the module of an expert system that allows
transparency of operation, by providing an explanation of how the inference engine
reached the conclusion. We want ES to have this facility so that users can have
knowledge of how it reaches its conclusion.
An expert system is different from conventional programs in the sense that program
control and knowledge are separate. We can change one while affecting the other
minimally. This separation is manifest in ES structure; knowledge base, working
memory and inference engine. Separation of these components allows changes to
the knowledge to be independent of changes in control and vice versa.
”There is a clear separation of general knowledge about the problem (the rules
forming the knowledge base) from information about the current problem (the input
data) and methods for applying the general knowledge to a problem (therule
interpreter).The program itself is only an interpreter (or general reasoning
mechanism) and ideally the system can be changed simply by adding or subtracting
rules in the knowledge base” (Duda)
Besides these properties, an expert system also possesses expert knowledge in that
it embodies expertise of human expert. If focuses expertise because the larger the
domain, the more complex the expert system becomes, e.g. a car diagnosis expert
is more easily handled if we make separate ES components for engine problems,
electricity problems, etc. instead of just designing one component for all problems.
124
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Unlike traditional programs, you don’t just program an ES and consider it ‘built’. It
grows as you add new knowledge. Once framework is made, addition of knowledge
dictates growth of ES.
The main people involved in an ES development project are the domain expert,
the knowledge engineer and the end user.
Domain Expert
A domain expert is ‘A person who posses the skill and knowledge to solve a specific
problem in a manner superior to others’ (Durkin). For our purposes, an expert should
have expert knowledge in the given domain, good communication skills, availability
and readiness to co-operate.
Knowledge Engineer
A knowledge engineer is ‘a person who designs, builds and tests an Expert System’
(Durkin). A knowledge engineer plays a key role in identifying, acquiring and
encoding knowledge.
End-user
The end users are the people who will use the expert system. Correctness, usability
and clarity are important ES features for an end user.
Approach
125
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Rules
Rule 1
IF The patient has deep cough
AND We suspect an infection
THEN The patient has Pneumonia
Rule 2
IF The patient’s temperature is above 100
THEN Patient has fever
Rule 3
IF The patient has been sick for over a fortnight
AND The patient has a fever
THEN We suspect an infection
Case facts
First Pass
Second Pass
126
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Third Pass
Now, no more facts can be added to the WM. Diagnosis: Patient has Pneumonia.
Undirected search
Conflict resolution
Another important issue is conflict resolution. This is the question of what to do when
the premises of two rules match the given facts. Which should be firedfirst? If we
127
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
If both rules are fired, you will add conflicting recommendations to the working
memory.
To overcome the conflict problem stated above, we may choose to use on of the
following conflict resolution strategies:
Fire first rule in sequence (rule ordering in list). Using this strategy all the rules
in the list are ordered (the ordering imposes prioritization). When more than
one rule matches, we simply fire the first in the sequence
Assign rule priorities (rule ordering by importance). Using this approach we
assign explicit priorities to rules to allow conflict resolution.
More specific rules (more premises) are preferred over general rules. This
strategy is based on the observation that a rule with more premises, in a
sense, more evidence or votes from its premises, therefore it should be fired
in preference to a rule that has less premises.
128
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 21
Inference Mechanisms
5.11.2 Backward chaining
Approach
As you look at the example for backward chaining below, notice how the approach
of backward chaining is like depth first search.
Consider the same example of doctor and patient that we looked at previously
Rules
Rule 1
IF The patient has deep cough
AND We suspect an infection
THEN The patient has Pneumonia
Rule 2
IF The patient’s temperature is above 100
THEN Patient has fever
Rule 3
IF The patient has been sick for over a fortnight
AND The patient has fever
THEN We suspect an infection
Goal
130
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
In the figures below, each node represents a statement. Forward chaining starts
with several facts in the working memory. It uses rules to generate more facts. In
the end, several facts have been added, amongst which one or more may be relevant.
Backward chaining however, starts with the goal state and tries to reach down to all
primitive nodes (marked by ‘?’), where information is sought from the user.
131
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 26
Different Designing Phases of Expert Systems
We will now look at software engineering methodology for developing practical ES.
The general stages of the expert system development lifecycle or ESDLC are
Feasibility study
Rapid prototyping
Alpha system (in-house verification)
Beta system (tested by users)
Maintenance and evolution
Linear model
The Linear model (Bochsler 88) of software development has been successfully used
in developing expert systems. A linear sequence of steps is applied repeatedly in an
iterative fashion to develop the ES. The main phases of thelinear sequence are
Planning
Knowledge acquisition and analysis
Knowledge design
Code
Knowledge verification
System evaluation
Knowledge
Feasibility assessment
Resource allocation
Task phasing and scheduling
Requirements analysis
132
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
This is the most important stage in the development of ES. During this stage the
knowledge engineer works with the domain expert to acquire, organize and analyze
the domain knowledge for the ES. ‘Knowledge acquisition is the bottleneck in the
construction of expert systems’ (Hayes-Roth et al.). The main steps in this phase are
Getting knowledge from the expert is called knowledge elicitation vs. the broader term
knowledge acquisition. Elicitation methods may be broadly divided into:
Direct Methods
o Interviews
Very good at initial stages
Reach a balance between structured (multiple choice, rating
scale) and un-structured interviewing.
Record interviews (transcribe or tape)
Mix of open and close ended questions
133
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Problems that may be faced and have to be overcome during elicitation include
Age
Patient
Medical
History
Gets
Personal
History
Tests
Echo
Cardiogram
Blood
Blood Hematology
Sugar
134
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The example cognitive map for the domain of medicine shows entities and their
relationships. Concepts and sub-concepts are identified and grouped together to
understand the structure of the knowledge better. Cognitive maps are usually used
to represent static entities.
Inference networks
Diagnosis is
Anemia
Flowcharts
Flow charts also capture knowledge of strategies. They may be used to represent
a sequence of steps that depict the order of application of rule sets. Try making a flow
chart that depicts the following strategy. The doctor begins by asking symptoms. If
they are not indicative of some disease the doctor will not ask for specific tests. If it is
symptomatic of two or three potential diseases, the doctor decides which disease to
check for first and rules out potential diagnoses in some heuristic sequence.
135
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Knowledge definition
Detailed design
Decision of how to represent knowledge
o Rules and Logic
o Frames
Decision of a development tool. Consider whether it supports your planned
strategy.
Internal fact structure
Mock interface
5.12.7 Code
This phase occupies the least time in the ESDLC. It involves coding, preparingtest
cases, commenting code, developing user’s manual and installation guide. At the end
of this phase the system is ready to be tested.
136
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 21
CLIPS Expert System Development Tool
5.12.8 CLIPS
We will now look at a tool for expert system development. CLIPS stands for C
Language Integrated Production System. CLIPS is an expert system tool which
provides a complete environment for the construction of rule and object based expert
[Link] CLIPS for windows ([Link]) from:
[Link] Also download the complete
documentation including the programming guide from:
[Link]
All commands use ( ) as delimiters, i.e. all commands are enclosed in brackets.
A simple command example for adding numbers
CLIPS> (+ 3 4)
137
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 22
CLIPS Expert System Development Tool
Fields
Fields are the main types of fields/tokens that can be used with clips. They can
be:
Before facts can be added, we have to define the format for our [Link]
relation consists of: relation name, zero or more slots (arguments of the relation)
The Deftemplate construct defines a relation’s structure
(deftemplate <relation-name> [<optional comment>] <slot-definition>
e.g.
CLIPS> ( deftemplate father “Relation father”
(slot fathersName)
(slot sonsName) )
Adding facts
Facts are added in the predicate format. The deftemplate construct is used to inform
CLIPS of the structure of facts. The set of all known facts is called the factlist. To add
facts to the fact list, use the assert command, e.g.
Facts to add:
man(ahmed)
father(ahmed, belal)
brother(ahmed, chand)
After adding facts, you can see the fact list using command: (facts). You will see that
a fact index is assigned to each fact, starting with 0. For long fact lists, use the
format
(facts [<start> [<end>]])
For example:
(facts 1 10) lists fact numbers 1 through 10
138
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Removing facts
We add a fact:
CLIPS>(assert ( father ( fathersName “Ahmed”) (sonsName “Belal”) ) )
The WATCH command is used for debugging programs. It is used to view the
assertion and modification of facts. The command is
After entering this command, for subsequent commands, the whole sequence of
events will be shown. To turn off this option, use:
(unwatch facts)
These are a set of facts that are automatically asserted when the (reset)
command is used, to set the working memory to its initial state. For example:
The Defrule construct is used to add rules. Before using a rule the component
facts need to be defined. For example, if we have the rule
;Rule header
(defrule isSon “An example rule”
; Patterns
(father (fathersName “ali”) (sonsName “ahmed”)
;THEN
=>
;Actions
(assert (son (sonsName “ahmed”) (fathersName “ali”)))
)
CLIPS attempts to match the pattern of the rules against the facts in the fact list.
If all patterns of a rule match, the rule is activated, i.e. placed on the agenda.
140
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 23
CLIPS Expert System Development Tool
The agenda is the list of activated rules. We use the run command to run the
agenda. Running the agenda causes the rules in the agenda to be fired.
CLIPS>(run)
All subsequent activations and firings will be shown until you turn the watch off
using the unwatch command.
Instead of asserting facts in a rule, you can print out messages using
(printout t “Ali is Ahmed’s son” crlf)
Commands cannot be loaded from a file; they have to be entered at the command
prompt. However constructs like deftemplate, deffacts and defrules can be loaded
141
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
from a file that has been saved using .clp extension. The command to load the file
is:
(load “[Link]”)
You can write out constructs in file editor, save and load. Also (save “[Link]”)
saves all constructs currently loaded in CLIPS to the specified file.
Pattern matching
Below are some examples to help you see the above concept in practice:
Example 1
Lecture # 24
CLIPS Expert System Development Tool
Example 2
;OR example
;note: CLIPS operators use prefix notation
(deffacts startup (weather raining))
(defrule take-umbrella
(or (weather raining)
(weather snowing))
=>
(assert (umbrella required)))
These two are very basic examples. You will find many examples in the CLIPS
documentation that you download. Try out these examples.
Below is the code for the case study we discussed in the lectures, for the automobile
diagnosis problem discussion that is given in Durkin’s book. This is an implementation
of the solution. (The solution is presented by Durkin as rules in your book).
;startup rule
(deffacts startup (task begin))
(defrule startDiagnosis
?fact <- (task begin)
=>
(retract ?fact)
(assert (task test_cranking_system))
(printout t "Auto Diagnostic Expert System" crlf)
)
;
;Test Display Rules
143
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
;
(defrule testTheCrankingSystem
?fact <- (task test_cranking_system)
=>
(printout t "Cranking System Test" crlf)
(printout t " -------------------- " crlf)
(printout t "I want to first check out the major components of the cranking system. This
includes such items as the battery, cables, ignition switch and starter. Usually, when a car does not
start the problem can be found with one of these components" crlf)
(printout t "Steps: Please turn on the ignition switch to energize the starting motor" crlf)
(bind ?response
(ask-question "How does your engine turn: (slowly or not at all/normal)? "
"slowly or not at all" "normal") )
(assert(engine_turns ?response))
)
(defrule testTheBatteryConnection
?fact <- (task test_battery_connection)
=>
(printout t "Battery Connection Test" crlf)
(printout t " ----------------------- " crlf)
(printout t "I next want to see if the battery connections are good. Often, a bad connection
will appear like a bad battery" crlf)
(printout t "Steps: Insert a screwdriver between the battery post and the cable clamp.
Then turn the headlights on high beam and observe the lights as the screwdriver is twisted." crlf)
(bind ?response
(ask-question "What happens to the lights: (brighten/don't brighten/not on)? "
"brighten" "don't brighten" "not on") )
(assert(screwdriver_test_shows_that_lights ?response))
)
(defrule testTheBattery
?fact <- (task test_battery)
=>
(printout t "Battery Test" crlf)
(printout t " ------------" crlf)
(printout t "The state of the battery can be checked with a hydrometer. This is a good test
to determine the amount of charge in the battery and is better than a simple voltage measurement"
crlf)
(printout t "Steps: Please test each battery cell with the hydrometer and note each cell's
specific gravity reading." crlf)
(bind ?response
(ask-question "Do all cells have a reading above 1.2: (yes/no)? "
"yes" "no" "y" "n") )
(assert(battery_hydrometer_reading_good ?response))
)
(defrule testTheStartingSystem
144
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
(defrule testTheStarterOnBench
?fact <- (task test_starter_on_bench)
=>
(bind ?response
(ask-question "Check your starter on bench: (meets specifications/doesn't meet
specifications)? "
"meets specifications" "doesn't meet specifications") )
(assert(starter_on_bench ?response))
)
(defrule testTheIgnitionOverrideSwitch
?fact <- (task test_ignition_override_switches)
=>
(bind ?response
(ask-question "Check the ignition override switches: starter(operates/doesn't
operate)? "
"operates" "doesn't operate") )
(assert(starter_override ?response))
)
(defrule testTheIgnitionSwitch
?fact <- (task test_ignition_switch)
=>
(bind ?response
(ask-question "Test your ignition swich. The voltmeter: (moves/doesn't move)? "
"moves" "doesn't move") )
(assert(voltmeter ?response))
)
(defrule testEngineMovement
?fact <- (task test_engine_movement)
=>
(bind ?response
(ask-question "Test your engine movement: (doesn't move/moves freely)? "
"doesn't move" "moves freely") )
(assert(engine_turns ?response))
)
;
;Test Cranking System Rules
;
(defrule crankingSystemIsDefective
?fact <- (task test_cranking_system)
(engine_turns "slowly or not at all")
=>
(assert(cranking_system defective))
(retract ?fact)
;(bind ?response )
145
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
(printout t "It seems like the cranking system is defective! I will now identify the problem
with the cranking system" crlf)
(assert (task test_battery_connection))
)
(defrule crankingSystemIsGood
?fact <- (task test_cranking_system)
(engine_turns "normal")
=>
(assert( cranking_system "good"))
(retract ?fact)
(printout t "Your Cranking System Appears to be Good" crlf)
(printout t "I will now check your ignition system" crlf)
(assert(task test_ignition_switch)) ;in complete system, replace this with
test_ignition_system
)
;
;Test Battery Connection Rules
;
(defrule batteryConnectionIsBad
?fact <- (task test_battery_connection)
(or (screwdriver_test_shows_that_lights "brighten")(screwdriver_test_shows_that_lights
"not on"))
=>
(assert( problem bad_battery_connection))
(printout t "The problem is a bad battery connection" crlf)
(retract ?fact)
(assert (task done))
)
(defrule batteryConnectionIsGood
?fact <- (task test_battery_connection)
(screwdriver_test_shows_that_lights "don't brighten")
=>
(printout t "The problem does not appear to be a bad battery connection." crlf)
(retract ?fact)
(assert(task test_battery))
)
;
;Test Battery Rules
;
(defrule batteryChargeIsBad
?fact <- (task test_battery)
(battery_hydrometer_reading_good "no")
=>
(assert( problem bad_battery))
(printout t "The problem is a bad battery." crlf)
(retract ?fact)
(assert (task done))
(defrule batteryChargeIsGood
?fact <- (task test_battery)
(battery_hydrometer_reading_good "yes")
=>
(retract ?fact)
146
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
;
;Test Starter Rules
;
(defrule RunStarterBenchTest
?fact <- (task test_starting_system)
(or (starter "starter buzzes")(starter "engine turns slowly"))
=>
(retract ?fact)
(assert (task test_starter_on_bench))
)
(defrule solenoidBad
?fact <- (task test_starting_system)
(starter "nothing")
=>
(retract ?fact)
(assert (problem bad_solenoid))
(printout t "The problem appears to be a bad solenoid." crlf)
(assert(task done))
)
(defrule starterTurnsEngineNormally
?fact <- (task test_starting_system)
(starter "engine turns normally")
=>
(retract ?fact)
(printout t "The problem does not appears to be a bad solenoid." crlf)
(assert(task test_ignition_override_switches))
)
;
;Starter Bench Test Rules
;
(defrule starterBad
?fact <- (task test_starter_on_bench)
(starter_on_bench "doesn't meet specifications")
=>
(assert( problem bad_starter))
(printout t "The problem is a bad starter." crlf)
(retract ?fact)
(assert (task done))
(defrule starterGood
?fact <- (task test_starter_on_bench)
(starter_on_bench "meets specifications")
=>
(retract ?fact)
(printout t "The problem does not appear to be with starter." crlf)
(assert(task test_engine_movement))
)
;
;Override Swich Test Rules
;
147
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
(defrule overrideSwitchBad
?fact <- (task test_ignition_override_switches)
(starter_override "operates")
=>
(assert( problem bad_override_switch))
(printout t "The problem is a bad override switch." crlf)
(retract ?fact)
(assert (task done))
(defrule starterWontOperate
?fact <- (task test_ignition_override_switches)
(starter_override "doesn't operate")
=>
(retract ?fact)
(printout t "The problem does not appear to be with override switches." crlf)
(assert(task test_ignition_switch))
)
;
;Engine Movement Test
;
(defrule engineBad
?fact <- (task test_engine_movement)
(engine_turns "doesn't move")
=>
(assert( problem bad_engine))
(printout t "The problem is a bad engine." crlf)
(retract ?fact)
(assert (task done))
(defrule engineMovesFreely
?fact <- (task test_engine_movement)
(engine_turns "moves freely")
=>
(retract ?fact)
(printout t "The problem does not appear to be with the engine." crlf)
(printout t "Test your engine timing. That is beyond my scope for now" crlf) ; actual test
goes here in final system.
(assert(task perform_engine_timing_test))
)
;
;Ignition Switch Test
;
;these reluts for the ignition system are not complete, they are added only to test the control flow.
(defrule ignitionSwitchConnectionsBad
?fact <- (task test_ignition_switch)
(voltmeter "doesn't move")
=>
(assert( problem bad_ignition_switch_connections))
(printout t "The problem is bad ignition switch connections." crlf)
(retract ?fact)
(assert (task done))
148
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
(defrule ignitionSwitchBad
?fact <- (task test_ignition_switch)
(voltmeter "moves")
=>
(assert( problem bad_ignition_switch))
(printout t "The problem is a bad ignition switch." crlf)
(retract ?fact)
(assert (task done))
)
149
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 28
Classical Sets and Fuzzy Sets
We’re driving in a car, and we see an old house. We can easily classify it as an old
house. But what exactly is an old house? Is a 15 years old house, an old house? Is
40 years old house an old house? Where is the dividing line betweenthe old and
the new houses? If we agree that a 40 years old house is an old house, then how
is it possible that a house is considered new when it is 39 years,11 months and 30
days old only. And one day later it has become old all of a sudden? That would be a
bizarre world, had it been like that for us in all scenarios of life.
Similarly human beings form vague groups of things such as ‘short men’, ‘warm days’,
‘high pressure’. These are all groups which don’t appear to have a well defined
boundary but yet humans communicate with each other using these terminologies.
Monkeys
Fish
Computers
Let’s take the example of the set ‘Days of the week’. This is a classical set in which
all the 7 days from Monday up until Sunday belong to the set, and everything
possible other than that that you can think of, monkeys, computers, fish,
telephone, etc, are definitely not a part of this set. This is a binaryclassification
system, in which everything must be asserted or denied. In the caseof Monday, it
150
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
will be asserted to be an element of the set of ‘days of the week’, but tuna fish will
not be an element of this set.
Thursday
Monkeys Saturday
Tuesday Fish
Friday
Computers Sunday
Monday
Another diagram that would help distinguish between crisp and fuzzy representation
of days of the weekend is shown below.
The left side of the above figure shows the crisp set ‘days of the weekend’, which
is a Boolean two-valued function, so it gives a value of 0 for all week days except
Saturday and Sunday where it gives an abrupt 1 and then back to 0 as soon as
Sunday ends. On the other hand, Fuzzy set is a multi-valued function, which in
this case is shown by a smoothly rising curve for the weekend, and even Friday
has a good membership in the set ‘days of the weekend’.
Same is the case with seasons. There are four seasons in Pakistan: Spring, Summer,
Fall and Winter. The classical/crisp set would mark a hard boundary
151
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
between the two adjacent seasons, whereas we know that this is not the case in
reality. Seasons gradually change from one into the next. This is more clearly
explained in the figure below.
Dr. Lotfi Zadeh of UC/Berkeley introduced it in the 1960's as a means to model the
uncertainty of natural languages. He was faced with a lot of criticism but today the
vast number of fuzzy logic applications speak for themselves:
Self-focusing cameras
Washing machines that adjust themselves according to the dirtiness of the
clothes
Automobile engine controls
Anti-lock braking systems
Color film developing systems
Subway control systems
Computer programs trading successfully in financial markets
152
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 29
Difference Between Boolean and Fuzzy Logic
How does it work? Reasoning in fuzzy logic is just a matter of generalizing the familiar
yes-no (Boolean) logic. If we give "true" the numerical value of 1 and "false" the
numerical value of 0, we're saying that fuzzy logic also permits in-between values like
0.2 and 0.7453.
“In fuzzy logic, the truth of any statement becomes matter of degree”
We will understand the concept of degree or partial truth by the same example of
days of the weekend. Following are some questions and their respective answers:
Degree
of 0. Not Tall
tallness
On the other hand, in fuzzy logic, you can define any function represented by any
mathematical shape. The output of the function can be discreet or continuous. The
output of the function defines the membership of the input or the degree oftruth.
153
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
As in this case, the same person A is termed as ‘Not very tall’. This isn’t absolute ‘Not
tall’ as in the case of boolean. Similarly, person B is termed as ‘Quite Tall’ as
apposed to the absolute ‘Tall’ classification by the boolean parameters. In short, fuzzy
logic lets us define more realistically the true functions that define real world
scenarios.
height
1. Quite Tall
Degree
of 0. Not Very Tall
tallness
154
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
In (crisp) set terminology, Amma ji belongs to the set of old people. We define µOLD,
the membership function operating on the fuzzy set of old people. µOLD takes as
input one variable, which is age, and returns a value between 0.0 and1.0.
For this particular age, the membership function is defined by a linear line with
positive slope.
In probability theory:
There is a 20% chance that Amber belongs to the set of old people, there’s an 80%
chance that she doesn’t belong to the set of old people.
In fuzzy terminology:
Amber is definitely not old or some other term corresponding to the value 0.2. But
there are certainly no chances involved, no guess work left for the system toclassify
Amber as young or old.
155
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
The table above lists down the AND, OR and NOT operators and their respective
values for the boolean inputs. Now for fuzzy systems we needed the exact operators
which would act exactly the same way when given the extreme values of 0 and 1,
and that would in addition also act on other real numbers between the ranges of 0.0
to 1.0. If we choose min (minimum) operator in place for AND, weget the same
output, similarly max (maximum) operator replaces OR, and 1-A replaces NOT of A.
In a lot of ways these operators seem to make sense. When we are ANDing two
domains, A and B, we do want to have the intersection as a result, and intersection
gives us the minimum overlapping area, hence both are equivalent. Same is the case
with max and 1-A.
The figure below explains these logical operators in a non-tabular form. If weallow
the fuzzy system to take on only two values, 0 and 1, then it becomes boolean logic,
as can be seen in the figure, top row.
156
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
It would be interesting to mention here that the graphs for A and B are nothingmore
than a distribution, for instance if A was the set of short men, then the graphA shows
the entire distribution of short men where the horizontal axis is the increasing height
and the vertical axis shows the membership of men with different heights in the
function ‘short men’. The men who would be taller would have little or 0 membership
in the function, whereas they would have a significant membership in set B,
considering it to be the distribution of tall men.
157
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 30
Representation of Fuzzy Sets and Rules
6.4.6 Fuzzy set representation
Usually a triangular graph is chosen to represent a fuzzy set, with the peak around
the mean, which is true in most real world scenarios, as majority of the population lies
around the average height. There are fewer men who are exceptionally tall or short,
which explains the slopes around both sides of thetriangular distribution. It’s also an
approximation of the Gaussian curve, which isa more general function in some
aspects.
Apart from this graphical representation, there’s also another representation which
is more handy if you were to write down some individual members alongwith
their membership. With this representation, the set of Tall men would be written like
follows:
• Tall = (0/5, 0.25/5.5, 0.8/6, 1/6.5, 1/7)
– Numerator: membership value
– Denominator: actual value of the variable
For instance, the first element is 0/5 meaning, that a height of 5 feet has 0
membership in the set of tall people, likewise, men who are 6.5 feet or 7 feet tall have
a membership value of maximum 1.
Apply fuzzy operator to multiple part antecedents: If there are multiple parts to the
antecedent, apply fuzzy logic operators and resolve the antecedent to a single
number between 0 and 1. This is the degree of support for the rule. In the example,
there are two parts to the antecedent, and they have an OR operator in between
them, so they are resolved using the max operator and max(0,0,0.7) is
0.7. That becomes the output of this step.
Apply implication method: Use the degree of support for the entire rule to shape the
output fuzzy set. The consequent of a fuzzy rule assigns an entire fuzzy set to the
output. This fuzzy set is represented by a membership function that is chosento
indicate the qualities of the consequent. If the antecedent is only partially true, (i.e.,
is assigned a value less than 1), then the output fuzzy set is truncated according to
the implication method.
In general, one rule by itself doesn't do much good. What's needed are two or more
rules that can play off one another. The output of each rule is a fuzzy [Link]
output fuzzy sets for each rule are then aggregated into a single output fuzzy set.
Finally the resulting set is defuzzified, or resolved to a single number. Thenext
section shows how the whole process works from beginning to end for a particular
type of fuzzy inference system.
159
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 31
Different Parts of Fuzzy Inference System
Fuzzy inference systems have been successfully applied in fields such as automatic
control, data classification, decision analysis, expert systems, and computer vision.
Because of its multidisciplinary nature, fuzzy inference systems are associated with
a number of names, such as fuzzy-rule-based systems, fuzzy expert systems, fuzzy
modeling, fuzzy associative memory, fuzzy logic controllers, and simply (and
ambiguously !!) fuzzy systems. Since the terms used to describe the various parts
of the fuzzy inference process are far from standard, we will try to be as clear as
possible about the different terms introduced in this section.
Mamdani's fuzzy inference method is the most commonly seen fuzzy methodology.
Mamdani's method was among the first control systems built using fuzzy set theory.
It was proposed in 1975 by Ebrahim Mamdani as an attempt to control a steam engine
and boiler combination by synthesizing a set of linguistic control rules obtained from
experienced human operators. Mamdani's effort was based on Lotfi Zadeh's 1973
paper on fuzzy algorithms for complex systems and decision processes.
6.5.1 Five parts of the fuzzy inference process
• Fuzzification of the input variables
• Application of fuzzy operator in the antecedent (premises)
• Implication from antecedent to consequent
• Aggregation of consequents across the rules
• Defuzzification of output
Rule1:
If service is poor or food is rancid then tip is cheap
Rule2:
If service is good then tip is average
Rule3:
If service is excellent or food is delicious then tip is generous
160
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Based on these rules and the input by the diners, the Fuzzy inference system gives
the final output using all the inference steps listed above. Let’s take a lookat those
steps one at a time.
The example we're using in this section is built on three rules, and each of the
rules depends on resolving the inputs into a number of different fuzzy linguistic sets:
service is poor, service is good, food is rancid, food is delicious, and so on. Before
the rules can be evaluated, the inputs must be fuzzified according to eachof these
linguistic sets. For example, to what extent is the food really delicious?The figure
below shows how well the food at our hypothetical restaurant (rated ona scale of 0
to 10) qualifies, (via its membership function), as the linguistic variable "delicious."
In this case, the diners rated the food as an 8, which, givenour graphical definition
of delicious, corresponds to µ = 0.7 for the "delicious"membership function.
Shown below is an example of the OR operator max at work. We're evaluating the
antecedent of the rule 3 for the tipping calculation. The two different pieces of the
antecedent (service is excellent and food is delicious) yielded the fuzzy membership
values 0.0 and 0.7 respectively. The fuzzy OR operator simply selects the maximum
of the two values, 0.7, and the fuzzy operation for rule 3 is complete.
Once proper weightage has been assigned to each rule, the implication method
is implemented. A consequent is a fuzzy set represented by a membership function,
which weighs appropriately the linguistic characteristics that are attributed to it. The
consequent is reshaped using a function associated with the antecedent (a single
number). The input for the implication process is a single number given by the
antecedent, and the output is a fuzzy set. Implication isimplemented for each rule.
We will use the min (minimum) operator to perform the implication, which truncates
the output fuzzy set, as shown in the figure
162
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
below.
Figure: Apply Implication Method
Notice that as long as the aggregation method is commutative (which it always should
be), then the order in which the rules are executed is unimportant. Anylogical
operator can be used to perform the aggregation function: max (maximum), probor
(probabilistic OR), and sum (simply the sum of each rule's output set).
In the diagram below, all three rules have been placed together to show how the
output of each rule is combined, or aggregated, into a single fuzzy set whose
membership function assigns a weighting for every output (tip) value.
163
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
[Link] Defuzzify
The input for the defuzzification process is a fuzzy set (the aggregate output fuzzy
set) and the output is a single number. As much as fuzziness helps the rule evaluation
during the intermediate steps, the final desired output for each variableis generally
a single number. However, the aggregate of a fuzzy set encompasses a range of
output values, and so must be defuzzified in order to resolve a single output value
from the set.
Perhaps the most popular defuzzification method is the centroid calculation, which
returns the center of area under the curve. There are other methods in practice:
centroid, bisector, middle of maximum (the average of the maximum value of the
output set), largest of maximum, and smallest of maximum.
Figure: Defuzzification
Thus the FIS calculates that in case the food has a rating of 8 and the service
has a rating of 3, then the tip given to the waiter should be 16.7% of the total bill.
164
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
6.6 Summary
Fuzzy system maps more realistically, the everyday concepts, like age, height,
temperature etc. The variables are given fuzzy values. Classical sets, either wholly
include something or exclude it from the membership of a set, for instance, in a
classical set, a man can be either young or old. There are crisp and rigid boundaries
between the two age sets, but in Fuzzy sets, there can be partialmembership of a
man in both the sets.
6.7 Exercise
1) Think of the membership functions for the following concepts, from the famous
quote: “Early to bed, and early to rise, makes a man healthy, wealthy and
wise.”
a. Health
b. Wealth
c. Wisdom
2) What do you think would be the implication of using a different shaped curve
for a membership function? For example, a triangular, gaussian, square etc
3) Try to come up with at least 5 more rules for the tipping system(Dinner for two
case study), such that the system would be a more realistic and complete
one.
165
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 33
Different Types of Learning
7 Introduction to learning
7.1 Motivation
Artificial Intelligence (AI) is concerned with programming computers to performtasks
that are presently done better by humans. AI is about human behavior, the discovery
of techniques that will allow computers to learn from humans. One ofthe most
often heard criticisms of AI is that machines cannot be called Intelligent until they are
able to learn to do new things and adapt to new situations, ratherthan simply
doing as they are told to do. There can be little question that theability to adapt
to new surroundings and to solve new problems is an important characteristic of
intelligent entities. Can we expect such abilities in programs? Ada Augusta, one
of the earliest philosophers of computing, wrote: "TheAnalytical Engine has no
pretensions whatever to originate anything. It can do whatever we know how to order
it to perform." This remark has been interpreted by several AI critics as saying that
computers cannot learn. In fact, it does not say that at all. Nothing prevents us from
telling a computer how to interpret its inputsin such a way that its performance
gradually improves. Rather than asking in advance whether it is possible for
computers to "learn", it is much more enlightening to try to describe exactly what
activities we mean when we say "learning" and what mechanisms could be used to
enable us to perform those activities. [Simon, 1993] stated "changes in the system
that are adaptive in the sense that they enable the system to do the same task or
tasks drawn from the same population more efficiently and more effectively the next
time".
Once the internal model of what ought to happen is set, it is possible to learn by
practicing the skill until the performance converges on the desired model. One begins
by paying attention to what needs to be done, but with more practice, onewill need
to monitor only the trickier parts of the performance.
Automatic performance of some skills by the brain points out that the brain is capable
of doing things in parallel i.e. one part is devoted to the skill whilst another part
mediates conscious experience.
There’s no decisive definition of learning but here are some that do justice:
"Learning denotes changes in a system that ... enables a system to do the
same task more efficiently the next time." --Herbert Simon
"Learning is constructing or modifying representations of what is being
experienced." --Ryszard Michalski
"Learning is making useful changes in our minds." --Marvin Minsky
166
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
167
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 34
Phases of Machine learning and Learning Techniques
7.5 What are the three phases in machine learning?
Machine learning typically follows three phases according to Finlay, [Janet Finlay,
1996]. They are as follows:
1. Training: a training set of examples of correct behavior is analyzed and some
representation of the newly learnt knowledge is stored. This is often some form of
rules.
2. Validation: the rules are checked and, if necessary, additional training is
given. Sometimes additional test data are used, but instead of using a human to
validate the rules, some other automatic knowledge based component may be
used. The role of tester is often called the critic.
3. Application: the rules are used in responding to some new situations.
These phases may not be distinct. For example, there may not be an explicit
validation phase; instead, the learning algorithm guarantees some form of
correctness. Also in some circumstances, systems learn "on the job", that is, the
training and application phases overlap.
7.5.1 Inputs to training
There is a continuum between knowledge-rich methods that use extensive domain
knowledge and those that use only simple domain-independent knowledge. The
domain-independent knowledge is often implicit in the algorithms; e.g. inductive
learning is based on the knowledge that if something happens a lot it is likely to
be generally true. Where examples are provided, it is important to know the source.
The examples may be simply measurements fromthe world, for example, transcripts
of grand master tournaments. If so, do they represent "typical" sets of behavior or
have they been filtered to be "representative"? If the former is true then it is possible
to infer information about the relative probability from the frequency in the training
set. However, unfiltered data may also be noisy, have errors, etc., and examples
from the world may not be complete, since infrequent situations may simply not be
in the training set.
Alternatively, the examples may have been generated by a teacher. In this case,
it can be assumed that they are a helpful set which cover all the important cases.
Also, it is advisable to assume that the teacher will not be ambiguous.
Some form of representation of the examples also has to be decided. This may partly
be determined by the context, but more often than not there will be a choice. Often
the choice of representation embodies quite a lot of the domain knowledge.
7.5.2 Outputs of training
Outputs of learning are determined by the application. The question that arises is
'What is it that we want to do with our knowledge?’. Many machine learning systems
are classifiers. The examples they are given are from two or more classes, and the
168
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
purpose of learning is to determine the common features in each class. When a new
unseen example is presented, the system uses the common features to determine
which class the new example belongs to. For example:
If example satisfies condition
Then assign it to class X
This sort of job classification is often termed as concept learning. The simplest case
is when there are only two classes, of which one is seen as the desired "concept" to
be learnt and the other is everything else. The "then" part of the rulesis always the
same and so the learnt rule is just a predicate describing the concept.
Not all learning is simple classification. In applications such as robotics one wants to
learn appropriate actions. In such a case, the knowledge may be in terms of
production rules or some similar representation.
169
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
memorizes the logical consequences of what is known already. This implies that
virtually all mathematical research would not be classified as learning "new" things.
However, regardless of whether this is termed as new knowledge or not, it certainly
makes the reasoning system more efficient.
7.6.3 Inductive learning
Inductive learning takes examples and generalizes rather than starting with existing
knowledge. For example, having seen many cats, all of which have tails, one might
conclude that all cats have tails. This is an unsound step of reasoningbut it would
be impossible to function without using induction to some extent. In many areas it is
an explicit assumption. There is scope of error in inductive reasoning, but still it is a
useful technique that has been used as the basis of several successful systems.
In the above example, there are various ways of generalizing from examples of
fish and non-fish. The simplest description can be that a fish is something thatdoes
not have lungs. No other single attribute would serve to differentiate the fish.
The two very common inductive learning algorithms are version spaces and ID3.
These will be discussed in detail, later.
170
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 33
Different Types of Learning
7.8 Applied learning
7.8.1 Solving real world problems by learning
We do not yet know how to make computers learn nearly as well as people learn.
However, algorithms have been developed that are effective for certain types of
learning tasks, and many significant commercial applications have begun to appear.
For problems such as speech recognition, algorithms based on machine learning
outperform all other approaches that have been attempted to date. In other
emergent fields like computer vision and data mining, machine learning algorithms
are being used to recognize faces and to extract valuable information and knowledge
from large commercial databases respectively. Some of the applications that use
learning algorithms include:
This is just the glimpse of the applications that use some intelligent learning
components. The current era has applied learning in the domains ranging from
agriculture to astronomy to medical sciences.
7.8.2 A general model of learning agents, pattern recognition
Any given learning problem is primarily composed of three things:
Input
Processing unit
Output
The input is composed of examples that can help the learner learn the underlying
problem concept. Suppose we were to build the learner for recognizing spoken digits.
We would ask some of our friends to record their sounds for each digit [0to 9].
Positive examples of digit ‘1’ would be the spoken digit ‘1’, by the speakers. Negative
examples for digit ‘1’ would be all the rest of the digits. For our learner to learn the
digit ‘1’, it would need positive and negative examples of digit ‘1’ in order to truly learn
the difference between digit ‘1’ and the rest.
The processing unit is the learning agent in our focus of study. Any learning agent
or algorithm should in turn have at least the following three characteristics:
171
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
but sometimes the real world problems have inputs that cannot be fedto a
learning system directly, for instance, if the learner is to tell the difference
between a good and a not-good student, how do you suppose it would take
the input? And for that matter, what would be an appropriate input to the
system? It would be very interesting if the input were an entire student named
Ali or Umar etc. So the student goes into the machine andit tells if the
student it consumed was a good student or not. But that seems like a far
fetched idea right now. In reality, we usually associate some attributes or
features to every input, for instance, two features thatcan define a student
can be: grade and class participation. So these become the feature set of
the learning system. Based on these features,the learner processes each
input.
[Link] Generalization
In the training phase, the learner is presented with some positive and negative
examples from which it leans. In the testing phase, when the learner comes
across new but similar inputs, it should be able to classify them similarly. This
is called generalization. Humans are exceptionally good at generalization.
A small child learns to differentiate between birds and cats in the early days
of his/her life. Later when he/she sees a new bird, never seen before, he/she
can easily tell that it’s a bird and not a cat.
172
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 34
Techniques of Learning and Problem Spaces
7.9 LEARNING: Symbol-based
Ours is a world of symbols. We use symbolic interpretations to understand the world
around us. For instance, if we saw a ship, and were to tell a friend about its size, we
will not say that we saw a 254.756 meters long ship, instead we’d say that we saw
a ‘huge’ ship about the size of ‘Eiffel tower’. And our friend would understand the
relationship between the size of the ship and its hugeness with the analogies of
the symbolic information associated with the two words used: ‘huge’ and ‘Eiffel
tower’.
Similarly, the techniques we are to learn now use symbols to represent knowledge
and information. Let us consider a small example to help us see where we’re
headed. What if we were to learn the concept of a GOOD STUDENT. We would
need to define, first of all some attributes of a student, onthe basis of which we
could tell apart the good student from the average. Then we would require some
examples of good students and average students. Tokeep the problem simple we
can label all the students who are “not good” (average, below average, satisfactory,
bad) as NOT GOOD STUDENT. Let’s say we choose two attributes to define a
student: grade and class participation. Boththe attributes can have either of the two
values: High, Low. Our learner program will require some examples from the
concept of a student, for instance:
1. Student (GOOD STUDENT): Grade (High) ^ Class Participation (High)
2. Student (GOOD STUDENT): Grade (High) ^ Class Participation (Low)
3. Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation
(High)
4. Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation (Low)
As you can see the system is composed of symbolic information, based on which
the learner can even generalize that a student is a GOOD STUDENT if his/her
grade is high, even if the class participation is low:
Student (GOOD STUDENT): Grade (High) ^ Class Participation (?)
This is the final rule that the learner has learnt from the enumerated examples. Here
the ‘?’ means that the attribute class participation can have any value, as long as
the grade is high. In this section we will see all the steps the learner hasto go
through to actually come up with the final conclusion like this.
173
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Those problems that can be solved in polynomial time are termed as tractable,the
other half is called intractable. The tractable problems are further divided into
structured and complex problems. Structured problems are those which have defined
steps through which the solution to the problem is reached. Complex problems
usually don’t have well-defined steps. Machine learning algorithms are particularly
more useful in solving the complex problems like recognition of patterns in images
or speech, for which it’s hard to come up with procedural algorithms otherwise.
The solution to any problem is a function that converts its inputs to corresponding
outputs. The domain of a problem or the problem space is defined by the elements
explained in the following paragraphs. These new concepts will be best understood
if we take one example and exhaustively use it to justify each construct.
Example:
Let us consider the domain of HEALTH. The problem in this case is to distinguish
between a sick and a healthy person. Suppose we have some domain
knowledge; keeping a simplistic approach, we say that two attributes are
necessary and sufficient to declare a person as healthy or sick. These two
attributes are: Temperature (T) and Blood Pressure (BP). Any patient coming into
the hospital can have three values for T and BP: High (H), Normal (N) and Low
(L). Based on these values, the person is to be classified as Sick (SK). SK is a
Boolean concept, SK = 1 means the person is sick, and SK = 0 means person is
healthy. So the concept to be learnt by the system is of Sick, i.e., SK=1.
7.10.1 Instance space
How many distinct instances can the concept sick have? Since there are two
attributes: T and BP, each having 3 values, there can be a total of 9 possible distinct
instances in all. If we were to enumerate these, we’ll get the following table:
X T BP SK
x1 L L -
x2 L N -
x3 L H -
x4 N L -
x5 N N -
x6 N H -
x7 H L -
x8 H N -
x9 H H -
This is the entire instance space, denoted by X, and the individual instances are
denoted by xi. |X| gives us the size of the instance space, which in this case is 9.
|X| = 9
The set X is the entire data possibly available for any concept. However,
sometimes in real world problems, we don’t have the liberty to have access to the
174
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
X T BP SK
x1 L L 0
x2 L N 0
x3 L H 1
x4 N L 0
x5 N N 0
x6 N H 1
x7 H L 1
x8 H N 1
x9 H H 1
But there are a lot of other possibilities besides this one. The question is: how many
total concepts can be generated out of this given situation. The answer is: 2|X|. To
see this intuitively, we’ll make small tables for each concept and see them graphically
if they come up to the number 29, since |X| = 9.
The representation used here is that every box in the following diagram is populated
using C(xi), i.e. the value that the concept C gives as output when xi is given to it as
input.
C(x3) C(x6) C(x9)
C(x2) C(x5) C(x8)
C(x1) C(x4) C(x7)
Since we don’t know the concept yet, so there might be concepts which can
produce 29 different outputs, such as:
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 0 0 1 0 0 1 1 1
0 0 0 1 0 0 0 0 0 1 0 0 1 1 1
C1 C2 C3 C4 9 C29
C2
Each of these is a different concept, only one of which is the true concept (that
we are trying to learn), but the dilemma is that we don’t know which one of the 29
is the true concept of SICK that we’re looking for, since in real world problems we
don’t have all the instances in the instance space X, available to us for learning. If we
had all the possible instances available, we would know the exact concept, but the
problem is that we might just have three or four examples of instances available to us
out of nine.
175
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
D T BP SK
x1 N L 1
x2 L N 0
x3 N N 0
Notice that this is not the instance space X, in fact it is D: the training set. We
don’t have any idea about the instances that lie outside this set D. The learner is
to learn the true concept C based on only these three observations, so that once
it has learnt, it could classify the new patients as sick or healthy based on the
input parameters.
176
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 35
Techniques of Learning and Problem Spaces
So the learner has to apply some hypothesis, which has either a search or the
language bias to reduce the size of the concept space. This reduced conceptspace
becomes the hypothesis space. For example, the most common language bias is
that the hypothesis space uses the conjunctions (AND) of the attributes,i.e.
H = <T, BP>
H is the denotive representation of the hypothesis space; here it is the conjunction
of attribute T and BP. If written in English it would mean:
H = <T, BP>:
Now if we fill in these two blanks with some particular values of T and B, it would
form a hypothesis, e.g. for T = N and BP = N:
177
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Notice that this is the C2 we presented before in the concept space section:
0 0 0
0 0 0
1 0 0
This means that if the true concept of SICK that we wanted to learn was c2 then
the hypothesis h = <L, L> would have been the solution to our problem. But you must
still be wondering what’s all the use of having separate conventions for hypothesis
and concepts, when in the end we reached at the same thing: C2 =
<L, L> = h. Well, the advantage is that now we are not required to look at 29 different
concepts, instead we are only going to have to look at the maximum of17 different
hypotheses before reaching at the concept. We’ll see in a moment how that is
possible.
We said H = <T, BP>. Now T and BP here can take three values for sure: L, N
and H, but now they can take two more values: ? and Ø. Where ? means that for any
value, H = 1, and Ø means that there will be no value for which H will be 1.
For example, h1 = <?, ?>: [For any value of T or BP, the person is sick]
Similarly h2 = <?, N>: [For any value of T AND for BP = N, the person is sick]
© Copyright Virtu
Artificial Intelligence (CS607)
Having said all this, how does this still reduce the hypothesis space to 17? Well it’s
simple, now each attribute T and BP can take 5 values each: L, N, H, ? andØ.
So there are 5 x 5 = 25 total hypotheses possible. This is a tremendous reduction
from 29 = 512 to 25.
But if we want to represent h4 = < Ø , L>, it would be the same as h3, meaning
that there are some redundancies within the 25 hypotheses. These redundancies are
caused by Ø, so if there’s this ‘Ø’ in the T or the BP or both, we’ll have the same
hypothesis h3 as the outcome, all zeros. To calculate the number of semantically
distinct hypotheses, we need one hypothesis which outputs all zeros, since it’s a
distinct hypothesis than others, so that’s one, plus we need to know the rest of the
combinations. This primarily means that T and BP can now take 4 values instead of
5, which are: L, N, H and ?. This implies that there arenow 4 x 4 = 16 different
hypotheses possible. So the total distinct hypotheses are: 16 + 1 = 17. This is a
wonderful idea, but it comes at a vital cost. What if thetrue concept doesn’t lie in the
conjunctive hypothesis space? This is often the case. We can try different
hypotheses then. Some prior knowledge about the problem always helps.
Artificial Intelligence (CS607)
Lecture # 36
Techniques of Learning and Problem Spaces
D T BP SK
x1 H H 1
x2 L L 0
x3 N N 0
Although it classifies some of the unseen instances that are not in the training set
170
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
D, different from h1, but it’s still consistent over all the instances in D. Similarly
“We have to assume that the concept lies in the hypothesis space. So we search
for a hypothesis belonging to this hypothesis space that best fits the training
examples, such that the output given by the hypothesis is same as the true output
of concept.”
In short:-
Assume CH, search for an hH that best fits D
Such that xiD, h(xi) = C(xi).
The stress here is on the word ‘search’. We need to somehow search through the
hypothesis space.
181
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 35
Techniques of Learning and Problem Spaces
182
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 36
Algorithms for Concept Learning
7.11.2 FIND-S
FIND-S finds the maximally specific hypothesis possible within the version space
given a set of training data. How can we use the general to specific ordering of
hypothesis space to organize the search for a hypothesis consistent with the
observed training examples? One way is to begin with the most specific possible
hypothesis in H, then generalize the hypothesis each time it fails to cover anobserved
positive training example. (We say that a hypothesis “covers” a positive example if it
correctly classifies the example as positive.) To be more preciseabout how the
partial ordering is used, consider the FIND-S algorithm:
To illustrate this algorithm, let us assume that the learner is given the sequence
of following training examples from the SICK domain:
D T BP SK
x1 H H 1
x2 L L 0
x3 N H 1
Upon encountering the second example; in this case a negative example, the
algorithm makes no change to h. In fact, the FIND-S algorithm simply ignores every
negative example. While this may at first seem strange, notice that in the current case
our hypothesis h is already consistent with the new negative example (i.e. h
correctly classifies this example as negative), and hence no revision is needed. In
the general case, as long as we assume that the hypothesis space H contains a
hypothesis that describes the true target concept c and that the training data
183
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
contains no errors and conflicts, then the currenthypothesis h can never require a
revision in response to a negative example.
To complete our trace of FIND-S, the third (positive) example leads to a further
generalization of h, this time substituting a “?” in place of any attribute value in h that
is not satisfied by the new example. The final hypothesis is:
h = < ?, H >
This hypothesis will term all the future patients which have BP = H as SICK for all the
different values of T.
There might be other hypotheses in the version space but this one was the maximally
specific with respect to the given three training examples. For generalization
purposes we might be interested in the other hypotheses but FIND-S fails to find
the other hypotheses. Also in real world problems, the training data isn’t consistent
and void of conflicting errors. This is another drawback of FIND-S, that, it assumes
the consistency within the training set.
7.11.3 Candidate-Elimination algorithm
Although FIND-S outputs a hypothesis from H that is consistent with the training
examples, but this is just one of many hypotheses from H that might fit the training
data equally well. The key idea in Candidate-Elimination algorithm is to output a
description of the set of all hypotheses consistent with the training examples. This
subset of all hypotheses is actually the version space with respect to the hypothesis
space H and the training examples D, because it contains all possible versions of
the target concept.
The Candidate-Elimination algorithm represents the version space by storing only
its most general members (denoted by G) and its most specific members (denoted
by S). Given only these two sets S and G, it is possible to enumerate all members of
the version space as needed by generating the hypotheses that lie between these
two sets in general-to-specific partial ordering over hypotheses.
184
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
If d is a positive example
Remove from G any hypothesis inconsistent with d
For each hypothesis s in S that is inconsistent with d
Remove s from S
Add to S all minimal generalization h of s, such that
h is consistent with d, and some member of G is more general than h
Remove from S any hypothesis that is more general than another one in S
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is inconsistent with d
Remove g from G
Add to G all minimal specializations h of g, such that
h is consistent with d, and some member of S is more specific than h
Remove from G any hypothesis that is less general than another one in S
185
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 37
Algorithms for Concept learning
D T BP SK
x1 H H 1
x2 L L 0
x3 N H 1
G0 = {< ?, ? >}
S0 = {< Ø, Ø >}
G1 = G0 = {< ?, ? >}, since <?, ?> is consistent with d1; both give positive outputs.
Since S0 has only one hypothesis that is < Ø, Ø >, which implies S0(x1) = 0, which
is not consistent with d1, so we have to remove < Ø, Ø > from S1. Also, we add
minimally general hypotheses from H to S1, such that those hypotheses are
consistent with d1. The obvious choices are like <H,H>, <H,N>, <H,L>,
<N,H>……… <L,N>, <L,L>, but none of these except <H,H> is consistent with d1.
So S1 becomes:
S1 = {< H, H >}
G1 = {< ?, ? >}
S2 = S1 = {< H, H>}, since <H, H> is consistent with d2: both give negative outputs
for x2.
G1 has only one hypothesis: < ?, ? >, which gives a positive output on x2, and hence
is not consistent, since SK(x2) = 0, so we have to remove it and add in its place, the
hypotheses which are minimally specialized. While adding we have to take care of
two things; we would like to revise the statement of the algorithm forthe negative
examples:
“Add to G all minimal specializations h of g, such that
h is consistent with d, and some member of S is more specific than h”
186
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
{< H, ? >, < N, ? >, < L, ? >, < ?, H >, < ?, N >, < ?, L >}
Out of these we have to get rid of the hypotheses which are not consistent with d2
= (<L, L>, 0). We see that all of the above listed hypotheses will give a 0 (negative)
output on x2 = < L, L >, except for < L, ? > and < ?, L >, which give a 1 (positive) output
on x2, and hence are not consistent with d2, and will not be added to G2. This
leaves us with {< H, ? >, < N, ? >, < ?, H >, < ?, N >}. Thistakes care of the
inconsistent hypotheses, but there’s another condition in the algorithm that we must
take care of before adding all these hypotheses to G2. We will repeat the statement
again, this time highlighting the point under consideration:
This is very important condition, which is often ignored, and which results in the wrong
final version space. We know the current S we have is S2, which is: S2 = {<H, H>}.
Now for which hypotheses do you think < H, H > is more specific to, outof {< H,
? >, < N, ? >, < ?, H >, < ?, N >}. Certainly < H, H > is more specific than
< H, ? > and < ?, H >, so we remove < N, ? > and < ?, N >to get the final G2:
Third and final training example is: d3 = (<N, H>, 1) [A positive example]
We see that in G2, < H, ? > is not consistent with d3, so we remove it:
G3 = {< ?, H >}
We also see that in S2, < H, H > is not consistent with d3, so we remove it and add
minimally general hypotheses than < H, H >. The two choices we have are: <H, ?
> and < ?, H >. We only keep < ?, H >, since the other one is not consistentwith
d3. So our final version space is encompassed by S3 and G3:
G3 = {< ?, H >}
S3 = {< ?, H >}
It is only a coincidence that both G and S sets are the same. In bigger problems,
or even here if we had more examples, there was a chance that we’d get different but
consistent sets. These two sets of G and S outline the version space of a concept.
Note that the final hypothesis is the same one that was computed by FIND-S.
where A, B, C and D are the attributes for the problem. This tree gives a positive
output if either A AND B attributes are present in the instance; OR C AND D attributes
are present. Through decision trees, this is how we reach the final hypothesis. This
is a hypothetical tree. In real problems, every tree has to have a root node. There
are various algorithms like ID3 and C4.5 to find decision trees for learning
problems.
7.12.2 ID3
ID stands for interactive dichotomizer. This was the 3rd revision of the algorithm which
got wide acclaims. The first step of ID3 is to find the root node. It uses a special
function GAIN, to evaluate the gain information of each attribute. For example if there
are 3 instances, it will calculate the gain information for each. Whichever attribute
has the maximum gain information, becomes the root node. The rest of the
attributes then fight for the next slots.
Lecture # 38
Techniques of Learning
[Link] Entropy
In order to define information gain precisely, we begin by defining a measure
commonly used in statistics and information theory, called entropy, which
characterizes the purity/impurity of an arbitrary collection of examples. Given a
collection S, containing positive and negative examples of some target concept, the
entropy of S relative to this Boolean classification is:
Entropy(S) = - p+log2 p+ - p-log2 p-
where p+ is the proportion of positive examples in S and p- is the proportion of
negative examples in S. In all calculations involving entropy we define 0log 0 to
be 0.
Notice that the entropy is 0, if all the members of S belong to the same class
(purity). For example, if all the members are positive (p+ = 1), then p- = 0 and so:
Entropy(S) = - 1log2 1 - 0log2 0
= - 1 (0) - 0 [since log2 1 = 0, also 0log2 0 = 0]
=0
Note the entropy is 1 when the collection contains equal number of positive and
negative examples (impurity). See for yourself by putting p+ and p- equal to 1/2.
Otherwise if the collection contains unequal numbers of positive and negative
examples, the entropy is between 0 and 1.
S A B E C
d1 a1 b1 e2 YES
d2 a2 b2 e1 YES
d3 a3 b2 e1 NO
d4 a2 b2 e1 NO
d5 a3 b1 e2 NO
1 2 3
|S| |S| |S |
where |Sa1| is the number of times attribute A takes the value a1. E(Sa1) is the
entropy of a1, which will be calculated by observing the proportion of total
population of a1 and the number of times the C is YES or NO within these
observation containing a1 for the value of A.
190
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
G(S, A) 0.97
1
0 2 1 1 0 = 0.57
5 5 5
Similarly for B, now since there are only two values observable for the attribute B:
| Sb1 | | Sb2 |
G(S, B) E(S) E(Sb ) E(Sb )
1 2
|S| |S |
2 3 1 1 2 2
G(S, B) 0.97 (1) ( log log )
2 2
5 5 3 3 3 3
3
G(S, B) 0.97 0.4 (0.52 0.39) = 0.02
5
Similarly for E
| Se1 | | Se2 |
G(S, E) E(S ) E(Se ) E(Se ) = 0.02
1 2
|S| |S|
191
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
This tells us that information gain for A is the highest. So we will simply choose A
as the root of our decision tree. By doing that we’ll check if there are any conflicting
leaf nodes in the tree. We’ll get a better picture in the pictorial representation shown
below:
A
a1 a2 a3
This is a tree of height one, and we have built this tree after only one iteration. This
tree correctly classifies 3 out of 5 training samples, based on only one attribute A,
which gave the maximum information gain. It will classify every forthcoming sample
that has a value of a1 in attribute A as YES, and each sample having a3 as NO. The
correctly classified samples are highlighted below:
S A B E C
d1 a1 b1 e2 YES
d2 a2 b2 e1 YES
d3 a3 b2 e1 NO
d4 a2 b2 e1 NO
d5 a3 b1 e2 NO
Note that a2 was not a good determinant for classifying the output C, because it
gives both YES and NO for d2 and d4 respectively. This means that now we have
to look at other attributes B and E to resolve this conflict. To build the tree further
we will ignore the samples already covered by the tree above. Our new sample
space will be given by S’ as given in the table below:
S’ A B E C
d2 a2 B2 e1 YES
d4 a2 B2 e2 NO
We’ll apply the same process as above again. First we calculate the entropy for
this sub sample space S’:
E(S’) = - p+log2 p+ - p-log2 p-
1 1 1 1
= log log =1
2 2
2 2 2 2
This gives us entropy of 1, which is the maximum value for entropy. This is also
obvious from the data, since half of the samples are positive (YES) and half are
negative (NO).
Since our tree already has a node for A, ID3 assumes that the tree will not have
the attribute repeated again, which is true since A has already divided the data as
much as it can, it doesn’t make any sense to repeat A in the intermediate nodes. Give
this a thought yourself too. Meanwhile, we will calculate the gain information of B
192
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
|S’| = 2
|S’b2| = 2
| S 'b2 |
G(S ', B) E(S ') E(S 'b )
2
| S '|
2 1 1 1 1
G(S ', B) 1 ( log log ) = 1 - 1 = 0
2 2
2 2 2 2 2
Similarly for E:
|S’| = 2
|S’e1| = 1 [since there is only one observation of e1 which outputs a YES]
E(S’e1) = -1log21 - 0log20 = 0 [since log 1 = 0]
|S’e2| = 1 [since there is only one observation of e2 which outputs a NO]
E(S’e2) = -0log20 - 1log21 = 0 [since log 1 = 0]
Hence:
| S ' e1 | | S ' e2 |
G(S ', E) E(S ') E(S ' e ) E(S ' e )
1 2
| S '| | S '|
1 1
G(S ', E) 1 (0) (0) = 1 - 0 - 0 = 1
2 2
Therefore E gives us a maximum information gain, which is also true intuitively
since by looking at the table for S’, we can see that B has only one value b2, which
doesn’t help us decide anything, since it gives both, a YES and a NO. Whereas,
E has two values, e1 and e2; e1 gives a YES and e2 gives a NO. So weput the
node E in the tree which we are already building. The pictorial representation is
shown below:
193
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
a2
YES E NO
e1 e2
YES NO
Now we will stop further iterations since there are no conflicting leaves that we need
to expand. This is our hypothesis h that satisfies each training example.
194
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 39
Techniques of Learning and Perceptron’s
While this clearly shows that the human information processing system is superior
to conventional computers, but still it is possible to realize an artificialneural
network which exhibits the above mentioned properties. We’ll start with a single
perceptron, pioneering work done in 1943 by McCulloch and Pitts.
Input1
weigh
Threshold
Input2 Function Output
weigh
Bias
196
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
197
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Input 2
Input 1 Input 2 OR
0 0 0
0 1 1
1 0 1
1 1 1
Input 1
A single perceptron simply draws a line, which is a hyper plane when the data is more
then 2 dimensional. Sometimes there are complex problems (as is the case
198
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
in real life). The data for these problems cannot be separated into their respective
classes by using a single straight line. These problems are not linearly separable.
Another example of linearly non-separable problems is the XOR gate (exclusive OR).
This shows how such a small data of just 4 rows, can make it impossible to draw one
line decision boundary, which can separate the 1s from 0s.
(1,0) (1,1)
Input 2 0
1
1
0
0
0
1 1 1
(0,0) (0,1)
Input 1
Can you draw one line which separates the ones from zeros for the output?
199
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
(1,0) (1,1)
Input 2
(0,0) (0,1)
Input 1
A single layer perceptron can perform pattern classification only on linearly separable
patterns, regardless of the type of non-linearity (hard limiter, signoidal). Papert and
Minsky in 1969 illustrated the limitations of Rosenblatt’s single layer perceptron (e.g.
requirement of linear separability, inability to solve XOR problem) and cast doubt
on the viability of neural networks. However, multi-layer perceptron and the back-
propagation algorithm overcomes many of the shortcomings of the single layer
perceptron.
Input 1
Input 2
Each neuron in the hidden layer forms a different decision line. Together all the
lines can construct any arbitrary non-linear decision boundaries. These multi- layer
perceptrons are the most basic artificial neural networks.
200
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 40
Artificial Neural Networks
Advantages
• Excellent for pattern recognition
• Excellent classifiers
• Handles noisy data well
• Good for generalization
Draw backs
• The power of ANNs lie in their parallel architecture
– Unfortunately, most machines we have are serial (Von Neumann
architecture)
• Lack of defined rules to build a neural network for a specific problem
– Too many variables, for instance, the learning algorithm, number of
neurons per layer, number of layers, data representation etc
• Knowledge is implicit
• Data dependency
But all these drawbacks doesn’t mean that the neural networks are useless artifacts.
They are still arguably very powerful general purpose problem solvers.
Connectivity
o Fully connected
o Partially connected
Learning methodology
o Supervised
Given a set of example input/output pairs, find a rule that does a
good job of predicting the output associated with a new input.
o Unsupervised
Given a set of examples with no labeling, group them into
sets called clusters
202
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
• These large feature spaces make algorithms run slower. They also
make the training process longer. The solution lies in finding a smaller
feature space which is the subset of existing features.
• Feature Space should show discrimination between classes of the
data. Patient’s height is not a useful feature for classifying whether he
is sick or healthy
Training
• Training is either supervised or unsupervised.
• Remember when we said:
• We assume that the concept lies in the hypothesis space. So we
search for a hypothesis belonging to this hypothesis space
that best fits the training examples, such that the output given
by the hypothesis is same as the true output of concept
• Finding the right hypothesis is the goal of the training session.
So neural networks are doing function approximation, and
training stops when it has found the closest possible function
that gives the minimum error on all the instances
• Training is the heart of learning, in which finding the best
hypothesis that covers most of the examples is the objective.
Learning is simply done through adjusting the weights of the
network
Activation
function
Weighted Similarity
Sum
of input
measure
Weight
updation
Similarity Measurement
• A measure to tell the difference between the actual output of the
network while training and the desired labeled output
• The most common technique for measuring the total error in
each iteration of the neural network (epoch) is Mean Squared
Error (MSE).
Validation
• During training, training data is divided into k data sets; k-1 sets
are used for training, and the remaining data set is used for
cross validation. This ensures better results, and avoids over-
fitting.
203
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Stopping Criteria
• Done through MSE. We define a low threshold usually 0.01,
which if reached stops the training data.
• Another stopping criterion is the number of epochs, which
defines how many maximum times the data can be presented
to the network for learning.
Application Testing
• A network is said to generalize well when the input-output
relationship computed by the network is correct (or nearly so)
for input-output pattern (test data) never used in creating and
training the network.
7.21 Supervised
Given a set of example input/output pairs, find a rule that does a good job of
predicting the output associated with a new input.
7.21.1 Back propagation algorithm
1. Randomize the weights {ws} to small random values (both positive and
negative)
2. Select a training instance t, i.e.,
a. the vector {xk(t)}, i = 1,...,Ninp (a pair of input and output patterns),
from the training set
3. Apply the network input vector to network input
4. Calculate the network output vector {zk(t)}, k = 1,...,Nout
5. Calculate the errors for each of the outputs k , k=1,...,Nout, the difference
between the desired output and the network output
6. Calculate the necessary updates for weights -ws in a way that minimizes
this error
7. Adjust the weights of the network by - ws
8. Repeat steps for each instance (pair of input–output vectors) in the training
set until the error for the entire system
204
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 41
Artificial Neural Networks
7.22 Unsupervised
Given a set of examples with no labeling, group them into sets called
clusters
A cluster represents some specific underlying patterns in the data
Useful for finding patterns in large data sets
Form clusters of input data
Map the clusters into outputs
Given a new example, find its cluster, and generate the associated output
205
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
7.23 Exercise
1) We will change the problem size for SICK a little bit. If T can take on 4 values,
and BP can take 5 values. For conjunctive bias, determine the size of
instance space and hypothesis space.
2) Is the following concept possible through conjunctive or disjunctive
hypothesis? ( T AND BP ) or ( T OR BP )
makeTrainData.m
for i = 1:7
filename = strcat('alif', int2str(i),'.bmp');
tempImage = imread(filename);
trainData(i,:) = reshape(tempImage,1,100);
end
for i = 1:7
filename = strcat('bay', int2str(i),'.bmp');
tempImage = imread(filename);
trainData(i+7,:) = reshape(tempImage,1,100);
end
for i = 1:7
filename = strcat('jeem', int2str(i),'.bmp');
tempImage = imread(filename);
trainData(i+14,:) = reshape(tempImage,1,100);
end
targetData = zeros(21,3);
targetData(1:7,1) = 1;
targetData(8:14,2) = 1;
targetData(15:21,3) = 1;
save 'trainData' trainData targetData ;
makeTestData.m
206
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
for i = 1:3
filename = strcat('alif', int2str(i),'.bmp');
tempImage = imread(filename);
testData(i,:) = reshape(tempImage,1,100);
end
for i = 1:3
filename = strcat('bay', int2str(i),'.bmp');
tempImage = imread(filename);
testData(i+3,:) = reshape(tempImage,1,100);
end
for i = 1:3
filename = strcat('jeem', int2str(i),'.bmp');
tempImage = imread(filename);
testData(i+6,:) = reshape(tempImage,1,100);
end
targetData = zeros(9,3);
targetData(1:3,1) = 1;
targetData(4:6,2) = 1;
targetData(7:9,3) = 1;
load '[Link]';
[Link] = 15;
[Link] = 0.01;
bpn = train(bpn,trainData',targetData');
testNN.m
load('trainData');
load('bpnNet');
Y = sim(bpn, trainData');
207
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
[X,I] = max(Y);
errorCount = 0;
for i = 1 : length(targetData)
if ceil(i/7) ~= I(i)
errorCount = errorCount + 1;
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%
load('testData');
Y = sim(bpn, testData');
[X,I] = max(Y);
errorCount = 0;
for i = 1 : length(targetData)
if ceil(i/3) ~= I(i)
errorCount = errorCount + 1;
end
end
208
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 42
Planning and Problem Solving Algorithm
8 Planning
8.1 Motivation
We started study of AI with the classical approach of problem solving that founders
of AI used to exhibit intelligence in programs. If you look at problem solving again
you might now be able to imagine that for realistically complex problems too problem
solving could work. But when you think more you might guess that there might be
some limitation to this old approach.
Lets take an example. I have just landed on Lahore airport as a cricket-loving tourist.
I have to hear cricket commentary live on radio at night at a hotel where I have to
reserve a room. For doing that, I have to find the hotel to get my room reserved before
its too late, and also I have to find the market to buy the radiofrom. Now this is
a more realistic problem. Is this a tougher problem? Let’s see.
One thing easily visible is that this problem can be broken into multiple problems
i.e. is composed of small problems like finding market and finding the hotel. Another
observation is that different things are dependent on others like listeningto radio is
dependent upon the sub-problem of buying the radio or finding the market.
Ignore the observations made above for a moment. If we start formulating this
problem as usual, be assured that the state design will have more information init.
There will be more operators. Consequently, the search tree we generate willbe
much bigger. The poor system that will run this search will have much more load
than any of the examples we have studied so far. The search tree will consume
more space and it will take more calculations in the process.
A state design and operators for the sample problem formulation could be asshown
in figure.
Location
Has radio?
Turn right
Sells radio? Turn left
IsHotel? Move forward
IsMarket? Buy radio
Reservation
done?
Get reservation
And maybe Listen radio
more… Sleep
And maybe more…
If we apply say, BFS in this problem the tree can easily become something hugelike
this rough illustration.
209
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Location=Airpo
rt
Has radio?=No
Sells
radio?=No Figure – Search space
IsHotel?=No of a moderate problem
IsMarket?=No
ReservationDo
ne?=No Although this tree is
.
. just a depiction of
how a search space
BuyRadio
grows for realistic
problems, yet after
TurnRightTurnLeft
seeing this tree we
can very well
imagine for even
X X X more complex
X X X problems that the
search tree could be
too big, big enough
to trouble us. So the
question is, can we
make such inefficient
problem solving any
X X X better?
X X X
Good news is that
X X X
X X X the answer is yes.
How? Simply
speaking, this
‘search’ technique
could be improved
by acting a bit logically instead of blindly. For example not using operators at a state
where their usage is illogical. Like operator ‘sleeping’ should not be even tried to
generate children nodes from a state where I am not at the hotel, or even haven’t
reserved the room.
The field of acting logically to solve problems is known as Planning. Planning is based
on logic representation that we have already studied, so you will not find ittoo
difficult and thus we have kept it short.
The key in planning is to use logic in order to solve problem elegantly. People working
in AI have devised different techniques and algorithms for planning. We will now
introduce a basic definition of planning.
210
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
STRIPS is one of the founding languages developed particularly for planning. Let us
understand planning to a better level by seeing what a planning language can
represent.
at(X)
8.4.2 State
State is a conjunction of predicates represented in well-known form, for example,a
state where we are at the hotel and do not have either cash or radio is represented
as,
8.4.3 Goal
Goal is also represented in the same manner as a state. For example, if the goal
of a planning problem is to be at the hotel with radio, it is represented as,
211
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
at(hotel) have(radio)
Action is a predicate used to change states. It has three components namely, the
predicate itself, the pre-condition, and post-condition predicates. For example, the
action to buy something item can be represented as,
Action:
buy(X)
Pre-conditions:
at(Place) sells(Place, X)
Post-conditions/Effect:
have(X)
What this example action says is that to buy any item ‘X’, you have to be (pre-
conditions) at a place ‘Place’ where ‘X’ is sold. And when you apply this operator
i.e. buy ‘X’, then the consequence would be that you have item ‘X’ (post-
conditions).
If you think over this algorithm, it is quite simple. You just start with an empty plan
in which naturally, no condition predicate of goal state is met i.e. pre-conditions of
finish action are not met. You backtrack by adding actions that meet these unsatisfied
pre-condition predicates. New unsatisfied preconditions will be generated for each
newly added action. Then you try to satisfy those by using appropriate actions in the
same way as was done for goal state initially. You keepon doing that until there is
no unsatisfied precondition.
Now, at some time there might be two actions at the same level of ordering of them
one action’s effect conflicts with other action’s pre-condition. This is called a
212
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
threat and should be resolved. Threats are resolved by simply reordering such
actions such that you see no threat.
Because this algorithm does not order actions unless absolutely necessary it is
known as a partial-order planning algorithm.
Let us understand it more by means of the example we discussed in the lecture from
[??].
The problem to solve is of shopping a banana, milk and drill from the market and
coming back to home. Before going into the dry-run of POP let us reproduce the
predicates.
The initial state and the goal state for our algorithm are formally specified asunder.
Initial State:
At(Home) Sells (HWS, Drill) Sells (SM, Banana) Sells (SM, Milk)
Path (home, SM) path (SM, HWS) Path (home, HWS)
Goal State:
At (Home) Has (Banana) Has (Milk) Has (Drill)
The actions for this problem are only two i.e. buy and go. We have added the special
actions start and finish for our POP algorithm to work. The definitions for these four
actions are.
Go (x)
Preconditions: at(y) path(y,x)
Postconditions: at(x) ~at(y)
Buy (x)
Preconditions: at(s) sells (s, x)
Postconditions: has(x)
Start ()
Preconditions: nill
Postconditions: At(Home) Sells (HWS, Drill) Sells (SM, Banana)
Sells (SM, Milk) Path (home, SM) path (SM, HWS) Path (home, HWS)
Finish ()
Preconditions: At (Home) Has (Banana) Has (Milk) Has (Drill)
213
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Postconditions: nill
Note the post-condition of the start action is exactly our initial state. That is how we
have made sure that our end plan starts with the initial state configuration given.
Similarly note that the pre-conditions of finish action are exactly the sameas the
goal state. Thus we can ensure that this plan satisfies all the conditions ofthe goal
state. Also note that naturally there is no pre-condition to start and no post-condition
for finish actions.
Now we start the algorithm by just putting the start and finish actions in our plan
and linking them. After this first initial step the situation becomes as follows.
Start
At(Home) Sells(SM, Banana) Sells(SM, Milk) Sells(HWS, Drill)
Finish
We now enter the main loop of POP algorithm where we iteratively find any
unsatisfied pre-condition in our existing plan and then satisfying it by an appropriate
action.
Start
214
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Start
Finish
We now move forward and see what other pre-conditions are not satisfied. At(HWS)
is not satisfied in action Buy(Drill). Similarly At(SM) is not satisfied in actions Buy(Milk)
and Buy(Banana). Only action Go() has post-conditions that can satisfy these pre-
conditions. Adding them one-by-one to satisfy all these pre- conditions our plan
becomes,
Start
Now if we check
for threats wefind
At(Home) At(Home)
that if we go to
Go(HWS) Go(SM) HWS from Home
we cannot go to
SM from Home.
At(HWS), Sells(HWS, Drill) At(SM), Sells(SM, Milk) At(SM), Sells(SM, Bananas) Meaning, post-
condition of
Buy(Drill) Buy(Milk) Buy(Bananas)
Go(HWS)
threats the pre-
condition
Have(Drill), Have(Milk), Have(Bananas) At(Home)
At(Home) of
Finish Go(SM) and vice
versa. So as
given in our POP algorithm, we have to resolve the threat by reordering these actions
such that no action threat pre-conditions of other action.
That is how POP proceeds by adding actions to satisfy preconditions and reordering
actions to resolve any threat in the plan. The final plan using this algorithm becomes.
215
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Start
To feel more comfortable on the plan we have achieved from this problem, lets
narrate our solution in plain English.
“Start by going to hardware store. Then you can buy drill and then go to the super
market. At the super market, buy milk and banana in any order and then go home.
You are done.”
8.7 Problems
1. A Farmer has a tiger, a goat and a bundle of grass. He is standing at one side
of the river with a very week boat which can hold only one of his belongings at
a time. His goal is has to take all three of his belongings to the other side.
The constraint is that the farmer cannot leave either goatand tiger, or goat
and grass, at any side of the river unattended becauseone of them will eat
the other. Using the simple POP algorithm we studiedin the lecture, solve
this problem. Show all the intermediate and final plans step by step.
2. A robot has three slots available to put the blocks A, B, C. The blocks are
initially placed at slot 1, one upon the other (A placed on B placed on C) and
it’s goal is to move all three to slot 3 in the same order. The constraintto this
robot is that it can only move one block from any slot to any otherslot, and
it can only pick the top most block from a slot to move. Using the simple POP
algorithm we studied in the lecture, solve this problem. Show all the
intermediate and final plans step by step.
202
Lecture # 43
Advanced Topics in Artificial Intelligence
9 Advanced Topics
9.1 Computer vision
Exercise Question
Search through the internet and read about interesting happenings and research going on around
the globe in the area of Computer Vision. 203
[Link]
The above link might be useful to explore knowledge about computer vision.
218
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 44
Advanced Topics in Artificial Intelligence
9.2 Robotics
Robotics is the highly advanced and totally hyped field of today. Literally speaking,
robotics is the study of robots. Robots are nothing but a complex combination of
hardware and intelligence, or mechanics and brains. Thus robotics is truly a multi-
disciplinary area, having active contributions from, physics, mechanics, biology,
mathematics, computer science, statistics, control thory, philosophy, etc.
Mobility
Perception
Planning
Searching
Reasoning
Dealing with uncertainty
Vision
Learning
Autonomy
Physical Intelligence
What we can see from the list is that robotics is the most profound manifestationof
AI in practice. The most crucial or defining ones from the list above are mobility,
autonomy and dealing with uncertainety
The area of robotics have been followed with enthusiasm by masses from fiction,
science and industry. Now robots have entered the common household, as robot pets
(Sony Aibo entertainment robot), oldage assistant and people carriers (Segway
human transporter).
Exercise Question
Search through the internet and read about interesting happenings and reseach
going on around the globe in the area of robotics.
[Link]
9.2.1 Softcomputing
Genetic algorithms have been employed in finding the optimal initial weights of neural
networks.
Exercise Question
Search through the internet and read about interesting happenings and research
going on around the globe in the area of softcomputing.
[Link]
9.3 Clustering
The famous clustering algorithms are Self-organizing maps (SOM), k-means, linear
vector quantization, Density based data analysis, etc.
220
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Exercise Question
Search through the internet and read about interesting happenings and research
going on around the globe in the area of clustering.
[Link]
221
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
Lecture # 45
Revision of All Lectures
10 Conclusion
We have now come to the end of this course and we have tried to cover all the core
technologies of AI at the basic level. We hope that the set of topics we have studied
so far can give you the essential base to work into specialized, cutting- edge areas
of AI.
Let us recap what have we studied and concluded so far. The list of major topics that
we covered in the course is:
problem solving. Genetic algorithms are inspired by the biological theory of evolution
and provide facilities of parallel search agents using collaborative hill climbing. We
have seen that many otherwise difficult problems to solve through classical
programming or blind search techniques are easily but undeterministically solved
using genetic algorithms.
At this point we introduced the cycle of AI to set base for systematic approach to study
contemporary techniques in AI.
Predicate logic and the classical and successful expert systems were limited in that
they could only deal with perfect boolean logic alone. Fuzzy logic provided the new
base of knowledge and logic representation to capture uncertain information and
thus fuzzy reasoning systems were developed. Just like expert systems, fuzzy
systems have almost recently found exceptional success and are one of the most
used AI systems of today, with applications ranging from self- focusing cameras to
automatic intelligent stock trading systems.
10.7 Learning
223
© Copyright Virtual University of Pakistan
Artificial Intelligence (CS607)
knowledge at all, and that is where learning was felt essential i.e. the ability of
knowledge based systems to improve through experience.
Learning has been categorized into rote, inductive and deductive learning. Out of
these all almost all the prevalent learning techniques are attributed to inductive
learning, including concept learning, decision tree learning and neural networks.
10.8 Planning
In the end we have studied a rather specialized part of AI namely planning. Planning
is basically advancement to problem solving in which concepts of KRR are fused
with the knowledge of classical problem solving to construct advanced systems to
solve reasonably complex real world problems with multiple, interrelated and
unrelated goals. We have learned that using predicate logic and regression, problems
could be elegantly solved which would have been nightmare for machines in case
of classical problem solving approach.
Now, it’s up to you to take these thoughts and directions along with the basics and
move forward into advanced study and true application of the field of Artificial
Intelligence.
224
© Copyright Virtual University of Pakistan