Notes Artificial Intelligence Unit 3
Notes Artificial Intelligence Unit 3
UNIT-03
Probabilistic Reasoning
UNIT-03/LECTURE-01
Bayes Theorem
This reads that given some evidence E then probability that hypothesis is true is equal to the ratio of the
probability that E will be true given times the a priori evidence on the probability of and the sum of the
probability of E over the set of all hypotheses times the probability of these hypotheses.
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.
• Knowledge acquisition is very hard.
• Too many probabilities needed -- too large a storage space.
• Computation time is too large.
• Updating new information is difficult and time consuming.
• Exceptions like ``none of the above'' cannot be represented.
• Humans are not very good probability estimators.
However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning systems with
suitable enhancement to overcome the above problems.
We will look at three broad categories:
• Certainty factors,
• Dempster-Shafer models,
• Bayesian networks.
UNIT-03/Lecture 02
Dempster-Shafer Models
This can be regarded as a more general approach to representing uncertainty than the Bayesian
approach.
Bayesian methods are sometimes inappropriate:
Let A represent the proposition Demi Moore is attractive.Then the axioms of probability insist that
Now suppose that Andrew does not even know who Demi Moore is.
Then
• We cannot say that Andrew believes the proposition if he has no idea what it means.
• Also, It is not fair to say that he disbelieves the proposition.
• It would therefore be meaningful to denote Andrew's belief of B(A) and as both being 0.
• Certainty factors do not allow this.
Dempster-Shafer Calculus
The basic idea in representing uncertainty in this model is:
• Set up a confidence interval -- an interval of probabilities within which the true probability lies with a
certain confidence -- based on the Belief B and plausibility PL provided by some evidence E for a
proposition P.
• The belief brings together all the evidence that would lead us to believe in P with some certainty.
• The plausibility brings together the evidence that is compatible with P and is not inconsistent with it.
• This method allows for further additions to the set of knowledge and does not assume disjoint
outcomes.
If is the set of possible outcomes, then a mass probability, M, is defined for each member of the set and
3
Implementation
• A Bayesian Network is a directed acyclic graph:
o A graph where the directions are links which indicate dependencies that exist between nodes.
o Nodes represent propositions about events or events themselves.
o Conditional probabilities quantify the strength of dependencies.
UNIT-03/Lecture 03
Fuzzy Logic
This topic is treated more formally in other courses. Here we summarize the main points for the sake
completeness.
Fuzzy logic is a totally different approach to representing uncertainty:
• It focuses on ambiguities in describing events rather the uncertainty about the occurrence of an event.
• Changes the definitions of set theory and logic to allow this.
• Traditional set theory defines set memberships as a Boolean predicate.
Fuzzy Set Theory
• Fuzzy set theory defines set membership as a possibility distribution.
This basically states that we can take n possible events and us f to generate as single possible outcome.
This extends set membership since we could have varying definitions of, say, hot curries. One person
might declare that only curries of Vindaloo strength or above are hot whilst another might say madras
and above are hot. We could allow for these variations definition by allowing both possibilities in fuzzy
definitions.
Once set membership has been redefined we can develop new logics based on combining of
Uncertain Reasoning
Sometimes the knowledge in rules is not certain. Rules then may be enhanced by adding information
about how certain the conclusions drawn from the rules may be. Here we describe certainty factors and
their manipulation.
Often, experts can't give definite answers.
May require an inference mechanism that derives conclusions by combining uncertainties.
Fuzzy Inferencing
The process of fuzzy reasoning is incorporated into what is called a Fuzzy Inferencing System. It is
comprised of three steps that process the system inputs to the appropriate system outputs. These steps
are 1) Fuzzification, 2) Rule Evaluation, and 3) Defuzzification. The system is illustrated in the following
figure. https://www.rgpvonline.com
1. Fuzzification : is the first step in the fuzzy inferencing process. This involves a domain formation where
crisp inputs are transformed into fuzzy inputs. Crisp inputs are exact inputs measured by sensors and
passed into the control system for processing, such as temperature, pressure, rpm's, etc.. Each crisp input
that is to be processed by the FIU has its own group of membership functions or sets to which they are
transformed. This group of membership functions exists within a universe of discourse that holds all
relevant values that the crisp input can possess. The following shows the structure of membership
5
2. Degree of membership: degree to which a crisp value is compatible to a membership function, value
from 0 to 1, also known as truth value or fuzzy input. membership function, MF: defines a fuzzy set by
mapping crisp values from its domain to the sets associated degree of membership.
3.crisp inputs: distinct or exact inputs to a certain system variable, usually measured
6.scope: or domain, the width of the membership function, the range of concepts, usually numbers, over
which a membership function is mapped.
7.universe of discourse: range of all possible values, or concepts, applicable to a system variable. When
designing the number of membership functions for an input variable, labels must initially be determined
for the membership functions. The number of labels correspond to the number of regions that the
universe should be divided, such that each label describes a region of behavior. A scope must be assigned
to each membership function that numerically identifies the range of input values that correspond to a
label. The shape of the membership function should be representative of the variable. However this
shape is also restricted by the computing resources available. Complicated shapes require more complex
descriptive equations or large lookup tables. The next figure shows examples of possible shapes for
membership functions.
UNIT-03/Lecture 04
Certainty Factors
Logic and rules provide all or nothing answers
An expert might want to say that something provides evidence for a conclusion, but it is not definite.
For example, the MYCIN system, an early expert system that diagnosed bacterial blood infections, used
rules of this form:
if the infection is primary-bacteremia
and the site of the culture is one of the sterile sites
and the suspected portal of entry is the gastrointestinal tract
then there is suggestive evidence (0.7) that the infection is bacteroid
0.7 is a certainty factor
Certainty factors have been quantified using various different systems, including linguistics ones (certain,
fairly certain, likely, unlikely, highly unlikely, definitely not) and various numeric scales, such as 0-10, 0-1,
and -1 to 1. We shall concentrate on the -1 to 1 version.
Certainty factors may apply both to facts and to rules, or rather to the conclusion(s) of rules.
A "Theory" of Certainty
6
The combined CF of the premises is then multiplied by the CF of the rule to get the CF of the conclusion
Example
if (P1 and P2) or P3 then C1 (0.7) and C2 (0.3)
Assume CF(P1) = 0.6, CF(P2) = 0.4, CF(P3) = 0.2
CF(P1 and P2) = min(0.6, 0.4) = 0.4
CF(0.4, P3) = max(0.4, 0.2) = 0.4
CF(C1) = 0.7 * 0.4 = 0.28
CF(C2) = 0.3 * 0.4 = 0.12
________________________________________
Combining Multiple CF's
Suppose two rules make conclusions about C.
How do we combine evidence from two rules?
Let CFR1(C) be the current CF for C.
Let CFR2(C) be the CF for C resulting from a new rule.
The new CF is calculated as follows:
no information about whether the proposition is true or not. A certainty factor of -1 means that the
proposition is certainly false. A certainty factor of 0.7 means that the proposition is quite likely to be true,
and so on.
The certainty factors of conditions are associated with facts held in working memory. Certainty factors for
actions are stored as part of the rules.
Rules for manipulating certainty factors are given in the lecture notes on uncertain reasoning.
However, here is a simple example. Suppose that there is a rule
if P then Q (0.7)
meaning that if P is true, then, with certainty factor 0.7, Q follows. Suppose also that P is stored in
working memory with an associated certainty factor of 0.8. Suppose that the rule above fires (see also
match-resolve-act cycle). Then Q will be added to working memory with an associated certainty factor of
0.7 * 0.8 = 0.56.
condition-action rule
A condition-action rule, also called a production or production rule, is a rule of the form
if condition then action.
The condition may be a compound one using connectives like and, or, and not. The action, too, may be
compound. The action can affect the value of
working memory variables, or take some real world action, or potentially do other things, including
stopping the production system.
Rule-Based Systems
The knowledge of many expert systems is principally stored in their collections of rules.
One of the most popular methods for representing knowledge is in the form of Production Rules. These
are in the form of:
if conditions then conclusion
If 1) the gram stain of the organism is gram
negative, and
2) the morphology of the organism is rod, and
3) the aerobicity of the organism is
anaerobic,
Then: There is suggestive evidence (. 6) that
the identity of the organism is
Bacteroides.
Advantages of Rules
• Knowledge comes in meaningful chunks.
• New knowledge can be added incrementally.
• Rules can make conclusions based on different kinds of data, depending on what is available.
• Rule conclusions provide ``islands'' that give multiplicative power.
• Rules can be used to provide explanations, control problem-solving process, check new rules for errors.
EMYCIN
EMYCIN was the first widely used expert system tool.
• Good for learning expert systems
• Limited in applicability to ``finite classification'' problems:
o Diagnosis
o Identification
• Good explanation capability
• Certainty factors
Several derivative versions exist.
Rule-Based Expert Systems[Shortliffe, E. Computer-based medical consultations: MYCIN. New York:
Elsevier, 1976.]
MYCIN diagnoses infectious blood diseases using a backward-chained (exhaustive) control strategy.
The algorithm, ignoring certainty factors, is basically back chaining:
Given:
1. list of diseases, Goal-list
8
2. initial symptoms, DB
3. Rules
For each g ∈ Goal-list do
If prove(g, DB, Rules) then Print (``Diagnosis:'', g)
Function prove (goal, DB, Rules)
If goal ∈ DB then return True
elseif ∃ r ∈ Rules such that rRHS contains goal
then return provelist(LHS, DB, Rules)[provelist calls prove with each condition of LHS]
else Ask user about goal and return answer
SLOT AND FILLER STRUCTURE
Why use this data structure?
• It enables attribute values to be retrieved quickly
o assertions are indexed by the entities
o binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff).
• Properties of relations are easy to describe .
• It allows ease of consideration as it embraces aspects of object oriented programming.
So called because:
• A slot is an attribute value pair in its simplest form.
• A filler is a value that a slot can take -- could be a numeric, string (or any data type) value or a pointer to
another slot.
• A weak slot and filler structure does not consider the content of the representation.
We will study two types:
• Semantic Nets.
• Frames.
UNIT-03/Lecture 04
Semantic Network :
Semantic networks are a knowledge representation technique. More specifically, it is a way of recording
all the relevant relationships between members of set of objects and types. "Object" means an individual
(a particular person, or other particular animal or object, such as a particular cat, tree, chair, brick, etc.).
"Type" means a set of related objects - the set of all persons, cats, trees, chairs, bricks, mammals, plants,
furniture, etc. Possible relationships include the special set-theoretic relationships isa (set membership)
and ako(the subset relation), and also general relationships like likes, child-of. Technically a semantic
network is a node- and edge-labelled directed graph, and they are frequently depicted that way. Here is a
pair of labelled nodes and a single labelled edge (relationship) between them (there could be more than
one relationship between a single pair):
Here is a larger fragment of a semantic net, showing 4 labelled nodes (Fifi, cat, mammal, milk) and three
labelled edges (isa, ako, likes) between them.
slot : A slot in a frame is like a field in a record or struct in languages like Pascal, Modula-2 and C.
However, slots can be added dynamically to frames, and slots contain substructure, called facets. The
facets would normally include a value, perhaps a default, quite likely some demons, and possibly some
flags like the iProlog frame system's cache and multi_valued facets.
state
The major idea is that:
• The meaning of a concept comes from its relationship to other concepts, and that,
• The information is stored by interconnecting nodes with labelled arcs.
Representation in a Semantic Net
9
These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person)
team(Mike-Hall, Cardiff)
We have already seen how conventional predicates such as lecturer(dave) can be written as instance
(dave, lecturer) Recall that isa and instance represent inheritance and are popular in many knowledge
representation schemes. But we have a problem: How we can have more than 2 place predicates in
semantic nets? E.g. score(Cardiff, Llanelli, 23-6) Solution:
• Create new nodes to represent new objects either contained or alluded to in the knowledge, game and
fixture in the current example.
As a more complex example consider the sentence: John gave Mary the book. Here we have several
aspects of an event.
In making certain inferences we will also need to distinguish between the link that defines a new entity
and holds its value and the other kind of link that relates two existing entities. Consider the example
shown where the height of two people is depicted and we also wish to compare them.
We need extra nodes for the concept as well as its value.
Special procedures are needed to process these nodes, but without this distinction the analysis would be
very limited.
Here we have to construct two spaces one for each x,y. NOTE: We can express variables as existentially
qualified variables and express the event of love having an agent p and receiver b for every parent p
which could simplify the network (See Exercises).
Also If we change the sentence to Every parent loves child then the node of the object being acted on
(the child) lies outside the form of the general statement. Thus it is not viewed as an existentially qualified
variable whose value may depend on the agent. (See Exercises and Rich and Knight book for examples of
this) So we could construct a partitioned network as in Fig. 16
Generic Frame : A frame that serves as a template for building instance frames. For example, a generic
frame might describe the "elephant" concept in general, giving defaults for various elephant features
(number of legs, ears, presence of trunk and tusks, colour, size, weight, habitat, membership of the class
of mammals, etc.), which an instance frame would describe a particular elephant, say "Dumbo", who
might have a missing tusk and who would thus have the default for number of tusks overridden by
specifically setting number of tusks to 1. Instance frames are said to inherit their slots from the generic
frame used to create them. Generic frames may also inherit slots from other generic frames of which they
are a subconcept (as with mammal and elephant - elephant inherits all the properties of mammal that are
encoded in the mammal generic frame - warm blood, bear young alive, etc.)
Goal state
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction
between a semantic net and a frame ends. Semantic nets initially we used to represent labelled
connections between objects. As tasks became more complex the representation needs to be more
structured. The more structured the system it becomes more beneficial to use frames. A frame is a
collection of attributes or slots and associated values that describe some real world entity. Frames on
their own are not particularly helpful but frame systems are a powerful way of encoding information to
support reasoning. Set theory provides a good basis for understanding frame systems. Each frame
represents:
• a class (set), or
An instFrame Knowledge Representation
Figure: A simple frame system
Here the frames Person, Adult-Male, Rugby-Player and Rugby-Team are all classes and the frames Robert-
Howley and Cardiff-RFC are instances.
• The isa relation is in fact the subset relation.
• The instance relation is in fact element of.
• The isa attribute possesses a transitivity property. This implies: Robert-Howley is a Back and a Back is a
Rugby-Player who in turn is an Adult-Male and also a Person.
• Both isa and instance have inverses which are called subclasses or all instances.
• There are attributes that are associated with the class or set such as cardinality and on the other hand
there are attributes that are possessed by each member of the class or set.
DISTINCTION BETWEEN SETS AND INSTANCES
It is important that this distinction is clearly understood.
Cardiff-RFC can be thought of as a set of players or as an instance of a Rugby-Team.
If Cardiff-RFC were a class then
• its instances would be players
• it could not be a subclass of Rugby-Team otherwise its elements would be members of Rugby-Team
11
• Here is a list of the demon types supported by the iProlog frame implementation:
if_added
demons are triggered when a new value is put into a slot.
if_removed
demons are triggered when a value is removed from a slot.
if_replaced
is triggered when a slot value is replaced.
if_needed
demons are triggered when there is no value present in an instance frame and a value must be computed
from a generic frame.
if_new
is triggered when a new frame is created.
range
is triggered when a new value is added. The value must satisfy the range constraint specified for the slot.
help
is triggered when the range demon is triggered and returns false.
The following are not demons but demon-related slots in a frame.
cache
• means that when a value is computed it is stored in the instance frame.
• multi_valued
• means that the slot may contain more than one value.
Strong Slot and Filler Structures : Represent links between objects according to more rigid rules.
• Specific notions of what types of object and relations between them are provided.
• Represent knowledge about common situations.
UNIT-03/Lecture 05
• Arrows indicate the direction of dependency. Letters above indicate certain relationships:
o-- object.
R-- recipient-donor.
I -- instrument e.g. eat with a spoon.
D-- destination e.g. going home.
• Double arrows ( ) indicate two-way links between the actor (PP) and action (ACT).
• The actions are built from the set of primitive acts (see above).
o These can be modified by tense etc.
The use of tense and mood in describing events is extremely important and schank introduced the
following modifiers:
p -- past
f-- future
t-- transition
-- start transition
-- finished transition
k
-- continuing
?
-- interrogative
/
-- negative
delta
-- timeless
c
-- conditional
the absence of any modifier implies the present tense.
So the past tense of the above example:
John gave Mary a book becomes:
The has an object (actor), PP and action, ACT. I.e. PP ACT. The triplearrow ( ) is also a two link but
between an object, PP, and its attribute, PA. I.e. PP PA.
It represents isa type dependencies. E.g
Dave lecturerDave is a lecturer.
Primitive states are used to describe many state descriptions such as height, health, mental state, physical
state.
14
There are many more physical states than primitive actions. They use a numeric scale.
E.g. John height(+10) John is the tallest John height(< average) John is short Frank Zappa health(-10)
Frank Zappa is dead Dave mental_state(-10) Dave is sad Vase physical_state(-10) The vase is broken
You can also specify things like the time of occurrence in the relation ship.
For Example: John gave Mary the book yesterday
Now let us consider a more complex sentence: Since smoking can kill you, I stopped Lets look at how we
represent the inference that smoking can kill:
• Use the notion of one to apply the knowledge to.
• Use the primitive act of INGESTing smoke from a cigarette to one.
• Killing is a transition from being alive to dead. We use triple arrows to indicate a transition from one
state to another.
• Have a conditional, c causality link. The triple arrow indicates dependency of one concept on another.
Advantages of CD:
• Using these primitives involves fewer inference rules.
• Many inference rules are already represented in CD structure.
• The holes in the initial structure help to focus on the points still to be established.
Disadvantages of CD:
• Knowledge must be decomposed into fairly low level primitives.
• Impossible or difficult to find correct set of primitives.
• A lot of inference may still be required.
• Representations can be complex even for relatively simple actions. Consider:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
Complex representations require a lot of storage
UNIT-03/Lecture 06
Scripts
A script is a structure that prescribes a set of circumstances which could be expected to follow on from
one another. It is similar to a thought sequence or a chain of situations which could be anticipated.
Scripts are beneficial because:
• Events tend to occur in known runs or patterns.
• Causal relationships between events exist.
• Entry conditions exist which allow an event to take place
• Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme
or when a purchaser buys a house.
The components of a script include:
Entry Conditions -- these must be satisfied before events in the script can occur.
Results -- Conditions that will be true after events in script occur.
Props -- Slots representing objects involved in events.
Roles -- Persons involved in the events.
15
Track -- Variations on the script. Different tracks may share components of the same script.
Scenes-- The sequence of events that occur. Events are represented in conceptual dependency form.
Scripts are useful in describing certain situations such as robbing a bank. This might involve:
• Getting a gun.
• Hold up a bank.
• Escape with the money.
Here the Props might be
• Gun, G.
• Loot, L.
• Bag, B
• Get away car, C.
The Roles might be:
• Robber, S.
• Cashier, M.
• Bank Manager, O.
• Policeman, P.
The Entry Conditions might be:
• S is poor.
• S is destitute.
The Results might be:
• S has more money.
• O is angry.
• M is in a state of shock.
• P is shot.
There are 3 scenes: obtaining the gun, robbing the bank and the getaway.
• If a particular script is to be applied it must be activated and the activating depends on its significance.
• If a topic is mentioned in passing then a pointer to that script could be held.
• If the topic is important then the script should be opened.
• The danger lies in having too many active scripts much as one might have too many windows open on
the screen or too many recursive calls in a program.
• Provided events follow a known trail we can use scripts to represent the actions involved and use them
to answer detailed questions.
• Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery goes wrong).
CYC
What is CYC?
• An ambitious attempt to form a very large knowledge base aimed at capturing commonsense reasoning.
• Initial goals to capture knowledge from a hundred randomly selected articles in the EnCYClopedia
Britannica.
• Both Implicit and Explicit knowledge encoded.
• Emphasis on study of underlying information (assumed by the authors but not needed to tell to the
readers.
Example: Suppose we read that Wellington learned of Napoleon's death
Then we (humans) can conclude Napoleon never new that Wellington had died.
How do we do this?
We require special implicit knowledge or commonsense such as:
• We only die once.
• You stay dead.
• You cannot learn of anything when dead.
• Time cannot go backwards.
Why build large knowledge bases:
Brittleness
16
-- Specialised knowledge bases are brittle. Hard to encode new situations and non-graceful degradation in
performance. Commonsense based knowledge bases should have a firmer foundation.
Form and Content
-- Knowledge representation may not be suitable for AI. Commonsense strategies could point out where
difficulties in content may affect the form.
Shared Knowledge
-- Should allow greater communication among systems with common bases and assumptions.
Machine Learning : Machine learning refers to the ability of computers to automatically acquire new
knowledge, learning from, for example, past cases or experience, from the computer's own experiences,
or from exploration. Machine learning has many uses such as finding rules to direct marketing campaigns
based on lessons learned from analysis of data from supermarket loyalty campaigns; or learning to
recognize characters from people's handwriting. Machine learning enables computer software to adapt to
changing circumstances, enabling it to make better decisions than non-AI software. Synonyms: learning,
automatic learning.
Natural Language Processing : English is an example of a natural language, a computer language isn't. For
a computer to process a natural language, it would have to mimic what a human does. That is, the
computer would have to recognize the sequence of words spoken by a person or another computer,
understand the syntax or grammar of the words (i.e., do a syntactical analysis), and then extract the
meaning of the words. A limited amount of meaning can be derived from a sequence of words taken out
of context (i.e., by semantic analysis); but much more of the meaning depends on the context in which
the words are spoken (e.g., who spoke them, under what circumstances, with what tone, and what else
was said, particularly before the words), which would require a pragmatic analysis to extract. To date,
natural language processing is poorly developed and computers are not yet able to even approach the
ability of humans to extract meaning from natural languages; yet there are already valuable practical
applications of the technology.
Very Simply
Probability = (number of desired outcomes) / (total number of outcomes)
So given a pack of playing cards the probability of being dealt an ace from a full normal deck is 4 (the
number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the probability of being dealt a
spade suit is 13 / 52 = 1/4.
If you have a choice of number of items k from a set of items n then the formula is applied to find the
number of ways of making this choice. (! = factorial).
So the chance of winning the national lottery (choosing 6 from 49) is to 1.
• Conditional probability, P(A|B), indicates the probability of of event A given that we know event B has
occurred.
• sets etc. and reason effectively.