Fai Gtu All Lecture 1
Fai Gtu All Lecture 1
AI Lecture Notes
UNIT 1 Introduction
PREPARED BY
PROF. VISHVA UPADHYAY
What is AI?
● Artificial intelligence (AI) refers to the simulation of human intelligence in machines that
are programmed to think like humans and mimic their actions.
OR
● AI is accomplished by studying how the human brain thinks, and how humans learn,
decide, and work while trying to solve a problem, and then using the outcomes of this
study as a basis of developing intelligent software and systems.
OR
● Artificial Intelligence is composed of two words Artificial and Intelligence, where
Artificial defines "man- made," and intelligence defines "thinking power", hence AI
means "a man-made thinking power."
● There are three main types of AI based on its capabilities - weak AI, strong AI, and super
AI.
● Weak AI - Focuses on one task and cannot perform beyond its limitations (common in
our daily lives)
● Strong AI - Can understand and learn any intellectual task that a human being can
(researchers are striving to reach strong AI)
● Super AI - Surpasses human intelligence and can perform any task better than a
human (still a concept)
Advantages of AI
Reduction in Human Error:
● The phrase “human error” was born because humans make mistakes from time to time.
Computers, however, do not make these mistakes if they are programmed properly.
● With Artificial intelligence, the decisions are taken from the previously gathered
information applying a certain set of algorithms. So errors are reduced and the chance of
reaching accuracy with a greater degree of precision is a possibility.
● Example : In Weather Forecasting using AI they have reduced the majority of human
error.
Available 24x7:
● But using AI we can make machines work 24x7 without any breaks and they don’t even
get bored, unlike humans.
● Example : Educational Institutes and Helpline centers are getting many queries and issues
which can be handled effectively using AI.
Digital Assistance:
● The digital assistants are also used in many websites to provide things that users want.
We can chat with them about what we are looking for. Some chatbots are designed in
such a way that it’s become hard to determine that we’re chatting with a chatbot or a
human being.
● Example: We all know that organizations have a customer support team that needs to
clarify the doubts and queries of the customers. Using AI the organizations can set up a
Voice bot or Chatbot which can help customers with all their queries.
Faster Decisions:
● Using AI alongside other technologies we can make machines take decisions faster than a
human and carry out actions quicker. While making a decision humans will analyze many
factors both emotionally and practically but AI-powered machines work on what is
programmed and deliver the results in a faster way.
● Example: We all have played Chess games on Windows. It is nearly impossible to beat
the CPU in hard mode because of the AI behind that game. It will take the best possible
step in a very short time according to the algorithms used behind it.
Daily Applications:
● Daily applications such as Apple’s Siri, Windows Cortana, Google’s OK Google are
frequently used in our daily routine whether it is for searching a location, taking a selfie,
making a phone call, replying to a mail and many more.
● Example : Around 20 years ago, when we were planning to go somewhere we used to ask
a person who already went there for directions. But now all we have to do is say “OK
Google where is Visakhapatnam”. It will show you Visakhapatnam’s location on google
map and the best path between you and Visakhapatnam.
New Inventions:
● AI is powering many inventions in almost every domain which will help humans solve
the majority of complex problems.
● Example : Recently doctors can predict breast cancer in women at earlier stages using
advanced AI-based technologies.
Disadvantages of AI
High Costs of Creation:
● As AI is updating every day the hardware and software need to get updated with time to
meet the latest requirements. Machines need repairing and maintenance which need
plenty of costs. It’ s creation requires huge costs as they are very complex machines.
Unemployment:
● As AI is replacing the majority of repetitive tasks and other work with robots,human
interference is becoming less, which will cause a major problem in the employment
standards. Every organization is looking to replace the minimum qualified individuals
with AI robots which can do similar work with more efficiency.
No Emotions:
● There is no doubt that machines are much better when it comes to working efficiently but
they cannot replace the human connection that makes the team. Machines cannot develop
a bond with humans which is an essential attribute when it comes to Team Management.
The AI Problems
Problems which we are giving to ai so that we can get our output as per our need.
● Problems in AI early focused on formal tasks.
● Another focused problem is called Commonsense reasoning.
● These include perception natural language understanding, and problem solving in
Specialized domains like medical diagnosis and chemical analysis
Nowadays looking at AI problems and solution techniques it is important to discuss the
following Question:
1. What are the underlying assumptions about intelligence?
2. What kinds of techniques will be useful for solving AI problems?
3. At what level human intelligence can be modeled?
4. When will it be realized when an intelligent program has been built?
Underlying assumption of AI
● Any physical symbol system (computer) has the ability for common quick-witted(clever)
actions. Example : Installing any software in system
● That means before the existence of AI, computers could already perform some clever
actions.
Task Domains of Artificial Intelligence (AI)
Mundane Tasks:
● Perception
● Vision
● Speech
● Natural Languages
● Understanding
● Generation
● Translation
● Common sense reasoning
● Robot Control
● Humans have been learning mundane (ordinary) tasks since their birth. They learn by
perception, speaking, using language, and locomotives.
● For humans, the mundane tasks are easiest to learn. The same was considered true before
trying to implement mundane tasks in machines.
● Earlier, all work of AI was concentrated in the mundane task domain.
Formal Tasks:
● Games: chess, checkers, etc
● Mathematics: Geometry, logic, Proving properties of programs
● Formal Tasks are the tasks that need deals with verification, theorem proving, deals with
Math, Games, etc.
Expert Tasks:
● Engineering ( Design, Fault finding, Manufacturing planning)
● Scientific Analysis
● Medical Diagnosis
● Financial Analysis
● Expert Tasks are those tasks, which involves scientific analysis, and different domain
analysis, like Financial, healthcare, creative aspects, etc.
● Now researchers have understood that to solve mundane tasks, they need better and more
efficient algorithms, and a much more knowledge base to help them tackle the problems
they have set out to solve. And that is the reason that AI has shifted more on working
with Expert Tasks, to enhance the capabilities of the AI system.
AI techniques
● Techniques that we are using to solve the problems by applying to AI.
● AI technique is a method that exploits knowledge that should be represented in such a
way that:
● Situations that share common properties are grouped together.
● It can easily be modified to correct errors and to reflect changes in the world.
● It can be used in many situations even though it may not be totally accurate or complete.
There are three important AI techniques:
Search : Provides a way of solving problems for which no direct approach is available. It will be
solved by the data that has been given to AI.
Example :
1 2 3 4
5 6 7 8
9 10 11 ??
Use of knowledge : Provides a way of solving complex problems. It has to use the knowledge
that has already been given to it to solve any problem.
Example : Maze
Abstraction : It should abstract all the possible ways and use only one optimal path to solve the
problem.
Example : Map
Criteria :
● If the targeted goal is achieved. Then we can say that the AI model is successful.
PREPARED BY
PROF. VISHVA UPADHYAY
● State Space Search : A state-space defined as a set of all possible states of a problem.
OR
● It is a complete set of states including start and goal states, where the answer of the
problem is to be searched.
● S = (Start State, Goal State, Action,Result,Cost)
● For Example:
● The eight tile puzzle problem formulation
● The eight tile puzzle consist of a 3 by 3 (3*3) square frame board which holds 8 movable
tiles numbered 1 to 8. One square is empty, allowing the adjacent tiles to be shifted. The
objective of the puzzle is to find a sequence of tile movements that leads from a starting
configuration to a goal configuration to be transformed
●
3 8 1
6 2 5
4 7
Start State
● Here we have our goal state as
1 2 3
8 4
7 6 5
● The states of 8 tile puzzle are the different permutations of the tiles within the frame.
● Start State and Goal State both are defined above.
● After every intermediate state we will match it with our goal state. If the goal state is
reached then it will stop searching otherwise it will continue to search.
● Here Action defines in all possible solution which state it is being chosen
● Result is The Result matrix after applying the action.
● Cost : Each step costs 1, so the path cost is the number of steps in the path.
● Searching can be of 2 types
1. Uninformed Search
2. Informed Search
It uses knowledge for the searching process. It doesn’t use knowledge for the searching
process.
It consumes less time because of quick It consumes moderate time because of slow
searching. searching.
There is a direction given about the solution. No suggestion is given regarding the solution
in it.
Production System
● A production system in AI helps create AI-based computer programs.
● A production system in AI is a type of cognitive architecture that defines specific actions
as per certain rules. The rules represent the declarative knowledge of a machine to
respond according to different conditions
● A production system (popularly known as a production rule system) is a kind of cognitive
architecture that is used to implement search algorithms and replicate human
problem-solving skills.
Global Database
● A database contains all the necessary data and information required for the successful
completion of a task.
● It can be divided into two parts: permanent and temporary. The permanent part of the
database consists of fixed actions, whereas the temporary part alters according to
circumstances.
Production Rules
● Production rules in AI are the set of rules that operate on the data fetched from the global
database.
● These production rules are bound with precondition and postcondition that gets checked
by the database. If a condition is passed through a production rule and gets satisfied by
the global database, then the rule is successfully applied.
● The rules are of the form A®B, where the right-hand side represents an outcome
corresponding to the problem state represented by the left-hand side.
Control System
● The control system checks the applicability of a rule. It helps decide which rule should be
applied and terminates the process when the system gives the correct output.
● It also resolves the conflict of multiple conditions arriving at the same time. The strategy
of the control system specifies the sequence of rules that compares the condition from the
global database to reach the correct result.
Simplicity
● The production rule in AI is in the form of an ‘IF-THEN’ statement. Every rule in the
production system has a unique structure.
● It helps represent knowledge and reasoning in the simplest way possible to solve
real-world problems. Also, it helps improve the readability and understanding of the
production rules.
Modularity
● The modularity of a production rule helps in its incremental improvement as the
production rule can be in discrete parts.
● The production rule is made from a collection of information and facts that may not have
dependencies unless there is a rule connecting them together.
● The addition or deletion of single information will not have a major effect on the output.
● Modularity helps enhance the performance of the production system by adjusting the
parameters of the rules.
Modifiability
● The feature of modifiability helps alter the rules as per requirements.
● Initially, the skeletal form of the production system is created.
● We then gather the requirements and make changes in the raw structure of the production
system.
● This helps in the iterative improvement of the production system.
Knowledge-intensive
● Production systems contain knowledge in the form of a human spoken language,
i.e.English.
● It is not built using any programming languages.
● The knowledge is represented in plain English sentences. Production rules help make
productive conclusions from these sentences.
● The direction in which to conduct search (forward versus backward reasoning). If the
search proceeds from the start state towards a goal state, it is a forward search or we can
also search from the goal.
● How to select applicable rules (Matching). Production systems typically spend most of
their time looking for rules to apply. So, it is critical to have efficient procedures for
matching rules against states.
● How to represent each node of the search process (knowledge representation problem).
Heuristic Search
● Heuristic search is a class of methods which is used in order to search a solution space for
an optimal solution for a problem. The heuristic here uses some method to search the
solution space while assessing where in the space the solution is most likely to be and
focusing the search on that area.
The classic example of heuristic search methods is the traveling salesman problem
Properties
● Complete: Good Generators need to be complete i.e. they should generate all the
possible solutions and cover all the possible states.
● Non Redundant: Good Generators should not yield a duplicate solution at any point of
time as it reduces the efficiency of the algorithm thereby increasing the time of search
and making the time complexity exponential.
● Informed: Good Generators have the knowledge about the search space which they
maintain in the form of an array of knowledge. This can be used to search how far the
agent is from the goal, calculate the path cost and even find a way to reach the goal.
Arrange four 6-sided cubes in a row, with each side of each cube painted one of four colors, such
that on all four sides of the row one block face of each color is shown.”
Heuristic: If there are more red faces than other colors then, when placing a block with several
red faces, use as few of them as possible as outside faces.
Problem Statement :
A salesman has a list of cities, each of which he must visit exactly once. There are direct roads
between each pair of cities on the list. Find the route the salesman should follow for the shortest
possible round trip that both starts and finishes at any one of the cities.
Rules :
● Travelers need to visit n cities.
● Know the distance between each pair of cities.
● Want to know the shortest route that visits all the cities at once.
Heuristics for the TSP is mainly speed and closeness to optimal solutions.
Possible Solutions :
Path Length
ABCD 19
ABDC 18
ACBD 12
ACDB 13
ADBC 16
Hill Climbing
● Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best solution to
the problem. It terminates when it reaches a peak value where no neighbor has a higher
value.
● It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
● Depends on Heuristic.
Algorithm
● Evaluate the initial state, if it is a goal state then return success and Stop.
● Loop Until a solution is found or there is no new operator left to apply.
● Select and apply an operator to the current state.
Global Maximum: Global maximum is the best possible state of the state space landscape. It
has the highest value of objective function.
Flat local maximum: It is a flat space in the landscape where all the neighbor states of current
states have the same value.
10
Local Maximum: A local maximum is a peak state in the landscape which is better than each of
its neighboring states, but there is another state also present which is higher than the local
maximum.
Flat local maximum/Plateau:A plateau is the flat area of the search space in which all the
neighboring states of the current state contain the same value, because this algorithm does not
find any best direction to move. A hill-climbing search might be lost in the plateau area.
Ridge:It is a region that is higher than its neighbors but itself has a slope. It is a special kind of
local maximum.
11
Advantages
● Memory Efficient
● Helpful in solving pure optimization problems, where the focus is on finding the best
state.
● Hill Climbing technique can be used to solve many problems, where the current state
allows for an accurate evaluation function, such as Network-Flow, Traveling Salesman
problem, 8-Queens problem, Integrated Circuit design, etc.
12
● On each iteration, each node n with the lowest heuristic value is expanded and generates
all its successors and n is placed to the closed list. The algorithm continues until a goal
state is found.
In the informed search there are two algorithms
● Best First Search Algorithm(Greedy search)
● A* Search Algorithm
Best-first Search Algorithm (Greedy Search):
● Greedy best-first search algorithm always selects the path which appears best at that
moment. It is the combination of depth-first search and breadth-first search algorithms.
● It uses the heuristic function and search. Best-first search allows us to take the advantages
of both algorithms. With the help of best-first search, at each step, we can choose the
most promising node.
● In the best first search algorithm, we expand the node which is closest to the goal node
and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
● Were, h(n)= estimated cost from node n to the goal.
● The greedy best first algorithm is implemented by the priority queue.
Best first search algorithm:
Step 1: Place the starting node into the OPEN list.
Step 2: If the OPEN list is empty, Stop and return failure.
Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
Step 4: Expand the node n, and generate the successors of node n.
Step 5: Check each successor of node n, and find whether any node is a goal node or not.
If any successor node is a goal node, then return success and terminate the search, else
proceed to Step 6.
Step 6: For each successor node, the algorithm checks for evaluation function f(n), and
then checks if the node has been in either OPEN or CLOSED list. If the node has not
been in both lists, then add it to the OPEN list.
Step 7: Return to Step 2.
Advantages:
● Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
● This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
● It can behave as an unguided depth-first search in the worst case scenario.
13
Example:
14
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stop.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute the
evaluation function for n' and place it into the Open list.
15
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Advantages:
● A* search algorithm is the best algorithm than other search algorithms.
● A* search algorithm is optimal and complete.
● This algorithm can solve very complex problems.
Disadvantages:
● It does not always produce the shortest path as it is mostly based on heuristics and
approximation.
● A* search algorithm has some complexity issues.
● The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
Example:
Solution:
16
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost
6.
● A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
● The efficiency of A* algorithm depends on the quality of heuristic.
● A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:
● Branching factor is finite.
● Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
● Admissible: the first condition requires for optimality is that h(n) should be an admissible
heuristic for A* tree search. An admissible heuristic is optimistic in nature.
● Consistency: Second required condition is consistency for only A* graph-search.
● If the heuristic function is admissible, then A* tree search will always find the least cost
path.
17
Algorithm
18
where,
● g(n): The actual cost of traversal from initial state to the current state.
● h(n): The estimated cost of traversal from the current state to the goal state.
● f(n): The actual cost of traversal from the initial state to the goal state.
Example-
19
Here, in the above example all numbers in brackets are the heuristic value i.e h(n). Each edge is
considered to have a value of 1 by default.
Step-1
f(A-B) = g(B) + h(B) = 1+4= 5 , where 1 is the default cost value of traveling from A to B and 4
is the estimated cost from B to Goal state.
f(A-C-D) = g(C) + h(C) + g(D) + h(D) = 1+2+1+3 = 7 , here we are calculating the path cost as
both C and D because they have the AND-Arc. The default cost value of traveling from A-C is 1,
and from A-D is 1, but the heuristic value given for C and D are 2 and 3 respectively hence
making the cost as 7.
20
Step-2
Using the same formula as step-1, the path is now calculated from the B node,
f(B-E) = 1 + 6 = 7.
f(B-F) = 1 + 8 = 9
Hence, the B-E path has lesser cost. Now the heuristics have to be updated since there is a
difference between actual and heuristic value of B. The minimum cost path is chosen and is
updated as the heuristic , in our case the value is 7. And because of the change in heuristic of B
there is also change in heuristic of A which is to be calculated again.
21
Step-3
Comparing the path of f(A-B) and f(A-C-D) it is seen that f(A-C-D) is smaller. Hence f(A-C-D)
needs to be explored.
Now the current node becomes C node and the cost of the path is calculated,
f(C-G) = 1+2 = 3
f(C-H-I) = 1+0+1+0 = 2
f(C-H-I) is chosen as the minimum cost path,also there is no change in heuristic since it matches
the actual cost. Heuristic of the paths of H and I are 0 and hence they are solved, but Path A-D
also needs to be calculated , since it has an AND-arc.
f(D-J) = 1+0 = 1, hence heuristic of D needs to be updated to 1. And finally the f(A-C-D) needs
to be updated.
22
Constraint Satisfaction
● Constraint satisfaction technique. By the name, it is understood that constraint
satisfaction means solving a problem under certain constraints or rules.
● Constraint satisfaction is a technique where a problem is solved when its values satisfy
certain constraints or rules of the problem. This type of technique leads to a deeper
understanding of the problem structure as well as its complexity.
● X: It is a set of variables.
● D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
● C: It is a set of constraints which are followed by a set of variables.
● The constraint value consists of a pair of {scope, rel}. The scope is a tuple of variables
which participate in the constraint and rel is a relation which includes a list of values
which the variables can take to satisfy the constraints of the problem.
● A state-space
● The notion of the solution.
23
● Consistent or Legal Assignment: An assignment which does not violate any constraint or
rule is called Consistent or legal assignment.
● Complete Assignment: An assignment where every variable is assigned with a value, and
the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
● Partial Assignment: An assignment which assigns values to some of the variables only.
Such types of assignments are called Partial assignments.
● Discrete Domain: It is an infinite domain which can have one state for multiple variables.
For example, a start state can be allocated infinite times for each variable.
● Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.
● Unary Constraints: It is the simplest type of constraints that restricts the value of a single
variable.
● Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
● Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
Some special types of solution algorithms are used to solve the following types of constraints:
● Linear Constraints: These types of constraints are commonly used in linear programming
where each variable containing an integer value exists in linear form only.
● Non-linear Constraints: These types of constraints are used in nonlinear programming
where each variable (an integer value) exists in a non-linear form.
24
Means-Ends Analysis
● We have studied the strategies which can reason either in forward or backward, but a
mixture of the two directions is appropriate for solving a complex and large problem.
Such a mixed strategy makes it possible to first solve the major part of a problem and
then go back and solve the small problems that arise during combining the big parts of
the problem. Such a technique is called Means-Ends Analysis.
● Means-Ends Analysis is problem-solving techniques used in Artificial intelligence for
limiting search in AI programs.
● It is a mixture of Backward and forward search techniques.
● The MEA technique was first introduced in 1961 by Allen Newell, and Herbert A. Simon
in their problem-solving computer program, which was named as General Problem
Solver (GPS).
● The MEA analysis process centered on the evaluation of the difference between the
current state and goal state.
● The means-ends analysis process can be applied recursively for a problem. It is a strategy
to control search in problem-solving. Following are the main Steps which describe the
working of MEA techniques for solving a problem.
a. First, evaluate the difference between Initial State and final State.
b. Select the various operators which can be applied for each difference.
c. Apply the operator at each difference, which reduces the difference between the current
state and goal state.
Operator Subgoaling
● In the MEA process, we detect the differences between the current state and goal state.
Once these differences occur, then we can apply an operator to reduce the differences.
But sometimes it is possible that an operator cannot be applied to the current state. So we
create the subproblem of the current state, in which operator can be applied, such a type
25
of backward chaining in which operators are selected, and then sub goals are set up to
establish the preconditions of the operator is called Operator Subgoaling.
Algorithm
● Current state as CURRENT and Goal State as GOAL, then following are the steps for the
MEA algorithm.
● Step 1: Compare CURRENT to GOAL, if there are no differences between both then
return Success and Exit.
● Step 2: Else, select the most significant difference and reduce it by doing the following
steps until the success or failure occurs.
a. Select a new operator O which is applicable for the current difference, and if there
is no such operator, then signal failure.
b. Attempt to apply operator O to CURRENT. Make a description of two states.
i) O-Start, a state in which O?s preconditions are satisfied.
ii) O-Result, the state that would result if O were applied In O-start.
c. If
(First-Part <------ MEA (CURRENT, O-START)
And
(LAST-Part <----- MEA (O-Result, GOAL), are successful, then signal Success
and return the result of combining FIRST-PART, O, and LAST-PART
Let's take an example where we know the initial state and goal state as given below. In this
problem, we need to get the goal state by finding differences between the initial state and goal
state and applying operators.
Solution:
To solve the above problem, we will first find the differences between initial states and goal
states, and for each difference, we will generate a new state and will apply the operators. The
operators we have for this problem are:
26
● Move
● Delete
● Expand
1. Evaluating the initial state: In the first step, we will evaluate the initial state and will
compare the initial and Goal state to find the differences between both states.
2. Applying Delete operator: As we can check the first difference is that in the goal state there
is no dot symbol which is present in the initial state, so, first we will apply the Delete operator
to remove this dot.
3. Applying Move Operator: After applying the Delete operator, the new state occurs which we
will again compare with the goal state. After comparing these states, there is another difference
that is the square is outside the circle, so we will apply the Move Operator.
27
4. Applying Expand Operator: Now a new state is generated in the third step, and we will
compare this state with the goal state. After comparing the states there is still one difference
which is the size of the square, so, we will apply the Expand operator, and finally, it will
generate the goal state.
28
PREPARED BY
PROF. VISHVA UPADHYAY
Types of Knowledge
Declarative Knowledge:
● Declarative knowledge is to know about something.
● It includes concepts, facts, and objects.
● It is also called descriptive knowledge and expressed in declarative sentences.
● It is simpler than procedural language.
●
Procedural Knowledge
● It is also known as imperative knowledge.
Meta-knowledge:
● Knowledge about the other types of knowledge is called Meta-knowledge.
Heuristic knowledge:
● Heuristic knowledge is representing knowledge of some experts in a filed or subject.
● Heuristic knowledge is rules of thumb based on previous experiences, awareness of
approaches, and which are good to work but not guaranteed.
Structural knowledge:
● Structural knowledge is basic knowledge of problem-solving.
● It describes relationships between various concepts such as kind of, part of, and grouping
of something.
● It describes the relationship that exists between concepts or objects.
Player1 65 23
Player2 58 18
Player3 75 24
2. Inheritable knowledge:
● In the inheritable knowledge approach, all data must be stored into a hierarchy of classes.
● All classes should be arranged in a generalized form or a hierarchical manner.
● In this approach, we apply inheritance property.
● Elements inherit values from other members of a class.
● This approach contains inheritable knowledge which shows a relation between instance
and class, and it is called instance relation.
● Every individual frame can represent the collection of attributes and its value.
● In this approach, objects and values are represented in Boxed nodes.
● We use Arrows which point from objects to their values.
● Example:
3. Inferential knowledge:
● Inferential knowledge approach represents knowledge in the form of formal logics.
● This approach can be used to derive more facts.
● It guaranteed correctness.
● Example: Let's suppose there are two statements:
a. Marcus is a man
man(Marcus)
∀x = man (x) ----------> mortal (x)s
4. Procedural knowledge:
● Procedural knowledge approach uses small programs and codes which describe how to
do specific things, and how to proceed.
● In this approach, one important rule is used which is the If-Then rule.
● With this knowledge, we can use various coding languages such as LISP language( list
processing) and Prolog language.
● We can easily represent heuristic or domain-specific knowledge using this approach.
● But it is not necessary that we can represent all cases in this approach.
a. Atomic Propositions
b. Compound propositions
Example:
Example:
Logical Connectives:
Negation ¬ P
● It represents a Negative condition. P is a positive statement, and ¬ P indicates NOT
condition. Example: Today is Monday (P), Today is not a Monday (¬ P)
Conjunction P ^ Q
● It joins two statements P, Q with the AND clause.
● Example: Ram is a cricket player (P). Ram is a Hockey player (Q). Ram plays both
cricket and Hockey is represented by (P ^ Q)
● P= Rohan is intelligent, Q= Rohan is hardworking. → P∧ Q.
Disjunction P v Q
● It joins two statements P, Q with OR Clause.
● Example: Ram leaves for Mumbai (P) and Ram leaves for Chennai (Q). Ram leaves for
Chennai or Mumbai is represented by (P v Q). In this complex statement, at any given
point of time if P is True Q is not true and vice versa.
● P= Ritika is a Doctor. Q= Ritika is Engineer, so we can write it as
P ∨ Q.
Implication P → Q
● Sentence (Q) is dependent on sentence (P), and it is called implication. It follows the rule
of If then clause. If sentence P is true, then sentence Q is true. The condition is
unidirectional.
● Example: If it is Sunday (P) then I will go to Movie (Q), and it is represented as P →
Q
● P= It is raining, and Q= Street is wet, so it is represented as P → Q.
Bi-conditional P ⇔ Q
● Sentence (Q) is dependent on sentence (P), and vice versa and conditions are
bi-directional in this connective. If a conditional statement and its converse are true,
then it is called as bi-conditional connective (Implication condition in both the
directions P → Q and Q → P). If and only if all conditions are true, then the end
statement is true.
● P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Truth Table
Precedence of connectives:
● Just like arithmetic operators, there is a precedence order for propositional connectors or
logical operators. This order should be followed while evaluating a propositional
problem.
Precedence Operators
Variables x, y, z, a, b,....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic sentences:
● Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.
● We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
Chinky is a cat: => cat (Chinky).
Complex Sentences:
● Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
● Subject: Subject is the main part of the statement.
● Predicate: A predicate can be defined as a relation, which binds two atoms together in a
statement.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of
the statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which
resembles an inverted A.
Example:
All men drink coffee.
Let a variable x which refers to a man so all x can be represented
∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drinks coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
10
Example:
Some boys are intelligent.
Properties of Quantifiers:
● In universal quantifier, ∀x∀y is similar to ∀y∀x.
● In Existential quantifier, ∃x∃y is similar to ∃y∃x.
● ∃x∀y is not similar to ∀y∃x.
●
Some Examples of FOL using quantifier:
1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly, it will be represented as follows.
∀x bird(x) →fly(x).
11
Propositional Logic
Predicate Logic
12
Propositions are
combined with Logical
Operators or Logical
Connectives like
Negation(¬),
Disjunction(∨), Predicate Logic adds by
introducing quantifiers to the
Conjunction(∧),
existing proposition.
Exclusive OR(⊕),
Implication(⇒),
Bi-Conditional or
Double
Implication(⇔).
13
Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
14
15
16
Hence the negation of the conclusion has been proved as a complete contradiction with the given
set of statements.
17
● The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known
facts. The first inference engine was part of the expert system. Inference engine
commonly proceeds in two modes, which are:
a. Forward chaining
b. Backward chaining
Forward Chaining
● Forward chaining is also known as a forward deduction or forward reasoning method
when using an inference engine.
● The Forward-chaining algorithm starts from known facts, triggers all rules whose
premises are satisfied, and adds their conclusion to the known facts. This process repeats
until the problem is solved.
Properties of Forward-Chaining:
18
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an
enemy of America, has some missiles, and all the missiles were sold to it by Robert, who is an
American citizen."
Prove that "Robert is a criminal."
19
In the first step we will start with the known facts and will choose the sentences which do not
have implications, such as: American(Robert), Enemy(A, America), Owns(A, T1), and
Missile(T1). All these facts will be represented as below.
Step-2:
At the second step, we will see those facts which infer from available facts and with satisfied
premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfies with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers
from the conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from
Rule-(7).
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we
can add Criminal(Robert) which infers all the available facts. And hence we reached our goal
statement.
20
Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method when
using an inference engine. A backward chaining algorithm is a form of reasoning, which starts
with the goal and works backward, chaining through rules to find known facts that support the
goal.
Properties of backward chaining:
● It is known as a top-down approach.
● Backward-chaining is based on modus ponens inference rule.
● In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts
true.
● It is called a goal-driven approach, as a list of goals decides which rules are selected and
used.
● Backward -chaining algorithm is used in game theory, automated theorem proving tools,
inference engines, proof assistants, and various AI applications.
● The backward-chaining method mostly used a depth-first search strategy for proof.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
● American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) →
Criminal(p) ...(1)
Owns(A, T1) ........(2)
● Missile(T1)
● ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
21
Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then
infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and
at last, we will prove those facts true. So our goal fact is "Robert is Criminal," so following is the
predicate of it.
Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can
see in Rule-1, the goal predicate Criminal (Robert) is present with substitution {Robert/P}. So
we will add all the conjunctive facts below the first level and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.
Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it
satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at q.
22
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which
satisfies the Rule- 4, with the substitution of A in place of r. So these two statements are proved
here.
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6.
And hence all the statements are proved true using backward chaining.
23
Forward chaining starts from known Backward chaining starts from the goal
facts and applies inference rules to and works backward through inference
extract more data unit it reaches to rules to find the required facts that
the goal. support the goal.
24
Forward chaining tests for all the Backward chaining only tests for few
available rules required rules.
Forward chaining is aimed for any Backward chaining is only aimed for
conclusion. the required data.
25
PREPARED BY
PROF. VISHVA UPADHYAY
Monotonic Reasoning:
● In monotonic reasoning, once the conclusion is taken, then it will remain the same even if
we add some other information to existing information in our knowledge base.
● In monotonic reasoning, adding knowledge does not decrease the set of prepositions that
can be derived.
● To solve monotonic problems, we can derive the valid conclusion from the available facts
only, and it will not be affected by new facts.
● Any theorem proving is an example of monotonic reasoning.
Example:
● Earth revolves around the Sun.
● It is a true fact, and it cannot be changed even if we add another sentence in the
knowledge base like, "The moon revolves around the earth" Or "Earth is not round," etc.
Non-monotonic Reasoning
Example:
● Birds can fly
● Penguins cannot fly
● Pitty is a bird
● So from the above sentences, we can conclude that Pitty can fly.
● However, if we add one another sentence into the knowledge base "Pitty is a penguin",
which concludes "Pitty cannot fly", it invalidates the above conclusion.
PREPARED BY
PROF. VISHVA UPADHYAY
Uncertainty:
knowledge representation as first-order logic and propositional logic with certainty, which
means we were sure about the predicates. With this knowledge representation, we might
write A→B, which means if A is true then B is true, but consider a situation where we are
not sure about whether A is true or not then we cannot express this statement, this situation
is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
● Probabilistic reasoning is a way of knowledge representation where we apply the concept
of probability to indicate the uncertainty in knowledge.
● In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
● We use probability in probabilistic reasoning because it provides a way to handle the
uncertainty that is the result of someone's laziness and ignorance.
● In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which we
can assume that it will happen but are not sure about it, so here we use probabilistic
reasoning.
Probability:
● Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
Conditional probability:
● Conditional probability is a probability of occurring an event when another event has
already happened.
● Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
Example:
In a class, there are 70% of the students who like English and 40% of the students who like
English and mathematics, and then what is the percentage of students who like English and also
like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event where a student likes English.
Hence, 57% of the students who like English also like Mathematics.
Bayes' theorem:
● Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
● In probability theory, it relates the conditional probability and marginal probabilities of
two random events.
● Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian
statistics.
● It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
● Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
● Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
● Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
● The above equation (a) is called Bayes' rule or Bayes' theorem. This equation is the basis
of most modern AI systems for probabilistic inference.
● It shows the simple relationship between joint and conditional probabilities. Here, P(A|B)
is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred evidence B.
● P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
● P(A) is called the prior probability, probability of hypothesis before considering the
evidence
● P(B) is called marginal probability, pure probability of an evidence.
● In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:
Example-1:
Question: what is the probability that a patient has meningitis with a stiff neck?
Given Data:
● A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:
● Let a be the proposition that the patient has a stiff neck and b be the proposition that the
patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn
face card is a king card.
Solution:
Application
● It is used to calculate the next step of the robot when the already executed step is given.
● Bayes' theorem is helpful in weather forecasting.
● It can solve the Monty Hall problem.
● The certainty-factor model was one of the most popular models for the representation and
manipulation of uncertain knowledge in the early (1980s) Rule-based expert systems.
● The model was criticized by researchers in artificial intelligence and statistics as being
ad-hoc-in nature. Researchers and developers have stopped using the model.
● Its place has been taken by more expressive formalisms of Bayesian belief networks for
the representation and manipulation of uncertain knowledge.
● If there is a power failure then (see rules 1, 2, 3 mentioned above) Rule 3 states that there
is a pump failure, and
● Rule 1 tells that the pressure is low, and
● Rule 2 gives a (useless) recommendation to check the oil level.
● It is very difficult to control such a mixture of inference back and forth in the same
session and resolve such uncertainties.
● A problem with rule-based systems is that often the connections reflected by the rules are
not absolutely certain (i.e. deterministic), and the gathered information is often subject to
uncertainty.
● In such cases, a certainty measure is added to the premises as well as the conclusions in
the rules of the system.
● A rule then provides a function that describes : how much a change in the certainty of the
premise will change the certainty of the conclusion.
● In its simplest form, this looks like :
● If A (with certainty x) then B (with certainty f(x))
● Bayesian belief network is key computer technology for dealing with probabilistic events
and to solve a problem which has uncertainty. We can define a Bayesian network as:
● "A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph."
● It is also called a Bayes network, belief network, decision network, or Bayesian model.
● Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
● Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight, reasoning,
time series prediction, and decision making under uncertainty.
● Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
● Directed Acyclic Graph
● Table of conditional probabilities.
● The generalized form of Bayesian network that represents and solves decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
● Each node corresponds to the random variables, and a variable can be continuous or
discrete.
● Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes in the
graph.
● These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other
● In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
● If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
● Node C is independent of node A.
● Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi)
), which determines the effect of the parent on that node.
● The Bayesian network is based on Joint probability distribution and conditional
probability. So let's first understand the joint probability distribution:
Solution:
● The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm going off, but David and Sophia's calls depend on alarm
probability.
● The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
● The conditional distributions for each node are given as a conditional probabilities table
or CPT.
● Each row in the CPT must be summed to 1 because all the entries in the table represent
an exhaustive set of cases for the variable.
● In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values
● We can write the events of problem statement in the form of probability: P[D, S, A, B, E],
can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
10
● Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake did not occur
11
From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
● Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.
● The semantics of Bayesian Network:
● There are two ways to understand the semantics of the Bayesian network, which is given
below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence
statements. It is helpful in designing inference procedures.
Dempster-Shafer Theory
This theory was released because of following reason:-
12
● DST is an evidence theory, it combines all possible outcomes of the problem. Hence it is
used to solve problems where there may be a chance that a different evidence will lead to
some different result.
For eg:-
● Let us consider a room where four people are present, A, B, C and D. Suddenly the lights
go out and when the lights come back, B has been stabbed in the back by a knife, leading
to his death. No one came into the room and no one left the room. We know that B has
not committed suicide. Now we have to find out who the murderer is.
● There will be the possible evidence by which we can find the murderer by measure of
plausibility.
● Using the above example we can say: Set of possible conclusion (P): {p1, p2….pn}
● where P is a set of possible conclusions and cannot be exhaustive, i.e. at least one (p)i
must be true. (p)I must be mutually exclusive.
● The Power Set will contain 2n elements where n is the number of elements in the possible
set.
For eg:-
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 23 elements.
● Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is evidence
for {K or B} which cannot be divided among more specific beliefs for K and B.
13
● Belief in K: The belief in element K of the Power Set is the sum of masses of elements
which are subsets of K. This can be explained through an example
Advantages:
● As we add more information, the uncertainty interval reduces.
● DST has a much lower level of ignorance.
● Diagnose hierarchies can be represented using this.
● Person dealing with such problems is free to think about evidence.
Disadvantages:
● In this, computation effort is high, as we have to deal with 2n of sets.
Fuzzy Logic
● The term Fuzzy means something that is a bit vague. When a situation is vague, the
computer may not be able to produce a result that is True or False. As per Boolean Logic,
the value 1 refers to True and 0 means False. But a Fuzzy Logic algorithm considers all
the uncertainties of a problem, where there may be possible values besides True or False.
● The fuzzy logic in artificial intelligence operates on the levels of possibilities of input to
obtain the definite output. It can be executed in systems with different capabilities and
sizes, varying from tiny microcontrollers to huge, networked, workstation-centered
control systems. Furthermore, it can be executed in software, hardware, or a combination
of both.
● Similar to humans, there are many possible values between True and False that a
computer can incorporate. These can be:
● Certainly yes
● Possibly yes
● Can’t say
14
● Possibly no
● Certainly no
Problem – Is it hot outside?
Boolean Logic
Solution
● Yes (1.0)
● No (0)
According to conventional Boolean Logic, the algorithm will take a definite input and produce a
precise result Yes or No. This is represented by 0 and 1, respectively.
Fuzzy Logic
Solution
● Very hot (0.9)
● Little hot (0.20)
● Moderately hot (0.35)
● Not hot (1.0)
● As per the above example, Fuzzy Logic has a wider range of outputs, such as very hot,
moderately hot and not hot. These values between 0 and 1 display the range of
possibilities.
● So, in cases where accurate reasoning cannot be provided, Fuzzy Logic provides an
acceptable method of reasoning. An algorithm based on Fuzzy Logic takes all available
data while solving a problem. It then takes the best possible decision according to the
given input.
Rule base
● This is the set of rules along with the If-Then conditions that are used for making
decisions. But, modern developments in Fuzzy Logic have reduced the number of rules in
the rule base. These sets of rules are also called a knowledge base.
Fuzzification
● This is the step where crisp numbers are converted into fuzzy sets. A crisp set is a set of
elements that have identical properties. Based on certain logic, an element can either
belong to the set or not. Crisp sets are based on binary logic – Yes or No answers.
● Here, the error signals and physical values are converted into a normalized fuzzy subset.
In any Fuzzy Logic system, the fuzzifier separates the input signals into five states that
are:
● Large positive
15
● Medium positive
● Small
● Medium negative
● Large negative
● The fuzzification process converts crisp inputs, such as room temperature, fetched by
sensors and passes them to the control system for further processing. A Fuzzy Logic
control system is based on Fuzzy Logic. Common household appliances, such as
air-conditioners and washing machines have Fuzzy Control systems within them.
Inference Engine
● The inference engine determines how much the input values and the rules match. The
rules are applied based on the input values received. Then, the rules are used to develop
control actions. The inference engine and the knowledge base together are called a
controller in a Fuzzy Logic system.
Defuzzification
● This is the inverse process of fuzzification. Here, the fuzzy values are converted into
crisp values by mapping. There will be several defuzzification methods for doing this, but
the best one is selected as per the input. This is a complicated process where methods,
such as the maximum membership principle, weighted average method and centroid
method, are used.
Advantages
● It is a robust system where no precise inputs are required
● These systems are able to accommodate several types of inputs including vague, distorted
or imprecise data
● In case the feedback sensor stops working, you can reprogram it according to the
situation
● The Fuzzy Logic algorithms can be coded using less data, so they do not occupy a huge
memory space
● As it resembles human reasoning, these systems are able to solve complex problems
where ambiguous inputs are available and take decisions accordingly
● These systems are flexible and the rules can be modified
● The systems have a simple structure and can be constructed easily
● You can save system costs as inexpensive sensors can be accommodated by these systems
● It is easily understandable.
● It efficiently solves complex problems by enhancing its capability to accomplish
human-like decision-making and reasoning tasks.
● It deals with uncertainties in engineering.
● The fuzzy logic’s flexibility makes it easier to adapt FLS by simply adding or deleting
rules.con
16
Disadvantages
● The accuracy of these systems is compromised as the system mostly works on inaccurate
data and inputs
● There is no single systematic approach to solve a problem using Fuzzy Logic. As a result,
many solutions arise for a particular problem, leading to confusion
● Due to inaccuracy in results, they are not always widely accepted
● A major drawback of Fuzzy Logic control systems is that they are completely dependent
on human knowledge and expertise
● You have to regularly update the rules of a Fuzzy Logic control system
● These systems cannot recognize machine learning or neural networks
● The systems require a lot of testing for validation and verification
Medicine
● Controlling arterial pressure when providing anesthesia to patients
● Used in diagnostic radiology and diagnostic support systems
● Diagnosis of prostate cancer and diabetes
Transportation systems
● Handling underground train operations
● Controlling train schedules
● Braking and stopping vehicles based on parameters, such as car speed, acceleration and
wheel speed
Defense
● Locating and recognizing targets underwater
● Supports naval decision making
● Using thermal infrared images for target recognition
● Used for controlling hypervelocity interceptors
Industry
● Controlling water purification plants
● Handling problems in constraint satisfaction in structural design
● Pattern analysis for quality assurance
● Fuzzy Logic is used for tackling sludge wastewater treatment
Naval control
● Steer ships properly
● Selecting the optimal or best possible routes for reaching a destination
● Autopilot is based on Fuzzy Logic
● Autonomous underwater vehicles are controlled using Fuzzy Logic
17
PREPARED BY
PROF. VISHVA UPADHYAY
Step-1: In the first step, the algorithm generates the entire game-tree and applies the utility
function to get the utility values for the terminal states. In the below tree diagram, let's take A as
the initial state of the tree. Suppose maximizer takes the first turn which has worst-case initial
value =- infinity, and minimizer will take next turn which has worst-case initial value = +infinity.
Step 2: Now, first we find the utility value for the Maximizer, its
initial value is -∞, so we will compare each value in terminal state
with the initial value of the Maximizer and determine the higher
nodes values. It will find the maximum among them all.
● For node D max(-1,- -∞) => max(-1,4)= 4
● For Node E max(2, -∞) => max(2, 6)= 6
● For Node F max(-3, -∞) => max(-3,-5) = -3
● For node G max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare
all nodes value with +∞, and will find the 3rd layer node values.
● For node B= min(4,6) = 4
● For node C= min (-3, 7) = -3
Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes values
and find the maximum value for the root node. In this game tree, there are only 4 layers, hence
we reach immediately to the root node, but in real games, there will be more than 4 layers.
● For node A max(4, -3)= 4
That was the complete workflow of the minimax two player game.
● The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the
final decision but making the algorithm slow. Hence by pruning these nodes, it makes the
algorithm fast.
return minEva
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and
node value will also be 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, the algorithm traverses the next successor of Node B which is node E, and
the values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will
change. The current value of alpha will be compared with 5, so max
(-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not traverse it, and
the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B
to node A. At node A, the value of alpha will be changed the maximum
available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now
passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α remains 3,
but the node value of F will become 1.
10
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and
again it satisfies the condition α>=β, so the next child of C which is G will be pruned, and
the algorithm will not compute the entire subtree G.
11
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following
is the final game tree which shows the nodes which are computed and nodes which have never
computed. Hence the optimal value for the maximizer is 3 for this example.
12
Advantages:
● It Combines the benefits of BFS and DFS search algorithms in terms of fast search and
memory efficiency.
● IDDFS gives us the hope to find the solution if it exists in the tree.
13
● When the solutions are found at the lower depths say n, then the algorithm proves to be
efficient and in time.
● The great advantage of IDDFS is found in-game tree searching where the IDDFS search
operation tries to improve the depth definition, heuristics, and scores of searching nodes
so as to enable efficiency in the search algorithm.
● Another major advantage of the IDDFS algorithm is its quick responsiveness. The early
results indications are a plus point in this algorithm. This is followed up with multiple
refinements after the individual iteration is completed.
Disadvantages:
● The main drawback of IDDFS is that it repeats all the work of the previous phase.
● The time taken is exponential to reach the goal node.
● The main problem with IDDFS is the time and wasted calculations that take place at each
depth.
● The situation is not as bad as we may think, especially when the branching factor is found
to be high.
● The IDDFS might fail when the BFS fails. When we are to find multiple answers from
the IDDFS, it gives back the success nodes and its path once even if it needs to be found
again after multiple iterations. To stop the depth bound is not increased further.
Example:
● Following tree structure shows the iterative deepening depth-first search. The IDDFS
algorithm performs various iterations until it does not find the goal node. The iteration
performed by the algorithm is given as:
●
● 1'st Iteration-----> A
14
● 2'nd Iteration----> A, B, C
● 3'rd Iteration------>A, B, D, E, C, F, G
● 4'th Iteration------>A, B, D, H, I, E, C, F, K, G
● In the fourth iteration, the algorithm will find the goal node.
Completeness:
● This algorithm is complete if the branching factor is finite.
Time Complexity:
● Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).
Space Complexity:
● The space complexity of IDDFS will be O(bd).
Optimal:
● The IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of
the node.
15
UNIT 7 Planning
PREPARED BY
PROF. VISHVA UPADHYAY
What is Planning
● Planning in artificial intelligence is about decision-making actions performed by robots
or computer programs to achieve a specific goal.
● Execution of the plan is about choosing a sequence of tasks with a high probability of
accomplishing a specific task.
Goal : To change the configuration of the blocks from the Initial State to the Goal State
Predicates can be thought of as a statement which helps us convey the information about a
configuration in Blocks World.
1. ON(A,B) : Block A is on B
2. ONTABLE(A) : A is on table
3. CLEAR(A) : Nothing is on top of A
4. HOLDING(A) : Arm is holding A.
5. ARMEMPTY : Arm is holding nothing
Using these predicates, we represent the Initial State and the Goal State
The effect of these operations is represented using two lists ADD and DELETE. DELETE List
contains the predicates which will cease to be true once the operation is performed. ADD List on
the other hand contains the predicates which will become true once the operation is performed.
The Precondition
For example, to perform the STACK(X,Y) operation i.e. to Stack Block X on top of Block Y, No
other block should be on top of Y (CLEAR(Y)) and the Robot Arm should be holding the Block
X (HOLDING(X)).
Once the operation is performed, these predicates will cease to be true, thus they are included in
the DELETE List as well.
On the other hand, once the operation is performed, The robot arm will be free (ARMEMPTY)
and block X will be on top of Y (ON(X,Y)).
Solution :
Steps = [PICKUP(C), PUTDOWN(C), UNSTACK(B,A), PUTDOWN(B), PICKUP(C),
STACK(C,A), PICKUP(B), STACK(B,D)]
Hierarchical Planning
● Hierarchical Planning is an Artificial Intelligence (AI) problem solving approach for a
certain kind of planning problems -- the kind focusing on problem decomposition, where
problems are step-wise refined into smaller and smaller ones until the problem is finally
solved.
● A solution hereby is a sequence of actions that's executable in a given initial state (and a
refinement of the initial compound tasks that needed to be refined). This form of
hierarchical planning is usually referred to as Hierarchical Task Network (HTN) planning
● Example : One level planner
● Planning for ”Going to Goa this Cristmas”
Switch on computer
Start web browser
Open Indian Railways website
Select date
Select class
Select train
so on
● Practical problems are too complex to be solved at one level
● Here we use Hierarchy in Planning
● Hierarchy of actions
● In terms of major action or minor action
● Lower level activities would detail more precise steps for accomplishing the higher level
tasks.
Example by Hierarchy Planning
Major Steps :
Hotel Booking
Ticket Booking
Reaching Goa
Staying and enjoying there
Coming Back
Minor Steps :
Take a taxi to reach station / airport
Have dinner on beach
Take photos
Reactive Systems
● Reactive machines are the most basic type of unsupervised AI. This means that they
cannot form memories or use past experiences to influence present-made decisions; they
can only react to currently existing situations – hence “reactive.”
● Reactive machines have no concept of the world and therefore cannot function beyond
the simple tasks for which they are programmed. A characteristic of reactive machines is
that no matter the time or place, these machines will always behave the way they were
programmed. There is no growth with reactive machines, only stagnation in recurring
actions and behaviors.
● Example of reactive machines
● An existing form of a reactive machine is Deep Blue, a chess-playing supercomputer
created by IBM in the mid-1980s.
● Deep Blue was created to play chess against a human competitor with the intent to defeat
the competitor. It was programmed to identify a chess board and its pieces while
understanding and predicting the moves of each piece. In a series of matches played
between 1996 and 1997, Deep Blue defeated Russian chess grandmaster Garry Kasparov
3½ to 2½ games, becoming the first computerized device to defeat a human opponent.
● Deep Blue’s unique skill of accurately and successfully playing chess matches
highlighted its reactive abilities. In the same vein, its reactive mind also indicates that it
has no concept of the past or future; it only comprehends and acts on the
presently-existing world and its components within it. To simplify, reactive machines are
programmed for the here and now, but not the before and after.
● Limited memory
PREPARED BY
PROF. VISHVA UPADHYAY
● For example, “He lifted the beetle with red cap.” − Did he
use cap to lift the beetle or he lifted a beetle that had red
cap?
NLP Terminology
● Phonology − It is the study of organizing sound
systematically.
● Morphology − It is a study of construction of words from
primitive meaningful units.
● Morpheme − It is a primitive unit of meaning in a language.
● Syntax − It refers to arranging words to make a sentence.
It also involves determining the structural role of words
in the sentence and in phrases.
● Semantics − It is concerned with the meaning of words and
how to combine words into meaningful phrases and
sentences.
● Pragmatics − It deals with using and understanding
sentences in different situations and how the
interpretation of the sentence is affected.
● Discourse − It deals with how the immediately preceding
sentence can affect the interpretation of the next
sentence.
● World Knowledge − It includes general knowledge about the
world.
Steps in NLP
● Lexical Analysis − It involves identifying and analyzing
the structure of words. Lexicon of a language means the
collection of words and phrases in a language. Lexical
Context-Free Grammar
● It is the grammar that consists rules with a single symbol on
the left-hand side of the rewrite rules. Let us create grammar
to parse a sentence −
● “The bird pecks the grains”
● Articles (DET) − a | an | the
● Nouns − bird | birds | grain | grains
● Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
● = DET N | DET ADJ N
● Verbs − pecks | pecking | pecked
● Verb Phrase (VP) − NP V | V NP
● Adjectives (ADJ) − beautiful | small | chirping
● The parse tree breaks down the sentence into structured parts so that the computer can
easily understand and process it. In order for the parsing algorithm to construct this parse
tree, a set of rewrite rules, which describe what tree structures are legal, need to be
constructed.
● These rules say that a certain symbol may be expanded in the
tree by a sequence of other symbols. According to first order
logic rule, if there are two strings Noun Phrase (NP) and Verb
Phrase (VP), then the string combined by NP followed by VP is a
sentence. The rewrite rules for the sentence are as follows −
● S → NP VP
● NP → DET N | DET ADJ N
● VP → V NP
● Lexocon −
● DET → a | the
● ADJ → beautiful | perching
● N → bird | birds | grain | grains
● V → peck | pecks | pecking
● The parse tree can be created as shown −
● Now consider the above rewrite rules. Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the
subject-verb agreement error is approved as correct.
● Merit − The simplest style of grammar, therefore widely used.
Demerits −
● They are not highly precise. For example, “The grains peck the bird”, is
syntactically correct according to the parser, but even if it makes no sense, the
parser takes it as a correct sentence.
● To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural
variations, passive sentences, etc., which can lead to creation of huge set of rules
that are unmanageable.
Top-Down Parser
● Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence until it
consists entirely of terminal symbols.
● These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is
found which describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
● It is inefficient, as the search process has to be repeated if an error occurs.
● Slow speed of working.
Semantic Analysis
● Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to
understand the meaning of Natural Language
Parts of Semantic Analysis
Semantic Analysis of Natural Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the meaning
of each word of the text individually. It basically refers to fetching the dictionary meaning that a
word in the text is deputed to carry.
2. Compositional Semantics Analysis: Although knowing the meaning of each word of the text
is essential, it is not sufficient to completely understand the meaning of the text.
Tasks involved in Semantic Analysis
In order to understand the meaning of a sentence, the following are the major processes involved
in Semantic Analysis:
1. Word Sense Disambiguation
2. Relationship Extraction
Relationship Extraction:
● Homonymy: Homonymy refers to two or more lexical terms with the same spellings
but completely distinct in meaning. For example: ‘Rose‘ might mean ‘the past form
of rise‘ or ‘a flower‘, – same spelling but different meanings; hence, ‘rose‘ is a
homonymy.
● Synonymy: When two or more lexical terms that might be spelt distinctly have the
same or similar meaning, they are called Synonymy. For example: (Job, Occupation),
(Large, Big), (Stop, Halt).
● Polysemy: Polysemy refers to lexical terms that have the same spelling but multiple
closely related meanings. It differs from homonymy because the meanings of the
terms need not be closely related in the case of homonymy. For example: ‘man‘ may
mean ‘the human species‘ or ‘a male human‘ or ‘an adult male human‘ – since all
these different meanings bear a close association, the lexical term ‘man‘ is a
polysemy.
For example:
● In Sentiment Analysis, we try to label the text with the prominent emotion they
convey. It is highly beneficial when analyzing customer reviews for improvement.
● In Topic Classification, we try to categorize our text into some predefined categories.
For example: Identifying whether a research paper is of Physics, Chemistry or Maths
● In Intent Classification, we try to determine the intent behind a text message. For
example: Identifying whether an e-mail received at customer care service is a query,
complaint or request.
Text Extraction
In-Text Extraction, we aim at obtaining specific information from our text.
For Example,
● In Keyword Extraction, we try to obtain the essential words that define the entire
document.
● In Entity Extraction, we try to obtain all the entities involved in a document.
Spell Checking
● A spell checker is an application, program or a function of a program which determines
the correctness of the spelling of a given word based on the language set being used. It
can either be a standalone program or part of a larger program which operates on blocks
of text such as a word processor, search engine or an email client.
● A spell checker is also known as spell check.
PREPARED BY
PROF. VISHVA UPADHYAY
Hopfield Network
● Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative processes. It has just one
layer of neurons relating to the size of the input and output, which must be the same
● When such a network recognizes, for example, digits, we present a list of correctly
rendered digits to the network. Subsequently, the network can transform a noise input to
the related perfect output.
● A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is
symmetric wij = wji .
● With zero self-connectivity, Wii =0 is given below. Here, the given three neurons having
values i = 1, 2, 3 with values Xi=±1 have connectivity weight Wij.
Updating rule:
Synchronously:
In this approach, the update of all the nodes taking place simultaneously at each time.
Asynchronously:
In this approach, at each point of time, update one node chosen randomly or according to some
rule. Asynchronous updating is more biologically realistic.
Example:
w12 = w21 = 1
w12= w21 = -1
Asynchronous updating:
In the first case, there are two attracting fixed points termed as [-1,-1] and [-1,-1]. All orbit
converges to one of these. For a second, the fixed points are [-1,1] and [1,-1], and all orbits are
joined through one of these. For any fixed point, swapping all the signs gives another fixed point.
Synchronous updating:
In the first and second cases, although there are fixed points, none can be attracted to nearby
points, i.e., they are not attracting fixed points. Some orbits oscillate forever.
● Here, we need to update Xm to X'm and denote the new energy by E' and show that.
● E'-E = (Xm-X'm ) ∑i≠mWmiXi.
● Using the above equation, if Xm = Xm' then we have E' = E
● If Xm = -1 and Xm' = 1 , then Xm - Xm' = 2 and hm= ∑iWmiXi ? 0
● Thus, E' - E ≤ 0
● Similarly if Xm =1 and Xm'= -1 then Xm - Xm' = 2 and hm= ∑iWmiXi < 0
● Thus, E - E' < 0.
● Suppose the connection weight Wij = Wji between two neurons I and j.
● If Wij > 0, the updating rule implies:
● If Xj = 1, then the contribution of j in the weighted sum, i.e., WijXj, is positive. Thus the
value of Xi is pulled by j towards its value Xj= 1
● If Xj= -1 then WijXj , is negative, and Xi is again pulled by j towards its value Xj = -1
● Thus, if Wij > 0 , then the value of i is pulled by the value of j. By symmetry, the value of
j is also pulled by the value of i.
● If Wij < 0, then the value of i is pushed away by the value of j.
● It follows that for a particular set of values Xi ∈ { -1 , 1 } for;
● 1 ≤ i ≤ N, the selection of weights taken as Wij = XiXj for;
● 1 ≤ i ≤ N correlates to the Hebbian rule.
● If we select Wij =ɳ XiXj for 1 ≤ i , j ≤ N (Here, i≠j), where ɳ > 0 is the learning rate,
then the value of Xi will not change under updating condition as we illustrate below.
● We have
●
● It implies that the value of Xi, whether 1 or -1 will not change, so that x→ is a fixed point.
● Note that - x→ also becomes a fixed point when we train the network with x→ validating
that Hopfield networks are sign blind.
Neural Network
● A neural network is a method in artificial intelligence that teaches computers to process
data in a way that is inspired by the human brain.
● It is a type of machine learning process, called deep learning, that uses interconnected
nodes or neurons in a layered structure that resembles the human brain.
● It creates an adaptive system that computers use to learn from their mistakes and improve
continuously. Thus, artificial neural networks attempt to solve complicated problems, like
summarizing documents or recognizing faces, with greater accuracy.
Computer vision is the ability of computers to extract information and insights from images and
videos. With neural networks, computers can distinguish and recognize images similar to
humans. Computer vision has several applications, such as the following:
● Visual recognition in self-driving cars so they can recognize road signs and other road
users
● Content moderation to automatically remove unsafe or inappropriate content from image
and video archives
● Facial recognition to identify faces and recognize attributes like open eyes, glasses, and
facial hair
● Image labeling to identify brand logos, clothing, safety gear, and other image details
Speech recognition
Neural networks can analyze human speech despite varying speech patterns, pitch, tone,
language, and accent. Virtual assistants like Amazon Alexa and automatic transcription software
use speech recognition to do tasks like these:
Natural language processing (NLP) is the ability to process natural, human-created text. Neural
networks help computers gather insights and meaning from text data and documents. NLP has
several use cases, including in these functions:
Input Layer
Information from the outside world enters the artificial neural network from the input layer. Input
nodes process the data, analyze or categorize it, and pass it on to the next layer.
Hidden Layer
Hidden layers take their input from the input layer or other hidden layers. Artificial neural
networks can have a large number of hidden layers. Each hidden layer analyzes the output from
the previous layer, processes it further, and passes it on to the next layer.
Output Layer
The output layer gives the final result of all the data processing by the artificial neural network. It
can have single or multiple nodes. For instance, if we have a binary (yes/no) classification
problem, the output layer will have one output node, which will give the result as 1 or 0.
However, if we have a multi-class classification problem, the output layer might consist of more
than one output node.
Recurrent Networks
● Recurrent Neural Network(RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the
inputs and outputs are independent of each other, but in cases like when it is required to
predict the next word of a sentence, the previous words are required and hence there is a
need to remember the previous words.
● Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
The main and most important feature of RNN is the Hidden state, which remembers some
information about a sequence.
● RNN has a memory which remembers all information about what has been calculated. It
uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output. This reduces the complexity of parameters, unlike
other neural networks.
Working of RNN
● Example: Suppose there is a deeper network with one input layer, three hidden layers,
and one output layer. Then like other neural networks, each hidden layer will have its
own set of weights and biases, let’s say, for hidden layer 1 the weights and biases are (w1,
b1), (w2, b2) for the second hidden layer, and (w3, b3) for the third hidden layer. This
means that each of these layers is independent of the other, i.e. they do not memorize the
previous outputs.
Now the RNN will do the following:
● RNN converts the independent activations into dependent activations by providing
the same weights and biases to all the layers, thus reducing the complexity of
increasing parameters and memorizing each previous output by giving each output as
input to the next hidden layer.
● Hence these three layers can be joined together such that the weights and bias of all
the hidden layers are the same, in a single recurrent layer.
where:
whh -> weight at recurrent neuron
wxh -> weight at input neuron
Distributed Representations
10
11
PREPARED BY
PROF. VISHVA UPADHYAY
● Transferring knowledge from the human expert to a computer is often the most difficult
part of building an expert system.
● The knowledge acquired from the human expert must be encoded in such a way that it
remains a faithful representation of what the expert knows, and it can be manipulated by
a computer.
● Three common methods of knowledge representation evolved over the years are
IF-THEN rules, Semantic networks and Frames.
1. IF-THEN rules
Rules "if-then" are predominant form of encoding knowledge in expert systems. These are of
the form :
If a1 , a2 , . . . . . , an
Then b1 , b2 , . . . . . , bn where
each ai is a condition or situation, and
each bi is an action or a conclusion.
2. Semantic Networks
3. Frames
● In this technique, knowledge is decomposed into highly modular pieces called frames,
which are generalized record structures. Knowledge consist of concepts, situations,
attributes of concepts, relationships between concepts, and procedures to handle
relationships as well as attribute values.
● Each concept may be represented as a separate frame.
● The attributes, the relationships between concepts, and the procedures are allotted to
slots in a frame.
● The contents of a slot may be of any data type - numbers, strings, functions or procedures
and so on.
● The frames may be linked to other frames, providing the same kind of inheritance as that
provided by a semantic network.
● A frame-based representation is ideally suited for object-oriented programming
techniques. An example of Frame-based representation of knowledge is shown in the
next slide.
Working Memory
● Working memory refers to task-specific data for a problem. The contents of the working
memory changes with each problem situation. Consequently, it is the most dynamic
component of an expert system, assuming that it is kept current.
● Every problem in a domain has some unique data associated with it.
● Data may consist of the set of conditions leading to the problem, its parameters and so on.
● Data specific to the problem needs to be input by the user at the time of using, which
means consulting the expert system. The Working memory is related to user interface
● Knowledge Base
A store of factual and heuristic knowledge. Expert system tool provides one or more knowledge
representation schemes for expressing knowledge about the application domain. Some tools use
both Frames (objects) and IF-THEN rules. In PROLOG the knowledge is represented as logical
statements.
● Reasoning Engine
Inference mechanisms for manipulating the symbolic information and knowledge in the
knowledge base form a line of reasoning in solving a problem. The inference mechanism can
range from simple modus backward chaining of IF-THEN rules to Case-Based reasoning.
● Explanation subsystem
A subsystem that explains the system's actions. The explanation can range from how the final or
intermediate solutions arrived at justifying the need for additional data.
● User Interface
A means of communication with the user. The user interface is generally not a part of the expert
system technology. It was not given much attention in the past. However, the user interface can
make a critical difference in the perceived utility of an Expert system
PREPARED BY
PROF. VISHVA UPADHYAY
So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization
problems. It is a subset of evolutionary algorithms, which is used in computing. A genetic
algorithm uses genetic and natural selection concepts to solve optimization problems.
● Termination
1. Initialization
● The process of a genetic algorithm starts by generating the set of individuals, which is
called population. Here each individual is the solution for the given problem. An
individual contains or is characterized by a set of parameters called Genes. Genes are
combined into a string and generate chromosomes, which is the solution to the problem.
One of the most popular techniques for initialization is the use of random binary strings.
2. Fitness Assignment
● Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.
3. Selection
● The selection phase involves the selection of individuals for the reproduction of
offspring. All the selected individuals are then arranged in a pair of two to increase
reproduction. Then these individuals transfer their genes to the next generation.
4. Reproduction
● After the selection process, the creation of a child occurs in the reproduction step. In this
step, the genetic algorithm uses two variation operators that are applied to the parent
population. The two operators involved in the reproduction phase are given below:
● Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the
genes. Then the crossover operator swaps genetic information of two parents from the
current generation to produce a new individual representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
crossover. Types of crossover styles available:
○ One point crossover
○ Two-point crossover
○ Livery crossover
○ Inheritable Algorithms crossover
● Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
○ Flip bit mutation
○ Gaussian mutation
○ Exchange/Swap mutation
5. Termination
● After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the
final solution as the best solution in the population.
● Genetic algorithms are not efficient algorithms for solving simple problems.
● It does not guarantee the quality of the final solution to a problem.
● Repetitive calculation of fitness values may generate some computational challenges.
Genetic Operators
● Genetic operators are used to create and maintain genetic diversity (mutation operator),
combine existing solutions (also known as chromosomes) into new solutions (crossover)
and select between solutions (selection).
Selection
generation directly to the next generation without being mutated; this is known as elitism
or elitist selection.
Crossover
● Crossover is the process of taking more than one parent solution (chromosomes) and
producing a child solution from them. By recombining portions of good solutions, the
genetic algorithm is more likely to create a better solution. As with selection, there are a
number of different methods for combining the parent solutions, including the edge
recombination operator (ERO) and the 'cut and splice crossover' and 'uniform crossover'
methods. The crossover method is often chosen to closely match the chromosome's
representation of the solution; this may become particularly important when variables are
grouped together as building blocks, which might be disrupted by a non-respectful
crossover operator. Similarly, crossover methods may be particularly suited to certain
problems; the ERO is generally considered a good option for solving the traveling
salesman problem.
Mutation
● The mutation operator encourages genetic diversity amongst solutions and attempts to
prevent the genetic algorithm converging to a local minimum by stopping the solutions
becoming too close to one another. In mutating the current pool of solutions, a given
solution may change entirely from the previous solution. By mutating the solutions, a
genetic algorithm can reach an improved solution solely through the mutation operator.[1]
Again, different methods of mutation may be used; these range from a simple bit
mutation (flipping random bits in a binary string chromosome with some low probability)
to more complex mutation methods, which may replace genes in the solution with
random values chosen from the uniform distribution or the Gaussian distribution. As with
the crossover operator, the mutation method is usually chosen to match the representation
of the solution within the chromosome.
Termination Parameters
● The termination condition of a Genetic Algorithm is important in determining when a GA
run will end. It has been observed that initially, the GA progresses very fast with better
solutions coming in every few iterations, but this tends to saturate in the later stages
where the improvements are very small. We usually want a termination condition such
that our solution is close to the optimal, at the end of the run.
● Usually, we keep one of the following termination conditions −
○ When there has been no improvement in the population for X iterations.
○ When we reach an absolute number of generations.
○ When the objective function value has reached a certain predefined value.
● For example, in a genetic algorithm we keep a counter which keeps track of the
generations for which there has been no improvement in the population. Initially, we set
this counter to zero. Each time we don’t generate off-springs which are better than the
individuals in the population, we increment the counter.
● However, if the fitness of any of the off-springs is better, then we reset the counter to
zero. The algorithm terminates when the counter reaches a predetermined value.
● Like other parameters of a GA, the termination condition is also highly problem specific
and the GA designer should try out various options to see what suits his particular
problem the best.