0% found this document useful (0 votes)
108 views138 pages

Fai Gtu All Lecture 1

Uploaded by

niteshnadar677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views138 pages

Fai Gtu All Lecture 1

Uploaded by

niteshnadar677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

lOMoARcPSD|29109960

AI Lecture Notes

Artificial Intelligence (Gujarat Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Dimpal Dalwadi ([email protected])
lOMoARcPSD|29109960

UNIT 1 Introduction

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What is AI?
● Artificial intelligence (AI) refers to the simulation of human intelligence in machines that
are programmed to think like humans and mimic their actions.
OR
● AI is accomplished by studying how the human brain thinks, and how humans learn,
decide, and work while trying to solve a problem, and then using the outcomes of this
study as a basis of developing intelligent software and systems.
OR
● Artificial Intelligence is composed of two words Artificial and Intelligence, where
Artificial defines "man- made," and intelligence defines "thinking power", hence AI
means "a man-made thinking power."
● There are three main types of AI based on its capabilities - weak AI, strong AI, and super
AI.
● Weak AI - Focuses on one task and cannot perform beyond its limitations (common in
our daily lives)
● Strong AI - Can understand and learn any intellectual task that a human being can
(researchers are striving to reach strong AI)
● Super AI - Surpasses human intelligence and can perform any task better than a
human (still a concept)

Why artificial intelligence?


● With the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues,
etc.
● With the help of AI, you can create your personal virtual Assistant, such as Cortana,
Google Assistant, Siri, etc.
● With the help of AI, you can build such Robots which can work in an environment where
survival of humans can be at risk.
● AI opens a path for other new technologies, new devices, and new Opportunities.

Advantages of AI
Reduction in Human Error:
● The phrase “human error” was born because humans make mistakes from time to time.
Computers, however, do not make these mistakes if they are programmed properly.
● With Artificial intelligence, the decisions are taken from the previously gathered
information applying a certain set of algorithms. So errors are reduced and the chance of
reaching accuracy with a greater degree of precision is a possibility.
● Example : In Weather Forecasting using AI they have reduced the majority of human
error.

Takes risks instead of Humans:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● We can overcome many risky limitations of humans by developing an AI Robot which in


turn can do the risky things for us.
● Example : Let it be going to mars, defuse a bomb, explore the deepest parts of oceans,
mining for coal and oil, it can be used effectively in any kind of natural or man-made
disasters.
● chernobyl nuclear power plant explosion in Ukraine

Available 24x7:
● But using AI we can make machines work 24x7 without any breaks and they don’t even
get bored, unlike humans.
● Example : Educational Institutes and Helpline centers are getting many queries and issues
which can be handled effectively using AI.

Helping in Repetitive Jobs:


● Using artificial intelligence we can productively automate these mundane tasks and can
even remove “boring” tasks for humans and free them up to be increasingly creative.
● Example : In banks, we often see many verifications of documents to get a loan which is
a repetitive task for the owner of the bank. Using AI Cognitive Automation the owner can
speed up the process of verifying the documents by which both the customers and the
owner will be benefited.

Digital Assistance:
● The digital assistants are also used in many websites to provide things that users want.
We can chat with them about what we are looking for. Some chatbots are designed in
such a way that it’s become hard to determine that we’re chatting with a chatbot or a
human being.
● Example: We all know that organizations have a customer support team that needs to
clarify the doubts and queries of the customers. Using AI the organizations can set up a
Voice bot or Chatbot which can help customers with all their queries.

Faster Decisions:
● Using AI alongside other technologies we can make machines take decisions faster than a
human and carry out actions quicker. While making a decision humans will analyze many
factors both emotionally and practically but AI-powered machines work on what is
programmed and deliver the results in a faster way.
● Example: We all have played Chess games on Windows. It is nearly impossible to beat
the CPU in hard mode because of the AI behind that game. It will take the best possible
step in a very short time according to the algorithms used behind it.

Daily Applications:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Daily applications such as Apple’s Siri, Windows Cortana, Google’s OK Google are
frequently used in our daily routine whether it is for searching a location, taking a selfie,
making a phone call, replying to a mail and many more.
● Example : Around 20 years ago, when we were planning to go somewhere we used to ask
a person who already went there for directions. But now all we have to do is say “OK
Google where is Visakhapatnam”. It will show you Visakhapatnam’s location on google
map and the best path between you and Visakhapatnam.

New Inventions:
● AI is powering many inventions in almost every domain which will help humans solve
the majority of complex problems.
● Example : Recently doctors can predict breast cancer in women at earlier stages using
advanced AI-based technologies.

Disadvantages of AI
High Costs of Creation:
● As AI is updating every day the hardware and software need to get updated with time to
meet the latest requirements. Machines need repairing and maintenance which need
plenty of costs. It’ s creation requires huge costs as they are very complex machines.

Making Humans Lazy:


● AI is making humans lazy with its applications automating the majority of the work.
Humans tend to get addicted to these inventions which can cause a problem to future
generations.

Unemployment:
● As AI is replacing the majority of repetitive tasks and other work with robots,human
interference is becoming less, which will cause a major problem in the employment
standards. Every organization is looking to replace the minimum qualified individuals
with AI robots which can do similar work with more efficiency.

No Emotions:
● There is no doubt that machines are much better when it comes to working efficiently but
they cannot replace the human connection that makes the team. Machines cannot develop
a bond with humans which is an essential attribute when it comes to Team Management.

Lacking Out of Box Thinking:


● Machines can perform only those tasks which they are designed or programmed to do,
anything out of that they tend to crash or give irrelevant outputs which could be a major
backdrop

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

The AI Problems
Problems which we are giving to ai so that we can get our output as per our need.
● Problems in AI early focused on formal tasks.
● Another focused problem is called Commonsense reasoning.
● These include perception natural language understanding, and problem solving in
Specialized domains like medical diagnosis and chemical analysis
Nowadays looking at AI problems and solution techniques it is important to discuss the
following Question:
1. What are the underlying assumptions about intelligence?
2. What kinds of techniques will be useful for solving AI problems?
3. At what level human intelligence can be modeled?
4. When will it be realized when an intelligent program has been built?

The Underlying Assumptions of AI


● Underlying : Underlying means cause of something or you can say reason of something.
● Assumption : It means accepting something to be the sole truth without any proof.
● It means what is the reason that ai has come into existence

Underlying assumption of AI
● Any physical symbol system (computer) has the ability for common quick-witted(clever)
actions. Example : Installing any software in system
● That means before the existence of AI, computers could already perform some clever
actions.
Task Domains of Artificial Intelligence (AI)
Mundane Tasks:
● Perception
● Vision
● Speech
● Natural Languages
● Understanding
● Generation
● Translation
● Common sense reasoning
● Robot Control

● Humans have been learning mundane (ordinary) tasks since their birth. They learn by
perception, speaking, using language, and locomotives.
● For humans, the mundane tasks are easiest to learn. The same was considered true before
trying to implement mundane tasks in machines.
● Earlier, all work of AI was concentrated in the mundane task domain.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Formal Tasks:
● Games: chess, checkers, etc
● Mathematics: Geometry, logic, Proving properties of programs

● Formal Tasks are the tasks that need deals with verification, theorem proving, deals with
Math, Games, etc.

Expert Tasks:
● Engineering ( Design, Fault finding, Manufacturing planning)
● Scientific Analysis
● Medical Diagnosis
● Financial Analysis

● Expert Tasks are those tasks, which involves scientific analysis, and different domain
analysis, like Financial, healthcare, creative aspects, etc.
● Now researchers have understood that to solve mundane tasks, they need better and more
efficient algorithms, and a much more knowledge base to help them tackle the problems
they have set out to solve. And that is the reason that AI has shifted more on working
with Expert Tasks, to enhance the capabilities of the AI system.

AI techniques
● Techniques that we are using to solve the problems by applying to AI.
● AI technique is a method that exploits knowledge that should be represented in such a
way that:
● Situations that share common properties are grouped together.
● It can easily be modified to correct errors and to reflect changes in the world.
● It can be used in many situations even though it may not be totally accurate or complete.
There are three important AI techniques:

Search : Provides a way of solving problems for which no direct approach is available. It will be
solved by the data that has been given to AI.
Example :
1 2 3 4

5 6 7 8

9 10 11 ??

Use of knowledge : Provides a way of solving complex problems. It has to use the knowledge
that has already been given to it to solve any problem.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Example : Maze

Abstraction : It should abstract all the possible ways and use only one optimal path to solve the
problem.
Example : Map

The Level of The Model & Criteria For Success


Model :
● It is something which makes an exact copy of the decision process to enable automation
and understanding as that of AI.
● Example : If we are giving any task to AI, first it will understand that, after that based on
the knowledge it will make some decisions, and will automatically follow that path.
● Level of the model shows how easily AI solves the problem or task and how it modifies
it.
● If humans and ai both are having the same task. If the human is performing that task with
85% of accuracy then we have to see how ai is solving that task.

Criteria :
● If the targeted goal is achieved. Then we can say that the AI model is successful.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 2 Problems, State Space Search


& Heuristic Search Techniques

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Defining The Problems As A State Space Search


● Defining the problem : The problem or task that we are giving to the machine. It is the
question which is to be solved, It should be
1. It should be defined precisely to the machine
2. On that basis it can be analyzed properly.

● State Space Search : A state-space defined as a set of all possible states of a problem.
OR
● It is a complete set of states including start and goal states, where the answer of the
problem is to be searched.
● S = (Start State, Goal State, Action,Result,Cost)
● For Example:
● The eight tile puzzle problem formulation
● The eight tile puzzle consist of a 3 by 3 (3*3) square frame board which holds 8 movable
tiles numbered 1 to 8. One square is empty, allowing the adjacent tiles to be shifted. The
objective of the puzzle is to find a sequence of tile movements that leads from a starting
configuration to a goal configuration to be transformed

3 8 1

6 2 5

4 7
Start State
● Here we have our goal state as
1 2 3

8 4

7 6 5

● The states of 8 tile puzzle are the different permutations of the tiles within the frame.
● Start State and Goal State both are defined above.
● After every intermediate state we will match it with our goal state. If the goal state is
reached then it will stop searching otherwise it will continue to search.
● Here Action defines in all possible solution which state it is being chosen
● Result is The Result matrix after applying the action.
● Cost : Each step costs 1, so the path cost is the number of steps in the path.
● Searching can be of 2 types
1. Uninformed Search

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

2. Informed Search

Informed Search Uninformed Search

It is also known as Heuristic Search. It is also known as Blind Search.

It uses knowledge for the searching process. It doesn’t use knowledge for the searching
process.

It finds a solution more quickly. It finds solutions slow as compared to an


informed search.

Cost is low. Cost is high.

It consumes less time because of quick It consumes moderate time because of slow
searching. searching.

There is a direction given about the solution. No suggestion is given regarding the solution
in it.

Greedy Search Depth First Search


A* Algorithm Breadth First Search
AO* Algorithm
Hill Climbing Algorithm

Production System
● A production system in AI helps create AI-based computer programs.
● A production system in AI is a type of cognitive architecture that defines specific actions
as per certain rules. The rules represent the declarative knowledge of a machine to
respond according to different conditions
● A production system (popularly known as a production rule system) is a kind of cognitive
architecture that is used to implement search algorithms and replicate human
problem-solving skills.

Components of a Production System in AI


● For making an AI-based intelligent system that performs specific tasks, we need an
architecture. The architecture of a production system in Artificial Intelligence consists of
production rules, a database, and the control system.

Global Database

● A global database consists of the architecture used as a central data structure.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● A database contains all the necessary data and information required for the successful
completion of a task.
● It can be divided into two parts: permanent and temporary. The permanent part of the
database consists of fixed actions, whereas the temporary part alters according to
circumstances.

Production Rules

● Production rules in AI are the set of rules that operate on the data fetched from the global
database.
● These production rules are bound with precondition and postcondition that gets checked
by the database. If a condition is passed through a production rule and gets satisfied by
the global database, then the rule is successfully applied.
● The rules are of the form A®B, where the right-hand side represents an outcome
corresponding to the problem state represented by the left-hand side.

Control System

● The control system checks the applicability of a rule. It helps decide which rule should be
applied and terminates the process when the system gives the correct output.
● It also resolves the conflict of multiple conditions arriving at the same time. The strategy
of the control system specifies the sequence of rules that compares the condition from the
global database to reach the correct result.

Characteristics of a Production System


● There are mainly four characteristics of the production system in AI that are simplicity,
modifiability, modularity, and knowledge-intensive.

Simplicity

● The production rule in AI is in the form of an ‘IF-THEN’ statement. Every rule in the
production system has a unique structure.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● It helps represent knowledge and reasoning in the simplest way possible to solve
real-world problems. Also, it helps improve the readability and understanding of the
production rules.

Modularity
● The modularity of a production rule helps in its incremental improvement as the
production rule can be in discrete parts.
● The production rule is made from a collection of information and facts that may not have
dependencies unless there is a rule connecting them together.
● The addition or deletion of single information will not have a major effect on the output.
● Modularity helps enhance the performance of the production system by adjusting the
parameters of the rules.

Modifiability
● The feature of modifiability helps alter the rules as per requirements.
● Initially, the skeletal form of the production system is created.
● We then gather the requirements and make changes in the raw structure of the production
system.
● This helps in the iterative improvement of the production system.

Knowledge-intensive
● Production systems contain knowledge in the form of a human spoken language,
i.e.English.
● It is not built using any programming languages.
● The knowledge is represented in plain English sentences. Production rules help make
productive conclusions from these sentences.

Issues in the design of search programs

● The direction in which to conduct search (forward versus backward reasoning). If the
search proceeds from the start state towards a goal state, it is a forward search or we can
also search from the goal.
● How to select applicable rules (Matching). Production systems typically spend most of
their time looking for rules to apply. So, it is critical to have efficient procedures for
matching rules against states.
● How to represent each node of the search process (knowledge representation problem).

Heuristic Search
● Heuristic search is a class of methods which is used in order to search a solution space for
an optimal solution for a problem. The heuristic here uses some method to search the

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

solution space while assessing where in the space the solution is most likely to be and
focusing the search on that area.

A heuristic is a method that

● might not always find the best solution


● but is guaranteed to find a good solution in reasonable time.
● By sacrificing completeness it increases efficiency.
● Useful in solving tough problems which
○ could not be solved any other way.
○ solutions take an infinite time or very long time to compute.

The classic example of heuristic search methods is the traveling salesman problem

Generate and Test Algorithm


● Generate and Test Search is a heuristic search technique based on Depth First Search with
Backtracking which guarantees to find a solution if done systematically and there exists a
solution.
● In this technique, all the solutions are generated and tested for the best solution. It ensures
that the best solution is checked against all possible generated solutions.
● It is also known as the British Museum Search Algorithm as it’s like looking for an
exhibit at random or finding an object in the British Museum by wandering randomly.
● The evaluation is carried out by the heuristic function as all the solutions are generated
systematically in the generate and test algorithm but if there are some paths which are
most unlikely to lead us to result then they are not considered.
Algorithm
● Generate a possible solution. For some problems. This means generating a particular
point in the problem space. For others, it means generating a path from a start state.
● Test to see if this is actually a solution by comparing the chosen point or the endpoint of
the chosen path to the set of acceptable goal states.
● If a solution has been found, quit. Otherwise, return to step 1.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Generate-and-test, like depth-first search, requires that complete solutions be generated


for testing.
● Solutions can also be generated randomly but the solution is not guaranteed.

Properties
● Complete: Good Generators need to be complete i.e. they should generate all the
possible solutions and cover all the possible states.
● Non Redundant: Good Generators should not yield a duplicate solution at any point of
time as it reduces the efficiency of the algorithm thereby increasing the time of search
and making the time complexity exponential.
● Informed: Good Generators have the knowledge about the search space which they
maintain in the form of an array of knowledge. This can be used to search how far the
agent is from the goal, calculate the path cost and even find a way to reach the goal.

Example : coloured blocks


Problem Statement :

Arrange four 6-sided cubes in a row, with each side of each cube painted one of four colors, such
that on all four sides of the row one block face of each color is shown.”

Heuristic: If there are more red faces than other colors then, when placing a block with several
red faces, use as few of them as possible as outside faces.

Example – Traveling Salesman Problem (TSP)

Problem Statement :
A salesman has a list of cities, each of which he must visit exactly once. There are direct roads
between each pair of cities on the list. Find the route the salesman should follow for the shortest
possible round trip that both starts and finishes at any one of the cities.

Rules :
● Travelers need to visit n cities.
● Know the distance between each pair of cities.
● Want to know the shortest route that visits all the cities at once.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

How we can apply heuristic for TSP


1. Randomly select one vertex as the root.
2. Find the vertex that is closest (more precisely, has the lowest cost) to the current
position but is not yet part of the route, and add it into the route.
3. Repeat until the route includes each vertex.
A heuristic function is a function that will rank all the possible alternatives at any branching step
in the search algorithm based on the available information. It helps the algorithm to select the
best route out of possible routes.

Heuristics for the TSP is mainly speed and closeness to optimal solutions.

Search flow with Generate and Test

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Possible Solutions :

Path Length

ABCD 19

ABDC 18

ACBD 12

ACDB 13

ADBC 16

Finally, select the path whose length is less.

Advantages & Disadvantages:


● Acceptable for simple problems.
● Inefficient for problems with large space.

Hill Climbing
● Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best solution to
the problem. It terminates when it reaches a peak value where no neighbor has a higher
value.
● It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
● Depends on Heuristic.

Features of Hill Climbing:


● Generate and Test variant: Hill Climbing is the variant of Generate and Test method. The
Generate and Test method produces feedback which helps to decide which direction to
move in the search space. It is a linear search algorithm.
● Greedy approach: Hill-climbing algorithm search moves in the direction which optimizes
the cost.
● No backtracking: It does not backtrack the search space, as it does not remember the
previous states.

Algorithm
● Evaluate the initial state, if it is a goal state then return success and Stop.
● Loop Until a solution is found or there is no new operator left to apply.
● Select and apply an operator to the current state.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Check new state:


○ If it is a goal state, then return to success and quit.
○ Else if it is better than the current state then assign a new state as a current state.
○ Else if not better than the current state, then return to step2.
● Exit.

Different regions in the state space landscape


Local Maximum: Local maximum is a state which is better than its neighbor states, but there is
also another state which is higher than it.

Global Maximum: Global maximum is the best possible state of the state space landscape. It
has the highest value of objective function.

Current state: It is a state in a landscape diagram where an agent is currently present.

Flat local maximum: It is a flat space in the landscape where all the neighbor states of current
states have the same value.

Shoulder: It is a plateau region which has an uphill edge.

Problems in Hill Climbing:

10

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Local Maximum: A local maximum is a peak state in the landscape which is better than each of
its neighboring states, but there is another state also present which is higher than the local
maximum.

Flat local maximum/Plateau:A plateau is the flat area of the search space in which all the
neighboring states of the current state contain the same value, because this algorithm does not
find any best direction to move. A hill-climbing search might be lost in the plateau area.

Ridge:It is a region that is higher than its neighbors but itself has a slope. It is a special kind of
local maximum.

11

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Advantages
● Memory Efficient
● Helpful in solving pure optimization problems, where the focus is on finding the best
state.
● Hill Climbing technique can be used to solve many problems, where the current state
allows for an accurate evaluation function, such as Network-Flow, Traveling Salesman
problem, 8-Queens problem, Integrated Circuit design, etc.

Best First Search


● It is an informed search algorithm.
● Works on the greedy concept
Algorithm
● Let OPEN be a Priority queue containing initial state.
● LOOP
● If open is empty return false
● Else Node <- Remove - First(OPEN)
● If the node is Goal
● Then return the path from initial to node
● Else generate all successors of node and put the newly generated node into open
according to their f value.
Pure Heuristic Search:
● Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes
based on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the
CLOSED list, it places those nodes which have already expanded and in the OPEN list, it
places nodes which have yet not been expanded.

12

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● On each iteration, each node n with the lowest heuristic value is expanded and generates
all its successors and n is placed to the closed list. The algorithm continues until a goal
state is found.
In the informed search there are two algorithms
● Best First Search Algorithm(Greedy search)
● A* Search Algorithm
Best-first Search Algorithm (Greedy Search):
● Greedy best-first search algorithm always selects the path which appears best at that
moment. It is the combination of depth-first search and breadth-first search algorithms.
● It uses the heuristic function and search. Best-first search allows us to take the advantages
of both algorithms. With the help of best-first search, at each step, we can choose the
most promising node.
● In the best first search algorithm, we expand the node which is closest to the goal node
and the closest cost is estimated by heuristic function, i.e.

f(n)= g(n).
● Were, h(n)= estimated cost from node n to the goal.
● The greedy best first algorithm is implemented by the priority queue.
Best first search algorithm:
Step 1: Place the starting node into the OPEN list.
Step 2: If the OPEN list is empty, Stop and return failure.
Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
Step 4: Expand the node n, and generate the successors of node n.
Step 5: Check each successor of node n, and find whether any node is a goal node or not.
If any successor node is a goal node, then return success and terminate the search, else
proceed to Step 6.
Step 6: For each successor node, the algorithm checks for evaluation function f(n), and
then checks if the node has been in either OPEN or CLOSED list. If the node has not
been in both lists, then add it to the OPEN list.
Step 7: Return to Step 2.
Advantages:
● Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
● This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
● It can behave as an unguided depth-first search in the worst case scenario.

13

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● It can get stuck in a loop as DFS.


● This algorithm is not optimal.

Example:

Expand the nodes of S and put in the CLOSED list

Initialization: Open [A, B], Closed [S]

Iteration 1: Open [A], Closed [S, B]

Iteration 2: Open [E, F, A], Closed [S, B]

: Open [E, A], Closed [S, B, F]

14

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Iteration 3: Open [I, G, E, A], Closed [S, B, F]

: Open [I, E, A], Closed [S, B, F, G]

● Hence the final solution path will be: S----> B----->F----> G


● Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
● Space Complexity: The worst case space complexity of Greedy best first search is O(bm).
Where, m is the maximum depth of the search space.
● Complete: Greedy best-first search is also incomplete, even if the given state space is
finite.
● Optimal: Greedy best first search algorithm is not optimal.
A* Search Algorithm:
● A* search is the most commonly known form of best-first search. It uses the heuristic
function h(n), and costs to reach the node n from the start state g(n).
● It has combined features of UCS and greedy best-first search, by which it solves the
problem efficiently. A* search algorithm finds the shortest path through the search space
using the heuristic function.
● This search algorithm expands the search tree and provides optimal results faster. A* the
algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
● In A* search algorithm, we use search heuristic as well as the cost to reach the node.
Hence we can combine both costs as following, and this sum is called as a fitness
number.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stop.

Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise

Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute the
evaluation function for n' and place it into the Open list.

15

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:
● A* search algorithm is the best algorithm than other search algorithms.
● A* search algorithm is optimal and complete.
● This algorithm can solve very complex problems.
Disadvantages:
● It does not always produce the shortest path as it is mostly based on heuristics and
approximation.
● A* search algorithm has some complexity issues.
● The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.

Example:

Solution:

16

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Initialization: {(S, 5)}


Iteration 1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}

Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost
6.
● A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
● The efficiency of A* algorithm depends on the quality of heuristic.
● A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:
● Branching factor is finite.
● Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
● Admissible: the first condition requires for optimality is that h(n) should be an admissible
heuristic for A* tree search. An admissible heuristic is optimistic in nature.
● Consistency: Second required condition is consistency for only A* graph-search.
● If the heuristic function is admissible, then A* tree search will always find the least cost
path.

17

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Time Complexity: The time complexity of A* search algorithm depends on heuristic


function, and the number of nodes expanded is exponential to the depth of solution d. So
the time complexity is O(b^d), where b is the branching factor.
● Space Complexity: The space complexity of A* search algorithm is O(b^d)
Problem Reduction
● AO* is an informed search algorithm ,work based on heuristic. We already know about
the divide and conquer strategy, a solution to a problem can be obtained by decomposing
it into smaller sub-problems.
● Each of these sub-problem can then be solved to get its sub solution. These sub solutions
can then be recombined to get a solution as a whole. That is called Problem Reduction.
AND-OR graphs or AND – OR trees are used for representing the solution.
● This method generates arcs which are called as AND-OR arcs. One AND arc may point
to any number of successor nodes, all of which must be solved in order for an arc to point
to a solution. The AND-OR graph is used to represent various kinds of complex problem
solutions.
● AO* search algo. is based on AND-OR graph so ,it is called AO* search algo.
AO* Algorithm
● Example: In Following figure , we have taken the example of Goal: Acquire TV Set.
● This goal or problem is subdivided into two subproblems or sub goals like 1) STEAL TV
SET 2) Earn some money, Buy TV SET. So to solve this problem if we select a second
alternative to earn some Money, then along with that Buy TV SET also need to select as
it is part of and graph.
● Whereas First alternative :Steal Tv Set is forming OR Graph
● Just as in an OR graph, several arcs may emerge from a single node, indicating a variety
of ways in which the original problem might be solved.
● This is why the structure is called not simply an OR-graph but rather an AND-OR graph
(which also happens to be an AND-OR tree)

Algorithm

18

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

1. Initialize the graph to the starting node.


2. Loop until the starting node is labeled SOLVED or until its cost goes above FUTILITY: In
AO* algorithm serves as the estimate of goodness of a node. Also a value called FUTILITY is
used. The estimated cost of a solution is greater than FUTILITY then the search is abandoned as
too expensive to be practical.
(i) Traverse the graph, starting at the initial node and following the current best path and
accumulate the set of nodes that are on that path and have not yet been expanded.
(ii) Pick one of these unexpanded nodes and expand it. If there are no successors, assign
FUTILITY as the value of this node. Otherwise, add its successors to the graph and for each of
them compute f'(n). If f'(n) of any node is O, mark that node as SOLVED.
(iii) Change the f'(n) estimate of the newly expanded node to reflect the new information
provided by its successors. Propagate this change backwards through the graph. If any node
contains a successor arc whose descendants are all solved, label the node itself as SOLVED.
Advantages of AO*:
● It is Complete
● Will not go in infinite loop
● Less Memory Required
Disadvantages of AO*:
● It is not optimal as it does not explore all the paths once it finds a solution.
Working of AO algorithm:
The AO* algorithm works on the formula given below :

f(n) = g(n) + h(n)

where,

● g(n): The actual cost of traversal from initial state to the current state.
● h(n): The estimated cost of traversal from the current state to the goal state.
● f(n): The actual cost of traversal from the initial state to the goal state.
Example-

19

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Here, in the above example all numbers in brackets are the heuristic value i.e h(n). Each edge is
considered to have a value of 1 by default.

Step-1

Starting from node A, we first calculate the best path.

f(A-B) = g(B) + h(B) = 1+4= 5 , where 1 is the default cost value of traveling from A to B and 4
is the estimated cost from B to Goal state.

f(A-C-D) = g(C) + h(C) + g(D) + h(D) = 1+2+1+3 = 7 , here we are calculating the path cost as
both C and D because they have the AND-Arc. The default cost value of traveling from A-C is 1,
and from A-D is 1, but the heuristic value given for C and D are 2 and 3 respectively hence
making the cost as 7.

20

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

The minimum cost path is chosen i.e A-B.

Step-2

Using the same formula as step-1, the path is now calculated from the B node,

f(B-E) = 1 + 6 = 7.

f(B-F) = 1 + 8 = 9

Hence, the B-E path has lesser cost. Now the heuristics have to be updated since there is a
difference between actual and heuristic value of B. The minimum cost path is chosen and is
updated as the heuristic , in our case the value is 7. And because of the change in heuristic of B
there is also change in heuristic of A which is to be calculated again.

f(A-B) = g(B) + updated((h(B)) = 1+7=8

21

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step-3

Comparing the path of f(A-B) and f(A-C-D) it is seen that f(A-C-D) is smaller. Hence f(A-C-D)
needs to be explored.

Now the current node becomes C node and the cost of the path is calculated,

f(C-G) = 1+2 = 3

f(C-H-I) = 1+0+1+0 = 2

f(C-H-I) is chosen as the minimum cost path,also there is no change in heuristic since it matches
the actual cost. Heuristic of the paths of H and I are 0 and hence they are solved, but Path A-D
also needs to be calculated , since it has an AND-arc.

f(D-J) = 1+0 = 1, hence heuristic of D needs to be updated to 1. And finally the f(A-C-D) needs
to be updated.

f(A-C-D) = g(C) + h(C) + g(D) + updated((h(D)) = 1+2+1+1 =5.

22

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

As we can see that the solved path is f(A-C-D).

Constraint Satisfaction
● Constraint satisfaction technique. By the name, it is understood that constraint
satisfaction means solving a problem under certain constraints or rules.
● Constraint satisfaction is a technique where a problem is solved when its values satisfy
certain constraints or rules of the problem. This type of technique leads to a deeper
understanding of the problem structure as well as its complexity.

Constraint satisfaction depends on three components, namely:

● X: It is a set of variables.
● D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
● C: It is a set of constraints which are followed by a set of variables.
● The constraint value consists of a pair of {scope, rel}. The scope is a tuple of variables
which participate in the constraint and rel is a relation which includes a list of values
which the variables can take to satisfy the constraints of the problem.

The requirements to solve a constraint satisfaction problem (CSP) is:

● A state-space
● The notion of the solution.

A state in state-space is defined by assigning values to some or all variables such as


{X1=v1, X2=v2, and so on…}.

23

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

An assignment of values to a variable can be done in three ways:

● Consistent or Legal Assignment: An assignment which does not violate any constraint or
rule is called Consistent or legal assignment.
● Complete Assignment: An assignment where every variable is assigned with a value, and
the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
● Partial Assignment: An assignment which assigns values to some of the variables only.
Such types of assignments are called Partial assignments.

Types of Domains in CSP


There are following two types of domains which are used by the variables :

● Discrete Domain: It is an infinite domain which can have one state for multiple variables.
For example, a start state can be allocated infinite times for each variable.
● Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.

Constraint Types in CSP


With respect to the variables, basically there are following types of constraints:

● Unary Constraints: It is the simplest type of constraints that restricts the value of a single
variable.
● Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
● Global Constraints: It is the constraint type which involves an arbitrary number of
variables.

Some special types of solution algorithms are used to solve the following types of constraints:

● Linear Constraints: These types of constraints are commonly used in linear programming
where each variable containing an integer value exists in linear form only.
● Non-linear Constraints: These types of constraints are used in nonlinear programming
where each variable (an integer value) exists in a non-linear form.

Popular Problems with CSP


The following problems are some of the popular problems that can be solved using CSP:

1. CryptArithmetic (Coding alphabets to numbers.)


2. n-Queen (In an n-queen problem, n queens should be placed in an nXn matrix such that
no queen shares the same row, column or diagonal.)
3. Map Coloring (coloring different regions of map, ensuring no adjacent regions have the
same color)
4. Crossword (everyday puzzles appearing in newspapers)
5. Sudoku (a number grid)

24

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

6. Latin Square Problem

Converting Process & Example


A problem to be converted to CSP requires the following steps:

● Step 1: Create a variable set.


● Step 2: Create a domain set.
● Step 3: Create a constraint set with variables and domains (if possible) after considering
the constraints.
● Step 4: Find an optimal solution.

Means-Ends Analysis
● We have studied the strategies which can reason either in forward or backward, but a
mixture of the two directions is appropriate for solving a complex and large problem.
Such a mixed strategy makes it possible to first solve the major part of a problem and
then go back and solve the small problems that arise during combining the big parts of
the problem. Such a technique is called Means-Ends Analysis.
● Means-Ends Analysis is problem-solving techniques used in Artificial intelligence for
limiting search in AI programs.
● It is a mixture of Backward and forward search techniques.
● The MEA technique was first introduced in 1961 by Allen Newell, and Herbert A. Simon
in their problem-solving computer program, which was named as General Problem
Solver (GPS).
● The MEA analysis process centered on the evaluation of the difference between the
current state and goal state.

● The means-ends analysis process can be applied recursively for a problem. It is a strategy
to control search in problem-solving. Following are the main Steps which describe the
working of MEA techniques for solving a problem.

a. First, evaluate the difference between Initial State and final State.
b. Select the various operators which can be applied for each difference.
c. Apply the operator at each difference, which reduces the difference between the current
state and goal state.

Operator Subgoaling

● In the MEA process, we detect the differences between the current state and goal state.
Once these differences occur, then we can apply an operator to reduce the differences.
But sometimes it is possible that an operator cannot be applied to the current state. So we
create the subproblem of the current state, in which operator can be applied, such a type

25

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

of backward chaining in which operators are selected, and then sub goals are set up to
establish the preconditions of the operator is called Operator Subgoaling.

Algorithm
● Current state as CURRENT and Goal State as GOAL, then following are the steps for the
MEA algorithm.

● Step 1: Compare CURRENT to GOAL, if there are no differences between both then
return Success and Exit.
● Step 2: Else, select the most significant difference and reduce it by doing the following
steps until the success or failure occurs.
a. Select a new operator O which is applicable for the current difference, and if there
is no such operator, then signal failure.
b. Attempt to apply operator O to CURRENT. Make a description of two states.
i) O-Start, a state in which O?s preconditions are satisfied.
ii) O-Result, the state that would result if O were applied In O-start.
c. If
(First-Part <------ MEA (CURRENT, O-START)
And
(LAST-Part <----- MEA (O-Result, GOAL), are successful, then signal Success
and return the result of combining FIRST-PART, O, and LAST-PART

Example of Mean-Ends Analysis:

Let's take an example where we know the initial state and goal state as given below. In this
problem, we need to get the goal state by finding differences between the initial state and goal
state and applying operators.

Solution:
To solve the above problem, we will first find the differences between initial states and goal
states, and for each difference, we will generate a new state and will apply the operators. The
operators we have for this problem are:

26

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Move
● Delete
● Expand

1. Evaluating the initial state: In the first step, we will evaluate the initial state and will
compare the initial and Goal state to find the differences between both states.

2. Applying Delete operator: As we can check the first difference is that in the goal state there
is no dot symbol which is present in the initial state, so, first we will apply the Delete operator
to remove this dot.

3. Applying Move Operator: After applying the Delete operator, the new state occurs which we
will again compare with the goal state. After comparing these states, there is another difference
that is the square is outside the circle, so we will apply the Move Operator.

27

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

4. Applying Expand Operator: Now a new state is generated in the third step, and we will
compare this state with the goal state. After comparing the states there is still one difference
which is the size of the square, so, we will apply the Expand operator, and finally, it will
generate the goal state.

28

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 3 Knowledge Representation

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What is Knowledge Representation in AI?


● Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence
which is concerned with AI agents thinking and how thinking contributes to intelligent
behavior of agents.
● It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve complex real world problems such as
diagnosing a medical condition or communicating with humans in natural language.
● Knowledge representation is not just storing data into some database, but it also enables
an intelligent machine to learn from that knowledge and experiences so that it can behave
intelligently like a human.

Knowledge Representations And Mappings


● In order to solve complex problems encountered in artificial intelligence, one needs both
a large amount of knowledge and some mechanism for manipulating that knowledge to
create solutions.
● Knowledge and Representation are two distinct entities.
● They play central but distinguishable roles in the intelligent system.
● Knowledge is a description of the world. It determines a system’s competence by what it
knows.
● Representation is the way knowledge is encoded. It defines a system’s performance
in doing something.

Which needs to be represented in AI systems:


● Object: All the facts about objects in our world domain.
● Events: Events are the actions which occur in our world.
● Performance: It describes behavior which involves knowledge about how to do things.
● Meta-knowledge: It is knowledge about what we know.
● Facts: Facts are the truths about the real world and what we represent.
● Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the Sentences

Types of Knowledge
Declarative Knowledge:
● Declarative knowledge is to know about something.
● It includes concepts, facts, and objects.
● It is also called descriptive knowledge and expressed in declarative sentences.
● It is simpler than procedural language.

Procedural Knowledge
● It is also known as imperative knowledge.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Procedural knowledge is a type of knowledge which is responsible for knowing how to


do something.
● It can be directly applied to any task.
● It includes rules, strategies, procedures, agendas, etc.
● Procedural knowledge depends on the task on which it can be applied.

Meta-knowledge:
● Knowledge about the other types of knowledge is called Meta-knowledge.
Heuristic knowledge:
● Heuristic knowledge is representing knowledge of some experts in a filed or subject.
● Heuristic knowledge is rules of thumb based on previous experiences, awareness of
approaches, and which are good to work but not guaranteed.

Structural knowledge:
● Structural knowledge is basic knowledge of problem-solving.
● It describes relationships between various concepts such as kind of, part of, and grouping
of something.
● It describes the relationship that exists between concepts or objects.

Approaches To Knowledge Representation


1. Simple relational knowledge:
● It is the simplest way of storing facts which uses the relational method, and each fact
about a set of the object is set out systematically in columns.
● This approach of knowledge representation is famous in database systems where the
relationship between different entities is represented.
● This approach has little opportunity for inference.

Example: The following is the simple relational knowledge representation.

Player Weight Age

Player1 65 23

Player2 58 18

Player3 75 24

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

2. Inheritable knowledge:
● In the inheritable knowledge approach, all data must be stored into a hierarchy of classes.
● All classes should be arranged in a generalized form or a hierarchical manner.
● In this approach, we apply inheritance property.
● Elements inherit values from other members of a class.
● This approach contains inheritable knowledge which shows a relation between instance
and class, and it is called instance relation.
● Every individual frame can represent the collection of attributes and its value.
● In this approach, objects and values are represented in Boxed nodes.
● We use Arrows which point from objects to their values.
● Example:

3. Inferential knowledge:
● Inferential knowledge approach represents knowledge in the form of formal logics.
● This approach can be used to derive more facts.
● It guaranteed correctness.
● Example: Let's suppose there are two statements:
a. Marcus is a man

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

b. All men are mortal


Then it can represent as;

man(Marcus)
∀x = man (x) ----------> mortal (x)s

4. Procedural knowledge:
● Procedural knowledge approach uses small programs and codes which describe how to
do specific things, and how to proceed.
● In this approach, one important rule is used which is the If-Then rule.
● With this knowledge, we can use various coding languages such as LISP language( list
processing) and Prolog language.
● We can easily represent heuristic or domain-specific knowledge using this approach.
● But it is not necessary that we can represent all cases in this approach.

Representation Simple Facts In Logic (Propositional Logic OR Boolean Logic)


● Simplest way of knowledge representation.
● Propositional means sentences
● Propositional logic (PL) is the simplest form of logic where all the statements are made
by propositions. A proposition is a declarative statement which is either true or false.
● It is a technique of knowledge representation in logical and mathematical form.
● The answer/output should be true or false. Propositional logic is also called Boolean logic
as it works on 0 and 1.
● 1+2 = 3 // True Proposition
● 3+3 = 9 // False Proposition
● The Sun rises from West // False Proposition
● Syntax and Semantic are the most important factors for the PL.
● Syntax means the structure of the sentence or format of the sentences
● Semantic means Meaning of that particular syntax.
Syntax of propositional logic:
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:

a. Atomic Propositions
b. Compound propositions

Atomic Proposition: Atomic propositions are simple propositions. It consists of a single


proposition symbol. These are the sentences which must be either true or false.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Example:

a) 2+2 is 4, it is an atomic proposition as it is a true fact.


b) "The Sun is cold" is also a proposition as it is a false fact.

Compound proposition: Compound propositions are constructed by combining simpler or atomic


propositions, using parentheses and logical connectives.

Example:

a) "It is raining today, and the street is wet."


b) "Ankit is a doctor, and his clinic is in Mumbai."

Facts about propositional logic:


● Upper Case letters A, B, C, P, Q, R are used to represent statements
● ^, v, →, ↔, ¬ are used to represent AND, OR,Implies, bi-conditional and NOT
condition.
● Complex conditions are handled by coding connectors within parenthesis.
● Propositional logic is also called Boolean logic as it works on 0 and 1.

Logical Connectives:
Negation ¬ P
● It represents a Negative condition. P is a positive statement, and ¬ P indicates NOT
condition. Example: Today is Monday (P), Today is not a Monday (¬ P)
Conjunction P ^ Q
● It joins two statements P, Q with the AND clause.
● Example: Ram is a cricket player (P). Ram is a Hockey player (Q). Ram plays both
cricket and Hockey is represented by (P ^ Q)
● P= Rohan is intelligent, Q= Rohan is hardworking. → P∧ Q.
Disjunction P v Q
● It joins two statements P, Q with OR Clause.
● Example: Ram leaves for Mumbai (P) and Ram leaves for Chennai (Q). Ram leaves for
Chennai or Mumbai is represented by (P v Q). In this complex statement, at any given
point of time if P is True Q is not true and vice versa.
● P= Ritika is a Doctor. Q= Ritika is Engineer, so we can write it as
P ∨ Q.
Implication P → Q

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Sentence (Q) is dependent on sentence (P), and it is called implication. It follows the rule
of If then clause. If sentence P is true, then sentence Q is true. The condition is
unidirectional.
● Example: If it is Sunday (P) then I will go to Movie (Q), and it is represented as P →
Q
● P= It is raining, and Q= Street is wet, so it is represented as P → Q.
Bi-conditional P ⇔ Q
● Sentence (Q) is dependent on sentence (P), and vice versa and conditions are
bi-directional in this connective. If a conditional statement and its converse are true,
then it is called as bi-conditional connective (Implication condition in both the
directions P → Q and Q → P). If and only if all conditions are true, then the end
statement is true.
● P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.

Truth Table

P Q Negation Conjunction Disjunction Implication Bi-conditional

¬P ¬Q P^Q PvQ P→Q P⇔Q

True True False False True True True True

True False False True False True False False

False True True False False True True False

False False True True False False True True

Precedence of connectives:
● Just like arithmetic operators, there is a precedence order for propositional connectors or
logical operators. This order should be followed while evaluating a propositional
problem.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)

Fifth Precedence Implication

Six Precedence Biconditional

What are the disadvantages of Propositional Logic?


● We cannot represent relations like ALL, some, or none with propositional logic.
Example: All the girls are intelligent. ...
● Propositional logic has limited expressive power.
● In propositional logic, we cannot describe statements in terms of their properties or
logical relationships.

Predicate Logic OR First Order Logic


● First-order logic is another way of knowledge representation in artificial intelligence. It is
an extension to propositional logic.
● FOL is sufficiently expressive to represent the natural language statements in a concise
way.
● First-order logic is also known as Predicate logic or First-order predicate logic.
First-order logic is a powerful language that develops information about the objects in a
more easy way and can also express the relationship between those objects.
● First-order logic (like natural language) does not only assume that the world contains
facts like propositional logic but also assumes the following things in the world:
○ Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
○ Relations: It can be unary relation such as: red, round, is adjacent, or n-any
relation such as: the sister of, brother of, has color, comes between
○ Function: Father of, best friend, third inning of, end of, ......
● As a natural language, first-order logic also has two main parts:
○ Syntax
○ Semantics

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Syntax of First-Order logic:


The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in
short-hand notation in FOL.

Basic Elements of First-order logic:


Following are the basic elements of FOL syntax:
Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences:
● Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.
● We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
Chinky is a cat: => cat (Chinky).

Complex Sentences:
● Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
● Subject: Subject is the main part of the statement.
● Predicate: A predicate can be defined as a relation, which binds two atoms together in a
statement.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of
the statement and second part "is an integer," is known as a predicate.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Quantifiers in First-order logic:


● A quantifier is a language element which generates quantification, and quantification
specifies the quantity of specimen in the universe of discourse.
● These are the symbols that permit to determine or identify the range and scope of the
variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).

Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which
resembles an inverted A.

In universal quantifiers we use the implication "→".


If x is a variable, then ∀x is read as:
● For all x
● For each x
● For every x.

Example:
All men drink coffee.
Let a variable x which refers to a man so all x can be represented
∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drinks coffee.

Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.

10

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

It is denoted by the logical operator ∃, which resembles as inverted


E. When it is used with a predicate variable then it is called an
existential quantifier.

In Existential quantifier we always use AND or Conjunction symbol (∧).


If x is a variable, then existential quantifier will be ∃x or ∃(x). And
it will be read as:
● There exists a 'x.'
● For some 'x.'
● For at least one 'x.'

Example:
Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)


It will be read as: There are some x where x is a boy who is intelligent.

Properties of Quantifiers:
● In universal quantifier, ∀x∀y is similar to ∀y∀x.
● In Existential quantifier, ∃x∃y is similar to ∃y∃x.
● ∃x∀y is not similar to ∀y∃x.

Some Examples of FOL using quantifier:
1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly, it will be represented as follows.
∀x bird(x) →fly(x).

2. Every man respects his parents.


In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as
follows:
∀x man(x) → respects (x, parent).

3. Some boys play cricket.


In this question, the predicate is "play(x, y)," where x= boys, and y=
game. Since there are some boys so we will use ∃, and it will be
represented as:

11

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

∃x boys(x) → play(x, cricket).

4. Not all students like both Mathematics and Science.


In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so
following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

5. Only one student failed in Mathematics.


In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following
representation for this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧
student(y) → ¬failed (x, Mathematics)].

Free and Bound Variables:


The quantifiers interact with variables which appear in a suitable way. There are two types of
variables in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of
the quantifier.
Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the
scope of the quantifier.
Example: ∀x [A (x) B( y)], here x and y are the bound variables.

Difference between Propositional Logic and Predicate Logic in AI

Propositional Logic
Predicate Logic

Propositional logic is the logic Predicate logic is an expression


that deals with a collection of consisting of variables with a
declarative statements which specified domain. It consists of
have a truth value, true or objects, relations and functions
false. between the objects.

12

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

It is the basic and most widely It is an extension of propositional


used logic. Also known as logic covering predicates and
Boolean logic. quantification.

A proposition has a specific A predicate’s truth value depends


truth value, either true or false. on the variables’ value.

Predicate logic helps


analyze the scope of the
subject over the
predicate. There are
three quantifiers :
Universal Quantifier
Scope analysis is not done in
(∀) depicts for all,
propositional logic.
Existential Quantifier
(∃) depicting there
exists some and
Uniqueness Quantifier
(∃!) depicting exactly
one.

Propositions are
combined with Logical
Operators or Logical
Connectives like
Negation(¬),
Disjunction(∨), Predicate Logic adds by
introducing quantifiers to the
Conjunction(∧),
existing proposition.
Exclusive OR(⊕),
Implication(⇒),
Bi-Conditional or
Double
Implication(⇔).

It is a more generalized It is a more specialized


representation. representation.

13

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

It cannot deal with sets of It can deal with set of entities


entities. with the help of quantifiers.

Resolution in First Order Logic


● Resolution is a theorem proving technique that proceeds by building refutation proofs,
i.e., proofs by contradictions
● Resolution is used, if various statements are given, and we need to prove a conclusion of
those statements. Unification is a key concept in proofs by resolutions. Resolution is a
single inference rule which can efficiently operate on the conjunctive normal form or
clausal form.
● Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to
be conjunctive normal form or CNF.

Steps for Resolution:


1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).

Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.

Step-1: Conversion of Facts into FOL


In the first step we will convert all the given statements into its first order logic.

14

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step-2: Conversion of FOL into CNF


In First order logic resolution, it is required to convert the FOL into CNF as the CNF form makes
resolution proofs.

● Eliminate all implication (→) and rewrite


a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).

● Move negation (¬)inwards and rewrite


a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).

● Rename variables or standardize variables


a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)

15

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)


d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).

● Eliminate existential instantiation quantifier by elimination.


In this step, we will eliminate existential quantifier ∃, and this
process is known as Skolemization. But in this example problem
since there is no existential quantifier so all the statements
will remain the same in this step.

● Drop Universal quantifiers.


In this step we will drop all universal quantifiers since all the statements are not
implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).

Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be


written in two separate statements.
● Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.

Step-3: Negate the statement to be proved


In this statement, we will apply negation to the conclusion statements, which will be written as
¬likes(John, Peanuts)

Step-4: Draw Resolution graph:


Now in this step, we will solve the problem by a resolution tree using substitution. For the above
problem, it will be given as follows:

16

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Hence the negation of the conclusion has been proved as a complete contradiction with the given
set of statements.

Explanation of Resolution graph:


● In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
● In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
(canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
● In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
● In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolved by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
● In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
Difference between Procedural and Declarative Knowledge

Procedural Knowledge Declarative Knowledge

It is also known as Interpretive It is also known as Descriptive


knowledge. knowledge.

17

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Procedural Knowledge means how a While Declarative Knowledge means


particular thing can be accomplished. basic knowledge about something.

Procedural Knowledge is generally not Declarative Knowledge is more


used means it is not more popular. popular.

Procedural Knowledge can’t be easily Declarative Knowledge can be easily


communicated. communicated.

Procedural Knowledge is generally Declarative Knowledge is data


process oriented in nature. oriented in nature.

In Procedural Knowledge debugging In Declarative Knowledge debugging


and validation is not easy. and validation is easy.

Procedural Knowledge is less effective Declarative Knowledge is more


in competitive programming. effective in competitive programming.

Forward reasoning/Forward Chaining and BAckward Reasoning/Backward Chaining


Inference engine:

● The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known
facts. The first inference engine was part of the expert system. Inference engine
commonly proceeds in two modes, which are:
a. Forward chaining
b. Backward chaining

Forward Chaining
● Forward chaining is also known as a forward deduction or forward reasoning method
when using an inference engine.
● The Forward-chaining algorithm starts from known facts, triggers all rules whose
premises are satisfied, and adds their conclusion to the known facts. This process repeats
until the problem is solved.

Properties of Forward-Chaining:

18

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● It is a down-up approach, as it moves from bottom to top.


● It is a process of making a conclusion based on known facts or data, by starting from the
initial state and reaching the goal state.
● Forward-chaining approach is also called data-driven as we reach the goal using available
data.
● Forward -chaining approach is commonly used in the expert system, such as CLIPS,
business, and production rule systems.

Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an
enemy of America, has some missiles, and all the missiles were sold to it by Robert, who is an
American citizen."
Prove that "Robert is a criminal."

Facts Conversion into FOL:


● It is a crime for an American to sell weapons to hostile nations.
(Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) →
Criminal(p) ...(1)
● Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can
be written in two definite clauses by using Existential
Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
● All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
● Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
● The Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
● Country A is an enemy of America.
Enemy (A, America) .........(7)
● Robert is American
American(Robert). ..........(8)

Forward chaining proof:


Step-1:

19

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

In the first step we will start with the known facts and will choose the sentences which do not
have implications, such as: American(Robert), Enemy(A, America), Owns(A, T1), and
Missile(T1). All these facts will be represented as below.

Step-2:
At the second step, we will see those facts which infer from available facts and with satisfied
premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfies with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers
from the conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from
Rule-(7).

Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we
can add Criminal(Robert) which infers all the available facts. And hence we reached our goal
statement.

20

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Hence it is proved that Robert is Criminal using a forward chaining approach.

Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method when
using an inference engine. A backward chaining algorithm is a form of reasoning, which starts
with the goal and works backward, chaining through rules to find known facts that support the
goal.
Properties of backward chaining:
● It is known as a top-down approach.
● Backward-chaining is based on modus ponens inference rule.
● In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts
true.
● It is called a goal-driven approach, as a list of goals decides which rules are selected and
used.
● Backward -chaining algorithm is used in game theory, automated theorem proving tools,
inference engines, proof assistants, and various AI applications.
● The backward-chaining method mostly used a depth-first search strategy for proof.

Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
● American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) →
Criminal(p) ...(1)
Owns(A, T1) ........(2)
● Missile(T1)
● ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)

21

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Missile(p) → Weapons (p) .......(5)


● Enemy(p, America) →Hostile(p) ........(6)
● Enemy (A, America) .........(7)
● American(Robert). ..........(8)

Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then
infer further rules.

Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and
at last, we will prove those facts true. So our goal fact is "Robert is Criminal," so following is the
predicate of it.

Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can
see in Rule-1, the goal predicate Criminal (Robert) is present with substitution {Robert/P}. So
we will add all the conjunctive facts below the first level and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.

Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it
satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at q.

22

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which
satisfies the Rule- 4, with the substitution of A in place of r. So these two statements are proved
here.

Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6.
And hence all the statements are proved true using backward chaining.

23

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Difference between Forward Chaining and Backward Chaining

Forward Chaining Backward Chaining

Forward chaining starts from known Backward chaining starts from the goal
facts and applies inference rules to and works backward through inference
extract more data unit it reaches to rules to find the required facts that
the goal. support the goal.

It is a bottom-up approach It is a top-down approach

Forward chaining is known as Backward chaining is known as


data-driven inference technique as goal-driven technique as we start from
we reach the goal using the available the goal and divide into sub-goal to
data. extract the facts.

Forward chaining reasoning applies a Backward chaining reasoning applies a


breadth-first search strategy. depth-first search strategy.

24

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Forward chaining tests for all the Backward chaining only tests for few
available rules required rules.

Forward chaining is suitable for the Backward chaining is suitable for


planning, monitoring, control, and diagnostic, prescription, and debugging
interpretation application. applications.

Forward chaining can generate an Backward chaining generates a finite


infinite number of possible number of possible conclusions.
conclusions.

It operates in the forward direction. It operates in the backward direction.

Forward chaining is aimed for any Backward chaining is only aimed for
conclusion. the required data.

25

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 4 Symbolic Reasoning Under


Uncertainty

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What is Nonmonotonic Reasoning?

Monotonic Reasoning:
● In monotonic reasoning, once the conclusion is taken, then it will remain the same even if
we add some other information to existing information in our knowledge base.
● In monotonic reasoning, adding knowledge does not decrease the set of prepositions that
can be derived.
● To solve monotonic problems, we can derive the valid conclusion from the available facts
only, and it will not be affected by new facts.
● Any theorem proving is an example of monotonic reasoning.
Example:
● Earth revolves around the Sun.
● It is a true fact, and it cannot be changed even if we add another sentence in the
knowledge base like, "The moon revolves around the earth" Or "Earth is not round," etc.

Advantages of Monotonic Reasoning:

● In monotonic reasoning, each old proof will always remain valid.


● If we deduce some facts from available facts, then it will always remain valid.

Disadvantages of Monotonic Reasoning:

● We cannot represent real world scenarios using Monotonic reasoning.


● Hypothesis knowledge cannot be expressed with monotonic reasoning, which means
facts should be true.
● Since we can only derive conclusions from the old proofs, new knowledge from the real
world cannot be added.

Non-monotonic Reasoning

● In Non-monotonic reasoning, some conclusions may be invalidated if we add some more


information to our knowledge base.
● Logic will be said as non-monotonic if some conclusions can be invalidated by adding
more knowledge into our knowledge base.
● Non-monotonic reasoning deals with incomplete and uncertain models.
● "Human perceptions for various things in daily life, "is a general example of
non-monotonic reasoning.

Example:
● Birds can fly
● Penguins cannot fly
● Pitty is a bird

● So from the above sentences, we can conclude that Pitty can fly.
● However, if we add one another sentence into the knowledge base "Pitty is a penguin",
which concludes "Pitty cannot fly", it invalidates the above conclusion.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Advantages of Non-monotonic reasoning:


● For real-world systems such as Robot navigation, we can use non-monotonic reasoning.
● In Non-monotonic reasoning, we can choose probabilistic facts or can make assumptions.

Disadvantages of Non-monotonic Reasoning:


● In non-monotonic reasoning, the old facts may be invalidated by adding new sentences.
● It cannot be used for theorem proving.

Difference between monotonic and nonmonotonic reasoning

Monotonic Reasoning Non-Monotonic Reasoning

Monotonic Reasoning is the process


Non-monotonic Reasoning is the process
which does not change its direction or
which changes its direction or values as
can say that it moves in the one
the knowledge base increases.
direction.

Monotonic Reasoning deals with very


Non-monotonic reasoning deals with
specific types of models, which have
incomplete or not known facts.
valid proofs.

The addition of knowledge will invalidate


The addition of knowledge won’t
the previous conclusions and change the
change the result.
result.

In non-monotonic reasoning, results and


In monotonic reasoning, results are
sets of prepositions will increase and
always true, therefore, the set of
decrease based on the condition of added
prepositions will only increase.
knowledge.

Monotonic Reasoning is based on true Non-monotonic Reasoning is based on


facts. assumptions.

Abductive Reasoning and Human


Deductive Reasoning is the type of
Reasoning is a non-monotonic type of
monotonic reasoning.
reasoning.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 5 Probabilistic Reasoning

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Uncertainty:
knowledge representation as first-order logic and propositional logic with certainty, which
means we were sure about the predicates. With this knowledge representation, we might
write A→B, which means if A is true then B is true, but consider a situation where we are
not sure about whether A is true or not then we cannot express this statement, this situation
is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.

Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

Probabilistic reasoning:
● Probabilistic reasoning is a way of knowledge representation where we apply the concept
of probability to indicate the uncertainty in knowledge.
● In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
● We use probability in probabilistic reasoning because it provides a way to handle the
uncertainty that is the result of someone's laziness and ignorance.
● In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which we
can assume that it will happen but are not sure about it, so here we use probabilistic
reasoning.

Need of probabilistic reasoning in AI:


● When there are unpredictable outcomes.
● When specifications or possibilities of predicates become too large to handle.
● When an unknown error occurs during an experiment.

Probability:
● Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

1. P(A) = 0, indicates total uncertainty in an event A.


1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

● P(¬A) = probability of a not happening event.


● P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before observing
new information.
Posterior Probability: The probability that is calculated after all evidence or information has
been taken into account. It is a combination of prior probability and new information.

Conditional probability:
● Conditional probability is a probability of occurring an event when another event has
already happened.
● Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:

● Where P(A⋀B)= Joint probability of a and B


● P(B)= Marginal probability of B.
● If the probability of A is given and we need to find the probability of B, then it will be
given as:

● It can be explained by using the below Venn diagram, where B is


occurred event, so sample space will be reduced to set B, and
now we can only calculate event A when event B is already
occurred by dividing the probability of P(A⋀B) by P( B ).

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Example:
In a class, there are 70% of the students who like English and 40% of the students who like
English and mathematics, and then what is the percentage of students who like English and also
like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event where a student likes English.

Hence, 57% of the students who like English also like Mathematics.

Bayes' theorem:

● Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
● In probability theory, it relates the conditional probability and marginal probabilities of
two random events.
● Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian
statistics.
● It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
● Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.

● Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:

As from product rule we can write:


1. P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:


1. P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

● The above equation (a) is called Bayes' rule or Bayes' theorem. This equation is the basis
of most modern AI systems for probabilistic inference.
● It shows the simple relationship between joint and conditional probabilities. Here, P(A|B)
is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred evidence B.
● P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
● P(A) is called the prior probability, probability of hypothesis before considering the
evidence
● P(B) is called marginal probability, pure probability of an evidence.

● In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:

Where A 1, A 2, A 3,........, A n is a set of mutually exclusive and exhaustive events.

Applying Bayes' rule:


● Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes' rule becomes:

Example-1:
Question: what is the probability that a patient has meningitis with a stiff neck?

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Given Data:
● A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:

● The Known probability that a patient has meningitis disease is 1/30,000.


● The Known probability that a patient has a stiff neck is 2%.

● Let a be the proposition that the patient has a stiff neck and b be the proposition that the
patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.

Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn
face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13


P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:

Application
● It is used to calculate the next step of the robot when the already executed step is given.
● Bayes' theorem is helpful in weather forecasting.
● It can solve the Monty Hall problem.

Certainty Factors in Rule-Based Systems

● The certainty-factor model was one of the most popular models for the representation and
manipulation of uncertain knowledge in the early (1980s) Rule-based expert systems.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● The model was criticized by researchers in artificial intelligence and statistics as being
ad-hoc-in nature. Researchers and developers have stopped using the model.
● Its place has been taken by more expressive formalisms of Bayesian belief networks for
the representation and manipulation of uncertain knowledge.

Rule Based Systems


● A rule is an expression of the form "if A then B" where A is an assertion and B can be
either an action or another assertion.
Example : Troubleshooting of water pumps

● If pump failure then the pressure is low


● If pump failure then check oil level
● If power failure then pump failure

● Rule based system consists of a library of such rules.


● Rules reflect essential relationships within the domain.
● Rules reflect ways to reason about the domain.
● Rules draw conclusions and points to actions, when specific information about the
domain comes in. This is called inference.

The inference is a kind of chain reaction like :

● If there is a power failure then (see rules 1, 2, 3 mentioned above) Rule 3 states that there
is a pump failure, and
● Rule 1 tells that the pressure is low, and
● Rule 2 gives a (useless) recommendation to check the oil level.
● It is very difficult to control such a mixture of inference back and forth in the same
session and resolve such uncertainties.

● How to deal with such uncertainties ?


● How to deal with uncertainties in a rule based system?

● A problem with rule-based systems is that often the connections reflected by the rules are
not absolutely certain (i.e. deterministic), and the gathered information is often subject to
uncertainty.
● In such cases, a certainty measure is added to the premises as well as the conclusions in
the rules of the system.
● A rule then provides a function that describes : how much a change in the certainty of the
premise will change the certainty of the conclusion.
● In its simplest form, this looks like :
● If A (with certainty x) then B (with certainty f(x))

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● This is a new rule, say rule 4, added to earlier three rules.


● There are many schemes for treating uncertainty in rule based systems. The most
common are :
● Adding certainty factors.
● Adoptions of Dempster-Shafer belief functions.
● Inclusion of fuzzy logic.
● In these schemes, uncertainty is treated locally, meaning action is connected directly to
incoming rules and uncertainty of their elements. Example : In addition to rule 4 , in
previous slide, we have the rule
● If C (with certainty x) then B (with certainty g(x))
● Now If the information is that A holds with certainty a and C holds with certainty c, Then
what is the certainty of B ?

Bayesian Belief Network

● Bayesian belief network is key computer technology for dealing with probabilistic events
and to solve a problem which has uncertainty. We can define a Bayesian network as:
● "A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph."
● It is also called a Bayes network, belief network, decision network, or Bayesian model.
● Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
● Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight, reasoning,
time series prediction, and decision making under uncertainty.
● Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
● Directed Acyclic Graph
● Table of conditional probabilities.

● The generalized form of Bayesian network that represents and solves decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Each node corresponds to the random variables, and a variable can be continuous or
discrete.
● Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes in the
graph.
● These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other
● In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
● If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
● Node C is independent of node A.

The Bayesian network has mainly two components:


● Causal Component
● Actual numbers

● Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi)
), which determines the effect of the parent on that node.
● The Bayesian network is based on Joint probability distribution and conditional
probability. So let's first understand the joint probability distribution:

Joint probability distribution:


● If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.
● P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Explanation of Bayesian network:


● Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes. Harry
has two neighbors David and Sophia, who have taken a responsibility to inform Harry at
work when they hear the alarm. David always calls Harry when he hears the alarm, but
sometimes he gets confused with the phone ringing and calls at that time too. On the
other hand, Sophia likes to listen to loud music, so sometimes she misses hearing the
alarm. Here we would like to compute the probability of Burglary Alarm.
Problem:
● Calculate the probability that the alarm has sounded, but there is neither a burglary, nor
an earthquake occurred, and David and Sophia both called Harry.

Solution:
● The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm going off, but David and Sophia's calls depend on alarm
probability.
● The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
● The conditional distributions for each node are given as a conditional probabilities table
or CPT.
● Each row in the CPT must be summed to 1 because all the entries in the table represent
an exhaustive set of cases for the variable.
● In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:


● Burglary (B)
● Earthquake(E)
● Alarm(A)
● David Calls(D)
● Sophia calls(S)

● We can write the events of problem statement in the form of probability: P[D, S, A, B, E],
can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

10

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake did not occur

Conditional probability table for Alarm A:


The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

11

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Conditional probability table for David Calls:


The Conditional probability of David that he will call depends on the probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:


The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).


= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.

● Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.
● The semantics of Bayesian Network:
● There are two ways to understand the semantics of the Bayesian network, which is given
below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence
statements. It is helpful in designing inference procedures.

Dempster-Shafer Theory
This theory was released because of following reason:-

12

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Bayesian theory is only concerned about single evidence.


● Bayesian probability cannot describe ignorance.

● DST is an evidence theory, it combines all possible outcomes of the problem. Hence it is
used to solve problems where there may be a chance that a different evidence will lead to
some different result.

The uncertainty in this model is given by:-


1. Consider all possible outcomes.
2. Belief will lead to believing in some possibility by bringing out some evidence.
3. Plausibility will make evidence compatible with possible outcomes.

For eg:-
● Let us consider a room where four people are present, A, B, C and D. Suddenly the lights
go out and when the lights come back, B has been stabbed in the back by a knife, leading
to his death. No one came into the room and no one left the room. We know that B has
not committed suicide. Now we have to find out who the murderer is.

To solve these there are the following possibilities:

● Either {A} or {C} or {D} has killed him.


● Either {A, C} or {C, D} or {A, C} have killed him.
● Or the three of them have killed him i.e; {A, C, D}
● None of them have killed him {o} (let’s say).

● There will be the possible evidence by which we can find the murderer by measure of
plausibility.
● Using the above example we can say: Set of possible conclusion (P): {p1, p2….pn}
● where P is a set of possible conclusions and cannot be exhaustive, i.e. at least one (p)i
must be true. (p)I must be mutually exclusive.
● The Power Set will contain 2n elements where n is the number of elements in the possible
set.
For eg:-
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 23 elements.

● Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is evidence
for {K or B} which cannot be divided among more specific beliefs for K and B.

13

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Belief in K: The belief in element K of the Power Set is the sum of masses of elements
which are subsets of K. This can be explained through an example

● Let's say K = {a, b, c}


● Bel(K) = m(a) + m(b) + m(c) + m(a, b) + m(a, c) + m(b, c) + m(a, b, c)

● Plausibility in K: It is the sum of masses of a set that intersects with K.

i.e; Pl(K) = m(a) + m(b) + m(c) + m(a, b) + m(b, c) + m(a, c) + m(a, b, c)

Characteristics of Dempster Shafer Theory:


● It will ignore the probability of all events aggregating to 1.
● Ignorance is reduced in this theory by adding more and more evidence.
● Combination rule is used to combine various types of possibilities.

Advantages:
● As we add more information, the uncertainty interval reduces.
● DST has a much lower level of ignorance.
● Diagnose hierarchies can be represented using this.
● Person dealing with such problems is free to think about evidence.

Disadvantages:
● In this, computation effort is high, as we have to deal with 2n of sets.

Fuzzy Logic
● The term Fuzzy means something that is a bit vague. When a situation is vague, the
computer may not be able to produce a result that is True or False. As per Boolean Logic,
the value 1 refers to True and 0 means False. But a Fuzzy Logic algorithm considers all
the uncertainties of a problem, where there may be possible values besides True or False.
● The fuzzy logic in artificial intelligence operates on the levels of possibilities of input to
obtain the definite output. It can be executed in systems with different capabilities and
sizes, varying from tiny microcontrollers to huge, networked, workstation-centered
control systems. Furthermore, it can be executed in software, hardware, or a combination
of both.
● Similar to humans, there are many possible values between True and False that a
computer can incorporate. These can be:

● Certainly yes
● Possibly yes
● Can’t say

14

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Possibly no
● Certainly no
Problem – Is it hot outside?

Boolean Logic
Solution
● Yes (1.0)
● No (0)

According to conventional Boolean Logic, the algorithm will take a definite input and produce a
precise result Yes or No. This is represented by 0 and 1, respectively.

Fuzzy Logic
Solution
● Very hot (0.9)
● Little hot (0.20)
● Moderately hot (0.35)
● Not hot (1.0)
● As per the above example, Fuzzy Logic has a wider range of outputs, such as very hot,
moderately hot and not hot. These values between 0 and 1 display the range of
possibilities.
● So, in cases where accurate reasoning cannot be provided, Fuzzy Logic provides an
acceptable method of reasoning. An algorithm based on Fuzzy Logic takes all available
data while solving a problem. It then takes the best possible decision according to the
given input.

Fuzzy Logic Architecture

Rule base
● This is the set of rules along with the If-Then conditions that are used for making
decisions. But, modern developments in Fuzzy Logic have reduced the number of rules in
the rule base. These sets of rules are also called a knowledge base.

Fuzzification
● This is the step where crisp numbers are converted into fuzzy sets. A crisp set is a set of
elements that have identical properties. Based on certain logic, an element can either
belong to the set or not. Crisp sets are based on binary logic – Yes or No answers.
● Here, the error signals and physical values are converted into a normalized fuzzy subset.
In any Fuzzy Logic system, the fuzzifier separates the input signals into five states that
are:
● Large positive

15

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Medium positive
● Small
● Medium negative
● Large negative
● The fuzzification process converts crisp inputs, such as room temperature, fetched by
sensors and passes them to the control system for further processing. A Fuzzy Logic
control system is based on Fuzzy Logic. Common household appliances, such as
air-conditioners and washing machines have Fuzzy Control systems within them.

Inference Engine
● The inference engine determines how much the input values and the rules match. The
rules are applied based on the input values received. Then, the rules are used to develop
control actions. The inference engine and the knowledge base together are called a
controller in a Fuzzy Logic system.

Defuzzification
● This is the inverse process of fuzzification. Here, the fuzzy values are converted into
crisp values by mapping. There will be several defuzzification methods for doing this, but
the best one is selected as per the input. This is a complicated process where methods,
such as the maximum membership principle, weighted average method and centroid
method, are used.

Advantages
● It is a robust system where no precise inputs are required
● These systems are able to accommodate several types of inputs including vague, distorted
or imprecise data
● In case the feedback sensor stops working, you can reprogram it according to the
situation
● The Fuzzy Logic algorithms can be coded using less data, so they do not occupy a huge
memory space
● As it resembles human reasoning, these systems are able to solve complex problems
where ambiguous inputs are available and take decisions accordingly
● These systems are flexible and the rules can be modified
● The systems have a simple structure and can be constructed easily
● You can save system costs as inexpensive sensors can be accommodated by these systems
● It is easily understandable.
● It efficiently solves complex problems by enhancing its capability to accomplish
human-like decision-making and reasoning tasks.
● It deals with uncertainties in engineering.
● The fuzzy logic’s flexibility makes it easier to adapt FLS by simply adding or deleting
rules.con

16

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Disadvantages
● The accuracy of these systems is compromised as the system mostly works on inaccurate
data and inputs
● There is no single systematic approach to solve a problem using Fuzzy Logic. As a result,
many solutions arise for a particular problem, leading to confusion
● Due to inaccuracy in results, they are not always widely accepted
● A major drawback of Fuzzy Logic control systems is that they are completely dependent
on human knowledge and expertise
● You have to regularly update the rules of a Fuzzy Logic control system
● These systems cannot recognize machine learning or neural networks
● The systems require a lot of testing for validation and verification

Applications of Fuzzy Logic


The applications of Fuzzy Logic are spread across several fields. They are as follows:

Medicine
● Controlling arterial pressure when providing anesthesia to patients
● Used in diagnostic radiology and diagnostic support systems
● Diagnosis of prostate cancer and diabetes

Transportation systems
● Handling underground train operations
● Controlling train schedules
● Braking and stopping vehicles based on parameters, such as car speed, acceleration and
wheel speed

Defense
● Locating and recognizing targets underwater
● Supports naval decision making
● Using thermal infrared images for target recognition
● Used for controlling hypervelocity interceptors

Industry
● Controlling water purification plants
● Handling problems in constraint satisfaction in structural design
● Pattern analysis for quality assurance
● Fuzzy Logic is used for tackling sludge wastewater treatment

Naval control
● Steer ships properly
● Selecting the optimal or best possible routes for reaching a destination
● Autopilot is based on Fuzzy Logic
● Autonomous underwater vehicles are controlled using Fuzzy Logic

17

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 6 Game Playing

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

MiniMax Search Procedure


● Mini-max algorithm is a recursive or backtracking algorithm which is used in
decision-making and game theory. It provides an optimal move for the player assuming
that the opponent is also playing optimally.
● The Mini-Max algorithm uses recursion to search through the game-tree.
● The Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various two-players games. This Algorithm computes the minimax
decision for the current state.
● In this algorithm two players play the game, one is called MAX and other is called MIN.
● Both the players fight it as the opponent player gets the minimum benefit while they get
the maximum benefit.
● Both Players of the game are opponents of each other, where MAX will select the
maximized value and MIN will select the minimized value.
● The minimax algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.
● The minimax algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.

Pseudo-code for MinMax Algorithm:


function minimax(node, depth, maximizingPlayer) is
if depth ==0 or node is a terminal node then
return static evaluation of node

if MaximizingPlayer then // for Maximizer Player


maxEva= -infinity
for each child of node do
eva= minimax(child, depth-1, false)
maxEva= max(maxEva,eva) //gives Maximum of the values
return maxEva

else // for Minimizer player


minEva= +infinity
for each child of node do
eva= minimax(child, depth-1, true)
minEva= min(minEva, eva) //gives minimum of the values
return minEva
Initial call:
Minimax(node, 3, true)

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Working of Min-Max Algorithm:


● The working of the minimax algorithm can be easily described using an example. Below
we have taken an example of game-tree which is representing the two-player game.
● In this example, there are two players one is called Maximizer and other is called
Minimizer.
● Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score.
● This algorithm applies DFS, so in this game-tree, we have to go all the way through the
leaves to reach the terminal nodes.
● At the terminal node, the terminal values are given so we will compare those values and
backtrack the tree until the initial state occurs. Following are the main steps involved in
solving the two-player game tree:

Step-1: In the first step, the algorithm generates the entire game-tree and applies the utility
function to get the utility values for the terminal states. In the below tree diagram, let's take A as
the initial state of the tree. Suppose maximizer takes the first turn which has worst-case initial
value =- infinity, and minimizer will take next turn which has worst-case initial value = +infinity.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 2: Now, first we find the utility value for the Maximizer, its
initial value is -∞, so we will compare each value in terminal state
with the initial value of the Maximizer and determine the higher
nodes values. It will find the maximum among them all.
● For node D max(-1,- -∞) => max(-1,4)= 4
● For Node E max(2, -∞) => max(2, 6)= 6
● For Node F max(-3, -∞) => max(-3,-5) = -3
● For node G max(0, -∞) = max(0, 7) = 7

Step 3: In the next step, it's a turn for minimizer, so it will compare
all nodes value with +∞, and will find the 3rd layer node values.
● For node B= min(4,6) = 4
● For node C= min (-3, 7) = -3

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes values
and find the maximum value for the root node. In this game tree, there are only 4 layers, hence
we reach immediately to the root node, but in real games, there will be more than 4 layers.
● For node A max(4, -3)= 4

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

That was the complete workflow of the minimax two player game.

Properties of Minimax algorithm:


● Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist), in
the finite search tree.
● Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
● Time complexity- As it performs DFS for the game-tree, so the time complexity of
Min-Max algorithm is O(bm), where b is the branching factor of the game-tree, and m is
the maximum depth of the tree.
● Space Complexity- Space complexity of Minimax algorithm is also similar to DFS which
is O(bm).

Limitation of the minimax Algorithm:


● The main drawback of the minimax algorithm is that it gets really slow for complex
games such as Chess, go, etc. This type of game has a huge branching factor, and the
player has lots of choices to decide. This limitation of the minimax algorithm can be
improved from alpha-beta pruning which we have discussed in the next topic.

Alpha-Beta Cut-offs/Alpha-Beta Pruning


● Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization
technique for the minimax algorithm.
● As we have seen in the minimax search algorithm, the number of game states it has to
examine are exponential in depth of the tree. Since we cannot eliminate the exponent, but
we can cut it to half. Hence there is a technique by which without checking each node of
the game tree we can compute the correct minimax decision, and this technique is called
pruning. This involves two threshold parameters Alpha and Beta for future expansion, so
it is called alpha-beta pruning. It is also called the Alpha-Beta Algorithm.
● Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prunes the tree leaves but also entire sub-tree.

● The two-parameter can be defined as:


Alpha: The best (highest-value) choice we have found so far
at any point along the path of Maximizer. The initial value
of alpha is -∞.
Beta: The best (lowest-value) choice we have found so far at
any point along the path of Minimizer. The initial value of
beta is +∞.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the
final decision but making the algorithm slow. Hence by pruning these nodes, it makes the
algorithm fast.

Condition for Alpha-beta pruning:


The main condition which required for alpha-beta pruning is:
1. α>=β

Key points about alpha-beta pruning:


● The Max player will only update the value of alpha.
● The Min player will only update the value of beta.
● While backtracking the tree, the node values will be passed to upper nodes instead of
values of alpha and beta.
● We will only pass the alpha, beta values to the child nodes.

Pseudo-code for Alpha-beta Pruning:


function minimax(node, depth, alpha, beta, maximizingPlayer) is
if depth ==0 or node is a terminal node then
return static evaluation of node

if MaximizingPlayer then // for Maximizer Player


maxEva= -infinity
for each child of node do
eva= minimax(child, depth-1, alpha, beta, False)
maxEva= max(maxEva, eva)
alpha= max(alpha, maxEva)
if beta<=alpha
break
return maxEva

else // for Minimizer player


minEva= +infinity
for each child of node do
eva= minimax(child, depth-1, alpha, beta, true)
minEva= min(minEva, eva)
beta= min(beta, eva)
if beta<=alpha
break

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

return minEva

Working of Alpha-Beta Pruning:


Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and
node value will also be 3.

Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

In the next step, the algorithm traverses the next successor of Node B which is node E, and
the values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will
change. The current value of alpha will be compared with 5, so max
(-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not traverse it, and
the value at node E will be 5.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 5: At next step, algorithm again backtrack the tree, from node B
to node A. At node A, the value of alpha will be changed the maximum
available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now
passes to right successor of A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α remains 3,
but the node value of F will become 1.

10

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and
again it satisfies the condition α>=β, so the next child of C which is G will be pruned, and
the algorithm will not compute the entire subtree G.

11

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following
is the final game tree which shows the nodes which are computed and nodes which have never
computed. Hence the optimal value for the maximizer is 3 for this example.

12

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Move Ordering in Alpha-Beta pruning:


The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is
examined. Move order is an important aspect of alpha-beta pruning.
It can be of two types:
● Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as a minimax algorithm. In this case, it also
consumes more time because of alpha-beta factors, such a move of pruning is called
worst ordering. In this case, the best move occurs on the right side of the tree. The time
complexity for such an order is O(bm).
● Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS,
hence it first searches left of the tree and goes deep twice as a minimax algorithm in the
same amount of time. Complexity in ideal ordering is O(bm/2).

Rules to find good ordering:


● Occur the best move from the shallowest node.
● Order the nodes in the tree such that the best nodes are checked first.
● Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
● We can bookkeep the states, as there is a possibility that states may repeat.

Iterative deepening search


● Iterative deepening depth-first search is a hybrid algorithm emerging out of BFS and
DFS.
● The iterative deepening algorithm is a combination of DFS and BFS algorithms. This
search algorithm finds out the best depth limit and does it by gradually increasing the
limit until a goal is found.
● This algorithm performs depth-first search up to a certain "depth limit", and it keeps
increasing the depth limit after each iteration until the goal node is found.
● This Search algorithm combines the benefits of Breadth-first search's fast search and
depth-first search's memory efficiency.
● The iterative search algorithm is useful for uninformed search when the search space is
large, and depth of the goal node is unknown.

Advantages:
● It Combines the benefits of BFS and DFS search algorithms in terms of fast search and
memory efficiency.
● IDDFS gives us the hope to find the solution if it exists in the tree.

13

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● When the solutions are found at the lower depths say n, then the algorithm proves to be
efficient and in time.
● The great advantage of IDDFS is found in-game tree searching where the IDDFS search
operation tries to improve the depth definition, heuristics, and scores of searching nodes
so as to enable efficiency in the search algorithm.
● Another major advantage of the IDDFS algorithm is its quick responsiveness. The early
results indications are a plus point in this algorithm. This is followed up with multiple
refinements after the individual iteration is completed.

Disadvantages:
● The main drawback of IDDFS is that it repeats all the work of the previous phase.
● The time taken is exponential to reach the goal node.
● The main problem with IDDFS is the time and wasted calculations that take place at each
depth.
● The situation is not as bad as we may think, especially when the branching factor is found
to be high.
● The IDDFS might fail when the BFS fails. When we are to find multiple answers from
the IDDFS, it gives back the success nodes and its path once even if it needs to be found
again after multiple iterations. To stop the depth bound is not increased further.

Example:
● Following tree structure shows the iterative deepening depth-first search. The IDDFS
algorithm performs various iterations until it does not find the goal node. The iteration
performed by the algorithm is given as:


● 1'st Iteration-----> A

14

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● 2'nd Iteration----> A, B, C
● 3'rd Iteration------>A, B, D, E, C, F, G
● 4'th Iteration------>A, B, D, H, I, E, C, F, K, G
● In the fourth iteration, the algorithm will find the goal node.

Completeness:
● This algorithm is complete if the branching factor is finite.

Time Complexity:
● Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).

Space Complexity:
● The space complexity of IDDFS will be O(bd).

Optimal:
● The IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of
the node.

15

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 7 Planning

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What is Planning
● Planning in artificial intelligence is about decision-making actions performed by robots
or computer programs to achieve a specific goal.
● Execution of the plan is about choosing a sequence of tasks with a high probability of
accomplishing a specific task.

Blocks World Problem


Problem Statement : There is a table on which some blocks are placed. Some blocks may or
may not be stacked on other blocks. We have a robot arm to pick up or put down the blocks. The
robot arm can move only one block at a time, and no other block should be stacked on top of the
block which is to be moved by the robot arm.

Goal : To change the configuration of the blocks from the Initial State to the Goal State

Predicates can be thought of as a statement which helps us convey the information about a
configuration in Blocks World.
1. ON(A,B) : Block A is on B
2. ONTABLE(A) : A is on table
3. CLEAR(A) : Nothing is on top of A
4. HOLDING(A) : Arm is holding A.
5. ARMEMPTY : Arm is holding nothing

Using these predicates, we represent the Initial State and the Goal State

Initial State — ON(B,A) ∧ ONTABLE(A) ∧ ONTABLE(C) ∧ ONTABLE(D) ∧ CLEAR(B)


∧ CLEAR(C) ∧ CLEAR(D) ∧ ARMEMPTY

Goal State — ON(C,A) ∧ ON(B,D) ∧ ONTABLE(A) ∧ ONTABLE(D) ∧ CLEAR(B) ∧


CLEAR(C) ∧ ARMEMPTY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Operations performed by the robot arm


The Robot Arm can perform 4 operations:
1. STACK(X,Y) : Stacking Block X on Block Y
2. UNSTACK(X,Y) : Picking up Block X which is on top of Block Y
3. PICKUP(X) : Picking up Block X which is on top of the table
4. PUTDOWN(X) : Put Block X on the table
All the four operations have certain preconditions which need to be satisfied to perform the
same. These preconditions are represented in the form of predicates.

The effect of these operations is represented using two lists ADD and DELETE. DELETE List
contains the predicates which will cease to be true once the operation is performed. ADD List on
the other hand contains the predicates which will become true once the operation is performed.

The Precondition

For example, to perform the STACK(X,Y) operation i.e. to Stack Block X on top of Block Y, No
other block should be on top of Y (CLEAR(Y)) and the Robot Arm should be holding the Block
X (HOLDING(X)).

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Once the operation is performed, these predicates will cease to be true, thus they are included in
the DELETE List as well.

On the other hand, once the operation is performed, The robot arm will be free (ARMEMPTY)
and block X will be on top of Y (ON(X,Y)).

Solution :
Steps = [PICKUP(C), PUTDOWN(C), UNSTACK(B,A), PUTDOWN(B), PICKUP(C),
STACK(C,A), PICKUP(B), STACK(B,D)]

Components of a planning system

The plan includes the following important steps:


● Choose the best rule to apply the next rule based on the best available guess.
● Apply the chosen rule to calculate the new problem condition.
● Find out when a solution has been found.
● Detect dead ends so they can be discarded and direct system effort in more useful
directions.
● Find out when a near-perfect solution is found.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Hierarchical Planning
● Hierarchical Planning is an Artificial Intelligence (AI) problem solving approach for a
certain kind of planning problems -- the kind focusing on problem decomposition, where
problems are step-wise refined into smaller and smaller ones until the problem is finally
solved.
● A solution hereby is a sequence of actions that's executable in a given initial state (and a
refinement of the initial compound tasks that needed to be refined). This form of
hierarchical planning is usually referred to as Hierarchical Task Network (HTN) planning
● Example : One level planner
● Planning for ”Going to Goa this Cristmas”

Switch on computer
Start web browser
Open Indian Railways website
Select date
Select class
Select train
so on
● Practical problems are too complex to be solved at one level
● Here we use Hierarchy in Planning
● Hierarchy of actions
● In terms of major action or minor action
● Lower level activities would detail more precise steps for accomplishing the higher level
tasks.
Example by Hierarchy Planning

Planning for ”Going to Goa this Cristmas”

Major Steps :
Hotel Booking
Ticket Booking
Reaching Goa
Staying and enjoying there
Coming Back
Minor Steps :
Take a taxi to reach station / airport
Have dinner on beach

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Take photos

● It will Reduces the size of search space


● Instead of having to try out a large number of possible plan ordering, plan hierarchies
limit the ways in which an agent can select and order its primitive operators
● If the entire plan has to be synthesized at the level of most detailed actions, it would be
impossibly long.
● Postpone attempts to solve mere details, until major steps are in place.
● Higher level plans may run into difficulties at a lower level, causing the need to return to
higher level again to produce appropriately ordered sequences.

Reactive Systems
● Reactive machines are the most basic type of unsupervised AI. This means that they
cannot form memories or use past experiences to influence present-made decisions; they
can only react to currently existing situations – hence “reactive.”
● Reactive machines have no concept of the world and therefore cannot function beyond
the simple tasks for which they are programmed. A characteristic of reactive machines is
that no matter the time or place, these machines will always behave the way they were
programmed. There is no growth with reactive machines, only stagnation in recurring
actions and behaviors.
● Example of reactive machines
● An existing form of a reactive machine is Deep Blue, a chess-playing supercomputer
created by IBM in the mid-1980s.
● Deep Blue was created to play chess against a human competitor with the intent to defeat
the competitor. It was programmed to identify a chess board and its pieces while
understanding and predicting the moves of each piece. In a series of matches played
between 1996 and 1997, Deep Blue defeated Russian chess grandmaster Garry Kasparov
3½ to 2½ games, becoming the first computerized device to defeat a human opponent.
● Deep Blue’s unique skill of accurately and successfully playing chess matches
highlighted its reactive abilities. In the same vein, its reactive mind also indicates that it
has no concept of the past or future; it only comprehends and acts on the
presently-existing world and its components within it. To simplify, reactive machines are
programmed for the here and now, but not the before and after.
● Limited memory

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Limited memory consists of supervised AI systems that derive knowledge from


experimental data or real-life events. Unlike reactive machines, limited memory learns
from the past by observing actions or data fed to them to create a good-fit model.
● Although limited memory builds on observational data in conjunction with
pre-programmed data the machines already contain, these sample pieces of information
are fleeting. An existing form of limited memory is autonomous vehicles.
● Example of limited memory
● Autonomous vehicles or self-driving cars work on a combination principle of
observational knowledge and computer vision. To observe how to properly drive among
human-dependent vehicles, self-driving cars segment their environment, detect patterns
or changes in external factors, and adjust.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 8 Natural Language Processing

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Introduction to Natural Language Processing


● What is NLP?
● Natural Language Processing (NLP) refers to an AI method of communicating with
intelligent systems using a natural language such as English.
● It is the technology that is used by machines to understand, analyze, manipulate, and
interpret human's languages. It helps developers to organize knowledge for performing
tasks such as translation, automatic summarization, Named Entity Recognition (NER),
speech recognition, relationship extraction, and topic segmentation.
● Processing of Natural Language is required when you want an intelligent system like a
robot to perform as per your instructions, when you want to hear decisions from a
dialogue based clinical expert system, etc.
Components of NLP
Natural Language Understanding (NLU)
● Mapping the given input in natural language into useful representations.
● Analyzing different aspects of the language.

Natural Language Generation (NLG)


It is the process of producing meaningful phrases and sentences in the form of natural language
from some internal representation.
It involves −
● Text planning − It includes retrieving the relevant
content from the knowledge base.
● Sentence planning − It includes choosing required words,
forming meaningful phrases, setting tone of the sentence.
● Text Realization − It is mapping sentence plan into
sentence structure.
The NLU is harder than NLG.
Difficulties in NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −
● Lexical ambiguity − It is at a very primitive level such as
word-level.
For example, treating the word “board” as a noun or verb?

● Syntax Level ambiguity − A sentence can be parsed in


different ways.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● For example, “He lifted the beetle with red cap.” − Did he
use cap to lift the beetle or he lifted a beetle that had red
cap?

● Referential ambiguity − Referring to something using


pronouns. For example, Rima went to Gauri. She said, “I am
tired.” − Exactly who is tired?
● One input can mean different meanings.
● Many inputs can mean the same thing.

NLP Terminology
● Phonology − It is the study of organizing sound
systematically.
● Morphology − It is a study of construction of words from
primitive meaningful units.
● Morpheme − It is a primitive unit of meaning in a language.
● Syntax − It refers to arranging words to make a sentence.
It also involves determining the structural role of words
in the sentence and in phrases.
● Semantics − It is concerned with the meaning of words and
how to combine words into meaningful phrases and
sentences.
● Pragmatics − It deals with using and understanding
sentences in different situations and how the
interpretation of the sentence is affected.
● Discourse − It deals with how the immediately preceding
sentence can affect the interpretation of the next
sentence.
● World Knowledge − It includes general knowledge about the
world.

Steps in NLP
● Lexical Analysis − It involves identifying and analyzing
the structure of words. Lexicon of a language means the
collection of words and phrases in a language. Lexical

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

analysis is dividing the whole chunk of txt into


paragraphs, sentences, and words.
● Syntactic Analysis (Parsing) − It involves analysis of
words in the sentence for grammar and arranging words in a
manner that shows the relationship among the words. The
sentence such as “The school goes to boy” is rejected by
English syntactic analyzer.

● Semantic Analysis − It draws the exact meaning or the


dictionary meaning from the text. The text is checked for
meaningfulness. It is done by mapping syntactic structures
and objects in the task domain. The semantic analyzer
disregards sentence such as “hot ice-cream”.
● Discourse Integration − The meaning of any sentence
depends upon the meaning of the sentence just before it. In
addition, it also brings about the meaning of immediately
succeeding sentences.
● Pragmatic Analysis − During this, what was said is
re-interpreted on what it actually meant. It involves
deriving those aspects of language which require real
world knowledge.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Implementation Aspects of Syntactic Analysis


There are a number of algorithms researchers have developed for
syntactic analysis, but we consider only the following simple
methods −
● Context-Free Grammar
● Top-Down Parser

Context-Free Grammar
● It is the grammar that consists rules with a single symbol on
the left-hand side of the rewrite rules. Let us create grammar
to parse a sentence −
● “The bird pecks the grains”
● Articles (DET) − a | an | the
● Nouns − bird | birds | grain | grains
● Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
● = DET N | DET ADJ N
● Verbs − pecks | pecking | pecked
● Verb Phrase (VP) − NP V | V NP
● Adjectives (ADJ) − beautiful | small | chirping
● The parse tree breaks down the sentence into structured parts so that the computer can
easily understand and process it. In order for the parsing algorithm to construct this parse
tree, a set of rewrite rules, which describe what tree structures are legal, need to be
constructed.
● These rules say that a certain symbol may be expanded in the
tree by a sequence of other symbols. According to first order
logic rule, if there are two strings Noun Phrase (NP) and Verb
Phrase (VP), then the string combined by NP followed by VP is a
sentence. The rewrite rules for the sentence are as follows −
● S → NP VP
● NP → DET N | DET ADJ N
● VP → V NP
● Lexocon −
● DET → a | the
● ADJ → beautiful | perching
● N → bird | birds | grain | grains
● V → peck | pecks | pecking
● The parse tree can be created as shown −

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Now consider the above rewrite rules. Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the
subject-verb agreement error is approved as correct.
● Merit − The simplest style of grammar, therefore widely used.
Demerits −
● They are not highly precise. For example, “The grains peck the bird”, is
syntactically correct according to the parser, but even if it makes no sense, the
parser takes it as a correct sentence.
● To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural
variations, passive sentences, etc., which can lead to creation of huge set of rules
that are unmanageable.

Top-Down Parser
● Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence until it
consists entirely of terminal symbols.
● These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is
found which describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
● It is inefficient, as the search process has to be repeated if an error occurs.
● Slow speed of working.

Semantic Analysis
● Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to
understand the meaning of Natural Language
Parts of Semantic Analysis
Semantic Analysis of Natural Language can be classified into two broad parts:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the meaning
of each word of the text individually. It basically refers to fetching the dictionary meaning that a
word in the text is deputed to carry.
2. Compositional Semantics Analysis: Although knowing the meaning of each word of the text
is essential, it is not sufficient to completely understand the meaning of the text.
Tasks involved in Semantic Analysis
In order to understand the meaning of a sentence, the following are the major processes involved
in Semantic Analysis:
1. Word Sense Disambiguation
2. Relationship Extraction

Word Sense Disambiguation:


● In Natural Language, the meaning of a word may vary as per its usage in sentences and
the context of the text. Word Sense Disambiguation involves interpreting the meaning of
a word based upon the context of its occurrence in a text.
● For example, the word ‘Bark’ may mean ‘the sound made by a dog’ or ‘the outermost
layer of a tree.’
● Likewise, the word ‘rock’ may mean ‘a stone‘ or ‘a genre of music‘ – hence, the accurate
meaning of the word is highly dependent upon its context and usage in the text.
● Thus, the ability of a machine to overcome the ambiguity involved in identifying the
meaning of a word based on its usage and context is called Word Sense Disambiguation.

Relationship Extraction:

● Another important task involved in Semantic Analysis is Relationship Extracting. It


involves firstly identifying various entities present in the sentence and then extracting the
relationships between those entities.
Elements of Semantic Analysis
Some of the critical elements of Semantic Analysis that must be scrutinized and taken into
account while processing Natural Language are:
● Hyponymy: Hyponymys refers to a term that is an instance of a generic term. They
can be understood by taking class-object as an analogy. For example: ‘Color‘ is a
hypernym while ‘grey‘, ‘blue‘, ‘red‘, etc, are its hyponyms.

● Homonymy: Homonymy refers to two or more lexical terms with the same spellings
but completely distinct in meaning. For example: ‘Rose‘ might mean ‘the past form
of rise‘ or ‘a flower‘, – same spelling but different meanings; hence, ‘rose‘ is a
homonymy.

● Synonymy: When two or more lexical terms that might be spelt distinctly have the
same or similar meaning, they are called Synonymy. For example: (Job, Occupation),
(Large, Big), (Stop, Halt).

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Antonymy: Antonymy refers to a pair of lexical terms that have contrasting


meanings – they are symmetric to a semantic axis. For example: (Day, Night), (Hot,
Cold), (Large, Small).

● Polysemy: Polysemy refers to lexical terms that have the same spelling but multiple
closely related meanings. It differs from homonymy because the meanings of the
terms need not be closely related in the case of homonymy. For example: ‘man‘ may
mean ‘the human species‘ or ‘a male human‘ or ‘an adult male human‘ – since all
these different meanings bear a close association, the lexical term ‘man‘ is a
polysemy.

● Meronomy: Meronomy refers to a relationship wherein one lexical term is a


constituent of some larger entity. For example: ‘Wheel‘ is a meronym of
‘Automobile‘

Basic Units of Semantic System:


In order to accomplish Meaning Representation in Semantic Analysis, it is vital to understand the
building units of such representations.
1. Entity: An entity refers to a particular unit or individual in specific such as a person
or a location. For example GeeksforGeeks, Delhi, etc.
2. Concept: A Concept may be understood as a generalization of entities. It refers to a
broad class of individual units. For example Learning Portals, City, Students.
3. Relations: Relations help establish relationships between various entities and
concepts. For example: ‘GeeksforGeeks is a Learning Portal’, ‘Delhi is a City.’, etc.
4. Predicate: Predicates represent the verb structures of the sentences.

Approaches to Meaning Representations:


1. First-order predicate logic (FOPL)
2. Semantic Nets
3. Frames
4. Conceptual dependency (CD)
5. Rule-based architecture
6. Case Grammar
7. Conceptual Graphs

Semantic Analysis Techniques


Based upon the end goal one is trying to accomplish, Semantic Analysis can be used in various
ways. Two of the most common Semantic Analysis techniques are:
Text Classification
In-Text Classification, our aim is to label the text according to the insights we intend to gain
from the textual data.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

For example:
● In Sentiment Analysis, we try to label the text with the prominent emotion they
convey. It is highly beneficial when analyzing customer reviews for improvement.
● In Topic Classification, we try to categorize our text into some predefined categories.
For example: Identifying whether a research paper is of Physics, Chemistry or Maths
● In Intent Classification, we try to determine the intent behind a text message. For
example: Identifying whether an e-mail received at customer care service is a query,
complaint or request.

Text Extraction
In-Text Extraction, we aim at obtaining specific information from our text.
For Example,
● In Keyword Extraction, we try to obtain the essential words that define the entire
document.
● In Entity Extraction, we try to obtain all the entities involved in a document.

Spell Checking
● A spell checker is an application, program or a function of a program which determines
the correctness of the spelling of a given word based on the language set being used. It
can either be a standalone program or part of a larger program which operates on blocks
of text such as a word processor, search engine or an email client.
● A spell checker is also known as spell check.

The spell checking process is:


● Scan blocks of text and extract individual words.
● Compare each extracted word to known words contained in a dictionary file of correctly
spelled words, which may also contain punctuation and grammatical rules.
● Morphologic algorithms might also be applied for handling alternative forms of words
used in different grammatical scenarios.
● Mark the words with incorrect spelling and offer the correct spelling to the user. Some
spell checkers change the incorrect words automatically if the setting is activated.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 9 Connectionist Models

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What are Connectionist Models?


● It is an approach to artificial intelligence (AI) that developed out of attempts to
understand how the human brain works at the neural level and, in particular, how people
learn and remember. (For that reason, this approach is sometimes referred to as
neuronlike computing.)

Hopfield Network
● Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative processes. It has just one
layer of neurons relating to the size of the input and output, which must be the same
● When such a network recognizes, for example, digits, we present a list of correctly
rendered digits to the network. Subsequently, the network can transform a noise input to
the related perfect output.
● A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is
symmetric wij = wji .
● With zero self-connectivity, Wii =0 is given below. Here, the given three neurons having
values i = 1, 2, 3 with values Xi=±1 have connectivity weight Wij.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Updating rule:

● Consider N neurons = 1, … , N with values Xi = +1, -1.


● The update rule is applied to the node i is given by:
● If hi ≥ 0 then xi → 1 otherwise xi → -1

● Where hi = is called field at i, with b£ R a bias.


● Thus, xi → sgn(hi), where the value of sgn(r)=1, if r ≥ 0, and the value of sgn(r)=-1,
if r < 0.
● We need to put bi=0 so that it makes no difference in training the network with random
patterns.

● We, therefore, consider hi= .


● We have two different approaches to update the nodes:

Synchronously:
In this approach, the update of all the nodes taking place simultaneously at each time.

Asynchronously:
In this approach, at each point of time, update one node chosen randomly or according to some
rule. Asynchronous updating is more biologically realistic.

Hopfield Network as a Dynamical system:


● Consider, K = {-1, 1} N so that each state x £ X is given by xi £ { -1,1 } for 1 ≤ I ≤
N
● Here, we get 2N possible states or configurations of the network.
● We can describe a metric on X by using the Hamming distance between any two states:
● P(x, y) = # {i: xi≠yi}
● N Here, P is a metric with 0≤H(x,y)≤ N. It is clearly symmetric
and reflexive.
● With any of the asynchronous or synchronous updating rules, we get a discrete-time
dynamical system.
● The updating rule up: X → X describes a map.
● And Up: X → X is trivially continuous.

Example:

Suppose we have only two neurons: N = 2

There are two non-trivial choices for connectivities:

w12 = w21 = 1

w12= w21 = -1

Asynchronous updating:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

In the first case, there are two attracting fixed points termed as [-1,-1] and [-1,-1]. All orbit
converges to one of these. For a second, the fixed points are [-1,1] and [1,-1], and all orbits are
joined through one of these. For any fixed point, swapping all the signs gives another fixed point.
Synchronous updating:
In the first and second cases, although there are fixed points, none can be attracted to nearby
points, i.e., they are not attracting fixed points. Some orbits oscillate forever.

Energy function evaluation:

● Hopfield networks have an energy function that diminishes or is unchanged with


asynchronous updating.
● For a given state X ∈ {−1, 1} N of the network and for any set of
association weights Wij with Wij = wji and wii =0 let,

● Here, we need to update Xm to X'm and denote the new energy by E' and show that.
● E'-E = (Xm-X'm ) ∑i≠mWmiXi.
● Using the above equation, if Xm = Xm' then we have E' = E
● If Xm = -1 and Xm' = 1 , then Xm - Xm' = 2 and hm= ∑iWmiXi ? 0
● Thus, E' - E ≤ 0
● Similarly if Xm =1 and Xm'= -1 then Xm - Xm' = 2 and hm= ∑iWmiXi < 0
● Thus, E - E' < 0.

Neurons pull in or push away from each other:

● Suppose the connection weight Wij = Wji between two neurons I and j.
● If Wij > 0, the updating rule implies:
● If Xj = 1, then the contribution of j in the weighted sum, i.e., WijXj, is positive. Thus the
value of Xi is pulled by j towards its value Xj= 1
● If Xj= -1 then WijXj , is negative, and Xi is again pulled by j towards its value Xj = -1
● Thus, if Wij > 0 , then the value of i is pulled by the value of j. By symmetry, the value of
j is also pulled by the value of i.
● If Wij < 0, then the value of i is pushed away by the value of j.
● It follows that for a particular set of values Xi ∈ { -1 , 1 } for;
● 1 ≤ i ≤ N, the selection of weights taken as Wij = XiXj for;
● 1 ≤ i ≤ N correlates to the Hebbian rule.

Training the network: One pattern (Ki=0)

● Suppose the vector x→ = (x1,…,xi,…,xN) ∈ {-1,1}N is a pattern that we like to store in


the Hopfield network.
● To build a Hopfield network that recognizes x→, we need to select connection weight Wij
accordingly.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● If we select Wij =ɳ XiXj for 1 ≤ i , j ≤ N (Here, i≠j), where ɳ > 0 is the learning rate,
then the value of Xi will not change under updating condition as we illustrate below.
● We have


● It implies that the value of Xi, whether 1 or -1 will not change, so that x→ is a fixed point.
● Note that - x→ also becomes a fixed point when we train the network with x→ validating
that Hopfield networks are sign blind.

Neural Network
● A neural network is a method in artificial intelligence that teaches computers to process
data in a way that is inspired by the human brain.
● It is a type of machine learning process, called deep learning, that uses interconnected
nodes or neurons in a layered structure that resembles the human brain.
● It creates an adaptive system that computers use to learn from their mistakes and improve
continuously. Thus, artificial neural networks attempt to solve complicated problems, like
summarizing documents or recognizing faces, with greater accuracy.

Applications of Neural Network


Computer vision

Computer vision is the ability of computers to extract information and insights from images and
videos. With neural networks, computers can distinguish and recognize images similar to
humans. Computer vision has several applications, such as the following:

● Visual recognition in self-driving cars so they can recognize road signs and other road
users
● Content moderation to automatically remove unsafe or inappropriate content from image
and video archives
● Facial recognition to identify faces and recognize attributes like open eyes, glasses, and
facial hair
● Image labeling to identify brand logos, clothing, safety gear, and other image details

Speech recognition

Neural networks can analyze human speech despite varying speech patterns, pitch, tone,
language, and accent. Virtual assistants like Amazon Alexa and automatic transcription software
use speech recognition to do tasks like these:

● Assist call center agents and automatically classify calls


● Convert clinical conversations into documentation in real time
● Accurately subtitle videos and meeting recordings for wider content reach

Natural language processing

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Natural language processing (NLP) is the ability to process natural, human-created text. Neural
networks help computers gather insights and meaning from text data and documents. NLP has
several use cases, including in these functions:

● Automated virtual agents and chatbots


● Automatic organization and classification of written data
● Business intelligence analysis of long-form documents like emails and forms
● Indexing of key phrases that indicate sentiment, like positive and negative comments on
social media
Simple neural network architecture
A basic neural network has interconnected artificial neurons in three layers:

Input Layer
Information from the outside world enters the artificial neural network from the input layer. Input
nodes process the data, analyze or categorize it, and pass it on to the next layer.

Hidden Layer
Hidden layers take their input from the input layer or other hidden layers. Artificial neural
networks can have a large number of hidden layers. Each hidden layer analyzes the output from
the previous layer, processes it further, and passes it on to the next layer.

Output Layer
The output layer gives the final result of all the data processing by the artificial neural network. It
can have single or multiple nodes. For instance, if we have a binary (yes/no) classification
problem, the output layer will have one output node, which will give the result as 1 or 0.
However, if we have a multi-class classification problem, the output layer might consist of more
than one output node.

Recurrent Networks
● Recurrent Neural Network(RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the
inputs and outputs are independent of each other, but in cases like when it is required to
predict the next word of a sentence, the previous words are required and hence there is a
need to remember the previous words.
● Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
The main and most important feature of RNN is the Hidden state, which remembers some
information about a sequence.
● RNN has a memory which remembers all information about what has been calculated. It
uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output. This reduces the complexity of parameters, unlike
other neural networks.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Working of RNN
● Example: Suppose there is a deeper network with one input layer, three hidden layers,
and one output layer. Then like other neural networks, each hidden layer will have its
own set of weights and biases, let’s say, for hidden layer 1 the weights and biases are (w1,
b1), (w2, b2) for the second hidden layer, and (w3, b3) for the third hidden layer. This
means that each of these layers is independent of the other, i.e. they do not memorize the
previous outputs.
Now the RNN will do the following:
● RNN converts the independent activations into dependent activations by providing
the same weights and biases to all the layers, thus reducing the complexity of
increasing parameters and memorizing each previous output by giving each output as
input to the next hidden layer.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Hence these three layers can be joined together such that the weights and bias of all
the hidden layers are the same, in a single recurrent layer.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● The formula for calculating current state:


where:
ht -> current state
ht-1 -> previous state
xt -> input state
● Formula for applying Activation function(tanh):

where:
whh -> weight at recurrent neuron
wxh -> weight at input neuron

● The formula for calculating output:


Yt -> output
Why -> weight at output layer
Training through RNN
1. A single-time step of the input is provided to the network.
2. Then calculate its current state using a set of current input and the previous state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the information
from all the previous states.
5. Once all the time steps are completed the final current state is used to calculate the
output.
6. The output is then compared to the actual output i.e the target output and the error is
generated.
7. The error is then back-propagated to the network to update the weights and hence the
network (RNN) is trained.

Advantages of Recurrent Neural Network


1. An RNN remembers each and every piece of information through time. It is useful in
time series prediction only because of the feature to remember previous inputs as
well. This is called Long Short Term Memory.
2. Recurrent neural networks are even used with convolutional layers to extend the
effective pixel neighborhood.
Disadvantages of Recurrent Neural Network
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation function.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Applications of Recurrent Neural Network


1. Language Modelling and Generating Text
2. Speech Recognition
3. Machine Translation
4. Image Recognition, Face detection
5. Time series Forecasting

Distributed Representations

What is Symbolic AI?


● Symbolic AI is more commonly known as rule-based AI, good old-fashioned AI
(GOFA), and classic AI. Earlier AI development research was based on Symbolic AI
which relied on inserting human behavior and knowledge in the form of computer codes.
● We humans have used symbols to drive meaning from things and events in the
environment around us. For example, imagine you told your friend to buy you a bottle of
Coke. Your friend would first have an image of a bottle of coke in his mind. This is the
very idea behind the symbolic AI development, that these symbols become the building
block for cognition.
● Any application made with Symbolic AI has a combination of characters signifying
real-world concepts or entities through a series of symbols. These symbols can easily be
arranged through networks and lists or arranged hierarchically. Such arrangements tell the
AI algorithms how each symbol is related to each other in totality.
● Information in Symbolic AI is processed through something that is called an expert
system. It is where the if/then pairing directs the algorithm to the parameters on which it
can behave. These expert systems are man-made knowledge bases. The inference engine
is a term given to a component that refers to the knowledge base and selects rules to
apply to given symbols.

Pros and Cons of Symbolic AI


● Symbolic AI is well suited for applications that are based on crystal clear rules and goals.
If you want this AI to beat a human in the game of chess then we need to teach the
algorithm the specifics of chess. This framework acts like a boundary that helps it operate
properly.
● Symbolic AI falls short when it is required to encounter variations. Taking an example of
machine vision, which might look at a product from all the possible angles. It would be
tedious and time-consuming to create rules for all the possible combinations. The real
world contains huge amounts of data and numerous variations. It is difficult to anticipate
all the possible alterations in a given environment.

What is a Connectionist AI?


● Earlier experts focused on the symbolic type AI for many decades however, the
Connectionist AI is more popular now. This AI is based on how a human mind functions
and its neural interconnections. This technique of AI software development is also
sometimes called a perceptron to signify a single neuron.
● An application built with Connectionist AI tends to get more intelligent as we keep on
feeding data and learning patterns and relations associated with the environment and with

10

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

itself. On the other hand, symbolic AI gets hand-coded. To understand connectionist AI


let’s take the example of an artificial neural network. Each one is made up of hundreds of
single units processing elements and artificial neurons. They are a layered format with
weights forming connections with this structure where weights are adjustable parameters.
● In Connectionist AI all the processing elements have weighted units, output, and a
transfer function. However, it is to keep in mind that the transfer function assesses
multiple inputs and then it combines them into a single output value. Each weight in the
algorithm efficiently evaluates directionality and importance and eventually the weighted
sum is the component that activates the neuron. When all is done then the activated signal
passes through the transfer function and produces one output.

Pros and cons of Connectionist AI


● When you have high-quality training data Connectionist AI is a good option to be fed
with that data. Even though this AI model gets smarter as data is fed into it, it still needs
the support of accurate information to start the whole learning process. Connectionist AI
is quite famous in the healthcare industry. It is most commonly used when there is a heap
of medical images that are required to be verified by humans for correctness and assign
annotations for contexts.
● With all the pros, this AI often cannot explain how it reached a solution. Thus it is
advised not to select this AI as the primary or the sole choice as the conclusions drawn by
it cannot be explained and would require the help of a third party. For example, using
connectionist AI to decide if a person is a murderer or not. People are bound to consider
it unjust and cruel to rely on AI, which does not explain how it reached the conclusion
that it did.

11

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 10 Expert Systems

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What is an Expert System?


● An expert system is a computer program that is designed to solve complex problems and
to provide decision-making ability like a human expert. It performs this by extracting
knowledge from its knowledge base using the reasoning and inference rules according to
the user queries.
● The system helps in decision making for complex problems using both facts and
heuristics like a human expert. It is called so because it contains the expert knowledge of
a specific domain and can solve any complex problem of that particular domain. These
systems are designed for a specific domain, such as medicine, science, etc.
● The performance of an expert system is based on the expert's knowledge stored in its
knowledge base. The more knowledge stored in the KB, the more that system improves
its performance. One of the common examples of an ES is a suggestion of spelling errors
while typing in the Google search box.

Knowledge Base (Representing and Using Domain Knowledge)


● Expert system is built around a knowledge base module.
● Expert system contains a formal representation of the information provided by the
domain expert. This information may be in the form of problem-solving rules,
procedures, or data intrinsic to the domain. To incorporate these information into the
system, it is necessary to make use of one or more knowledge representation methods.
Some of these methods are described here.

● Transferring knowledge from the human expert to a computer is often the most difficult
part of building an expert system.

● The knowledge acquired from the human expert must be encoded in such a way that it
remains a faithful representation of what the expert knows, and it can be manipulated by
a computer.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Three common methods of knowledge representation evolved over the years are
IF-THEN rules, Semantic networks and Frames.

1. IF-THEN rules

Human experts usually tend to think along :

condition ⇒ action or Situation ⇒ conclusion

Rules "if-then" are predominant form of encoding knowledge in expert systems. These are of
the form :
If a1 , a2 , . . . . . , an
Then b1 , b2 , . . . . . , bn where
each ai is a condition or situation, and
each bi is an action or a conclusion.

2. Semantic Networks

● In this scheme, knowledge is represented in terms of objects and relationships between


objects.
● The objects are denoted as nodes of a graph. The relationship between two objects are
denoted as a link between the corresponding two nodes.
● The most common form of semantic networks uses the links between nodes to represent
IS-A and HAS relationships between objects.

Example of Semantic Network


The Fig. below shows a car IS-A vehicle; a vehicle HAS wheels.
● This kind of relationship establishes an inheritance hy hiera in the network, with the
objects lower down in the network inheriting properties from the objects higher up.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

3. Frames
● In this technique, knowledge is decomposed into highly modular pieces called frames,
which are generalized record structures. Knowledge consist of concepts, situations,
attributes of concepts, relationships between concepts, and procedures to handle
relationships as well as attribute values.
● Each concept may be represented as a separate frame.
● The attributes, the relationships between concepts, and the procedures are allotted to
slots in a frame.
● The contents of a slot may be of any data type - numbers, strings, functions or procedures
and so on.
● The frames may be linked to other frames, providing the same kind of inheritance as that
provided by a semantic network.
● A frame-based representation is ideally suited for object-oriented programming
techniques. An example of Frame-based representation of knowledge is shown in the
next slide.

Example : Frame-based Representation of Knowledge.


Two frames, their slots and the slots filled with data type are shown.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Working Memory
● Working memory refers to task-specific data for a problem. The contents of the working
memory changes with each problem situation. Consequently, it is the most dynamic
component of an expert system, assuming that it is kept current.

● Every problem in a domain has some unique data associated with it.
● Data may consist of the set of conditions leading to the problem, its parameters and so on.
● Data specific to the problem needs to be input by the user at the time of using, which
means consulting the expert system. The Working memory is related to user interface

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Expert System Shells


● An Expert system shell is a software development environment. It contains the basic
components of expert systems. A shell is associated with a prescribed method for
building applications by configuring and instantiating these components.

Shell components and description


● The generic components of a shell : the knowledge acquisition, the knowledge Base, the
reasoning, the explanation and the user interface are shown below. The knowledge base
and reasoning engine are the core components.

● Knowledge Base
A store of factual and heuristic knowledge. Expert system tool provides one or more knowledge
representation schemes for expressing knowledge about the application domain. Some tools use
both Frames (objects) and IF-THEN rules. In PROLOG the knowledge is represented as logical
statements.

● Reasoning Engine
Inference mechanisms for manipulating the symbolic information and knowledge in the
knowledge base form a line of reasoning in solving a problem. The inference mechanism can
range from simple modus backward chaining of IF-THEN rules to Case-Based reasoning.

● Knowledge Acquisition subsystem


A subsystem to help experts in building knowledge bases. However, collecting knowledge,
needed to solve problems and build the knowledge base, is the biggest bottleneck in building
expert systems.

● Explanation subsystem
A subsystem that explains the system's actions. The explanation can range from how the final or
intermediate solutions arrived at justifying the need for additional data.

● User Interface
A means of communication with the user. The user interface is generally not a part of the expert
system technology. It was not given much attention in the past. However, the user interface can
make a critical difference in the perceived utility of an Expert system

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

UNIT 11 Genetic Algorithms

PREPARED BY
PROF. VISHVA UPADHYAY

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

What are Genetic Algorithms?


● A genetic algorithm is used to solve complicated problems with a greater number of
variables & possible outcomes/solutions.
● Population: Population is the subset of all possible or probable solutions, which can
solve the given problem.
● Chromosomes: A chromosome is one of the solutions in the population for the given
problem, and the collection of gene generate a chromosome.
● Gene: A chromosome is divided into a different gene, or it is an element of the
chromosome.
● Allele: Allele is the value provided to the gene within a particular chromosome.
● Fitness Function: The fitness function is used to determine the individual's fitness level
in the population. It means the ability of an individual to compete with other individuals.
In every iteration, individuals are evaluated based on their fitness function.
● Genetic Operators: In a genetic algorithm, the best individual mate to regenerate
offspring better than parents. Here genetic operators play a role in changing the genetic
composition of the next generation.
● Selection : After calculating the fitness of every existent in the population, a selection
process is used to determine which of the individualities in the population will get to
reproduce and produce the seed that will form the coming generation.

Types of selection styles available

● Roulette wheel selection


● Event selection
● Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization
problems. It is a subset of evolutionary algorithms, which is used in computing. A genetic
algorithm uses genetic and natural selection concepts to solve optimization problems.

How Does Genetic Algorithm Work?

● The genetic algorithm works on the evolutionary generational cycle to generate


high-quality solutions. These algorithms use different operations that either enhance or
replace the population to give an improved fit solution.
● It basically involves five phases to solve the complex optimization problems, which are
given as below:
● Initialization
● Fitness Assignment
● Selection
● Reproduction

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Termination

1. Initialization

● The process of a genetic algorithm starts by generating the set of individuals, which is
called population. Here each individual is the solution for the given problem. An
individual contains or is characterized by a set of parameters called Genes. Genes are
combined into a string and generate chromosomes, which is the solution to the problem.
One of the most popular techniques for initialization is the use of random binary strings.

2. Fitness Assignment

● Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.

3. Selection

● The selection phase involves the selection of individuals for the reproduction of
offspring. All the selected individuals are then arranged in a pair of two to increase
reproduction. Then these individuals transfer their genes to the next generation.

There are three types of Selection methods available, which are:

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Roulette wheel selection


● Tournament selection
● Rank-based selection

4. Reproduction

● After the selection process, the creation of a child occurs in the reproduction step. In this
step, the genetic algorithm uses two variation operators that are applied to the parent
population. The two operators involved in the reproduction phase are given below:
● Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the
genes. Then the crossover operator swaps genetic information of two parents from the
current generation to produce a new individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
crossover. Types of crossover styles available:
○ One point crossover
○ Two-point crossover
○ Livery crossover
○ Inheritable Algorithms crossover
● Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
○ Flip bit mutation
○ Gaussian mutation

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

○ Exchange/Swap mutation

5. Termination

● After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the
final solution as the best solution in the population.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

Advantages of Genetic Algorithm


● The parallel capabilities of genetic algorithms are best.
● It helps in optimizing various problems such as discrete functions, multi-objective
problems, and continuous functions.
● It provides a solution for a problem that improves over time.
● A genetic algorithm does not need derivative information.

Limitations of Genetic Algorithms

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● Genetic algorithms are not efficient algorithms for solving simple problems.
● It does not guarantee the quality of the final solution to a problem.
● Repetitive calculation of fitness values may generate some computational challenges.

Difference between Genetic Algorithms and Traditional Algorithms


● A search space is the set of all possible solutions to the problem. In the traditional
algorithm, only one set of solutions is maintained, whereas, in a genetic algorithm,
several sets of solutions in search space can be used.
● Traditional algorithms need more information in order to perform a search, whereas
genetic algorithms need only one objective function to calculate the fitness of an
individual.
● Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work
parallelly (calculating the fitness of the individualities are independent).
● One big difference in genetic Algorithms is that rather of operating directly on seeker
results, inheritable algorithms operate on their representations (or rendering), frequently
appertained to as chromosomes.
● One of the big differences between traditional algorithm and genetic algorithm is that it
does not directly operate on candidate solutions.
● Traditional Algorithms can only generate one result in the end, whereas Genetic
Algorithms can generate multiple optimal results from different generations.
● The traditional algorithm is not more likely to generate optimal results, whereas Genetic
algorithms do not guarantee to generate optimal global results, but also there is a great
possibility of getting the optimal result for a problem as it uses genetic operators such as
Crossover and Mutation.
● Traditional algorithms are deterministic in nature, whereas Genetic algorithms are
probabilistic and stochastic in nature.

Genetic Operators
● Genetic operators are used to create and maintain genetic diversity (mutation operator),
combine existing solutions (also known as chromosomes) into new solutions (crossover)
and select between solutions (selection).

Selection

● Selection operators give preference to better solutions (chromosomes), allowing them to


pass on their 'genes' to the next generation of the algorithm. The best solutions are
determined using some form of objective function (also known as a 'fitness function' in
genetic algorithms), before being passed to the crossover operator. Different methods for
choosing the best solutions exist, for example, fitness proportionate selection and
tournament selection; different methods may choose different solutions as being 'best'.
The selection operator may also simply pass the best solutions from the current

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

generation directly to the next generation without being mutated; this is known as elitism
or elitist selection.

Crossover

● Crossover is the process of taking more than one parent solution (chromosomes) and
producing a child solution from them. By recombining portions of good solutions, the
genetic algorithm is more likely to create a better solution. As with selection, there are a
number of different methods for combining the parent solutions, including the edge
recombination operator (ERO) and the 'cut and splice crossover' and 'uniform crossover'
methods. The crossover method is often chosen to closely match the chromosome's
representation of the solution; this may become particularly important when variables are
grouped together as building blocks, which might be disrupted by a non-respectful
crossover operator. Similarly, crossover methods may be particularly suited to certain
problems; the ERO is generally considered a good option for solving the traveling
salesman problem.

Mutation

● The mutation operator encourages genetic diversity amongst solutions and attempts to
prevent the genetic algorithm converging to a local minimum by stopping the solutions
becoming too close to one another. In mutating the current pool of solutions, a given
solution may change entirely from the previous solution. By mutating the solutions, a
genetic algorithm can reach an improved solution solely through the mutation operator.[1]
Again, different methods of mutation may be used; these range from a simple bit
mutation (flipping random bits in a binary string chromosome with some low probability)
to more complex mutation methods, which may replace genes in the solution with
random values chosen from the uniform distribution or the Gaussian distribution. As with
the crossover operator, the mutation method is usually chosen to match the representation
of the solution within the chromosome.

Termination Parameters
● The termination condition of a Genetic Algorithm is important in determining when a GA
run will end. It has been observed that initially, the GA progresses very fast with better
solutions coming in every few iterations, but this tends to saturate in the later stages
where the improvements are very small. We usually want a termination condition such
that our solution is close to the optimal, at the end of the run.
● Usually, we keep one of the following termination conditions −
○ When there has been no improvement in the population for X iterations.
○ When we reach an absolute number of generations.
○ When the objective function value has reached a certain predefined value.
● For example, in a genetic algorithm we keep a counter which keeps track of the
generations for which there has been no improvement in the population. Initially, we set
this counter to zero. Each time we don’t generate off-springs which are better than the
individuals in the population, we increment the counter.

Downloaded by Dimpal Dalwadi ([email protected])


lOMoARcPSD|29109960

AHMEDABAD INSTITUTE OF TECHNOLOGY AI(3170716)

● However, if the fitness of any of the off-springs is better, then we reset the counter to
zero. The algorithm terminates when the counter reaches a predetermined value.
● Like other parameters of a GA, the termination condition is also highly problem specific
and the GA designer should try out various options to see what suits his particular
problem the best.

Downloaded by Dimpal Dalwadi ([email protected])

You might also like