0% found this document useful (0 votes)
20 views79 pages

Artifical Intelegent (Print)

The document discusses several topics related to artificial intelligence including cognitive computing, rational agents, hill climbing search algorithms, and dynamic environments. It provides definitions and examples for each topic. Specifically, it defines cognitive computing as focusing on mimicking human behavior and reasoning to solve complex problems. It defines a rational agent as anything that makes decisions by considering past and current inputs to achieve the best outcome. It discusses problems that can occur with hill climbing search algorithms such as getting stuck at local maxima rather than finding global maxima. It also defines concepts like dynamic versus static environments and fully observable versus partially observable environments.

Uploaded by

prashantabim2075
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views79 pages

Artifical Intelegent (Print)

The document discusses several topics related to artificial intelligence including cognitive computing, rational agents, hill climbing search algorithms, and dynamic environments. It provides definitions and examples for each topic. Specifically, it defines cognitive computing as focusing on mimicking human behavior and reasoning to solve complex problems. It defines a rational agent as anything that makes decisions by considering past and current inputs to achieve the best outcome. It discusses problems that can occur with hill climbing search algorithms such as getting stuck at local maxima rather than finding global maxima. It also defines concepts like dynamic versus static environments and fully observable versus partially observable environments.

Uploaded by

prashantabim2075
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

What does cognitive approach resemble in AI?

Cognitive Computing focuses on mimicking human behavior and reasoning to


solve complex problems.

what is rational agent ?


A rational agent could be anything that makes decisions, as a person, firm,
machine, or software. It carries out an action with the best outcome after
considering past and current percepts(agent's perceptual inputs at a given
instance). An AI system is composed of an agent and its environment.

What are the problems in hill climbing search?


A hill-climbing algorithm is a local search algorithm that moves continuously
upward (increasing) until the best solution is attained. This algorithm comes to
an end when the peak is reached. Hill climbing algorithm is a technique which is
used for optimizing the mathematical problems. Hill Climbing is mostly used
when a good heuristic is available.
A hill-climbing algorithm has four main features:
It employs a greedy approach: This means that it moves in a direction in which
the cost function is optimized. The greedy approach enables the algorithm to
establish local maxima or minima.
No Backtracking: A hill-climbing algorithm only works on the current state and
succeeding states (future). It does not look at the previous states.
Feedback mechanism: The algorithm has a feedback mechanism that helps it
decide on the direction of movement (whether up or down the hill). The
feedback mechanism is enhanced through the generate-and-test technique.
Incremental change: The algorithm improves the current solution by
incremental changes.

Local maximum: A local maximum is a solution that surpasses other


neighboring solutions or states but is not the best possible solution.
Global maximum: This is the best possible solution achieved by the algorithm.
Current state: This is the existing or present state.
Flat local maximum: This is a flat region where the neighboring solutions attain
the same value.
Shoulder: This is a plateau whose edge is stretching upwards.
Problems with hill climbing
Local maximum
At this point, the neighboring states have lower values than the current state.
The greedy approach feature will not move the algorithm to a worse off state.
This will lead to the hill-climbing process’s termination, even though this is not
the best possible solution.

2. Plateau: A plateau is the flat area of the search space in which all the neighbor
states of the current state contains the same value, because of this algorithm
does not find any best direction to move. A hill-climbing search might be lost in
the plateau area.

Ridge
The hill-climbing algorithm may terminate itself when it reaches a ridge. This is
because the peak of the ridge is followed by downward movement rather than
upward movement.

Differentiate omniscience and artificial intelligence.


God’s omniscience refers to God’s complete knowledge of all things actual and
potential. Even AI cannot possibly do things beyond the comprehension of an
omniscient. AI is based on algorithms – a set of pre-defined rules created by
humans to help the machines think and learn.
List the parameters of NPL.
What is meant by natural language processing?
Natural language processing (NLP) refers to the branch of computer
science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can.

Write modus ponens rule.

What do you mean by dynamic environment?


Dynamic vs Static
An environment that keeps constantly changing itself when the agent is up with
some action is said to be dynamic.
A roller coaster ride is dynamic as it is set in motion and the environment keeps
changing every instant.
An idle environment with no change in its state is called a static environment.
An empty house is static as there’s no change in the surroundings when an agent
enters.

Fully observable vs Partially Observable:


sQIf an agent sensor can sense or access the complete state of an environment at
each point of time then it is a fully observable environment, else it is partially
observable.
A fully observable environment is easy as there is no need to maintain the
internal state to keep track history of the world.
An agent with no sensors in all environments then such an environment is called
as unobservable.
Chess – the board is fully observable, and so are the opponent’s moves.
Driving – the environment is partially observable because what’s around the
corner is not known.

2. Deterministic vs Stochastic:
If an agent's current state and selected action can completely determine the next
state of the environment, then such environment is called a deterministic
environment.
A stochastic environment is random in nature and cannot be determined
completely by an agent.
In a deterministic, fully observable environment, agent does not need to worry
about uncertainty.
Examples:
Chess – there would be only a few possible moves for a coin at the current state
and these moves can be determined.
Self-Driving Cars- the actions of a self-driving car are not unique, it varies time
to time.
. Competitive vs Collaborative
An agent is said to be in a competitive environment when it competes against
another agent to optimize the output.
The game of chess is competitive as the agents compete with each other to win
the game which is the output.
An agent is said to be in a collaborative environment when multiple agents
cooperate to produce the desired output.
When multiple self-driving cars are found on the roads, they cooperate with
each other to avoid collisions and reach their destination which is the output
desired.
Single-agent vs Multi-agent
An environment consisting of only one agent is said to be a single-agent
environment.
A person left alone in a maze is an example of the single-agent system.
An environment involving more than one agent is a multi-agent environment.
The game of football is multi-agent as it involves 11 players in each team.
Known vs Unknown
In a known environment, the output for all probable actions is given.
Obviously, in case of unknown environment, for an agent to make a decision, it
has to gain knowledge about how the environment works.
Episodic vs Sequential
In an Episodic task environment, each of the agent’s actions is divided into
atomic incidents or episodes. There is no dependency between current and
previous incidents. In each incident, an agent receives input from the
environment and then performs the corresponding action.
Example: Consider an example of Pick and Place robot, which is used to detect
defective parts from the conveyor belts. Here, every time robot(agent) will
make the decision on the current part i.e. there is no dependency between
current and previous decisions.
In a Sequential environment, the previous decisions can affect all future
decisions. The next action of the agent depends on what action he has taken
previously and what action he is supposed to take in the future.
Example:
Checkers- Where the previous move can affect all the following moves.

What is the use of sensor?


A sensor is a device that detects and responds to some type of input from the
physical environment. The input can be light, heat, motion, moisture, pressure
or any number of other environmental phenomena.

Define Disjunctive Normal Form (DNF).


A statement form which consist of disjunction between conjunction is called
DNF..
Example (P ^Q) v R ….here (P ^Q) and R are the conjunction and (v) is
disjunction .

Define Conjunction Normal Form (CNF).


A statement form which consist of conjunction between disjunction is called
CNF..
Example : P ^Q

what is the theme of monkey banana problem ?


what is AI ?
Artificial intelligence is the simulation of human intelligence processes by
machines, especially computer systems. Specific applications of AI
include expert systems, natural language processing, speech recognition
and machine vision.

How is performance of search algorithm is measures ?


Completeness: A search algorithm can be said to be complete if it provides a
solution for a given input when there exists at least one solution for this input.
Optimality: Search algorithms are also characterized by optimal solutions. These are
the best solutions given by the search algorithms at the lowest path cost.
Time Complexity: These algorithms have a maximum time needed to accomplish a
task or provide a solution. The time taken is usually based on the complexity of the
task.
Space Complexity: They have a maximum memory or storage space needed when
conducting a search operation. This memory is also based on the complexity of the
task.
Define omniscience ?(mathi xa)

What is cryptarithmetic ?
A cryptarithmetic puzzle is a mathematical exercise where the digits of some
numbers are represented by letters (or symbols). Each letter represents a unique
digit.
SEND 9567
MORE 1085
-------------- -------------------
MONEY 10652

What is the limitation of propositional logic over a predicate logic ?


Propositional logic has limited expressive power. In propositional logic, we cannot
describe statements in terms of their properties or logical relationships.
State modus ponens ? with example (mathi xa )

What is CNF ? with example ? (mathi xa )

What is Genetic algorithm ?


Genetic Algorithm (GA) is a search-based optimization technique based on the
principles of Genetics and Natural Selection. It is frequently used to find optimal or
near-optimal solutions to difficult problems which otherwise would take a lifetime
to solve. It is frequently used to solve optimization problems, in research, and in
machine learning.
Before understanding the Genetic algorithm, let's first understand basic
terminologies to better understand this algorithm:
Population: Population is the subset of all possible or probable solutions, which
can solve the given problem.

Chromosomes: A chromosome is one of the solutions in the population for the


given problem, and the collection of gene generate a chromosome.

Gene: A chromosome is divided into a different gene, or it is an element of the


chromosome.

Allele: Allele is the value provided to the gene within a particular chromosome.

Fitness Function: The fitness function is used to determine the individual's


fitness level in the population. It means the ability of an individual to compete
with other individuals. In every iteration, individuals are evaluated based on
their fitness function.

Genetic Operators: In a genetic algorithm, the best individual mate to


regenerate offspring better than parents. Here genetic operators play a role in
changing the genetic composition of the next generation.

Selection
After calculating the fitness of every existent in the population, a selection
process is used to determine which of the individualities in the population will
get to reproduce and produce the seed that will form the coming generation.

How Genetic Algorithm Work?


Five phases are considered in a genetic algorithm.
1. Initial population
2. Fitness function
3. Selection
4. Crossover
5. Mutation

Initial Population
The process begins with a set of individuals which is called a Population. Each
individual is a solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes.
Genes are joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string,
in terms of an alphabet. Usually, binary values are used (string of 1s and 0s). We say
that we encode the genes in a chromosome.
Fitness Function
The fitness function determines how fit an individual is (the ability of an individual to
compete with other individuals). It gives a fitness score to each individual. The
probability that an individual will be selected for reproduction is based on its fitness
score.
Selection
The idea of selection phase is to select the fittest individuals and let them pass their
genes to the next generation.
Two pairs of individuals (parents) are selected based on their fitness scores.
Individuals with high fitness have more chance to be selected for reproduction.
Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of
parents to be mated, a crossover point is chosen at random from within the genes.
For example, consider the crossover point to be 3 as shown below.

Offspring are created by


exchanging the genes of parents
among themselves until the
crossover point is reached.

Exchanging genes among parents


The new offspring are added to the
population.
New offspring

Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in
the population. It can be
done by flipping some
bits in the
chromosomes.

Termination
The algorithm terminates after the threshold fitness solution is reached. It will
identify the final solution as the best solution in the population.

• Genotype − Genotype is the


population in the computation space. In
the computation space, the solutions are
represented in a way which can be easily
understood and manipulated using a
computing system.
• Phenotype − Phenotype is the
population in the actual real world
solution space in which solutions are
represented in a way they are
represented in real world situations.
• Decoding and Encoding − For
simple problems, the phenotype and
genotype spaces are the same.
However, in most of the cases, the
phenotype and genotype spaces are
different. Decoding is a process of
transforming a solution from the
genotype to the phenotype space, while
encoding is a process of transforming
from the phenotype to genotype space.
Decoding should be fast as it is carried out repeatedly in a GA during the
fitness value calculation.

Define rule base export system ?


A rule-based expert system is the simplest form of artificial intelligence and uses
prescribed knowledge-based rules to solve a problem.

What do you mean by machine translation in nlp ?


Machine translation is the process of using artificial intelligence to automatically
translate text from one language to another without human involvement.

when machine is termed intellegent in turning test ?


If a machine can engage in a conversation with a human without being detected as
a machine, it has demonstrated human intelligence.

Define agent function ?


The agent function is a mathematical function that maps a sequence of perceptions
into action.

Why pragmatic analysis is necessary in NLP?


Pragmatic analysis allows you to analyze what the given text basically means. The
aim is to draw inferences from the given text. Sentimental analysis is one of the
fields of study of pragmatic analysis and aims to reveal the emotions in the given
text.

In what type of situation fuzzy logic is used ?


Fuzzy logic has been used in numerous applications such as facial pattern
recognition, air conditioners, washing machines, vacuum cleaners, antiskid braking
systems, transmission systems, control of subway systems and unmanned
helicopters, knowledge-based systems

What is unsupervised learning ?


As the name suggests, unsupervised learning is a machine learning technique in
which models are not supervised using training dataset. Instead, models itself find
the hidden patterns and insights from the given data. It can be compared to learning
which takes place in the human brain while learning new things.
write any two conflict resolution strategies in production system in ai?

What is Skolemization in artificial intelligence?


Skolemization is a transformation on first-order logic formulae, which removes all
existential quantifiers from a formula.

What is meant by admissible heuristic?


An admissible heuristic is used to estimate the cost of reaching the goal state in an
informed search algorithm. In order for a heuristic to be admissible to the search
problem, the estimated cost must always be lower than or equal to the actual cost
of reaching the goal state.

what is alpha beta pruning ?


Alpha–beta pruning is a search algorithm that seeks to decrease the number of
nodes that are evaluated by the minimax algorithm in its search tree.

What is learning agent ?


A learning agent in AI is the type of agent which can learn from its past experiences,
or it has learning capabilities. It starts to act with basic knowledge and then able to
act and adapt automatically through learning.
what is alpha beta pruning ?
Alpha–beta pruning is a search algorithm that seeks to decrease the number of
nodes that are evaluated by the minimax algorithm in its search tree.
What is inference engine (rule of engine )?
The inference engine is known as the brain of the expert system as it is the main
processing unit of the system. It applies inference rules to the knowledge base to
derive a conclusion or deduce new information. It helps in deriving an error-free
solution of queries asked by the user.
With the help of an inference engine, the system extracts the knowledge from the
knowledge base.

What is logical consequence ?


Logical consequence is a fundamental concept in logic, which describes the
relationship between statements that hold true when one statement logically
follows from one or more statements.

In which situation fuzzy logic can be used (mathi xa )?

What is export system shell ?


Expert system shells are toolkits that can be used to develop expert systems. They
consist of some built expert system components with an empty knowledge base.
With suitable example write a cross over operater in genetic algorithm ?(mathi xa
cross over ko definition dini )

What is pragmatic analysis in NLP ?


It deals with using and understanding sentences in different situations and how
the interpretation of the sentence is affected.

What is DENDRAL ?
The software program Dendral is considered the first expert system because it
automated the decision-making process and problem-solving behavior of organic
chemists.

Define ommiscience? (mathi xa )


What is sequential environment (mathi xa )

Why semantic analysis is important?


It allows computers to understand and interpret sentences, paragraphs, or whole
documents, by analyzing their grammatical structure, and identifying relationships
between individual words in a particular context.

What does production rule consist of ?


Production rule consist of set of rule and sequence of step .

What is crisp set and fuzzy logic ?


Crisp set defines the value is either 0 or 1..
Fuzzy set defines the value between 0 and 1 including both 0 and 1
It is also called a classical set(crisp set)
It specifies the degree to which something is true.

Human experts Expert System

Use knowledge in the


form of rules of It process knowledge expressed in
thumbs or heuristics the form of rules and use
to solve problem in a symbolic reasoning in narrow
narrow domain. domain.

In a human expert we
deal with human brain
in which knowledge
exists in a compiled It provide a clear separation of
form. knowledge from its processing.
Human experts Expert System

It uses inexact
reasoning and also
able to deal with It permits inexact reasoning and
incomplete, uncertain but able to deal with incomplete,
and fuzzy information. uncertain and fuzzy data.

It enhances the quality of


problem solving by the addition
It enhances the quality of new rules or by adjusting the
of problem solving old ones in the knowledge base
because of years of and when new knowledge is
learning and practical acquired, changes are easy to
training. observe.

Human expert can be


available at a specific Expert system can be available
working day. wherever and at any time.

To solve any problem, To solve any problem, expert


human expert can system takes a very short interval
take variable time. of time.

It is not replaceable. It can be replaceable.

What is fringe node ?


A fringe, which is a data structure used to store all the possible states (nodes) that
you can go from the current states.

What is DNF (mathi xa )

Importance of AI
AI technology is important because it enables human capabilities – understanding,
reasoning, planning, communication and perception – to be undertaken by
software increasingly effectively, efficiently and at low cost.
Describe the concept of Hebbian learning.

The term "Artificial Neural Network" is derived from Biological neural


networks that develop the structure of a human brain. Similar to the human
brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various
layers of the networks. These neurons are known as nodes.

Artificial neural networks (ANNs) are comprised of a node layers, containing


an input layer, one or more hidden layers, and an output layer. Each node, or
artificial neuron, connects to another and has an associated weight and
threshold. If the output of any individual node is above the specified
threshold value, that node is activated, sending data to the next layer of the
network. Otherwise, no data is passed along to the next layer of the network.

The dendrite is where a neuron receives


input from other neurons. The axon is the
output of a neuron it transmits the signal
to other neurons. The cell body contains a
nucleus and genetic material, which
controls the cell’s activities. Neurons
communicate with each other by
sending signals, called
neurotransmitters, across a narrow space, called a synapse, between the axons
of the sender neuron and dendrites of the receiver neuron.

What is synapse ?
A synapse is the connection between nodes, or neurons, in an
artificial neural network (ANN). Similar to biological brains, the
connection is controlled by the strength or amplitude of a connection
between both nodes, also called the synaptic weight. Multiple synapses
can connect the same neurons, with each synapse having a different
level of influence (trigger) on whether that neuron is “fired” and
activates the next neuron.
Dendrites-input
Synapse – weight
Axon-output
Artificial Neural Network primarily consists of three layers:
neural network that consists of a large number of artificial neurons, which are
termed units arranged in a sequence of layers. Lets us look at various types of
layers available in an artificial neural network.

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by
the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.

The structure of an
artificial neuron is
very similar to a
biological neuron, it
consists of 3 main
parts, weight and bias
as a dendrite denoted
by w and b
respectively, output
as an axon denoted
by y, and activation
function as a cell
body (nucleus) denoted by f(x). The x is the input signals received by the
dendrite.
What is an activation function and why use them?
The activation function decides whether a neuron should be activated
or not by calculating the weighted sum and further adding bias to it.
Explanation: We know, the neural network has neurons that work in
correspondence with weight, bias (bias can be defined as the constant which
is added to the product of features and weights. It is used to offset the
result. It helps the models to shift the activation function towards the positive
or negative side.), and their respective activation function. In a neural network,
we would update the weights and biases of the neurons on the basis of the error
at the output. This process is known as back-propagation. Activation
functions make the back-propagation possible since the gradients ( A gradient
simply measures the change in all weights with regard to the change in
error.) are supplied along with the error to update the weights and biases.

Hebbian learning
Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by
Donald O Hebb. It is one of the first and also easiest learning rules in the neural
network. It a mechanism to update weights between neurons in a neural
network. This method of weight updation enabled neurons to learn new things .

The basis of the theory is when our brains learn something new, neurons are
activated and connected with other neurons, forming a neural network. These
connections start off weak, but each time the stimulus is repeated, the
connections grow stronger and stronger, and the action becomes more intuitive.
A good example is the act of learning to drive. When you start out, everything
you do is incredibly deliberate. You remind yourself to turn on your indicator, to
check your blind spot, and so on. However, after years of experience, these
processes become so automatic that you perform them without even thinking.

If two neighbor neurons activated and deactivated at the same time. Then the
weight connecting these neurons should increase. For neurons operating in the
opposite phase, the weight between them should decrease. If there is no signal
correlation, the weight should not change.
When inputs of both the nodes are either positive or negative, then a strong
positive weight exists between the nodes. If the input of a node is positive and
negative for other, a strong negative weight exists between the nodes.
At the start, values of all weights are set to zero.

information is stored in the connections between neurons in neural networks, in


the form of weights.
Weight change between neurons is proportional to the product of activation
values for neurons.

Hebbian learning is a single layer neural network, i.e. it has one input layer and
one output layer. The input layer can have many units, say n. The output layer
only has one unit. Hebbian rule works by updating the weights between neurons
in the neural network for each training sample.

Hebbian Learning Algorithm


Hebb Network was stated by Donald Hebb in 1949. According to Hebb’s rule, the
weights are found to increase proportionately to the product of input and
output. It means that in a Hebb network if two neurons are interconnected then
the weights associated with these neurons can be increased by changes in the
synaptic gap.
This network is suitable for bipolar data. The Hebbian learning rule is generally
applied to logic gates.
The weights are updated as:
W (new) = w (old) + x*y
Training Algorithm For Hebbian Learning Rule
The training steps of the algorithm are as follows:
Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and n is the
total number of input neurons.
Let s be the output. The activation function for inputs is generally set as an
identity function.
The activation function for output is also set to y= t.
The weight adjustments and bias are adjusted to:

The steps 2 to 4 are repeated for each input vector and output.

Turing Test
In 1950, Alan Turing introduced a test to check whether a machine can think like
a human or not, this test is known as the Turing Test. In this test, Turing
proposed that the computer can be said to be an intelligent if it can mimic
human response under specific conditions. Turing Test was introduced by
Turing in his 1950 paper, "Computing Machinery and Intelligence," which
considered the question, "Can Machine think?"
The Turing test is based on a party
game "Imitation game," with some
modifications. This game involves
three players in which one player is
Computer, another player is human
responder, and the third player is a
human Interrogator, who is isolated
from other two players and his job
is to find that which player is
machine among two of them.
Consider, Player A is a computer, Player B is human, and Player C is an
interrogator. Interrogator is aware that one of them is machine, but he needs to
identify this on the basis of questions and their responses.
The conversation between all players is via keyboard and screen so the result
would not depend on the machine's ability to convert words as speech.
The test result does not depend on each correct answer, but only how closely its
responses like a human answer. The computer is permitted to do everything
possible to force a wrong identification by the interrogator.
The questions and answers can be like:
Interrogator: Are you a computer?
PlayerA (Computer): No
Interrogator: Multiply two large numbers such as (256896489*456725896)
Player A: Long pause and give the wrong answer.
In this game, if an interrogator would not be able to identify which is a machine
and which is human, then the computer passes the test successfully, and the
machine is said to be intelligent and can think like a human.

Acting humanly
To be considered intelligent a program must be able to act sufficiently like a
human to fool an interrogator.

AI means acting rationally,


i.e., performing actions that increase the value of the state. of the agent or
environment in which the agent is acting. For example, an agent that is. playing a
game will act rationally if it tries to win the game.

Point out some practical difficulties associated with ANN to implement it on an


agent easily ?
• training data is noisy, complex sensor data
• also problems where symbolic algos are used (decision tree learning
(DTL)) - ANN and DTL produce results of comparable accuracy
• instances are attribute-value pairs, attributes may be highly correlated or
independent, values can be any real value
• target function may be discrete-valued, real-valued or a vector
• training examples may contain errors
• long training times are acceptable
• requires fast eval. of learned target func.
• humans do NOT need to understand the learned target func.

McCulloch-Pitts Neuron(linear threshold get model)


• The first mathematical model of biological neuron .
• was proposed by Warren McCulloch (neuroscientist) and Walter Pitts
(logician) in 1943.
• Basic building block of neural network .
• Directed weight graph is used connecting neurons .
• Two possible state –(active 1), (silent 0)
• There is a fixed threshold for each neuron and of the net input to the neuron
is grater than threshold then neuron is fire .
Product the out put
using the threshold .(T)

Aggregate the weighted input in to


Single numeric value .let x be ∑{L1W1+L2W2+L3W3+……….+LnWn }

OP ={ 1 if X >T , 0 if X<T } y=f(x)


Output ki 0 ki 1 ma aauxa

1 if X >T
If Y=F(X)= 0 if X<T

Bias / threshold : minimum valu of weighted active input for a neuron to fire .

McCulloch-Pitts model neuron (i) with three inputs (dendrites) and one output
(axon). Each synapse (junction) has a weight: w i1 , w i2 , w i3. The neuron
receives information as the weighted sum
(x=∑{L1W1+L2W2+L3W3+…+LnWn})of the three inputs (L1 ,L2 ,L3 ), which in
turn is passed through a discontinuous threshold sigmoid non-linearly to obtain
a new activation (new input) value, denoted by y = f (x) = f
(∑{L1W1+L2W2+L3W3+……….+LnWn })

AND Function

An AND function neuron


would only fire when ALL
the inputs are ON i.e., x ≥ 3
here.
OR Function

I believe this is self


explanatory as we
know that an OR
function neuron would
fire if ANY of the inputs
is ON i.e., x ≥ 1 here.

Minimax algorithm
• Minimax is a kind of backtracking algorithm that is used in decision
making and game theory to find the optimal move for a player, assuming
that your opponent also plays optimally. It is widely used in two player
turn-based games such as Tic-Tac-Toe, Backgammon, Mancala, Chess, etc.
• In Minimax the two players are called maximizer and minimizer.
The maximizer tries to get the highest score (utility) possible while
the minimizer tries to do the opposite and get the lowest score possible.
• The minimax algorithm performs a depth-first search algorithm for the
exploration of the complete game tree.
• The minimax algorithm proceeds all the way down to the terminal node
of the tree, then backtrack the tree max will choose best move.
Example:
Consider a game which has 4 final states and paths to reach final state are from
root to 4 leaves of a perfect binary tree as shown below. Assume you are the
maximizing player and you get the first chance to move, i.e., you are at the root
and your opponent at next level. Which move you would make as a
maximizing player considering that your opponent also plays optimally?

Since this is a backtracking based


algorithm, it tries all possible
moves, then backtracks and makes
a decision.
Maximizer goes LEFT: It is now the
minimizers turn. The minimizer
now has a choice between 3 and 5.
Being the minimizer it will
definitely choose the least among
both, that is 3
Maximizer goes RIGHT: It is now
the minimizers turn. The minimizer
now has a choice between 2 and 9. He will choose 2 as it is the least among the
two values.
Being the maximizer you would choose the larger value that is 3. Hence the
optimal move for the maximizer is to go LEFT and the optimal value is 3.
Now the game tree looks like below :

The above tree shows two possible


scores when maximizer makes left and
right moves.

Alpha beta pruning :


Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
It reduces the computation time by a huge factor. This allows us to
search much faster and even go into deeper levels in the game tree. It
cuts off branches in the game tree which need not be searched
because there already exists a better move available. It is called
Alpha-Beta pruning because it passes 2 extra parameters in the
minimax function, namely alpha and beta.

he two-parameter can be defined as:


Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move
as the standard algorithm does, but it removes all the nodes which are not really
affecting the final decision but making algorithm slow. Hence by pruning these
nodes, it makes the algorithm fast.
Let’s make the above algorithm clear with an example.

The initial call starts


from A. The value of
alpha here is -
INFINITY and the
value of beta
is +INFINITY. These
values are passed
down to subsequent nodes in the tree. At A the maximizer must choose max
of B and C, so A calls B first
At B it the minimizer must choose min of D and E and hence calls D first.
At D, it looks at its left child which is a leaf node. This node returns a value of 3.
Now the value of alpha at D is max( -INF, 3) which is 3.
To decide whether its worth looking at its right node or not, it checks the condition
beta<=alpha. This is false since beta = +INF and alpha = 3. So it continues the
search.
D now looks at its right child which returns a value of 5.At D, alpha = max(3, 5)
which is 5. Now the value of node D is 5
D returns a value of 5 to B. At B, beta = min( +INF, 5) which is 5. The minimizer is
now guaranteed a value of 5 or lesser. B now calls E to see if he can get a lower
value than 5.
At E the values of alpha and beta is not -INF and +INF but instead -INF and 5
respectively, because the value of beta was changed at B and that is what B passed
down to E
Now E looks at its left child which is 6. At E, alpha = max(-INF, 6) which is 6. Here
the condition becomes true. beta is 5 and alpha is 6. So beta<=alpha is true. Hence
it breaks and E returns 6 to B
Note how it did not matter what the value of E‘s right child is. It could have been
+INF or -INF, it still wouldn’t matter, We never even had to look at it because the
minimizer was guaranteed a value of 5 or lesser. So as soon as the maximizer saw
the 6 he knew the minimizer would never come this way because he can get a 5 on
the left side of B. This way we dint have to look at that 9 and hence saved
computation time.
E returns a value of 6 to B. At B, beta = min( 5, 6) which is 5.The value of node B is
also 5
So far this is how our game tree looks. The 9 is crossed out because it was never
computed.

B returns 5 to A. At A,
alpha = max( -INF, 5)
which is 5. Now the
maximizer is
guaranteed a value of 5
or greater. A now
calls C to see if it can
get a higher value than
5.
At C, alpha = 5 and beta = +INF. C calls F
At F, alpha = 5 and beta = +INF. F looks at its left child which is a 1. alpha = max( 5,
1) which is still 5.
F looks at its right child which is a 2. Hence the best value of this node is 2. Alpha
still remains 5
F returns a value of 2 to C. At C, beta = min( +INF, 2). The condition beta <= alpha
becomes true as beta = 2 and alpha = 5. So it breaks and it does not even have to
compute the entire sub-tree of G.
The intuition behind this break-off is that, at C the minimizer was guaranteed a
value of 2 or lesser. But the maximizer was already guaranteed a value of 5 if he
choose B. So why would the maximizer ever choose C and get a value less than 2 ?
Again you can see that it did not matter what those last 2 values were. We also
saved a lot of computation by skipping a whole sub-tree.
C now returns a value of 2 to A. Therefore the best value at A is max( 5, 2) which is
a 5.
Hence the optimal value that the maximizer can get is 5
This is how our final game tree looks like. As you can see G has been crossed out as
it was never computed.

Why understanding of environment is necessary of an agent to perform


well?
An environment in artificial intelligence is the surrounding of the agent.
The agent takes input from the environment through sensors and
delivers the output to the environment through actuators.

Agents
An agent can be anything that perceiveits environment through sensors and act
upon that environment through actuators. An Agent runs in the cycle
of perceiving, thinking, and acting.
A (rational)agent could be anything that makes decisions, as a person, firm,
machine, or software. It carries out an action with the best outcome after
considering past and current percepts(agent’s perceptual inputs at a given
instance). An AI system is composed of an agent and its environment. The agents
act in their environment. The environment may contain other agents.
An agent is anything that can be viewed as :
perceiving its environment through sensors and
acting upon that environment through actuators
Note: Every agent can perceive its own actions (but not always the effects)

An agent can be:


Human-Agent: A human agent has eyes, ears, and other organs which work for
sensors and hand, legs, vocal tract work for actuators.
Robotic Agent: A robotic agent can have cameras, infrared range finder, NLP for
sensors and various motors for actuators.
Software Agent: Software agent can have keystrokes, file contents as sensory
input and act on those inputs and display output on the screen.
Hence the world around us is full of agents such as thermostat, cellphone, camera,
and even we are also agents.
Before moving forward, we should first know about sensors, effectors, and
actuators.

Sensor: Sensor is a device


which detects the change in
the environment and sends
the information to other
electronic devices. An agent
observes its environment
through sensors.
Actuators: Actuators are
the component of machines
that converts energy into motion. The actuators are only responsible for moving and
controlling a system. An actuator can be an electric motor, gears, rails, etc.
Effectors: Effectors are the devices which affect the environment. Effectors can be
legs, wheels, arms, fingers, wings, fins, and display screen.

intelligent agent
An intelligent agent is an autonomous entity which act upon an environment
using sensors and actuators for achieving goals.
intelligent agents may learn from the environment to achieve those goals.
Driverless cars and the Siri virtual assistant are examples of intelligent agents in
AI. A thermostat is an example of an intelligent agent.
Following are the main four rules for an AI agent:
Rule 1: An AI agent must have the ability to perceive the environment.
Rule 2: The observation must be used to make decisions.
Rule 3: Decision should result in an action.
Rule 4: The action taken by an AI agent must be a rational action.
Ration agent
A rational agent could be anything that makes decisions, as a person, firm,
machine, or software. It carries out an action with the best outcome after
considering past and current percepts(agent's perceptual inputs at a given
instance).
Perform optimal actions based on given premises and information.

For example, consider the case of this Vacuum cleaner as a Rational agent. It has
the environment as the floor which it is trying to clean. It has sensors like Camera’s
or dirt sensors which try to sense the environment. It has the brushes and the
suction pumps as actuators which take action. Percept is the agent’s perceptual
inputs at any given point of time. The action that the agent takes on the basis of the
perceptual input is defined by the agent function.
Hence before an agent is put into the environment, a Percept sequence and the
corresponding actions are fed into the agent. This allows it to take action on the
basis of the inputs.
An example would be something like a table
Percept Sequence Action

Area1 Dirty Clean

Area1 Clean Move to Area2

Area2 Clean Move to Area1

Area2 Dirty Clean


Based on the input (percept), the vacuum cleaner would either keep moving
between Area1 and Area2 or perform a clean operation.

PEAS in Artificial Intelligence


We know that there are different types of agents in AI. The PEAS System is
used to group related agents together. The PEAS system provides the
performance measure for the respective agent’s environment, actuators, and
sensors. Most of the highest performing agents are Rational Agents.

Rational Agent: The rational agent evaluates all options and selects the most
efficient action. For example, it selects the shortest road with the lowest cost for
maximum efficiency. PEAS is a term that stands for Performance
Measure, Environment, Actuator, and Sensor.
Performance Measure: Performance measure is the unit used to define an
agent’s success. The performance of agents changes according to their distinct
principles.
Environment: The environment is an agent’s immediate surroundings. If the
agent is set in motion, it changes over time. There are five primary types of
environments:
Fully Observable & Partially Observable
Episodic & Sequential
Static & Dynamic
Discrete & Continuous
Deterministic & Stochastic
Actuator: An actuator is a component of the agent that provides the action’s
output to the environment.
Sensor: Sensors are the receptive components of an agent that receive input.

PEAS for Self-Driving Cars


Let’s suppose a self-driving car then PEAS representation will be:
◼ Performance: Safety, time, legal drive, comfort
◼ Environment: Roads, other vehicles, road signs, pedestrian
◼ Actuators: Steering, accelerator, brake, signal, horn
◼ Sensors: Camera, GPS, speedometer, odometer, accelerometer, sonar.

PEAS for Vacuum Cleaners


Let’s suppose a vacuum cleaner then PEAS representation will be:
• Performance: cleanness, efficiency, battery life, security.
• Environment: room, table, wood floor, carpet, different obstacles
• Actuators: wheels, different brushes, vacuum extractors.
• Sensors: camera, dirt detection sensor, cliff sensor, bump sensors,
infrared wall sensors.

PEAS for medical diagnosis


• Agent: Medical diagnosis system
• Performance measure: Healthy patient, minimize costs, lawsuits
• Environment: Patient, hospital, staff
• Actuators: Screen display (questions, tests, diagnoses, treatments,
referrals)
• Sensors: Keyboard (entry of symptoms, findings, patient's answers)
uninformed search (domain level ko knowledge hudaina )
uninformed search algorithms are also called blind search algorithms.
The search algorithm produces the search tree without using any domain
knowledge, which is a brute force in nature.
Uninformed search algorithms do not have additional information about state or
search space other than how to traverse the tree, so it is also called blind search.
Example :breath search algorithm ,Depth search algorithm ,depth limited search .
Search without information
No knowledge
Time consuming
More complicity (time and space)
BFS
o BFS is a uniform search technique that starts traversing the graph from the
root node and explores all the neighboring nodes. Then, it selects the nearest
node and explores all the unexplored nodes. Breadth-first search
implemented using FIFO queue data structure.
o BFS algorithm starts searching from the root node of the tree and expands all
successor node at the current level before moving to nodes of next level.
o Breadth-first search is the most common search strategy for traversing a tree
or graph. This algorithm searches breadthwise in a tree or graph, so it is
called breadth-first search.
o If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.
o It requires lots of memory since each level of the tree must be saved into
memory to expand the next level.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS
algorithm from the root node S to goal node K.

FIFO (Queue)

AB

BCD

CDGH

DGHEF

GHEF
HEFI
EFI

FIK

IK

Time Complexity: Time Complexity of BFS algorithm can be obtained by the


number of nodes traversed in BFS until the shallowest Node. Where the d=
depth of shallowest solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some
finite depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of
the node.

Depth first search:


Depth-first search isa uninformed recursive algorithm for traversing a tree or
graph data structure.
It is called the depth-first search because it starts from the root node and follows
each path to its greatest depth node before moving to the next path.
DFS uses a stack data structure for its implementation.
It use stack
DFS requires very less memory as it only needs to store a stack of the nodes on
the path from root node to the current node.
It takes less time to reach to the goal node than BFS algorithm (if it traverses in
the right path).
Disadvantage:
DFS algorithm goes for deep down searching and sometime it may go to the
infinite loop.
informed search algorithm
So far we have talked about the uninformed search algorithms which looked
through search space for all possible solutions of the problem without having
any additional knowledge about search space. But informed search algorithm
contains an array of knowledge such as how far we are from the goal, path cost,
how to reach to goal node, etc. This knowledge help agents to explore less to the
search space and find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed
search algorithm uses the idea of heuristic, so it is also called Heuristic search.

The heuristic function (rule of thumb) is a way to inform the search about the
direction to a goal. It provides an informed way to guess which neighbor of a
node will lead to a goal. Heuristic is a function which is used in Informed Search,
and it finds the most promising path. It takes the current state of the agent as its
input and produces the estimation of how close agent is from the goal. The
heuristic method, however, might not always give the best solution, but it
guaranteed to find a good solution in reasonable time. Heuristic function
estimates how close a state is to the goal.
It is a technique designed to solve a problem quickely.
Every state ko cost /time dinxa heuristic la
Heuristic is kind a note ko value jasti ho

Heuristics function: A heuristic is a function that finds the most promising


path in Informed Search. It takes the agent's current state as input and
outputs an estimate of how near the agent is to the goal. The heuristic
method, on the other hand, may not always provide the optimum solution,
but it guarantees that a good solution will be found in a fair amount of time.
A heuristic function determines how close a state is to the desired outcome. It
calculates the cost of an ideal path between two states and is represented by
h(n). The heuristic function's value is always positive.
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost
should be less than or equal to the estimated cost.

Best-first Search Algorithm (Greedy Search):


Informed search technique
Greedy best-first search algorithm always selects the path which appears best
at that moment. It is the combination of depth-first search and breadth-first
search algorithms. It uses the heuristic function and search. Best-first search
allows us to take the advantages of both algorithms. With the help of best-
first search, at each step, we can choose the most promising node. In the best
first search algorithm, we expand the node which is closest to the goal node
and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
Were, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm:


Step 1: Place the starting node into the OPEN list.
Step 2: If the OPEN list is empty, Stop and return failure.
Step 3: Remove the node n, from the OPEN list which has the lowest value of
h(n), and places it in the CLOSED list.
Step 4: Expand the node n, and generate the successors of node n.
Step 5: Check each successor of node n, and find whether any node is a goal
node or not. If any successor node is goal node, then return success and
terminate the search, else proceed to Step 6.
Step 6: For each successor node, algorithm checks for evaluation function
f(n), and then check if the node has been in either OPEN or CLOSED list. If the
node has not been in both list, then add it to the OPEN list.
Step 7: Return to Step 2.

Example:
Consider the below search problem, and we will traverse it using greedy best-
first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.
In this search example, we are
using two lists which
are OPEN and CLOSED Lists.
Following are the iteration for
traversing the above example.

Expand the nodes of S and put in the


CLOSED list
Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S,
B]
: Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S,
B, F]
: Open [I, E, A], Closed [S, B,
F, G]
Hence the final solution path will
be: S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first search is
O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is
O(bm). Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state
space is finite.
Optimal: Greedy best first search algorithm is not optimal.

A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses
heuristic function h(n), and cost to reach the node n from the start state g(n)..
A* search algorithm finds the shortest path through the search space using
the heuristic function. This search algorithm expands less search tree and
provides optimal result faster.
In A* search algorithm, we use search heuristic as well as the cost to reach the node.
Hence we can combine both costs as following, and this sum is called as a fitness
number.
Advantages:
A* search algorithm is the best algorithm than other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics and
approximation.
A* search algorithm has some complexity issues.
The main drawback of A* is memory requirement as it keeps all generated nodes in
the memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The
heuristic value of all states is given in the below table so we will calculate the
f(n) of each state using the formula f(n)= g(n) + h(n), where g(n) is the cost to
reach any node from start state.
Here we will use OPEN and CLOSED list.
Solution:

Initialization: {(S, 5)}

Iteration1: {(S--> A, 4), (S-->G, 10)}

Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-


->G, 10)}

Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C-


-->D, 11), (S--> A-->B, 7), (S-->G, 10)}

Iteration 4 will give the final result, as S--->A-


-->C--->G it provides the optimal path with
cost 6.
Solutions Informed Search vs. Uninformed Search is depicted pictorially as
follows:

Parameters Informed Search Uninformed Search

It is also known as Heuristic It is also known as Blind


Known as Search. Search.

Using It uses knowledge for the searching It doesn’t use knowledge for
Knowledge process. the searching process.

It finds solution slow as


compared to an informed
Performance It finds a solution more quickly. search.

Completion It may or may not be complete. It is always complete.

Cost Factor Cost is low. Cost is high.

It consumes less time because of It consumes moderate time


Time quick searching. because of slow searching.

There is a direction given about the No suggestion is given


Direction solution. regarding the solution in it.

It is less lengthy while It is more lengthy while


Implementation implemented. implemented.

It is more efficient as efficiency It is comparatively less


takes into account cost and efficient as incurred cost is
performance. The incurred cost is more and the speed of finding
less and speed of finding solutions the Breadth-Firstsolution is
Efficiency is quick. slow.

Computational Computational requirements are Comparatively higher


requirements lessened. computational requirements.

Size of search Having a wide scope in terms of Solving a massive search task
problems handling large search problems. is challenging.
Parameters Informed Search Uninformed Search

• Depth First Search


• Greedy Search (DFS)
• A* Search • Breadth First
Examples of • AO* Search Search (BFS)
Algorithms • Hill Climbing Algorithm • Branch and Bound

List operators used in propositional logic ?

Propositional logic (PL) is the simplest form of logic where all the statements
are made by propositions. A proposition is a declarative statement which is
either true or false. It is a technique of knowledge representation in logical
and mathematical form.
Example:
a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.
Following are some basic facts about propositional logic:
Propositional logic is also called Boolean logic as it works on 0 and 1.
In propositional logic, we use symbolic variables to represent the logic, and we can
use any symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
Propositions can be either true or false, but it cannot be both.
Propositional logic consists of an object, relations or function, and logical
connectives.
These connectives are also called logical operators.
The propositions and connectives are the basic elements of the propositional logic.
Connectives can be said as a logical operator which connects two sentences.
A proposition formula which is always true is called tautology, and it is also called a
valid sentence.
A proposition formula which is always false is called Contradiction.
A proposition formula which has both true and false values is called
Statements which are questions, commands, or opinions are not propositions such
as "Where is Rohini", "How are you", "What is your name", are not propositions.
Syntax of propositional logic:
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
Atomic Propositions
Compound propositions
Atomic Proposition: Atomic propositions are the simple propositions. It consists of a
single proposition symbol. These are the sentences which must be either true or
false.
Example:
a) 2+2 is 4, it is an atomic proposition as it is a true fact.
b) "The Sun is cold" is also a proposition as it is a false fact.
Compound proposition: Compound propositions are constructed by combining
simpler or atomic propositions, using parenthesis and logical connectives.
Example:

a) "It is raining today, and street is wet."


b) "Ankit is a doctor, and his clinic is in Mumbai."
Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a
sentence logically. We can create compound propositions with the help of logical
connectives. There are mainly five connectives, which are given as follows:
Negation: A sentence such as ¬ P is called negation of P. A literal can be either
Positive literal or negative literal.
Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a
conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction,
where P and Q are the propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
Implication: A sentence such as P → Q, is called an implication. Implications are also
known as if-then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I
am breathing, then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Following is the summarized table for Propositional Logic Connectives:
Truth Table:
In propositional logic, we need to know the truth values of propositions in all
possible scenarios. We can combine all the possible combination with logical
connectives, and the representation of these combinations in a tabular format is
called Truth table. Following are the truth table for all logical connectives:
Truth table with three propositions:
We can build a proposition composing three propositions P, Q, and R. This truth
table is made-up of 8n Tuples as we have taken three proposition symbols.

Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional
connectors or logical operators. This order should be followed while evaluating a
propositional problem. Following is the list of the precedence order for operators:

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)

Fifth Precedence Implication

Six Precedence Biconditional


Note: For better understanding use parenthesis to make sure of the correct
interpretations. Such as ¬R∨ Q, It can be interpreted as (¬R) ∨ Q.
Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions
are said to be logically equivalent if and only if the columns in the truth table are
identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as
A⇔B. In below truth table we can see that column for ¬A∨ B and A→B, are identical
hence A is Equivalent to B
Properties of Operators:
Commutativity:
P∧ Q= Q ∧ P, or
P ∨ Q = Q ∨ P.
Associativity:
(P ∧ Q) ∧ R= P ∧ (Q ∧ R),
(P ∨ Q) ∨ R= P ∨ (Q ∨ R)
Identity element:
P ∧ True = P,
P ∨ True= True.
Distributive:
P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
DE Morgan's Law:
¬ (P ∧ Q) = (¬P) ∨ (¬Q)
¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
Double-negation elimination:
¬ (¬P) = P.
Limitations of Propositional logic:
We cannot represent relations like ALL, some, or none with propositional logic.
Example:
All the girls are intelligent.
Some apples are sweet.
Propositional logic has limited expressive power.
In propositional logic, we cannot describe statements in terms of their properties or
logical relationships.

Inference:
In artificial intelligence, we need intelligent computers which can create new
logic from old logic or by evidence, so generating the conclusions from
evidence and facts is termed as Inference.

Inference rules:
Inference rules are the templates for generating valid arguments. Inference rules are
applied to derive proofs in artificial intelligence, and the proof is a sequence of the
conclusion that leads to the desired goal.
In inference rules, the implication among all the connectives plays an important role.
Following are some terminologies related to inference rules:
Implication: It is one of the logical connectives which can be represented as P → Q.
It is a Boolean expression.
Converse: The converse of implication, which means the right-hand side proposition
goes to the left-hand side and vice-versa. It can be written as Q → P.
Contrapositive: The negation of converse is termed as contrapositive, and it can be
represented as ¬ Q → ¬ P.
Inverse: The negation of implication is called inverse. It can be represented as ¬ P →
¬ Q.
From the above term some of the compound statements are equivalent to each
other, which we can prove using truth table:

Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬
P, and Q→ P is equivalent to ¬ P → ¬ Q.
Types of Inference rules:
1. Modus Ponens:
The Modus Ponens rule is one of the most important rules of inference, and it states
that if P and P → Q is true, then we can infer that Q will be true. It can be
represented as:

Example:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
Proof by Truth table:

2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also
true. It can be represented as:

Statement-1: "If I am sleepy then I go to bed" ==> P→ Q


Statement-2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P
Proof by Truth table:
3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and
Q→R is true. It can be represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
Proof by truth table:

4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be
true. It can be represented as:

Example:
Statement-1: Today is Sunday or Monday. ==>P∨Q
Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q
Proof by truth-table:

5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true,
then P∨Q will be true.
Example:
Statement: I have a vanilla ice-cream. ==> P
Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)
Proof by Truth-Table:

6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can
be represented as:

Proof by Truth-Table:

7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It
can be represented as

Proof by Truth-Table:

Quantifiers
Quantifiers are the words that refer to quantities such as “ some ” or “all”
It tells for how many elements a given predicate is true.
IN ENGLISH
Quantifiers are used to express the quantities without giving an exact number .
Example : all, some , many etc

Universal Quantifier
Universal quantifier states that the statements within its scope are true for
every value of the specific variable. It is denoted by the symbol ∀∀.
∀xP(x)∀xP(x) is read as for every value of x, P(x) is true.
Example − "Man is mortal" can be transformed into the propositional
form ∀xP(x)∀xP(x) where P(x) is the predicate which denotes x is mortal
and the universe of discourse is all men.
Existential Quantifier
Existential quantifier states that the statements within its scope are true for
some values of the specific variable. It is denoted by the symbol ∃∃.
∃xP(x)∃xP(x) is read as for some values of x, P(x) is true.
Example − "Some people are dishonest" can be transformed into the
propositional form ∃xP(x)∃xP(x) where P(x) is the predicate which denotes
x is dishonest and the universe of discourse is some people.

Explain the different component of export system?(2015)


Export system
An expert system is a computer program that is designed to solve complex
problems and to provide decision-making ability like a human expert. It
performs this by extracting knowledge from its knowledge base using the
reasoning and inference rules according to the user queries.
The performance of an expert system is based on the expert's knowledge stored
in its knowledge base. The more knowledge stored in the KB, the more that
system improves its performance. One of the common examples of an ES is a
suggestion of spelling errors while typing in the Google search box.
these systems are designed for a specific domain, such as medicine,
science, etc.
Characteristics of Expert System
High Performance: The expert system provides high performance for solving
any type of complex problem of a specific domain with high efficiency and
accuracy.
Understandable: It responds in a way that can be easily understandable by the
user. It can take input in human language and provides the output in the same
way.
Reliable: It is much reliable for generating an efficient and accurate output.
Highly responsive: ES provides the result for any complex query within a very
short period of time.

Components of Expert System


An expert system mainly consists of three components:
• User Interface
• Inference Engine
• Knowledge Base

User interface
With the help of a user
interface, the expert system
interacts with the user, takes
queries as an input in a
readable format, and passes
it to the inference engine.
After getting the response
from the inference engine, it
displays the output to the
user. In other words, it is an
interface that helps a non-expert user to communicate with the expert system to
find a solution.
2. Inference Engine(Rules of Engine)
The inference engine is known as the brain of the expert system as it is the main
processing unit of the system. It applies inference rules to the knowledge base to
derive a conclusion or deduce new information. It helps in deriving an error-free
solution of queries asked by the user.
With the help of an inference engine, the system extracts the knowledge from the
knowledge base.
3. Knowledge Base
The knowledgebase is a type of storage that stores knowledge acquired from the
different experts of the particular domain. It is considered as big storage of
knowledge. The more the knowledge base, the more precise will be the Expert
System.
It is similar to a database that contains information and rules of a particular domain
or subject.
One can also view the knowledge base as collections of objects and their attributes.
Such as a Lion is an object and its attributes are it is a mammal, it is not a domestic
animal, etc.

NPL
Natural language processing (NLP) refers to the branch of computer
science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can.

What is “Activation function” ?


An activation function is a very important feature of an artificial neural network ,
they basically decide whether the neuron should be activated or not.
In artificial neural networks, the activation function defines the output of that node
given an input or set of inputs.
Important use of any activation function is to introduce non-linear properties to our
Network.

Fig 1
In simple term , it calculates a “weighted sum(Wi)” of its input(xi), adds a bias
and then decides whether it should be “fired” or not.
Explanation of above figure (Fig 1),
All the input Xi’s are multiplied with their weight Wi’s assigned to each link and
summed together along with Bias b .
Note : Xi’s and Wi’s are vectors and b is scalar.
Let Y be summation of ( (Wi*Xi) + b )
The value of Y can be anything ranging from -inf to +inf. Meaning it has lot of
information ,now neuron must know to distinguish between the “useful” and “not
-so-useful” information.To build this sense into our network we add ‘activation
function (f)’— Which will decide whether the information passed is useful or not
based on the result it get fired.
Working of single layer network
Properties that Activation function should hold?
Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is also known as
slope.(Back prop)
Monotonic function: A function which is either entirely non-increasing or non-
decreasing.
Activation function Types :
Linear function
Binary Step function
Non-Linear function
Linear Function :

Linear function
A linear activation function takes the form:
y=mx+c ( m is line equation represents W and c is represented as b in neural nets so
equation can be modified as y=Wx+b)
It takes the inputs (Xi’s), multiplied by the weights(Wi’s) for each neuron, and
creates an output proportional to the input. In simple term, weighted sum input
is proportional to output.
As mentioned above , activation function should hold some properties with fails
in linear function.
Problem with Linear function,
1 ) Differential result is constant.
2) All layers of the neural network collapse into one.

Differential of linear function is constant and has no relation with the


input.Which implies weights and bias will be updated during the backprop but
the updating factor (gradient) would be the same.
linear activation functions, no matter how many layers in the neural network, the
last layer will be a linear function of the first layer — Meaning Output of the first
layer is same as the output of the nth layer.
A neural networks with a linear activation function is simply a linear regression
model.
Pros and Cons : Linear function has limited power and ability to handle
complexity.It can be used for simple task like interpretability.

Binary Step Function:


Binary Step function

Binary step function are popular known as “ Threshold function”. It is very simple
function.

Pros and Cons :


The gradient(differential ) of the binary step function is zero,which is the very big
problem in back prop for weight updation.
Another problem with a step function is that it can handle binary class problem
alone.(Though with some tweak we can use it for multi-class problem)
Non-Linear function:
The deep learning rocketing to the sky because of the non-linear functions.Most
modern neural network use the non-linear function as their activation function to
fire the neuron. Reason being they allow the model to create complex mappings
between the network’s inputs and outputs, which are essential for learning and
modeling complex data, such as images, video, audio, and data sets which are
non-linear or have high dimensionality.
Advantage of Non-linear function over the Linear function :
Differential are possible in all the non -linear function.
Stacking of network is possible , which helps us in creating the deep neural nets.
Non-linear Types:
The Nonlinear Activation Functions are mainly divided on the basis of their range
or curves.
Sigmoid or logistic Activation Function:

Sigmoid activation , Derivative


The output of the sigmoid function always ranges between 0 and 1 .
Sigmoid is S-shaped , ‘monotonic’ & ‘differential’ function.
Derivative /Differential of the sigmoid function (f’(x)) will lies between 0 and
0.25.
Derivative of the sigmoid function is not “monotonic”.
Cons:
Derivative of sigmoid function suffers “Vanishing gradient and Exploding gradient
problem”.
Sigmoid function in not “zero-centric”.This makes the gradient updates go too far
in different directions. 0 < output < 1, and it makes optimization harder.
Slow convergence- as its computationally heavy.(Reason use of exponential math
function )
“Sigmoid is very popular in classification problems”
2. Tanh Activation Function:
Tanh function and derivative
Tanh is the modified version of sigmoid function.Hence have similar properties of
sigmoid function.

Tanh function(left) , Sigmoidal representation of Tanh(right)


The function and its derivative both are monotonic
Output is zero “centric”
Optimization is easier
Derivative /Differential of the Tanh function (f’(x)) will lies between 0 and 1.
Cons:
Derivative of Tanh function suffers “Vanishing gradient and Exploding gradient
problem”.
Slow convergence- as its computationally heavy.(Reason use of exponential math
function )
“Tanh is preferred over the sigmoid function since it is zero centered and the gradients
are not restricted to move in a certain direction”
3. ReLu Activation Function(ReLu — Rectified Linear Units):

ReLu function(Blue) , Derivative of ReLu (Green)


ReLU is the non-linear activation function that has gained popularity in AI. ReLu
function is also represented as f(x) = max(0,x).
The function and its derivative both are monotonic.
Main advantage of using the ReLU function- It does not activate all the neurons at
the same time.
Computationally efficient
Derivative /Differential of the Tanh function (f’(x)) will be 1 if f(x) > 0 else 0.
Converge very fast
Cons:
ReLu function in not “zero-centric”.This makes the gradient updates go too far in
different directions. 0 < output < 1, and it makes optimization harder.
Dead neuron is the biggest problem.This is due to Non-differentiable at zero.
“Problem of Dying neuron/Dead neuron : As the ReLu derivative f’(x) is not 0 for
the positive values of the neuron (f’(x)=1 for x ≥ 0), ReLu does not saturate
(exploid) and no dead neurons (Vanishing neuron)are reported. Saturation and
vanishing gradient only occur for negative values that, given to ReLu, are turned
into 0- This is called the problem of dying neuron.”
4. leaky ReLu Activation Function:
Leaky ReLU function is nothing but an improved version of the ReLU function with
introduction of “constant slope”
Leaky ReLu activation (blue) , Derivative(organe)
Leaky ReLU is defined to address problem of dying neuron/dead neuron.
Problem of dying neuron/dead neuron is addressed by introducing a small slope
having the negative values scaled by α enables their corresponding neurons to
“stay alive”.
The function and its derivative both are monotonic
It allows negative value during back propagation
It is efficient and easy for computation.
Derivative of Leaky is 1 when f(x) > 0 and ranges between 0 and 1 when f(x) < 0.
Cons:
Leaky ReLU does not provide consistent predictions for negative input values.
5. ELU (Exponential Linear Units) Activation Function:
ELU and its derivative
ELU is also proposed to solve the problem of dying neuron.
No Dead ReLU issues
Zero-centric
Cons:
Computationally intensive.
Similar to Leaky ReLU, although theoretically better than ReLU, there is currently
no good evidence in practice that ELU is always better than ReLU.
f(x) is monotonic only if alpha is greater than or equal to 0.
f’(x) derivative of ELU is monotonic only if alpha lies between 0 and 1.
Slow convergence due to exponential function.
6. P ReLu (Parametric ReLU) Activation Function:
Leaky ReLU vs P Relu
The idea of leaky ReLU can be extended even further.
Instead of multiplying x with a constant term we can multiply it with a
“hyperparameter (a-trainable parameter)” which seems to work better the leaky
ReLU. This extension to leaky ReLU is known as Parametric ReLU.
The parameter α is generally a number between 0 and 1, and it is generally
relatively small.
Have slight advantage over Leaky Relu due to trainable parameter.
Handle the problem of dying neuron.
Cons:
Same as leaky Relu.
f(x) is monotonic when a> or =0 and f’(x) is monotonic when a =1
7. Swish (A Self-Gated) Activation Function:(Sigmoid Linear Unit)
Google Brain Team has proposed a new activation function, named Swish, which
is simply f(x) = x · sigmoid(x).
Their experiments show that Swish tends to work better than ReLU on deeper
models across a number of challenging data sets.
The curve of the Swish function is smooth and the function is differentiable at all
points. This is helpful during the model optimization process and is considered to
be one of the reasons that swish outperforms ReLU.
Swish function is “not monotonic”. This means that the value of the function may
decrease even when the input values are increasing.
Function is unbounded above and bounded below.

“Swish tends to continuously match or outform the ReLu”


Note that the output of the swish function may fall even when the input increases.
This is an interesting and swish-specific feature.(Due to non-monotonic character)
f(x)=2x*sigmoid(beta*x)
If we think that beta=0 is a simple version of Swish, which is a learnable parameter,
then the sigmoid part is always 1/2 and f (x) is linear. On the other hand, if the beta is
a very large value, the sigmoid becomes a nearly double-digit function (0 for x<0,1 for
x>0). Thus f (x) converges to the ReLU function. Therefore, the standard Swish
function is selected as beta = 1. In this way, a soft interpolation (associating the
variable value sets with a function in the given range and the desired precision) is
provided. Excellent! A solution to the problem of the vanish of the gradients has been
found.
8.Softplus
Activation function,first order derivative,second order derivative
The softplus function is similar to the ReLU function, but it is relatively
smoother.Function of Softplus or SmoothRelu f(x) = ln(1+exp x).
Derivative of the Softplus function is f’(x) is logistic function (1/(1+exp x)).
Function value ranges from (0, + inf).Both f(x) and f’(x) are monotonic.
9.Softmax or normalized exponential function:

The “softmax” function is also a type of sigmoid function but it is very useful to
handle multi-class classification problems.
“Softmax can be described as the combination of multiple sigmoidal function.”
“Softmax function returns the probability for a datapoint belonging to each
individual class.”

While building a network for a multiclass problem, the output layer would have
as many neurons as the number of classes in the target.
For instance if you have three classes[A,B,C], there would be three neurons in the
output layer. Suppose you got the output from the neurons as [2.2 , 4.9 ,
1.75].Applying the softmax function over these values, you will get the following
result — [0.52 , 0.21, 0.27]. These represent the probability for the data point
belonging to each class. From result we can that the input belong to class A.
“Note that the sum of all the values is 1.”
Which one is better to use ? How to choose a right one?
To be honest there is no hard and fast rule to choose the activation function.We
can’t differentiate between activation function.Each activation function as its own
pro’s and con’s.All the good and bad will be decided based on the trail.
But based on the properties of the problem we might able to make a better choice
for easy and quicker convergence of the network.
Sigmoid functions and their combinations generally work better in the case of
classification problems
Sigmoids and tanh functions are sometimes avoided due to the vanishing
gradient problem
ReLU activation function is widely used in modern era.
In case of dead neurons in our networks due to ReLu then leaky ReLU function is
the best choice
ReLU function should only be used in the hidden layers
“As a rule of thumb, one can begin with using ReLU function and then move over to
other activation functions in case ReLU doesn’t provide with optimum results”

What is an Agent?(2017)
An agent can be anything that perceiveits environment through sensors and act
upon that environment through actuators. An Agent runs in the cycle
of perceiving, thinking, and acting. An agent can be:
Human-Agent: A human agent has eyes, ears, and other organs which work for
sensors and hand, legs, vocal tract work for actuators.
Robotic Agent: A robotic agent can have cameras, infrared range finder, NLP for
sensors and various motors for actuators.
Software Agent: Software agent can have keystrokes, file contents as sensory
input and act on those inputs and display output on the screen.
Hence the world around us is full of agents such as thermostat, cellphone,
camera, and even we are also agents.
Sensor: Sensor is a device which detects the change in the environment and
sends the information to other electronic devices. An agent observes its
environment through sensors.
Actuators: Actuators are the component of machines that converts energy into
motion. The actuators are only responsible for moving and controlling a system.
An actuator can be an electric motor, gears, rails, etc.
Effectors: Effectors are the devices which affect the environment. Effectors can
be legs, wheels, arms, fingers, wings, fins, and display screen.

Types of AI Agents
Agents can be grouped into five classes based on their degree of perceived
intelligence and capability. All these agents can improve their performance and
generate better action over the time. These are given below:
Simple Reflex Agent
Model-based reflex agent
Goal-based agents
Utility-based agent
Learning agent
1. Simple Reflex agent:
The Simple reflex agents are the simplest agents. These agents take decisions on
the basis of the current percepts and ignore the rest of the percept history.
These agents only succeed in the fully observable environment.
The Simple reflex agent does not consider any part of percepts history during
their decision and action process.
The Simple reflex agent works on Condition-action rule, which means it maps
the current state to action. Such as a Room Cleaner agent, it works only if there is
dirt in the room.
Problems for the simple reflex agent design approach:
They have very limited intelligence
They do not have knowledge of non-perceptual parts of the current state
Mostly too big to generate and to store.
Not adaptive to changes in the environment.

2. Model-based
reflex agent
The Model-based
agent can work in a
partially observable
environment, and
track the situation.
A model-based agent
has two important
factors:
Model: It is
knowledge about
"how things happen
in the world," so it is called a Model-based agent.
Internal State: It is a representation of the current state based on percept history.
These agents have the model, "which is knowledge of the world" and based on
the model they perform actions.
Updating the agent state requires information about:
How the world evolves
How the agent's action affects the world.
3. Goal-based agents
The knowledge of the
current state environment
is not always sufficient to
decide for an agent to what
to do.
The agent needs to know
its goal which describes
desirable situations.
Goal-based agents expand
the capabilities of the
model-based agent by
having the "goal"
information.
They choose an action, so that they can achieve the goal.
These agents may have to consider a long sequence of possible actions before
deciding whether the goal is achieved or not. Such considerations of different
scenario are called
searching and
planning, which
makes an agent
proactive.

4. Utility-based
agents
These agents are
similar to the goal-
based agent but
provide an extra
component of utility
measurement which
makes them different
by providing a
measure of success at
a given state.
Utility-based agent act based not only goals but also the best way to achieve the
goal.
The Utility-based agent is useful when there are multiple possible alternatives,
and an agent has to choose in order to perform the best action.
The utility function maps each state to a real number to check how efficiently
each action achieves the goals.
5. Learning
Agents
A learning
agent in AI is
the type of
agent which
can learn from
its past
experiences,
or it has
learning
capabilities.
It starts to act
with basic
knowledge
and then able
to act and adapt automatically through learning.
A learning agent has mainly four conceptual components, which are:
Learning element: It is responsible for making improvements by learning from
environment
Critic: Learning element takes feedback from critic which describes that how
well the agent is doing with respect to a fixed performance standard.
Performance element: It is responsible for selecting external action
Problem generator: This component is responsible for suggesting actions that
will lead to new and informative experiences.
Hence, learning agents are able to learn, analyze performance, and look for new
ways to improve the performance.

McCulloch–Pitt neural
The McCulloch–Pitt neural network is considered to be the first neural network.
The neurons are connected by directed weighted paths. McCulloch–Pitt neuron
allows binary activation (1 ON or 0 OFF), i.e., it either fires with an activation 1
or does not fire with an activation of 0. If w > 0, then the connected path is said
to be excitatory else it is known as inhibitory. Excitatory connections have
positive weights and inhibitory connections have negative weights. Each neuron
has a fixed threshold for firing. That is, if the net input to the neuron is greater
than the threshold, it fires. Different Matlab Programs have been done to
generate output of various logical function using McCulloch-Pitt neural network
algorithm.

It may be divided into 2 parts. The first part, g takes an input (ahem dendrite
ahem), performs an aggregation and based on the aggregated value the second
part, f makes a decision.

Backward and forward


Backward and forward chaining are methods of reasoning that exist in the
Expert System Domain of artificial intelligence. These techniques are used in
expert systems such as MYCIN and DENDRAL to generate solutions to real life
problems.
This article provides an overview of these techniques, and how they work. By
the end of the article, readers will have learned real life examples of how
backward and forward chaining are applied in artificial intelligence.
Introduction to the Expert System
A brief overview of an expert system can help us gain more insights on the origin
of backward and forward chaining in artificial intelligence.
An expert system is a computer application that uses rules, approaches, and facts
to provide solutions to complex problems. Examples of expert systems include
MYCIN and DENDRAL. MYCIN uses the backward chaining technique to diagnose
bacterial infections. DENDRAL employs forward chaining to establish the
structure of chemicals.
There are three components in an expert system: user interface, inference
engine, and knowledge base. The user interface enables users of the system to
interact with the expert system. High-quality and domain-specific knowledge is
stored in the knowledge base.
Backward and forward chaining stem from the inference engine component.
This is a component in which logical rules are applied to the knowledge base to
get new information or make a decision. The backward and forward chaining
techniques are used by the inference engine as strategies for proposing solutions
or deducing information in the expert system.
Forward chaining
Forward chaining is a method of reasoning in artificial intelligence in which
inference rules are applied to existing data to extract additional data until an
endpoint (goal) is achieved.
In this type of chaining, the inference engine starts by evaluating existing facts,
derivations, and conditions before deducing new information. An endpoint
(goal) is achieved through the manipulation of knowledge that exists in the
knowledge base.

Forward chaining can be used in planning, monitoring, controling, and


interpreting applications.
Properties of forward chaining
The process uses a down-up approach (bottom to top).
It starts from an initial state and uses facts to make a conclusion.
This approach is data-driven.
It’s employed in expert systems and production rule system.
Examples of forward chaining
A simple example of forward chaining can be explained in the following
sequence.
A
A->B
B
A is the starting point. A->B represents a fact. This fact is used to achieve a
decision B.
A practical example will go as follows;
Tom is running (A)
If a person is running, he will sweat (A->B)
Therefore, Tom is sweating. (B)
A DENDRAL expert system is a good example of how forward chaining is used in
artificial intelligence. DENDRAL is used in the prediction of the molecular
structure of substances.
Deducing the chemical structure starts by finding the number of atoms in every
molecule. The mass spectrum of the sample is then used to establish the
arrangement of the atoms. We can summarize these steps as follows.
The chemical formula is determined ( the number of atoms in every molecule).
The spectrum machine is used to form mass spectrums of the sample.
The isomer and structure of the chemical are identified.
In this example, the identification of the chemical structure is the endpoint. In
the DENDRAL expert system, a generate and test technique is employed.
There are two elements in the generator: a synthesiser and structural
enumerator. The synthesiser plays the role of producing the mass spectrum. The
structural enumerator identifies the structure of substances and prevents
redundancy in the generator.
Advantages
It can be used to draw multiple conclusions.
It provides a good basis for arriving at conclusions.
It’s more flexible than backward chaining because it does not have a limitation
on the data derived from it.
Disadvantages
The process of forward chaining may be time-consuming. It may take a lot of
time to eliminate and synchronize available data.
Unlike backward chaining, the explanation of facts or observations for this type
of chaining is not very clear. The former uses a goal-driven method that arrives
at conclusions efficiently.
Backward chaining
Backward chaining is a concept in artificial intelligence that involves
backtracking from the endpoint or goal to steps that led to the endpoint. This
type of chaining starts from the goal and moves backward to comprehend the
steps that were taken to attain this goal.
The backtracking process can also enable a person establish logical steps that
can be used to find other important solutions.

Backward chaining can be used in debugging, diagnostics, and prescription


applications.
Properties of backward chaining
The process uses an up-down approach (top to bottom).
It’s a goal-driven method of reasoning.
The endpoint (goal) is subdivided into sub-goals to prove the truth of facts.
A backward chaining algorithm is employed in inference engines, game theories,
and complex database systems.
The modus ponens inference rule is used as the basis for the backward chaining
process. This rule states that if both the conditional statement (p->q) and the
antecedent (p) are true, then we can infer the subsequent (q).
Example of backward chaining
The information provided in the previous example (forward chaining) can be
used to provide a simple explanation of backward chaining. Backward chaining
can be explained in the following sequence.
B
A->B
A
B is the goal or endpoint, that is used as the starting point for backward tracking.
A is the initial state. A->B is a fact that must be asserted to arrive at the endpoint
B.
A practical example of backward chaining will go as follows:
Tom is sweating (B).
If a person is running, he will sweat (A->B).
Tom is running (A).
The MYCIN expert system is a real life example of how backward chaining works.
This is a system that’s used in the diagnosis of bacterial infections. It also
recommends suitable treatments for this type of infections.
The knowledge base of a MYCIN comprises many antecedent-consequent rules,
that enable the system to recognize various causes of (bacterial) infections. This
system is suitable for patients who have a bacterial infection, but don’t know the
specific infection. The system will gather information relating to symptoms and
history of the patient. It will then analyze this information to establish the
bacterial infection.
A suitable sequence can be as follows:
The patient has a bacterial infection.
The patient is vomiting.
He/she is also experiencing diarrhea and severe stomach upset.
Therefore, the patient has typhoid (salmonella bacterial infection).
The MYCIN expert system uses the information collected from the patient to
recommend suitable treatment.
The recommended treatment corresponds to the identified bacterial infection. In
the case above, the system may recommend the use of ciprofloxacin.
Advantages
The result is already known, which makes it easy to deduce inferences.
It’s a quicker method of reasoning than forward chaining because the endpoint is
available.
In this type of chaining, correct solutions can be derived effectively if pre-
determined rules are met by the inference engine.
Disadvantages
The process of reasoning can only start if the endpoint is known.
It doesn’t deduce multiple solutions or answers.
It only derives data that is needed, which makes it less flexible than forward
chaining.
Conclusion
Backward and forward chaining are important methods of reasoning in artificial
intelligence. These concepts differ mainly in terms of approach, strategy,
technique, speed, and operational direction.
Forward chaining is important to developers that want to use data-driven
algorithms to develop effective computer-based systems. Backward chaining is
important to developers that are interested in using goal-driven algorithms to
design effective solutions in complex database systems.

S Forward Chaining Backward Chaining


.
N
o
.
1 Forward chaining starts Backward chaining starts from
. from known facts and the goal and works backward
applies inference rule to through inference rules to find
extract more data unit it the required facts that support
reaches to the goal. the goal.
2 is a bottom-up
It It is a top-down approach
. approach
3 Forward chaining is Backward chaining is known as
. known as data-driven goal-driven technique as we
inference technique as we start from the goal and divide
reach to the goal using into sub-goal to extract the facts.
the available data.
4 Forward chaining Backward chaining reasoning
. reasoning applies a applies a depth-first search
breadth-first search strategy.
strategy.
5 Forward chaining tests for Backward chaining only tests for
. all the available rules few required rules.
6 Forward chaining is Backward chaining is suitable for
. suitable for the planning, diagnostic, prescription, and
monitoring, control, and debugging application.
interpretation application.
7 Forward chaining can Backward chaining generates a
. generate an infinite finite number of possible
number of possible conclusions.
conclusions.
8 It operates in the forward It operates in the backward
. direction. direction.
9 Forward chaining is aimed Backward chaining is only aimed
. for any conclusion. for the required data.

Data is driven Goal driven

Data is available Goal state is given


DATA X=1 , Y=2
RULES IF (X==1 AND Y==2) THEN Z=3 ; IF (Z==3)
THEN A =4 CONCULATION A =4
Forward
Backward

Why genetic algorithm is important ?


By simulating the process of natural selection, reproduction and mutation, the
genetic algorithms can produce high-quality solutions for various problems
including search and optimization.
By the effective use of the Theory of Evolution genetic algorithms are able to
surmount problems faced by traditional algorithms.

What is game playing?


Game playing is search problem define by
Initial stage
Successor function
Goal test
Path cost , utility , pay off function

why alpha beta pruning is better than minimax ?


Both algorithms should give the same answer. However, their main difference is
that alpha-beta does not explore all paths, like minimax does, but prunes those
that are guaranteed not to be an optimal state for the current player, that is max
or min. So, alpha-beta is a better implementation of minimax.
What is Reinforcement Learning?
Reinforcement Learning is a feedback-based Machine learning technique in which an
agent learns to behave in an environment by performing the actions and seeing the
results of actions. For each good action, the agent gets positive feedback, and for
each bad action, the agent gets negative feedback or penalty.
In Reinforcement Learning, the agent learns automatically using feedbacks without
any labeled data, unlike supervised learning.
Since there is no labeled data, so the agent is bound to learn by its experience only.
RL solves a specific type of problem where decision making is sequential, and the
goal is long-term, such as game-playing, robotics, etc.
The agent interacts with the environment and explores it by itself. The primary goal
of an agent in reinforcement learning is to improve the performance by getting the
maximum positive rewards.
The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task in a better way. Hence, we can say that "Reinforcement
learning is a type of machine learning method where an intelligent agent
(computer program) interacts with the environment and learns to act within
that." How a Robotic dog learns the movement of his arms is an example of
Reinforcement learning.
It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it learns
from its own experience without any human intervention.

Example: The problem is as follows: We have an agent and a reward, with many
hurdles in between. The agent is supposed to find the best possible path to reach the
reward. The following problem explains the problem more easily.

The above image shows the robot,


diamond, and fire. The goal of the
robot is to get the reward that is
the diamond and avoid the hurdles
that are fired. The robot learns by
trying all the possible paths and
then choosing the path which
gives him the reward with the least hurdles. Each right step will give the robot a
reward and each wrong step will subtract the reward of the robot. The total reward
will be calculated when it reaches the final reward that is the diamond.
Terms used in Reinforcement Learning
Agent(): An entity that can perceive/explore the environment and act upon it.
Environment(): A situation in which an agent is present or surrounded by. In RL, we
assume the stochastic environment, which means it is random in nature.
Action(): Actions are the moves taken by an agent within the environment.
State(): State is a situation returned by the environment after each action taken by
the agent.
Reward(): A feedback returned to the agent from the environment to evaluate the
action of the agent.
Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
Value(): It is expected long-term retuned with the discount factor and opposite to
the short-term reward.
Q-value(): It is mostly similar to the value, but it takes one additional parameter as a
current action (a).

Types of Reinforcement: There are two types of Reinforcement:

Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular
behavior, increases the strength and the frequency of the behavior. In other
words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states which can diminish
the results

Negative –
Negative Reinforcement is defined as strengthening of behavior because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior

How does Reinforcement Learning Work?


To understand the working process of the RL, we need to consider two main
things:
Environment: It can be anything such as a room, maze, football ground, etc.
Agent: An intelligent agent such as AI robot.
Let's take an example of a maze environment that the agent needs to explore.
Consider the below image:
In the above image, the
agent is at the very first
block of the maze. The
maze is consisting of an
S6 block, which is a wall,
S8 a fire pit, and
S4 a diamond block.
The agent cannot cross
the S6 block, as it is a
solid wall. If the agent
reaches the S4 block,
then get the +1
reward; if it reaches the fire pit, then gets -1 reward point. It can take four
actions: move up, move down, move left, and move right.
The agent can take any path to reach to the final point, but he needs to make
it in possible fewer steps. Suppose the agent considers the path S9-S5-S1-
S2-S3, so he will get the +1-reward point.
The agent will try to remember the preceding steps that it has taken to reach
the final step. To memorize the steps, it assigns 1 value to each previous step.
Consider the below step:

Now, the agent has successfully stored the previous steps assigning the 1
value to each previous block. But what will the agent do if he starts moving
from the block, which has 1 value block on both sides? Consider the below
diagram:
It will be a difficult
condition for the
agent whether he
should go up or
down as each block
has the same value.
So, the above
approach is not
suitable for the
agent to reach the
destination. Hence
to solve the
problem, we will
use the Bellman equation, which is the main concept behind reinforcement
learning.

The Bellman Equation


The Bellman equation was introduced by the Mathematician Richard Ernest
Bellman in the year 1953, and hence it is called as a Bellman equation. It is
associated with dynamic programming and used to calculate the values of a
decision problem at a certain point by including the values of previous states.
It is a way of calculating the value functions in dynamic programming or
environment that leads to modern reinforcement learning.

Production systems
Production systems can be defined as a kind of cognitive (thinking)architecture,
in which knowledge is represented in the form of rules. So, a system that uses
this form of knowledge representation is called a production system. To simply
put, production systems consists of rules and factors. Knowledge is usually
encoded in a declarative from which comprises of a set of rules of the form.
The Major Components Of An AI Production System
A global database
A set of production rules
A control system
The global database is the central data structure which used by an AI production
system. The production system. The production rules operate on the global
database. Each rule usually has a precondition that is either satisfied or not by
the global database. If the precondition is satisfied, the rule is usually be applied.
Application of the rule changes the database. The control system then chooses
which applicable rule should be applied and ceases computation when a
termination condition on the database is satisfied. If multiple rules are to fire at
the same time, the control system resolves the conflicts.
4 Major Features Of Production System
Simplicity
Modularity
Modifiability
Knowledge Intensive
Components of Production System
The major components of the Production System in Artificial Intelligence are:
Global Database: The global database is the central data structure used by the
production system in Artificial Intelligence.
Set of Production Rules: The production rules operate on the global database. Each
rule usually has a precondition that is either satisfied or not by the global database. If
the precondition is satisfied, the rule is usually be applied. The application of the rule
changes the database.
Control system chooses which rule to apply.

What is testing of neural network?


The purpose of testing is to compare the outputs from the neural network
against targets in an independent set (the testing instances).

(Forward)
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations.
Country A, an enemy of America, has some missiles, and all the missiles were sold
to it by Robert, who is an American citizen."
Prove that "Robert is criminal."
To solve the above problem, first, we will convert all the above facts into first-order
definite clauses, and then we will use a forward-chaining algorithm to reach the
goal.
Facts Conversion into FOL:
It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r
are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two
definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
Country A is an enemy of America.
Enemy (A, America) .........(7)
Robert is American
American(Robert). ..........(8)

Forward chaining proof:


Step-1:
In the first step we will start with the known facts and will choose the sentences
which do not have implications, such as: American(Robert), Enemy(A, America),
Owns(A, T1), and Missile(T1). All these facts will be represented as below.

Step-2:
At the second step, we will see those facts which infer from available facts and with
satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which
infers from the conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which
infers from Rule-(7).

Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1,
r/A}, so we can add Criminal(Robert) which infers all the available facts. And hence
we reached our goal statement.
Hence it is proved that Robert is Criminal using forward chaining approach.

Example:backward

In backward-chaining, we will use the same above example, and will rewrite all the rules.

o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)


Owns(A, T1) ........(2)
o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missile(p) → Weapons (p) .......(5)
o Enemy(p, America) →Hostile(p) ........(6)
o Enemy (A, America) .........(7)
o American(Robert). ..........(8)

Backward-Chaining proof:

In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then
infer further rules.

Step-1:

At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and
at last, we will prove those facts true. So our goal fact is "Robert is Criminal," so following is the
predicate of it.
Step-2:

At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can
see in Rule-1, the goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we
will add all the conjunctive facts below the first level and will replace p with Robert.

Here we can see American (Robert) is a fact, so it is proved here.

Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it
satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at q.

Step-4:

At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies
the Rule- 4, with the substitution of A in place of r. So these two statements are proved here.
Step-5:

At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And
hence all the statements are proved true using backward chaining.

You might also like