0% found this document useful (0 votes)
55 views136 pages

Metaheuristic Algorithms Course Overview

metaheuritics book

Uploaded by

Jaweria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views136 pages

Metaheuristic Algorithms Course Overview

metaheuritics book

Uploaded by

Jaweria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Metaheuristic Algorithms

Course Outlines

 Preliminaries

 Concepts of metaheuristics: Optimization models, optimization methods, representation,


objective function, constraint handling, parameter tuning, performance analyses of
metaheuristics. Random walk and levy flights.

 Single solution-based metaheuristics

 Neighborhood, initial solution, fitness landscape analysis, local search. Population based
metaheuristics: Initial population, stopping criteria. Differential evolution, ant and bee
algorithms. Firefly algorithm, bat algorithm, cuckoo search.

 Metaheuristics for multi-objective optimization

 Hybrid metaheuristics: Design issues, implementation issues. Combining metaheuristics


with mathematical programming. Hybrid metaheuristics with machine learning and data
mining. Hybrid metaheuristics for multi-objective optimization.
Books
 Blum, C., Roli, A., & Sampels, M. (2008). Hybrid
metaheuristics: An emerging approach to
optimization. Springer.
 Bozor, O., Solgi, M., Loaiciga, H. A. (2017). Meta-
heuristic and evolutionary algorithms for
engineering optimization. John Wiley & Sons, Inc.
Algorithms
 An algorithm is a set of instructions for solving a
problem or accomplishing a task.

 Common example of an algorithm is a recipe, which


consists of specific instructions for preparing a dish or
meal.

 Every computerized device uses algorithms to perform


its functions in the form of hardware- or software-
based routines.

Heuristic Algorithms

 A technique designed for solving a problem more


quickly when classic methods are too slow.
 or
 Finding an approximate solution when classic
methods fail to find any exact solution.

 Heuristics are problem-dependent techniques


Metaheuristic Algorithms

 Meta-heuristics are global, the name "meta"


means "one level above", in other words, meta-
heuristics works with support to heuristic.

 Meta-heuristics are more generic and can be


applicable to variety of problems with little
modifications.
Concept of optimization
 Optimization occurs in the minimization of time,
cost, and risk or the maximization of profit,
quality, and efficiency.
 For example:
 Airline scheduling problem
 Class scheduling problem
 Electric power and gas distribution
 Ways to design a network to optimize the cost and the
quality of service;
 Ways to schedule a production to optimize the time and so
on.
 Oxford Dictionary defines optimization as a process or a
method that can make something perfect and effective.

 Something here may include a design, system or decision


which becomes better and better using the methodology
called optimization.

 We can refer to many other standard definitions and find


one thing common, which is the improvement in some
process, method, design or decision to make something
more effective.
Mathematical problem

 Let us understand optimization problems with the help of


an example. A system is defined by the following
equations:

M=ax1+bx2
P=Mx1+c

 x1 and x2 are two input variables, a and b are constants,


M is an intermediate output, c is one more constant and
P is the final output or response of the system.
 Now, with a given set of conditions we are interested in
obtaining the optimum value of P, which can be
achieved in two different ways:
 Optimizing system performance by optimizing the
internal parameters and constants which are
characteristics of the system.

 Optimizing by adjusting or optimizing all individual


variables or selected variables present in the
system.
Design Variables
 A design variable is a numerical input that is
allowed to change during the design optimization.
 Or
 Variables are design parameters
X
used in any
system which are actually responsible for obtaining
a desired output or a product. Such as:
 X=(x1,x2,…,xn )
 where x1, x2, x3, …, are n number of variables and
constitute together a single vector X.
 These variables are controlled, manipulated and
varied in the process of optimization.

 In some cases, only one variable is varied or


controlled and performance improvement results
and optimization works, but in other cases a
greater number of variables may also change in
order to achieve the desired objective in the
system design.
 Lower Limit:
 Specify the minimum value for the design variable.
 Upper Limit:
 Specify the maximum value for the design
variable.
Objective Function
 The objective function in a mathematical
optimization problem is the real-valued function
whose value is to be either minimized or
maximized over the set of feasible alternatives.
 Or
 An objective function means a goal to be achieved.
Every system performance improvement will have
a certain goal to achieve, maybe an error to
minimize at some specific level.
Lecture#2
 Constraints and bounds are some limitations and
conditions under which the optimization technique
or method works.
 The method successfully improves the system
performance provided a certain set of constraints
are satisfied.
 The limitations include some bounds as well,
which may be upper bounds, lower bounds or
intermediate.
 For example,
Y = minimize { (e(x) )}
under x2<x0 and x1>xp (1)

 Here, the goal is to minimize the value of the error


. The minimum or optimum value has to be
obtained as Y and the optimization has to be
achieved under the conditions given in equation
(1).
 These conditions are bounds which are apparently
lower and upper bounds in the case of x2 and x1,
respectively.

 The error needs to be minimized between x1 and


x2.
 In different types of optimization problems and
applications, various types of bounds and
constraints are used.

 Note:
 The optimization methods can be generalized but
the constraints are specific to the applications or
problems being addressed.
 Therefore, the constraints are subjective, whereas
the optimization methods are objective for the
applications.
 Maximizing the Area of a Garden

 A rectangular garden is to be constructed using a


rock wall on one side of the garden and wire
fencing on the other three sides (Figure).
Given 100ft of wire fencing, determine the
dimensions that would create a garden of
maximum area. What is the maximum area?
We want to determine the
measurements x and y that will create
a garden with maximum area
 Let x denote the length of the side of the garden
perpendicular to the rock wall and y denote the
length of the side parallel to the rock wall. Then
the area of the garden is
A=x⋅y
We want to find the maximum possible area
subject to the constraint that the total fencing
is 100ft. From Figure, the total amount of fencing
used will be 2x+y. Therefore, the constraint
equation is 2x+y=100
Solving this equation for y,
we have y=100−2x.
Thus, we can write the area as
A(x)=x⋅(100−2x)=100x−2x2.
 Before trying to maximize the area
function A(x)=100x−2x2, we need to
determine the domain under
consideration.
 To construct a rectangular garden, we
certainly need the lengths of both sides to
be positive.
 Therefore, we need x>0 and y>0.

 Since y=100−2x, if y>0, then x<50.


 Therefore, we are trying to determine the maximum
value of A(x) for x over the open interval (0,50).
 We do not know that a function necessarily has a
maximum value over an open interval. However, we
do know that a continuous function has an absolute
maximum (and absolute minimum) over a closed
interval.
 Therefore, let’s consider the
function A(x)=100x−2x2 over the closed interval [0,50].
 Therefore, we consider the following problem:
 Maximize A(x)=100x−2x2 over the interval [0,50].
 Differentiating the function A(x), we obtain
 A'(x)=100−4x
 and setting equal to zero
 The only critical point is x=25. We conclude that
the maximum area must occur when x=25.
Then we have y=100−2x=100−2(25)=50.

To maximize the area of the garden,


x=25ft and y=50ft. So, the area of this
garden is 1250ft2.
As scientists, engineers, and managers, we
always have to make decisions.
Decision-making is everywhere. As the world
becomes more and more complex and
competitive, decision-making must be tackled
rationally and optimally.
Decision making consists of the following steps
(Fig. 1.1):
Formulate the problem
 In this first step, a decision problem is identified.
 An initial statement of the problem is made. This
formulation may be imprecise.
 The internal and external factors and the
objective(s) of the problem are outlined.
 Many decision-makers may be involved in
formulating the problem
Model the problem
 In this important step, an abstract mathematical
model is built for the problem.
 The modeler can be inspired by similar models in
the literature. This will reduce the problem to well-
studied optimization models.
 Usually, models we are solving are simplifications
of the reality.
Optimize the problem

Once the problem is modeled, the solving


procedure generates a “good” solution for
the problem.
The solution may be optimal or suboptimal.

Let us notice that we are finding a solution


for an abstract model of the problem and not
for the originally formulated real-life problem.
Therefore, the obtained solution performances are
indicative when the model is an accurate one.

The algorithm designer can reuse state-of-the-art


algorithms on similar problems or integrate the
knowledge of this specific application into the
algorithm.
Implement a solution

The obtained solution is tested


practically by the decision-maker and is
implemented if it is “acceptable.”

Some practical knowledge may be


introduced in the solution to be
implemented.
If the solution is unacceptable, the model and/or the
optimization algorithm have to be improved and the
decision-making process is repeated.
 An open interval does not include its
endpoints, and is indicated with parentheses.
For example, (0,1) means greater than 0 and
less than 1. This means (0,1) = {x | 0 < x <
1}.
 A closed interval is an interval which includes
all its limit points, and is denoted with square
brackets. For example, [0,1] means greater
than or equal to 0 and less than or equal to 1
Complexity Theory

Metaheuristic Algorithms
Lecture#3
Complexity of Algorithms
An algorithm needs two important resources to solve
a problem:
 Time
 Space
Definition
The time complexity of an algorithm is the number of
steps required to solve a problem of size n.

The complexity is generally defined in terms of the


worst-case analysis.
Asymptotic Bound
• The goal in the determination of the
computational complexity of an algorithm
is not to obtain an exact step count but an
asymptotic bound on the step count.

• The Big-O notation makes use of


asymptotic analysis. It is one of the most
popular notations in the analysis of
algorithms.
Big O

Big O notation (with a capital letter O, not a zero), also


called Landau's symbol, is a symbolism used in complexity
theory, computer science, and mathematics to describe the
asymptotic behavior of functions. Basically, it tells you how
fast a function grows or decline.

Landau's symbol comes from the name of the German


number theoretician Edmund Landau who invented the
notation. The letter O is used because the rate of growth of
a function is also called its order.
• For example, when analyzing some
algorithm, one might find that the time (or
the number of steps) it takes to complete a
problem of size n is given by
T(n) = 4 n 2 - 2 n + 2
• If we ignore constants (which makes
sense because those depend on the
particular hardware the program is run on)
and slower growing terms, we could say
• "T(n) grows at the order of n 2 " and write:
T(n) = O(n 2 ).
For the formal definition,
Suppose f(x) and g(x) are two functions
defined on some subset of the real numbers.
We write f(x) = O(g(x))
(or f(x) = O(g(x)) for x -> ∞ to be more
precise)
if and only if there exist constants N and C
such that |f(x)| ≤ C |g(x)| for all x>N.
Intuitively, this means that f does not
grow faster than g.
Notation Name

O(1) constant
O(log(n)) logarithmic
O(n) linear
O(n 2 ) quadratic
O(n c ) polynomial
O(c n ) exponential
O(1) — Constant
• O(1) means that the algorithm takes the same
number of steps to execute regardless of how
much data is passed in.
• OR
• If we have statements with basic operations
like comparisons, assignments, reading a
variable. We can assume they take constant
time each O(1).
Example:
Let’s say we can compute the square sum of 3 numbers.

function squareSum(a,
1
b, c) {
2
const sa = a * a;
3
const sb = b * b;
4
const sc = c * c;
5
const sum = sa + sb +
6
sc;
7
return sum;
}
• As you can see, each statement is a basic
operation (math and assignment). Each line
takes constant time O(1). If we add up all
statements’ time it will still be O(1).
• It doesn’t matter if the numbers
are 0 or 9,007,199,254,740,991, it will
perform the same number of operations.
Plot
O(N) — Linear
• An algorithm that is O(N) will take as many
steps as there are elements of data. So
when an array increases in size by one
element, an O(N) algorithm will increase
by one step.
• Traverse an array and print each
element.
• “To traverse an array means to access each
element (item) stored in the array so that the
data can be checked or used as part of a
process”.
• Here, we need to access all the elements one
by one, so the calculation time increases at
the same pace as the input.
Plot
• O(N) is a perfect diagonal line, as for
every additional piece of data, the
algorithm takes one additional step. This is
why it is also referred to as linear time.
• Let’s plot the O(1) and O(N) algorithms in the
same graph and let’s assume that
the O(1) algorithm constantly takes 50 steps.
What can we observe?
 When the input array has less than 50
elements, the O(N) is more efficient.
 At exactly 50 elements the two algorithms
take the same number of steps.
 As the data increases the O(N) takes more
steps.
Example #2
• Basic strucure is :
• for (i = 0; i < N; i++) {
• sequence of statements of O(1)
• }
• The loop executes N times, so the total
time is N*O(1) which is O(N).
O(N²) — Quadratic

Quadratic Time Complexity represents an


algorithm whose performance is directly
proportional to the squared size of the input
data set (think of Linear, but squared).

Within our programs, this time complexity


will occur whenever we nest over multiple
iterations within the data sets
Examples
Worst case time complexity of Selection and
Insertion sort.
Nested loops
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
sequence of statements of O(1)
}
}
The outer loop executes N times and inner loop
executes M times so the time complexity is
O(N*M)
Example#2
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
sequence of statements of O(1)
}
}
Now the time complexity is O(N^2)
Complexity theory

Lecture # 4
Metaheuristic Algorithms
Logarithmic complexity
• If an algorithm has O(log n) running time, it means that
as the input size grows, the number of operations
grows very slowly.
• Example: binary search.
• A log(n) complexity is extremely better than a linear
complexity O(n).
• Even though O(n), linear time is already pretty good for an
algorithm. However, log(n) time is much better as the size of
input increases.
Example
Algorithm A:
Start on the first page of the book and go word by
word until you find what you are looking for.
Algorithm B:
Open the book in the middle and check the first
word on it.
If the word you are looking for is alphabetically
more significant, then look to the right. Otherwise,
look in the left half.
Divide the remainder in half again, and repeat step
#2 until you find the word you are looking for.
Which one is faster?
• The first algorithms go word by word O(n),
• However, the algorithm B split the problem in half on
each iteration O(log n).
• This 2nd algorithm is a binary search.
Comparison
Square root complexity
Square root time complexity means that the algorithm
requires O(N^(1/2)) evaluations where the size of input is N.

Assignment # 1
You are requested to prepare an assignment on logarithmic and
square root complexities by discussing all the significant aspects
with the help of possible examples.

Assignment submission date: 9-12-2021


Maximum marks: 5
[Link]
[Link]
Sequence of statements
• statement 1;
• statement 2;
• ...
• statement k;

The total time is found by adding the times for all


statements:
total time = time(statement 1) + time(statement 2) + ... +
time(statement k)

If each statement is "simple" (only involves basic


operations) then the time for each statement is constant
and the total time is also constant: O(1).
If-then-else statements

if (condition){
sequence of statements 1
}
else {
sequence of statements 2
}

Here, either sequence 1 will execute, or sequence 2 will


execute.
Therefore, the worst-case time is:
(max(time(sequence 1), time(sequence 2)).

For example, if sequence 1 is O(N) and sequence 2 is O(1)


the worst-case time for the whole if-then-else statement
would be O(N).
for loops

for (i = 0; i < N; i++) {


sequence of statements
}

The loop executes N times, so the sequence of statements


also executes N times. Since we assume the statements
are O(1), the total time for the for loop is N * O(1), which is
O(N) overall.
Nested loops
First we'll consider loops where the number of iterations
of the inner loop is independent of the value of the
outer loop's index.

For example:
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
sequence of statements
}
}
• The outer loop executes N times. Every time the outer
loop executes, the inner loop executes M times.

• As a result, the statements in the inner loop execute a


total of N * M times. Thus, the complexity is O(N * M).

• In a common special case where the stopping


condition of the inner loop is j < N instead of j <
M (i.e., the inner loop also executes N times), the total
complexity for the two loops is O(N2).
Now let's consider nested loops where the number of
iterations of the inner loop depends on the value of the
outer loop's index. For example:

for (i = 0; i < N; i++) {


for (j = i+1; j < N; j++) {
sequence of statements
}
}
• Now we can't just multiply the number of iterations of
the outer loop times the number of iterations of the
inner loop, because the inner loop has a different
number of iterations each time.

• So let's think about how many iterations that inner loop


has. That information is given in the following table:
Number of iterations
Value of i
of inner loop
0 N
1 N-1
2 N-2
... ...
N-2 2
N-1 1
So we can see that the total number of times the
sequence of statements executes is: N + N-1 + N-2 +
... + 3 + 2 + 1. We've seen that formula before: the
total is O(N2).
Class Practice
What is the worst-case complexity of the each of
the following code fragments?
Case # 1
• Two loops in a row:
for (i = 0; i < N; i++) {
sequence of statements }
for (j = 0; j < M; j++) {
sequence of statements }
How would the complexity change if the second loop
went to N instead of M?
Case#2

A nested loop followed by a non-nested loop:

for (i = 0; i < N; i++) {


for (j = 0; j < N; j++) {
sequence of statements
}
}
For (k = 0; k < N; k++) {
sequence of statements
}
Class practice

Find the time complexity of the following codes.

Fibonacci (code)

fr <- function ( n ) {

if ( n < 2) return ( n )

f (n -1) + f (n -2)

Explanation

The nth Fibonacci number is the sum of the (n-1)th and (n-2)th Fibonacci
numbers, this function simply calls itself to compute the previous two
numbers, then adds them up.

Its time complexity is __________

Hint:

[In each call there will be two branches that became more complex as
n increases].
Code # 2

Int a=0;

for (i=0;i<N;i++){ O(N) complexity

For (j=N,j>I,j--){ O(N) complexity

a=a+i+j;

} overall ____________

Ans

i j Total time=
i*j
0 N 0*N
1 N-1 1*(N-1)
2 N-2 2*(N-2)
3 N-3 3*(N-3)
4 N-4 4*(N-4)
And so on And so on

Overall its time complexity is __________


Code # 3

#Create a vector

X<- (12,7,3,4.2,18,2,54,-21,8,-5)

# Find mean

[Link]<- mean(X)

Print ([Link])

Its complexity is _____________________

Code # 4

The factorial of a number is defined as:

f(n) = n * f(n-1) → for all n >0


f(0) = 1 → for n = 0

Factorial with recursive function

findfactorial <- function(n) {


if (n == 0) return (1) when n is equal to 0, this command will return 1
else
return (n * findfactorial (n-1)) for n>0, compute factorial
}
Explanation

2!=2*1!=n*(n-1)!

3!=3*2!=n*(n-1)!

4!=4*3!=n*(n-1)!

5!=5*4!=n*(n-1)!

. .

. .

. .

. .

The time complexity of this code is ___________


Code # 5
12/9/2021

Optimization models
• An important step in the optimization process is
classifying your optimization model, since algorithms
Types of optimization for solving optimization problems are tailored to a
particular type of problem.
models
• Here, we provide some guidance to classify an
Lecture # 6 optimization model.

Continuous
Optimization versus Discrete Examples
Optimization Discrete search space  Continuous search space 
combinatorial optimization continuous optimization
Some models only make sense if the variables take on • Routing (e.g. vehicle • Parameter estimation
values from a discrete set, often a subset of integers, routing) – • Finding minimal energy
whereas other models contain variables that can take on • Planning (e.g. timetabling, configurations
any real value. task scheduling) – • Training adaptive systems
• Allocation (e.g. resource
Models with discrete variables are discrete optimization allocation) - Selection (e.g.
discrete optimization problems and models with feature selection)
continuous variables are continuous optimization
problems.

1
12/9/2021

Unconstrained
Optimization versus Constrained
Constrained optimization problems arise from
Optimization applications in which there are explicit constraints on the
variables.
• Another important distinction is between problems in The constraints on the variables can vary widely from
which there are no constraints on the variables and simple bounds to systems of equalities and inequalities that
problems in which there are constraints on the model complex relationships among the variables.
variables. Constrained optimization problems can be furthered
• Unconstrained optimization problems arise directly in classified according to the nature of the constraints (e.g.,
many practical applications; they also arise in the linear, nonlinear, convex).
reformulation of constrained optimization problems in
which the constraints are replaced by a penalty term in
the objective function.

None, One or Many Objectives


Example
Constraints in planning a timetable will be (the solution
is feasible only if they are satisfied): Most optimization problems have a single objective function,
however, there are interesting cases when optimization problems
 Each event is scheduled exactly once
have no objective function or multiple objective functions.
 Only one event is scheduled in a room at a given time Feasibility problems are problems in which the goal is to find
moment values for the variables that satisfy the constraints of a model
 The room satisfies the requirements of the event with no particular objective to optimize.
[Link]
 There are simultaneous events at which same function-optimization/
persons should attend.

2
12/9/2021

In practice, problems with multiple objectives often are


Multi objective optimization problems arise in many fields, reformulated as single objective problems by either forming
such as engineering, economics, and logistics, when a weighted combination of the different objectives or by
optimal decisions need to be taken in the presence of replacing some of the objectives by constraints.
trade-offs between two or more conflicting objectives.
For example, developing a new component might involve [Link]
minimizing weight while maximizing strength or choosing a
portfolio might involve maximizing the expected return
while minimizing the risk.

Deterministic
Selection Optimization versus Stochastic
Optimization
Feature/attribute selection
• Let us consider: In deterministic optimization, it is assumed that the data for
• a dataset characterized by a large number of attributes the given problem are known accurately.
We are looking for a subset of features:
However, for many actual problems, the data cannot be
Single objective known accurately for a variety of reasons. The first reason
Which maximizes the classification accuracy is due to simple measurement error. The second and more
fundamental reason is that some data represent
Or information about the future (e. g., product demand or price
Minimizes the cost of data processing for a future time period) and simply cannot be known with
certainty.
Multi objectives
(In case of two its bi-objective)
Minimize the class I error and class II error

3
12/9/2021

In stochastic optimization, the uncertainty is incorporated • Simulated Annealing (SA)


into the model. • Particle Swarm Optimization (PSO)
Stochastic optimization refers to a collection of • Game Theory-based optimization (GT)
methods for minimizing or maximizing an objective • Evolutionary Algorithms (EA)
function when randomness is present.

These models take advantage of the fact that probability


distributions governing the data are known or can be
estimated; the goal is to find some policy that is feasible for
all (or almost all) the possible data instances and optimizes
the expected performance of the model.

Optimization Methods/Algorithm Exact Methods/algorithms


Exact methods/algorithms obtain optimal solutions and
guarantee their optimality.
Or
Following the complexity of the problem, it may be solved Exact algorithms can find the optimum solution with
by an exact or an approximate method/algorithm. precision.
For example
• Dynamic programming
• The branch and bound algorithm etc

4
12/9/2021

Approximate
Disadvantages
methods/algorithms
 Exact methods can only be applied to small instances Approximate methods/algorithms generate high quality
of difficult problems. solutions in a reasonable time for practical use, but there is
no guarantee of finding a global optimal solution.
 Theses methods are not effective enough, especially Or
when the problem search area is large and complex. Approximate algorithms can find a near optimum solution.

An exact algorithm finds the solution to the problem asked.


This is by contrast with an approximate algorithm, which
only gets close to the solution.

Subclasses of
Approximate/methods
Algorithms
In the class of approximate methods, two subclasses of Heuristics find “good” solutions on large-size problem
algorithms may be distinguished: instances. They allow to obtain acceptable performance at
 Approximation algorithms acceptable costs in a wide range of problems.
 Heuristic algorithms In general, heuristics do not have an approximation
guarantee on the obtained solutions.
Approximation Algorithms
Unlike heuristics, which usually find reasonably “good”
They may be classified into two families:
solutions in a reasonable time approximation algorithms
provide provable solution quality and provable run-time  Specific heuristics
bounds.  Metaheuristics

5
12/9/2021

Metaheuristics
Specific heuristics are tailored and designed to solve a
specific problem and/or instance.
As we already discussed, unlike exact methods,
metaheuristics allow to tackle large-size problem instances
Metaheuristics are general-purpose algorithms that can by
be applied to solve almost any optimization problem. They  delivering satisfactory solutions in a reasonable time.
may be viewed as upper level general methodologies that
 There is no guarantee to find global optimal solutions or
can be used as a guiding strategy in designing underlying
even bounded solutions.
heuristics to solve specific optimization problems.

Its main aim is to present techniques to solve hard


problems.

Computationally hard
Examples of hard problems
problems
There are problems which are hard: Problems characterized by a large space of solutions for
which there are no exact methods of polynomial complexity
(so-called NP hard problems), they are characterized by a
• both for humans and computers (e.g. large combinatorial search space of large size (which cannot be exhaustively
optimization problems, multimodal / multiobjective searched)
optimization problems etc) – computationally hard Examples:
problems Satisfiability problem (SAT): find the values of
• for computers but rather easy for humans (e.g. character boolean variables for which a logical formula is true. For
recognition, face recognition, speech recognition etc) – n variables the search space has the size 2n
ill posed problem Travelling Salesman Problem (TSP): find a minimal
cost tour which visits n towns. The search size is (n-1)!
(in the symmetric case, it is (n-1)!/2)

6
12/9/2021

ill-posed problems ill-posed problems


The particularity of problems which are easy for humans In the case of the first problem there is easy to construct a
but hard for computers is that they are ill-posed, i.e. there is rule-based classifier:
difficult to construct an abstract model which reflects all
particularities of the problem • IF income > average THEN Class 1
ELSE Class 2
– Let us consider the following two problems: Classify •In the case of the second problem it is not so easy to
the employees of a company in two categories: first
category will contain all of those who have an income construct a classifier because there are a lot of other
larger than the average salary per company and the interrelated elements (health status, family, career
second category will contain the other employees evolution etc) to be taken into account in order to decide if
– Classify the employees of a company in two a given employee is reliable for a bank loan. A bank expert
categories: first category will contain all those which relies on his experience (previous success and failure
are good candidates for a bank loan and the second
category will contain the other employees cases) when he makes a decision

ill-posed problems

The methods appropriate for ill-posed problems should be


characterized by:
• •Ability to extract models from examples (learning) Designing a system having all these characteristics usually
• •Ability to deal with dynamic environments (adaptability) leads to solving an optimization problem (= find the design
• •Ability to deal with noisy, incomplete or inconsistent data variables which minimize an error, minimize a cost or
(robustness) maximize a quality criteria)
• • Ability to provide the answer in a reasonable amount of
time (efficiency)

7
12/14/2021

~~ Genetic algorithms(GA)
• In GA the crossover operator is used for exploring
the sample space so it is called exploration operator
while the mutation operator is used to exploit the
Lecture #7 promising locations found by the crossover operator so it
is called exploitation operator.

Criteria in designing a
Classification of Metaheuristics
metaheuristic
In designing a metaheuristic, two contradictory criteria must Many classification criteria may be used for metaheuristics:
be taken into account:
Nature inspired versus nonnature inspired
1. Exploration (Diversification) Many metaheuristics are inspired by natural processes:
2. Exploitation (Intensification) evolutionary algorithms and artificial immune systems from
biology; ants, bees colonies, and particle swarm
Exploration means that one must search over the whole optimization from swarm intelligence into different species
sample space (exploring the sample space) while (social sciences); and simulated annealing from physics.
exploitation means that you are exploiting the promising
areas found when one did the exploration.

1
12/14/2021

Memory usage versus memoryless methods Iterative versus greedy

Some metaheuristic algorithms are memoryless; that is, no In iterative algorithms, we start with a complete
information extracted dynamically is used during the solution (or population of solutions) and transform it at
search. each iteration using some search operators.
Some representatives of this class are local search,
GRASP (greedy random adaptive search procedure), Greedy algorithms start from an empty solution, and at
each step a decision variable of the problem is
and simulated annealing. While other metaheuristics use
assigned until a complete solution is obtained.
a memory that contains some information extracted online
during the search. For instance, short-term and long-term
memories in tabu search. Most of the metaheuristics are iterative algorithms.

Crossover. The recombination of two parent chromosomes (solutions) by


exchanging
part of one chromosome with a corresponding part of another so as to
Population-based search versus single-solution based produce
search offsprings (new solutions).
• Mutation. The change of part of a chromosome (a bit or several bits) to
Single-solution based algorithms (e.g., local search, simulated generate
annealing) manipulate and transform a single solution during the
search while in population-based algorithms (e.g., particle swarm, new genetic characteristics. In binary encoding, mutation can be achieved
evolutionary algorithms) a whole population of solutions is evolved. simply
by flipping between 0 and 1. Mutation can occur at a single site or multiple
These two families have complementary characteristics: single- sites
solution based metaheuristics are exploitation oriented; they have simultaneously.
the power to intensify the search in local regions. Population-
based metaheuristics are exploration oriented; they allow a better • Selection. The survival of the fittest, which means the highest quality
diversification in the whole search space. chromosomes
and/characteristics will stay within the population. This often takes some
form of
elitism, and the simplest form is to let the best genes pass on to the next
generations
in the population.

2
12/16/2021

Advantages
• It is quite easy to come up with a greedy algorithm for a
problem.
Greedy Methods • Analyzing the run time for greedy algorithms is much
easier than for other techniques because there is no
branching.
Lecture 8 Disadvantages
• It does not give the optimal solution.
• Proving that a greedy algorithm is correct is difficult.

Greedy Methods\Algorithms Example


If you want to pass an exam which is going to start day after
A greedy algorithm picks the best immediate choice and tomorrow.
never reconsiders its choices. In terms of optimizing a You have few possible ways to prepare your exam in minimum time.
solution, this simply means that the greedy solution will try
and find local optimum solutions which can be many and Objective: passing exam in minimum time
might miss out on a global optimum solution.
Options
 local author books
A Greedy algorithm makes greedy choices at each step to  Reference books
ensure that the objective function is optimized. The Greedy  5 minute videos
algorithm has only one shot to compute the optimal solution  Short cuts (cheating)
so that it never goes back and reverses the decision.
Constraint (No cheating)

1
12/16/2021

Feasible solutions Key Points


A greedy algorithm, always makes the choice that is best at that
moment.
The solutions that satisfies the constraint • This means that it makes a locally optimal choice in the hope that
this choice will lead to a globally optimal solution.
(excludes all possible ways of cheating)
How to decide which choice is optimal?
Feasible solutions • Assume there is an objective function that needs to be
optimized.
 local author books • A Greedy algorithm makes greedy choices at each step to
 Reference books ensure that the objective function is optimized.
• The Greedy algorithm has only one shot to compute the optimal
 5 minute videos solution so that it never goes back and reverses the decision.

Optimal solution Knapsack Problem


Will always be one which will get after optimizing the Knapsack problem is a name to a family of combinatorial
objective function optimization problems that have the following general
theme:
Reference and any local author book will take enough time
to prepare your exam. You need to fill a knapsack with the few provided
Therefore, 5 minute videos will the best solution to objects/items by considering your bag capacity and
prepare your exam in minimum time (a greedy optimal maximum profits/benefits.
solution)

2
Random walk

Lecture 9
Random walks and other stochastic components are an
intrinsic part of nature-inspired metaheursitic algorithms.
They are often used as random numbers and
randomization techniques in metaheuristic algorithms, and
the efficiency of a metaheuristic algorithm may
implicitly depend on the appropriate use of such
randomization
Metaheuristic algorithms usually involve some form of non-
deterministic, stochastic components, which often appear
in terms of random walks. Such random walks can be
surprisingly efficient when combined with deterministic
components and elitism, as this has been demonstrated in
many modern metaheuristic algorithms such as particle
swarm optimization (PSO), bat algorithm (BA), firefly
algorithm (FA) and other algorithms.
To understand the working mechanism of a stochastic
algorithm, we have to analyze the key characteristics of
random walks.

The good thing is that there are extensive studies of


random walks with extensive results in the literature of
statistics. More generally, many theoretical results are in
the context of Markov chain models and/or Markov chain
Monte Carlo methods.

However, these results tend to be very theoretical and thus


may be not easily accessible to the optimization
community. Even though some theoretical results are
relevant, we have to explain and translate them in the right
context so that they are truly useful to the optimization
communities.
Random walk

Definition
A random walk is a random process which consists of
taking a series of consecutive random steps.

Let SN denotes the sum of each consecutive random step


Xi, then SN forms a random walk.
• SN=

• = = +

where Xi is a random step drawn from a random


distribution.
This relationship can also be considered as a recursive
formula. That is, the next state SN will only depend on the
current existing state SN-1 and the motion or transition XN
from the existing state to the next state.
This is typically the main property of a Markov chain.
Here, the step size or length in a random walk can be fixed
or varying.

Random walks have many applications in physics,


economics, statistics, computer sciences, environmental
science and engineering.
Mathematically speaking, a random walk is given by the
following equation.

where St is the current location or state at t, and wt is a step
or random variable with a known distribution.
In addition, there is no reason why each step length should
be fixed. In fact, the step size can also vary according to a
known distribution.
• If the step length obeys the Gaussian distribution, the
random walk becomes the Brownian motion (Wiener
Process) or a diffusion process.

A random movement of microscopic particles


suspended in liquids or gases resulting from the
impact of molecules of the surrounding medium.
As the mean of particle locations is obviously zero, their
variance will increase linearly with t. In general, in the d-
dimensional space, the variance of Brownian random walks
can be written as:
t

Where is the drift velocity of the system . Here


is the effective diffusion coefficient which is related
to the step length s over a short interval during each
jump.
Therefore, the Brownian motion B(t) essentially obeys a
Gaussian distribution
with zero mean and time-dependent variance. That is,
where
means the random variance obeys the distribution on the
right-hand side; that is, samples should be drawn from the
distribution.
A diffusion process can be viewed as a series of Brownian
motion, which obeys the Gaussian distribution. For this
reason, standard diffusion is often referred to as the
Gaussian diffusion. If the motion at each step is not
Gaussian, then the diffusion is called non-Gaussian
diffusion.
Levy flight/walk

If the step lengths obey other distributions, we have to deal


with more generalized random walks.

A very special case is when step lengths obey the Lévy


distribution, such a random walk is called Lévy flight or
Lévy walk.
Lévy Flights as a Search
Mechanism
Apart from standard random walks, Lévy flights are another
class of random walks whose step lengths are drawn from
the so-called Lévy distribution.

When steps are large, Lévy distribution can be


approximated as a simple power-law.

where 2. Mathematically speaking, Lévy


distribution should be defined in terms of the following
Fourier transform.
Lévy flights are more efficient than Brownian random walks
in exploring unknown, large-scale search space.

There are many reasons to explain this efficiency and one


of them is due to the fact that the variance of Lévy flights
increases much faster than the linear relationship
(variance of the Brownian motion).

Studies show that Lévy flights can maximize the efficiency


of the resource search process in uncertain environments.
Simulated Annealing
(SA)
Simulated Annealing (SA) is a stochastic optimization
algorithm.

This means that it makes use of randomness as part of the


search process. This makes the algorithm appropriate for
nonlinear objective functions where other local search
algorithms do not operate well.

SA is escaping from local optima by allowing worsening


moves.
 SA is a memoryless algorithm, the algorithm does not
use any information gathered during the search.
 SA is applied for both combinatorial and continuous
optimization problems and it is simple and easy to
implement.
 SA is an optimization algorithm that is based on
metallurgy and how metals cool down.
 In metallurgy annealing is the process where metals
cools down slowly to avoid imperfections.
Annealing process is carried out by heating the metal and
cooling it down slowly, thus permitting atoms to arrange
into a minimal energy state.

This means that it makes use of randomness as part of the


search process.

Original Paper Introducing the Concept


Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization
by Simulated Annealing,” Science, Volume 220, Number
4598, 13 May 1983, pp. 671- 680.
Processing of the algorithm

Mainly, the annealing process is done by two steps :

Increasing the temperature of the heat until the metal melts


and
decreasing the temperature of the heat bath slowly trying
to let atoms arrange themselves orderly.
Steps
 Step 1: Initialize – Start with a random initial placement.
 Initialize a very high “temperature”.
 Step 2: Move – Perturb the placement through a defined
 move.
 Step 3: Calculate score – calculate the change in the score
 due to the move made.
 Step 4: Choose – Depending on the change in score, accept
or
 reject the move. The prob of acceptance depending on the
 current “temperature”.
 Step 5: Update and repeat– Update the temperature value by
 lowering the temperature. Go back to Step 2.
 The process is done until “Freezing Point” is reached
Set the objective function f(x), x=[x1,x2,…,xd]
Initialize the system configuration
Initialize the first solution x0
Initialize temperature T with a large value
While t<=Max iterations do
Repeat
Apply random walk
Evaluate
If
Keep the new state
else
∆𝑬
accept the new state with probability 𝒌𝑩 𝑻

(Boltzmann distribution kB is the constant of this distribution and T is temperature )


End if
Until the number of accepted iterations is below a threshold level
Set
End while
Set Best solution as x

Simulated annealing Pseudo code


Assignment
Explain the similarities and dissimilarities of Hill
climbing and Simulated Annealing.
Support your discussion by practical examples if
possible.

Submission date: 11-01-2022


Maximum Marks: 5
Test
Announcement
An Mcqs based test of first five lectures will be
on 13-01-2022.

Venue: Class room


Maximum time: 30 minutes
Max marks: 10
Evolutionary Algorithms
Introduction
Evolutionary algorithm: collective term for all
variants of (probabilistic) optimization and
approximation algorithms that are inspired by
Darwinian evolution.

Optimal states are approximated by successive


improvements based on the variation-selection-
paradigm. Thereby, the variation operators
produce genetic diversity and the selection directs
the evolutionary search (Beyer et al., 2002).
Introduction
Evolutionary algorithms(EAs) are population based
stochastic direct search algorithms that in some
sense mimic the natural evolution.

An EA uses mechanisms inspired by biological


evolution such as reproduction, mutation,
recombination, and selection.
Points in the search space are considered as
individuals (solution candidates), which form a
population.

Their fitness value is a number, indicating their


quality for the problem at hand.
Although different EAs may put different emphasis
on the search operators mutation and
recombination, their general effects are not in
question.

Mutation means neighborhood based movement


in the search space that includes the exploration
of the ’outer space’.

Recombination rearranges existing information


and so focuses on the ’inner space’.
Selection is meant to introduce a bias towards
better fitness values. It can be applied at two
stages:

When parents are selected from the population to


generate offspring (mating/reproducing selection),
and
after new solutions have been created and need to
be inserted into the population, competing for
survival (environmental selection or survival
selection).
Types of EAs
• Genetic algorithms (GA)
• Evolution strategies (ES)
• Estimation of distribution algorithms(EDA)
• Differential evolution (DE)
Genetic Algorithm (GA)
Genetic algorithms are a type of optimization
algorithm, meaning they are used to find the
optimal solution(s) to a given computational
problem that maximizes or minimizes a particular
function.

Genetic algorithms represent one branch of the


field of study called evolutionary computation, in
that they imitate the biological processes of
reproduction and natural selection to solve for the
‘fittest’ solutions .
Mechanisms, Structure, &
Terminology

Since genetic algorithms are designed to simulate


a biological process, much of the relevant
terminology is borrowed from biology.
The basic components common to almost all
genetic algorithms are:
• a fitness function for optimization
• a population of chromosomes
• selection of which chromosomes will reproduce
• crossover to produce next generation of
chromosomes
• random mutation of chromosomes in new
generation
Fitness function
The fitness function is the function that the
algorithm is trying to optimize.

The word “fitness” is taken from evolutionary


theory.
It is used here because the fitness function
tests and quantifies how ‘fit’ each potential
solution is
Chromosome
• The term chromosome refers to a
numerical value or values that represent a
candidate solution to the problem that the
genetic algorithm is trying to solve.

• Each candidate solution is encoded as an


array of parameter values, a process that
is also found in other optimization
algorithms.
Lecture # 13
Main rules of GA

Revision of the last lecture


The genetic algorithm uses three main types of rules at
each step to create the next generation from the current
population:
1. Selection rules select the individuals,
called parents, that contribute to the population at the
next generation.

2. Crossover rules combine two parents to form children


for the next generation.

or
Swaping parts of the solution with another in
chromosomes or solution representations. The main role
is to provide mixing of the solutions and convergence in a
subspace.

2. Mutation rules apply random changes to individual


parents to form children.
or
The change of parts of one solution randomly, which
increases the diversity of the population and provides a
mechanism for escaping from a local optimum
Selection of the fittest, or elitism the use of the
solutions with high fitness to pass on to next generations,
which is often carried out in terms of some form of
selection of the best solutions.

Reference: Genetic Algorithms


Xin-She Yang, in Nature-Inspired Optimization Algorithms, 2014

Pseudo Code of GA
How GA works?
a) Thealgorithm begins by creating a random initial
population.
%PopSize=100
%% Problem Satement
VarMin = -5.12;
VarMax =5.12;
DimNum = 30;
CostFuncName =@sph;

%Pop = rand(PopSize,DimNum) * (VarMax - VarMin) +


VarMin;

b) The algorithm then creates a sequence of new


populations. At each step, the algorithm uses the
individuals in the current generation to create the next
population. To create the new population, the
algorithm performs the following steps:
c) Scores each member of the current population by
computing its fitness value. These values are called
the raw fitness scores.

%Cost = feval(CostFuncName,Pop)

d) Scales the raw fitness scores to convert them into a


more usable range of values. These scaled values
are called expectation values.
e) Selects members, called parents, based on their
expectation.
f) Some of the individuals in the current population that
have lower fitness are chosen as elite. These elite
individuals are passed to the next population.
MaxIteration = 50;
CrossPercent = 65;
MutatPercent = 10;
ElitPercent = 100 - CrossPercent - MutatPercent;

CrossNum = round(CrossPercent/100*PopSize);
MutatNum = round(MutatPercent/100*PopSize);
ElitNum = PopSize - CrossNum - MutatNum;

g) Produces children from the parents. Children are


produced either by making random changes to a
single parent—mutation—or by combining the vector
entries of a pair of parents—crossover.

for Iter = 1:MaxIteration

Elitism
Cross Over
Mutation
MutatPop = rand(MutatNum,DimNum) * (VarMax -VarMin)
+ VarMin;
h) Replaces the current population with the children to
form the next generation.

%% New Population
Pop = [ElitPop ; CrossPop ; MutatPop];
Cost = feval(CostFuncName,Pop);
[Cost Indx] = sort(Cost);
Pop = Pop(Indx,:);

End
i) The algorithm stops when one of the stopping criteria
is met.

Implementation in Matlab will be practice in class via Optimization toolbox


1/11/22 12:55 AM C:\Users\sana\Desktop\simul...\Untitled.m 1 of 1

[xfinal, xf]=simu_my2([-5.12 -5.12],[5.12 5.12],[0.01 0.01],500)


%[xfinal, xf]=simu_my2([-10 -10],[10 10],[0 0],500) %for cost 2
%[xfinal, xf]=simu_my2([-32.768 -32.768],[32.768 32.768],[0.001 0.001],100) % for
ackely
1/11/22 12:31 AM C:\Users\sana\Desktop\simul...\simu_my2.m 1 of 3

function [xfinal, xf]=simu_my2(l,u,xsol,Maxiter)


%%%%%%%%%%for first test function sphere%%%%%%%%
%nd=2;
%l=-5.12*ones(1,nd);
%u=5.12*ones(1,nd);
%xsol=[0.01 0.01]; % it can be random(1e-10)*(l+rand(1,nd).*(u-l));
To=1;%0.5
%Maxiter=100;
CostFunction = @cost1;% Cost Function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for camel hump
%% Initialization
alpha=0.1;
T=To;
cost11 = CostFunction(xsol);%%%%%%%%%%%%first cost

%% SA Main Loop
for k=1: Maxiter
% Create and Evaluate New Solution
dx=0.01*(l+(rand(size(l)).*(u-l)));% step size * neighbour
xnew=xsol+dx;
newcost = CostFunction(xnew);%%%%%%%%%%%second cost
diff=newcost-cost11;
if diff<0 % If NEWSOL is better than SOL
xfinal = xnew;
xf=newcost;
elseif rand <exp(-diff/T)%=P %%%%%%%%%%%%% nature inspired optmization algorithm

xfinal = xnew;
xf=newcost;
end
if newcost<cost11
xfinal=xnew;
end
T=(1-0.1).^(k);
end
xf
end

%% Results

function f=cost1(x)
f=sum(x.^2);
end

function f= cost2(x)
f=(4-2.1*x(1).^2+x(1).^4/3).*x(1).^2+x(1).*x(2)+4*(x(2).^2-1).*x(2).^2;
end

function f = ackley(xx, a, b, c)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1/11/22 12:31 AM C:\Users\sana\Desktop\simul...\simu_my2.m 2 of 3

%
% ACKLEY FUNCTION
%
% Authors: Sonja Surjanovic, Simon Fraser University
% Derek Bingham, Simon Fraser University
% Questions/Comments: Please email Derek Bingham at dbingham@[Link].
%
% Copyright 2013. Derek Bingham, Simon Fraser University.
%
% THERE IS NO WARRANTY, EXPRESS OR IMPLIED. WE DO NOT ASSUME ANY LIABILITY
% FOR THE USE OF THIS SOFTWARE. If software is modified to produce
% derivative works, such modified software should be clearly marked.
% Additionally, this program is free software; you can redistribute it
% and/or modify it under the terms of the GNU General Public License as
% published by the Free Software Foundation; version 2.0 of the License.
% Accordingly, this program is distributed in the hope that it will be
% useful, but WITHOUT ANY WARRANTY; without even the implied warranty
% of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
% General Public License for more details.
%
% For function details and reference information, see:
% [Link]
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%

d = length(xx);

if (nargin < 4)
c = 2*pi;
end
if (nargin < 3)
b = 0.2;
end
if (nargin < 2)
a = 20;
end

sum1 = 0;
sum2 = 0;
for ii = 1:d
xi = xx(ii);
sum1 = sum1 + xi^2;
sum2 = sum2 + cos(c*xi);
end

term1 = -a * exp(-b*sqrt(sum1/d));
term2 = -exp(sum2/d);

f = term1 + term2 + a + exp(1);


1/11/22 12:31 AM C:\Users\sana\Desktop\simul...\simu_my2.m 3 of 3

end

You might also like