Stochastic search methods
Also known as stochastic optimization methods, are a class of optimization
techniques that use randomness or randomness-inspired mechanisms to find
solutions to optimization problems.
Particularly useful when dealing with complex and noisy optimization problems
where traditional deterministic approaches may be less effective.
Widely used in various fields, including machine learning, engineering, operations
research, and economics.
Stochastic Optimization Algorithms
The use of randomness in the algorithms often means that the techniques are referred to as
“heuristic search” as they use a rough rule-of-thumb procedure that may or may not work
to find the optima instead of a precise procedure.
- Many stochastic algorithms are inspired by a biological or natural process and
may be referred to as “metaheuristics” as a higher-order procedure providing the
conditions for a specific search of the objective function.
- They are also referred to as “black box” optimization algorithms. There are many
stochastic optimization algorithms.
Some examples of stochastic optimization algorithms include
1. Iterated Local Search
2. Stochastic Hill Climbing
3. Stochastic Gradient Descent
4. Tabu Search
5. Greedy Randomized Adaptive Search Procedure
Some examples of stochastic optimization algorithms that are inspired by biological or
physical processes include:
1. Simulated Annealing
2. Evolution Strategies
3. Genetic Algorithm
4. Differential Evolution
5. Particle Swarm Optimization
Stochastic Gradient Descent (SGD):
The word “stochastic” means a system or a process that is linked with a random probability.
Hence, in Stochastic Gradient Descent, a few samples are selected randomly instead of the
whole data set for each iteration.
- In Gradient Descent, there is a term called “batch” which denotes the total number
of samples from a dataset that is used for calculating the gradient for each iteration.
- In typical Gradient Descent optimization, like Batch Gradient Descent, the batch is
taken to be the whole dataset. Although, using the whole dataset is really useful for
getting to the minima in a less noisy and less random manner, but the problem arises
when our datasets get big.
Suppose, you have a million samples in your dataset, so if you use a typical Gradient Descent
optimization technique, you will have to use all of the one million samples for completing
one iteration while performing the Gradient Descent, and it has to be done for every
iteration until the minima are reached. Hence, it becomes computationally very expensive to
perform. This problem is solved by Stochastic Gradient Descent. In SGD, it uses only a single
sample, i.e., a batch size of one, to perform each iteration. The sample is randomly shuffled
and selected for performing the iteration.
Example of stochastic
Stochastic processes are widely used as mathematical models of systems and phenomena
that appear to vary in a random manner. Examples include the growth of a bacterial
population, an electrical current fluctuating due to thermal noise, or the movement of a gas
molecule.
Advantages to stochastic modeling
Stochastic models are particularly useful in forecasting, in which the actuary produces
estimates of results in future years, not just a current year valuation.