Simulation Techniques in Statistics
Simulation Techniques in Statistics
Computer system users, administrators, and designers usually have a goal of highest performance at
lowest cost. Modeling and simulation of system design trade off is good preparation for design and
engineering decisions in real world jobs.
In this Web site we study computer systems modeling and simulation. We need a proper knowledge of
both the techniques of simulation modeling and the simulated systems themselves.
The scenario described above is but one situation where computer simulation can be effectively used. In
addition to its use as a tool to better understand and optimize performance and/or reliability of systems,
simulation is also extensively used to verify the correctness of designs. Most if not all digital integrated
circuits manufactured today are first extensively simulated before they are manufactured to identify and
correct design errors. Simulation early in the design cycle is important because the cost to repair
mistakes increases dramatically the later in the product life cycle that the error is detected. Another
important application of simulation is in developing "virtual environments" , e.g., for training.
Analogous to the holodeck in the popular science-fiction television program Star Trek, simulations
generate dynamic environments with which users can interact "as if they were really there." Such
simulations are used extensively today to train military personnel for battlefield situations, at a fraction
of the cost of running exercises involving real tanks, aircraft, etc.
Dynamic modeling in organizations is the collective ability to understand the implications of change
over time. This skill lies at the heart of successful strategic decision process. The availability of effective
visual modeling and simulation enables the analyst and the decision-maker to boost their dynamic
decision by rehearsing strategy to avoid hidden pitfalls.
System Simulation is the mimicking of the operation of a real system, such as the day-to-day operation
of a bank, or the value of a stock portfolio over a time period, or the running of an assembly line in a
factory, or the staff assignment of a hospital or a security company, in a computer. Instead of building
extensive mathematical models by experts, the readily available simulation software has made it
possible to model and analyze the operation of a real system by non-experts, who are managers but not
programmers.
A simulation is the execution of a model, represented by a computer program that gives information
about the system being investigated. The simulation approach of analyzing a model is opposed to the
analytical approach, where the method of analyzing the system is purely theoretical. As this approach is
more reliable, the simulation approach gives more flexibility and convenience. The activities of the
model consist of events, which are activated at certain points in time and in this way affect the overall
state of the system. The points in time that an event is activated are randomized, so no input from
outside the system is required. Events exist autonomously and they are discrete so between the execution
of two events nothing happens. TheSIMSCRIPT provides a process-based approach of writing a
simulation program. With this approach, the components of the program consist of entities, which
combine several related events into one process.
In the field of simulation, the concept of "principle of computational equivalence" has beneficial
implications for the decision-maker. Simulated experimentation accelerates and replaces effectively the
"wait and see" anxieties in discovering new insight and explanations of future behavior of the real
system.
Consider the following scenario. You are the designer of a new switch for asynchronous transfer mode
(ATM) networks, a new switching technology that has appeared on the marketplace in recent years. In
order to help ensure the success of your product in this is a highly competitive field, it is important that
you design the switch to yield the highest possible performance while maintaining a reasonable
manufacturing cost. How much memory should be built into the switch? Should the memory be
associated with incoming communication links to buffer messages as they arrive, or should it be
associated with outgoing links to hold messages competing to use the same link? Moreover, what is the
best organization of hardware components within the switch? These are but a few of the questions that
you must answer in coming up with a design.
With the integration of artificial intelligence, agents and other modeling techniques, simulation has
become an effective and appropriate decision support for the managers. By combining the emerging
science of complexity with newly popularized simulation technology, the PricewaterhouseCoopers,
Emergent Solutions Group builds a software that allows senior management to safely play out "what if"
scenarios in artificial worlds. For example, in a consumer retail environment it can be used to find out
how the roles of consumers and employees can be simulated to achieve peak performance.
We concern ourselves with n realizations that are related to time, that is having n correlated
observations; the estimate of the mean is given by
mean = S Xi / n,
where the sum is over i = 1 to n.
Let
[1 + 2A ] S2 / n
Where
As a good rule of thumb, the maximum lag for which autocorrelations are computed should be approximately
2% of the number of n realizations, although each rj,x could be tested to determine if it is significantly different
from zero.
Sample Size Determination: We can calculate the minimum sample size required by
n = [1 + 2A ] S2 t2 / (d2 mean2)
Application: A pilot run was made of a model, observations numbered 150, the mean was 205.74 minutes and
the variance S2 = 101, 921.54, estimate of the lag coefficients were computed as: r1,x = 0.3301 r2,x = 0.2993,
and r3,x = 0.1987. Calculate the minimum sample size to assure the estimate lies within +d = 10% of the true
mean with a = 0.05.
For practical purposes, the main idea of the central limit theorem (CLT) is that the average of a sample
of observations drawn from some population with any shape-distribution is approximately distributed as
a normal distribution if certain conditions are met. In theoretical statistics there are several versions of
the central limit theorem depending on how these conditions are specified. These are concerned with the
types of assumptions made about the distribution of the parent population (population from which the
sample is drawn) and the actual sampling procedure.
One of the simplest versions of the theorem says that if is a random sample of size n (say, n larger than
30) from an infinite population, finite standard deviation , then the standardized sample mean converges
to a standard normal distribution or, equivalently, the sample mean approaches a normal distribution
with mean equal to the population mean and standard deviation equal to standard deviation of the
population divided by the square root of sample size n. In applications of the central limit theorem to
practical problems in statistical inference, however, statisticians are more interested in how closely the
approximate distribution of the sample mean follows a normal distribution for finite sample sizes, than
the limiting distribution itself. Sufficiently close agreement with a normal distribution allows
statisticians to use normal theory for making inferences about population parameters (such as the mean )
using the sample mean, irrespective of the actual form of the parent population.
It is well known that whatever the parent population is, the standardized variable will have a distribution
with a mean 0 and standard deviation 1 under random sampling. Moreover, if the parent population is
normal, then it is distributed exactly as a standard normal variable for any positive integer n. The central
limit theorem states the remarkable result that, even when the parent population is non-normal, the
standardized variable is approximately normal if the sample size is large enough (say > 30). It is
generally not possible to state conditions under which the approximation given by the central limit
theorem works and what sample sizes are needed before the approximation becomes good enough. As a
general guideline, statisticians have used the prescription that if the parent distribution is symmetric and
relatively short-tailed, then the sample mean reaches approximate normality for smaller samples than if
the parent population is skewed or long-tailed.
In this lesson, we will study the behavior of the mean of samples of different sizes drawn from a variety
of parent populations. Examining sampling distributions of sample means computed from samples of
different sizes drawn from a variety of distributions, allow us to gain some insight into the behavior of
the sample mean under those specific conditions as well as examine the validity of the guidelines
mentioned above for using the central limit theorem in practice.
Under certain conditions, in large samples, the sampling distribution of the sample mean can be
approximated by a normal distribution. The sample size needed for the approximation to be adequate
depends strongly on the shape of the parent distribution. Symmetry (or lack thereof) is particularly
important. For a symmetric parent distribution, even if very different from the shape of a normal
distribution, an adequate approximation can be obtained with small samples (e.g., 10 or 12 for the
uniform distribution). For symmetric short-tailed parent distributions, the sample mean reaches
approximate normality for smaller samples than if the parent population is skewed and long-tailed. In
some extreme cases (e.g. binomial) samples sizes far exceeding the typical guidelines (e.g., 30) are
needed for an adequate approximation. For some distributions without first and second moments (e.g.,
Cauchy), the central limit theorem does not hold.
Preamble
Define X, XBAR as a real 1-dimensional arrays
Define I, J, M, N as integer variables
End
Main
Open 3 for output, Name = "CLT.OUT"
Use 3 for output
LET N=50
LET M = 1000
Reserve X(*) as N
Reserves XBAR(*) as M
For J = 1 to M
DO
For I = 1 to N
DO
X(I) = Uniform.f(1., 0., 1)
Compute
XBAR (j) as the average of X(i)
Loop
Compute
Ave as the average of XBAR(j)
Compute
ST as the standard deviation of XBAR(j)
Loop
LL = Ave -1.96*ST/SQRT.F(real.f(M))
UL = Ave + 1.96*ST/SQRT.F(real.f(M))
For J =1 to M
Do
IF XBAR(j) LL AND XBAR(j) XU
Let Count = Count + 1
Always
Loop
Print 1 line with Count/M thus
The P-value is = *****
END
Many problems in analyzing data involve describing how variables are related. The simplest of all
models describing the relationship between two variables is a linear, or straight-line, model. The
simplest method of fitting a linear model is to "eye-ball'' a line through the data on a plot. A more
elegant, and conventional method is that of "least squares", which finds the line minimizing the sum of
distances between observed points and the fitted line.
Realize that fitting the "best'' line by eye is difficult, especially when there is a lot of residual variability
in the data.
Know that there is a simple connection between the numerical coefficients in the regression equation
and the slope and intercept of regression line.
Know that a single summary statistic like a correlation coefficient does not tell the whole story. A scatter
plot is an essential complement to examining the relationship between the two variables.
The tests we have learned up to this point allow us to test hypotheses that examine the difference
between only two means. Analysis of Variance or ANOVA will allow us to test the difference between 2
or more means. ANOVA does this by examining the ratio of variability between two conditions and
variability within each condition. For example, say we give a drug that we believe will improve memory
to a group of people and give a placebo to another group of people. We might measure memory
performance by the number of words recalled from a list we ask everyone to memorize. A t-test would
compare the likelihood of observing the difference in the mean number of words recalled for each
group. An ANOVA test, on the other hand, would compare the variability that we observe between the
two conditions to the variability observed within each condition. Recall that we measure variability as
the sum of the difference of each score from the mean. When we actually calculate an ANOVA we will
use a short-cut formula
Thus, when the variability that we predict (between the two groups) is much greater than the variability
we don't predict (within each group) then we will conclude that our treatments produce different results.
An important class of decision problems under uncertainty concerns the chance between events. For
examp
For example, the chance of the length of time to next breakdown of a machine not exceeding a certain
time, such as the copying machine in your office not to break during this week.
Exponential distribution gives distribution of time between independent events occurring at a constant
rate. Its density function is:
f(t) = l exp(-lt),
where l is the average number of events per unit of time, which is a positive number.
The mean and the variance of the random variable t (time between events) are 1/ l, and 1/l2, respectively.
Applications include probabilistic assessment of the time between arrival of patients to the emergency
room of a hospital, and arrival of ships to a particular port.
You may like using the following Lilliefors Test for Exponentially to perform the goodness-of-fit test.
Poisson Process
An important class of decision problems under uncertainty is characterized by the small chance of the
occurrence of a particular event, such as an accident. Gives probability of exactly x independent
occurrences during a given period of time if events take place independently and at a constant rate. May
also represent number of occurrences over constant areas or volumes. The following statements describe
the Poisson Process:
1. The occurrences of the events are independent. The occurrence of events from a set of
assumptions in an interval of space or time has no effect on the probability of a second
occurrence of the event in the same, or any other, interval.
2. Theoretically, an infinite number of occurrences of the event must be possible in the interval.
3. The probability of the single occurrence of the event in a given interval is proportional to the
length of the interval.
4. In any infinitesimally small portion of the interval, the probability of more than one occurrence
of the event is negligible.
Poisson process are often used, for example in quality control, reliability, insurance claim, incoming
number of telephone calls, and queuing theory.
An Application: One of the most useful applications of the Poisson Process is in the field of queuing
theory. In many situations where queues occur it has been shown that the number of people joining the
queue in a given time period follows the Poisson model. For example, if the rate of arrivals to an
emergency room is l per unit of time period (say 1 hr), then:
The mean and variance of random variable n are both l . However if the mean and variance of a random
variable having equal numerical values, then it is not necessary that its distribution is a Poisson.
Applications:
P ( 0 arrival) = e-l
P ( 1 arrival) = l e-l / 1!
P ( 2 arrival) = l2 e-l / 2!
Replace the numerical example data with your up-to-14 pairs of Observed values & their frequencies,
and then click the Calculate button. Blank boxes are not included in the calculations.
In entering your data to move from cell to cell in the data-matrix use the Tab key not arrow or enter
keys.
Observed Xi 0 1 2 3 4 5
Frequencies Fi 35 33 20 6 1 0
CLEAR
Estimated Rate
Chi-square
P-value
Conclusion
Example: Used to generate random numbers in sampling and Monte Carlo simulation.
DISCRIPTIVES X
/STATISTICS= 1 2
T-TEST GROUPS=GENDER(1,2)/VARIABLES=X
Plot x vs y:
PLOT FORMAT=REGRESSION/SYMBOLS='*'
/TITLE='PLOT OF Y ON X'
/VERTICAL='Y'
/HORIZONTAL='X'
/PLOT=Y WITH X
SPSS/OUTPUT=HW3.OUT
TITLE 'GENERATING FROM NORMAL 0,1'
INPUT PROGRAM
LOOP I=1 TO 50
COMPUTE X2=NORMAL(1)
END CASE
END LOOP
END FILE
END INPUT PROGRAM
VAR LABLE
X2 'NORMAL VARIATE'
LIST CASE CASE=50/VARIABLE=ALL//
CONDESCRIPTIVE X2(ZX2)
STATISTICS ALL
FREQUENCIES VARIABLE=ZX2/FORMAT=NOTABLE/
HISTOGRAM MIN(-3.0) MAX(+3.0) INCREMENT(0.2)/
NPAR TESTS RUNS(MEAN)=ZX2/
NPAR TESTS K-S(NORMAL,0.0,1.0)=ZX2/
SAMPLE 10 FROM 50
LIST CASE CASE=10/VARIABLES=X2,ZX2/
FINISH
For more SPSS programs useful to simulation input/output analysis, visit Data Analysis Routines.
Classical uniform random number generators have some major defects, such as, short period length and
lack of higher dimension uniformity. However, nowadays there are a class of rather complex generators
which is as efficient as the classical generators while enjoy the property of a much longer period and of
a higher dimension uniformity.
Computer programs that generate "random" numbers use an algorithm. That means if you know the
algorithm and the seed values you can predict what numbers will result. Because you can predict the
numbers they are not truly random - they are pseudorandom. For statistical purposes "good"
pseudorandom numbers generators are good enough.
s1 = seed1
s2 = seed2
return
end
A FORTRAN code for a generator of uniform random numbers on [0,1]. RANECU is multiplicative
linear congruential generator suitable for a 16-bit platform. It combines three simple generators, and has
a period exceeding 81012.
It is constructed for more efficient use by providing for a sequence of such numbers, LEN in total, to be
returned in a single call. A set of three non-zero integer seeds can be supplied, failing which a default set
is employed. If supplied, these three seeds, in order, should lie in the ranges [1,32362], [1,31726] and
[1,31656] respectively.
SUBROUTINE RANECU (RVEC,LEN)
C RVEC.
DIMENSION RVEC(*)
C call at RECUIN
DO 100 I = 1,LEN
K=ISEED1/206
IF(ISEED1.LT.0) ISEED1=ISEED1+32363
K=ISEED2/217
IF(ISEED2.LT.O) ISEED2=ISEED2+31727
K=ISEED3/222
IF(ISEED3.LT.0) ISEED3=ISEED3+31657
IZ=ISEED1-ISEED2
IF(IZ.GT.706)IZ = Z - 32362
IZ = 1Z+ISEED3
IF(IZ.LT.1)IZ = 1Z + 32362
RVEC(I)=REAL(IZ) * 3.0899E - 5
100 CONTINUE
RETURN
ISEED1=IS1
ISEED2=IS2
ISEED3=IS3
RETURN
ENTRY RECUUT(IS1,IS2,IS3)
IS1=ISEED1
IS2=ISEED2
IS3=ISEED3
RETURN
END
a:********************************32
b:**************************27
c:************************26
d:************12
e:***3
The idea is to cut the bars into pieces then reassemble them into a square histogram, all heights equal,
with each final bar having a lower part, as well as an upper part indicating where it came from. A single
uniform random variable U can then be used to choose one of the final bars and to indicate whether to
use the lower or upper part. There are many ways to do this cutting and reassembling; the simplest
seems to be the Robin Hood Algorithm: Take from richest to bring the poorest up to average.
a:******************************** 32
b:************************** 27
c:************************ 26
d:************ 12
e:*** 3
Take 17 from strip 'a' to bring strip 'e' up to average. Record donor and use old 'poor' level to mark lower
part of donee:
a:*************** 15
b:************************** 27
c:************************ 26
d:************ 12
e:**|****************| 20 (a)
Then bring 'd' up to average with donor 'b'. Record donor and use old 'poor' level to mark lower part of
donee:
a:*************** 15
b:****************** 19
c:************************ 26
d:***********|*******| 20 (b)
e:*****|*************| 20 (a)
Then bring 'a' up to average with donor 'c'. Record donor and use old 'poor' level to mark lower part of
donee:
a:***************|***| 20(c)
b:******************* 19
c:********************* 21
d:***********|*******| 20(b)
e:*****|*************| 20(a)
Finally, bring 'b' up to average with donor 'c'. Record donor and use old 'poor' level to mark
lower part of donee:
a:**************|****| 20(c)
b:******************|| 20(c)
c:*******************| 20
d:***********|*******| 20(b)
e:*****|*************| 20(a)
We now have a "squared histogram", i.e., a rectangle with 4 strips of equal area, each strip with two
regions. A single uniform variate U can be used to generate a,b,c,d,e with the required probabilities, .
32, .27, .26, .12 .06.
Setup: Make tables,
Generation Process:
Let j be the integer part of 1+5*U, with U uniform in (0,1). If U T[j] return V[j], else return V[K[j]].
In many applications no V table is necessary: V[i]=i and the generating procedure becomes If U T[j]
return j, else return K[j].
We need to test for both randomness as well as uniformity. The tests can be classified in 2 categories:
Empirical or statistical tests, and theoretical tests.
Theoretical tests deal with the properties of the generator used to create the realization with desired
distribution, and do not look at the number generated at all. For example, we would not use a generator
with poor qualities to generate random numbers.
Statistical tests are based solely on the random observations produced.
C. Correlation tests:
Do the random numbers exhibit discernible correlation? Compute the sample Autcorrelation Function.
Simulation in general is to pretend that one deals with a real thing while really working with an
imitation. In operations research the imitation is a computer model of the simulated reality. A flight
simulator on a PC is also a computer model of some aspects of the flight: it shows on the screen the
controls and what the "pilot" (the youngster who operates it) is supposed to see from the "cockpit" (his
armchair).
Why to use models? To fly a simulator is safer and cheaper than the real airplane. For precisely this
reason, models are used in industry commerce and military: it is very costly, dangerous and often
impossible to make experiments with real systems. Provided that models are adequate descriptions of
reality (they are valid), experimenting with them can save money, suffering and even time.
When to use simulations? Systems that change with time, such as a gas station where cars come and go
(called dynamic systems) and involve randomness. Nobody can guess at exactly which time the next car
should arrive at the station, are good candidates for simulation. Modeling complex dynamic systems
theoretically need too many simplifications and the emerging models may not be therefore valid.
Simulation does not require that many simplifying assumptions, making it the only tool even in absence
of randomness.
How to simulate? Suppose we are interested in a gas station. We may describe the behavior of this
system graphically by plotting the number of cars in the station; the state of the system. Every time a car
arrives the graph increases by one unit while a departing car causes the graph to drop one unit. This
graph (called sample path), could be obtained from observation of a real station, but could also be
artificially constructed. Such artificial construction and the analysis of the resulting sample path (or
more sample paths in more complex cases) consists of the simulation.
Types of simulations: Discrete event. The above sample path consisted of only horizontal and vertical
lines, as car arrivals and departures occurred at distinct points of time, what we refer to as events.
Between two consecutive events, nothing happens - the graph is horizontal. When the number of events
are finite, we call the simulation "discrete event."
In some systems the state changes all the time, not just at the time of some discrete events. For example,
the water level in a reservoir with given in and outflows may change all the time. In such cases
"continuous simulation" is more appropriate, although discrete event simulation can serve as an
approximation.
How is simulation performed? Simulations may be performed manually. Most often, however, the
system model is written either as a computer program (for an example click here) or as some kind of
input into simulator software.
System terminology:
State: A variable characterizing an attribute in the system such as level of stock in inventory or number
of jobs waiting for processing.
Event: An occurrence at a point in time which may change the state of the system, such as arrival of a
customer or start of work on a job.
Entity: An object that passes through the system, such as cars in an intersection or orders in a factory.
Often an event (e.g., arrival) is associated with an entity (e.g., customer).
Queue: A queue is not only a physical queue of people, it can also be a task list, a buffer of finished
goods waiting for transportation or any place where entities are waiting for something to happen for any
reason.
Creating: Creating is causing an arrival of a new entity to the system at some point in time.
Scheduling: Scheduling is the act of assigning a new future event to an existing entity.
Random variable: A random variable is a quantity that is uncertain, such as interarrival time between
two incoming flights or number of defective parts in a shipment.
Distribution: A distribution is the mathematical law which governs the probabilistic features of a random
variable.
A Simple Example: Building a simulation gas station with a single pump served by a single service man.
Assume that arrival of cars as well their service times are random. At first identify the:
states: number of cars waiting for service and number of cars served at any moment
queue: the queue of cars in front of the pump, waiting for service
distributions: we shall assume exponential distributions for both the interarrival time and service time.
Next, specify what to do at each event. The above example would look like this: At event of entity
arrival: Create next arrival. If the server is free, send entity for start of service. Otherwise it joins the
queue. At event of service start: Server becomes occupied. Schedule end of service for this entity. At
event of service end: Server becomes free. If any entities waiting in queue: remove first entity from the
queue; send it for start of service.
Some initiation is still required, for example, the creation of the first arrival. Lastly, the above is
translated into code. This is easy with an appropriate library which has subroutines for creation,
scheduling, proper timing of events, queue manipulations, random variate generation and statistics
collection.
How to simulate? Besides the above, the program records the number of cars in the system before and
after every change, together with the length of each event.
Discrete event systems (DES) are dynamic systems which evolve in time by the occurrence of events at
possibly irregular time intervals. DES abound in real-world applications. Examples include traffic
systems, flexible manufacturing systems, computer-communications systems, production lines, coherent
lifetime systems, and flow networks. Most of these systems can be modeled in terms of discrete events
whose occurrence causes the system to change from one state to another. In designing, analyzing and
operating such complex systems, one is interested not only in performance evaluation but also in
sensitivity analysis and optimization.
A typical stochastic system has a large number of control parameters that can have a significant impact
on the performance of the system. To establish a basic knowledge of the behavior of a system under
variation of input parameter values and to estimate the relative importance of the input parameters,
sensitivity analysis applies small changes to the nominal values of input parameters. For systems
simulation, variations of the input parameter values cannot be made infinitely small. The sensitivity of
the performance measure with respect to an input parameter is therefore defined as (partial) derivative.
Sensitivity analysis is concerned with evaluating sensitivities (gradients, Hessian, etc.) of performance
measures with respect to parameters of interest. It provides guidance for design and operational
decisions and plays a pivotal role in identifying the most significant system parameters, as well as
bottleneck subsystems. I have carried out research in the fields of sensitivity analysis and stochastic
optimization of discrete event systems with an emphasis on computer simulation models. This part of
lecture is dedicated to the estimation of an entire response surface of complex discrete event systems
(DES) from a single sample path (simulation), such as the expected waiting time of a customer in a
queuing network, with respect to the controllable parameters of the system, such as service rates, buffer
sizes and routing probabilities. With the response surfaces at hand, we are able to perform sensitivity
analysis and optimization of a DES from a single simulation, that is, to find the optimal parameters of
the system and their sensitivities (derivatives), with respect to uncontrollable system parameters, such as
arrival rates in a queuing network. We identified three distinct processes. Descriptive Analysis includes:
Problem Identification & Formulation, Data Collection and Analysis, Computer Simulation Model
Development, Validation, Verification and Calibration, and finally Performance Evaluation. Prescriptive
Analysis: Optimization or Goal Seeking. These are necessary components for Post-prescriptive
Analysis: Sensitivity, and What-If Analysis. The prescriptive simulation attempts to use simulation to
prescribe decisions required to obtain specified results. It is subdivided into two topics- Goal Seeking
and Optimization. Recent developments on "single-run" algorithms for the needed sensitivities (i.e.
gradient, Hessian, etc.) make the prescriptive simulation feasible.
Data Collection and Analysis: Regardless of the method used to collect the data, the decision of how
much to collect is a trade-off between cost and accuracy.
Model Validation, Verification and Calibration: In general, verification focuses on the internal
consistency of a model, while validation is concerned with the correspondence between the model and
the reality. The term validation is applied to those processes which seek to determine whether or not a
simulation is correct with respect to the "real" system. More prosaically, validation is concerned with the
question "Are we building the right system?". Verification, on the other hand, seeks to answer the
question "Are we building the system right?" Verification checks that the implementation of the
simulation model (program) corresponds to the model. Validation checks that the model corresponds to
reality. Calibration checks that the data generated by the simulation matches real (observed) data.
Validation: The process of comparing the model's output with the behavior of the phenomenon. In other
words: comparing model execution to reality (physical or otherwise)
Verification: The process of comparing the computer code with the model to ensure that the code is a
correct implementation of the model.
Calibration: The process of parameter estimation for a model. Calibration is a tweaking/tuning of
existing parameters and usually does not involve the introduction of new ones, changing the model
structure. In the context of optimization, calibration is an optimization procedure involved in system
identification or during experimental design.
Input and Output Analysis: Discrete-event simulation models typically have stochastic components
that mimic the probabilistic nature of the system under consideration. Successful input modeling
requires a close match between the input model and the true underlying probabilistic mechanism
associated with the system. The input data analysis is to model an element (e.g., arrival process, service
times) in a discrete-event simulation given a data set collected on the element of interest. This stage
performs intensive error checking on the input data, including external, policy, random and deterministic
variables. System simulation experiment is to learn about its behavior. Careful planning, or designing, of
simulation experiments is generally a great help, saving time and effort by providing efficient ways to
estimate the effects of changes in the model's inputs on its outputs. Statistical experimental-design
methods are mostly used in the context of simulation experiments.
Performance Evaluation and What-If Analysis: The `what-if' analysis is at the very heart of
simulation models.
Sensitivity Estimation: Users must be provided with affordable techniques for sensitivity analysis if
they are to understand which relationships are meaningful in complicated models.
Report Generating: Report generation is a critical link in the communication process between the
model and the end user.
A stochastic process is a probabilistic model of a system that evolves randomly in time and space.
Formally, a stochastic process is a collection of random variables {X(t), t T} all defined on a common
sample (probability) space. The X(t) is the state while (time) t is the index that is a member of set T.
Examples are the delay {D(i), i = 1, 2, ...} of the ith customer and number of customers {Q(t), T 0} in
the queue at time t in an M/M/1 queue. In the first example, we have a discrete- time, continuous state,
while in the second example the state is discrete and time in continuous.
The following table is a classification of various stochastic processes. The man made systems have
mostly discrete state. Monte Carlo simulation deals with discrete time while in discrete even system
simulation the time dimension is continuous, which is at the heart of this site.
Continuous Discrete
Level of water Number of
Continuous
behind a dam customers in a bank
Time
Weekdays' range Sales at the
Discrete
of temperature end of the day
A Classification of Stochastic Processes
To perform statistical analysis of the simulation output we need to establish some conditions, e.g. output
data must be a covariance stationary process (e.g. the data collected over n simulation runs).
Stationary Process (strictly stationary): A stationary stochastic process is a stochastic process {X(t),
t T} with the property that the joint distribution all vectors of h dimension remain the same for any
fixed h.
First Order Stationary: A stochastic process is a first order stationary if expected of X(t) remains the
same for all t.
For example in economic time series, a process is first order stationary when we remove any kinds of
trend by some mechanisms such as differencing.
Second Order Stationary: A stochastic process is a second order stationary if it is first order stationary
and covariance between X(t) and X(s) is function of t-s only.
Again, in economic time series, a process is second order stationary when we stabilize also its variance
by some kind of transformations such as taking square root.
Clearly, a stationary process is a second order stationary, however the reverse may not hold.
In simulation output statistical analysis we are satisfied if the output is covariance stationary.
Covariance Stationary: A covariance stationary process is a stochastic process {X(t), t T} having
finite second moments, i.e. expected of [X(t)]2 be finite.
Clearly, any stationary process with finite second moment is covariance stationary. A stationary process
may have no finite moment whatsoever.
Since a Gaussian process needs a mean and covariance matrix only, it is stationary (strictly) if it is
covariance stationary.
- Let Z be a single random variable with known distribution function, and set Z0 = Z1 = ....Z. Note that in
a realization of this process, the first element, Z0, may be random but after that there is no randomness.
The process {Zi, i = 0, 1, 2, ..} is stationary if Z has a finite variance.
Output data in simulation fall between these two type of process. Simulation outputs are identical, and
mildly correlated (how mild? It depends on e.g. in a queueing system how large is the traffic
intensity ). An example could be the delay process of the customers in a queueing system.
Unlike in queuing theory where steady state results for some models are easily obtainable, the steady
state simulation is not an easy task. The opposite is true for obtaining results for the transient period (i.e.,
the warm-up period).
Gather steady state simulation output requires statistical assurance that the simulation model reached the
steady state. The main difficulty is to obtain independent simulation runs with exclusion of the transient
period . The two technique commonly used for steady state simulation are the Method of Batch means,
and the Independent Replication.
None of these two methods is superior to the other in all cases. Their performance depend on the
magnitude of the traffic intensity. The other available technique is the Regenerative Method, which is
mostly used for its theoretical nice properties, however it is rarely applied in actual simulation for
obtaining the steady state output numerical results.
Suppose you have a regenerative simulation consisting of m cycles of size n1, n2,…nm, respectively. The
cycle sums is:
Method of Batch Means: This method involves only one very long simulation run which is suitably
subdivided into an initial transient period and n batches. Each of the batch is then treated as an
independent run of the simulation experiment while no observation are made during the transient period
which is treated as warm-up interval. Choosing a large batch interval size would effectively lead to
independent batches and hence, independent runs of the simulation, however since number of batches
are few on cannot invoke the central limit theorem to construct the needed confidence interval. On the
other hand, choosing a small batch interval size would effectively lead to significant correlation between
successive batches therefore cannot apply the results in constructing an accurate confidence interval.
Suppose you have n equal batches of m observations each. The means of each batch is:
Method of Independent Replications: This method is the most popularly used for systems with short
transient period. This method requires independent runs of the simulation experiment different initial
random seeds for the simulators' random number generator. For each independent replications of the
simulation run it transient period is removed. For the observed intervals after the transient period data is
collected and processed for the point estimates of the performance measure and for its subsequent
confidence interval.
Suppose you have n replications with of m observations each. The means of each replication is:
Batch Means is a method of estimating the steady-state characteristic from a single-run simulation. The
single run is partitioned into equal size batches large enough for estimates obtained from different
batches to be approximately independent. In the method of Batch Means, it is important to ensure that
the bias due to initial conditions is removed to achieve at least a covariance stationary waiting time
process. An obvious remedy is to run the simulation for a period large enough to remove the effect of
the initial bias. During this warm-up period, no attempt is made to record the output of the simulation.
The results are thrown away. At the end of this warm-up period, the waiting time of customers are
collected for analysis. The practical question is "How long should the warm-up period be?". Abate and
Whitt provided a relatively simple and nice expression for the time required (tp) for an M/M/1/ queue
system (with traffic intensity ) starting at the origin (empty) to reach and remain within 100p% of the
steady- state limit as follows:
The two widely used methods for experimentation on simulation models are method of bath means, and
independent replications. Intuitively one may say the method of independent replication is superior in
producing statistically a "good" estimate for the system's performance measure. In fact, not one method
is superior in all cases and it all depends on the traffic intensity .
After deciding what method is more suitable to apply, the main question is determination of number of
runs. That is, at the planning stage of a simulation investigation of the question of number of simulation
runs (n) is critical.
The confidence level of simulation output drawn from a set of simulation runs depends on the size of
data set. The larger the number of runs, the higher is the associated confidence. However, more
simulation runs also require more effort and resources for large systems. Thus, the main goal must be in
finding the smallest number of simulation runs that will provide the desirable confidence.
Pilot Studies: When the needed statistics for number of simulation runs calculation is not available from
existing database, a pilot simulation is needed.
For large pilot simulation runs (n), say over 30, the simplest number of runs determinate is:
[(Z)2 S2] / 2
where is the desirable margin of error (i.e., the absolute error), which is the half-length of the
confidence interval with 100(1- )% confidence interval. S2 is the variance obtained from the pilot run.
One may use the following sample size determinate for a desirable relative error in %, which requires
an estimate of the coefficient of variation (C.V. in %) from a pilot run with n over 30:
[(Z)2 (C.V.)2] / 2
These sample size determinates could also be used for simulation output estimation of unimodal output
populations, with discrete or continuous random variables provided the pilot run size (n) is larger than
(say) 30.
The aim of applying any one of the above number of runs determinates is at improving your pilot
estimates at feasible costs.
You may like using the following Applet for determination of number of runs
At the planning stage of a simulation modeling the question of number of simulation runs (n) is critical.
The following Java applets compute the needed Runs Size based on current avialable information
ontained from a pilot simulation run, to achieve an acceptable accuracy and/or risk.
The aim of applying any one of the following number of simulation runs determinates is at improving
your pilot estimates at a feasible cost.
Notes: The normality condition might be relaxed for number of simulation runs over, say 30. Moreover,
determination of number of simulation runs for mean could also be used for other unimodal simulation
output distributions including those with discrete random variables, such as proportion, provided the
pilot run is sufficiently large (say, over 30).
Current Estimate: 5
Current Estimate: 5
The vast amount of simulation software available can be overwhelming for the new users. The following
are only a random sample of software in the market today:
ACSL, APROS, ARTIFEX, Arena, AutoMod, C++SIM, CSIM, Call$im, FluidFlow, GPSS, Gepasi,
JavSim, MJX, MedModel, Mesquite, Multiverse, NETWORK, OPNET Modeler, POSES++, Simulat8,
Powersim, QUEST, REAL, SHIFT, SIMPLE++, SIMSCRIPT, SLAM, SMPL, SimBank, SimPlusPlus,
TIERRA, Witness, SIMNON, VISSIM, and javasim.
There are several things that make an ideal simulation package. Some are properties of the package,
such as support, reactivity to bug notification, interface, etc. Some are properties of the user, such as
their needs, their level of expertise, etc. For these reasons asking which package is best is a sudden
failure of judgment. The first question to ask is for what purpose you need the software? Is it for
education, teaching, student-projects or research?
The main question is: What are the important aspects to look for in a package? The answer depends on
specific applications. However some general criteria are: Input facilities, Processing that allows some
programming, Optimization capability, Output facilities, Environment including training and support
services, Input-output statistical data analysis capability, and certainly the Cost factor.
You must know which features are appropriate for your situation, although, this is not based on a "Yes"
or "No" judgment.
Animation in systems simulation is a useful tool. Most graphically based software packages have default
animation. This is quite useful for model debugging, validation, and verification. This type of animation
comes with little or no additional effort and gives the modeler additional insight into how the model.
This type of animation comes with little or no additional effort and gives the modeler additional insight
into how the model works. However, it augments the modeling tools available. The more realistic
animation presents qualities which intend to be useful to the decision-maker in implementing the
developed simulation model. There are also, good model management tools. Some tools have been
developed which combined a database with simulation to store models, data, results, and animations.
However, there is not one product that provides all of those capabilities.
SIMSCRIPT II.5
Without computer one cannot perform any realistic dynamic systems simulation.
DESIGN:
A powerful worldview, consisting of Entities and Processes, provides a natural conceptual framework
with which to relate real objects to the model.
PROGRAMMING:
SIMSCRIPT II.5 is a modern, free-form language with structured programming constructs and all the
built-in facilities needed for model development. Model components can be programmed so they clearly
reflect the organization and logic of the modeled system. The amount of program needed to model a
system is typically 75% less than its FORTRAN or C counterpart.
DEBUGGER:
A well designed package of program debug facilities is provided. The required tools are available to
detect errors in a complex computer program without resorting an error. Simulation status information is
provided, and control is optionally transferred to a user program for additional analysis and output.
EVOLUTION:
This structure allows the model to evolve easily and naturally from simple to detailed formulation as
data becomes available. Many modifications, such as the choice of set disciplines and statistics are
simply specified in the Preamble.
DOCUMENTATION:
You get a powerful, English-like language supporting a modular implementation. Because each model
component is readable and self-contained, the model documentation is the model listing; it is never
obsolete or inaccurate.
Network Access/Utilities
Connect UBE
Username:
Password:
Get $ sign
$EDT PROG.SIM
$SIMSCRIPT
$SIMSCOMP PROG.SIM
$EDT PROG.LIST
$SIMLINK PROG
$RUN PROG
Then, submit
$Submit PROG.COM
An Example:
Preamble
Define V as a real 1-dimensional arrays
Define I, N as integer variables
End
Main
Open 3 for output, Name = "PROG.OUT"
Use 3 for ouTPUT
LET N=50
Reserve V(*) as N
LET V(1)=1.
For I = 1 to N-1
DO
LET U=V(I)
LET PPRAM=-9./((1+U)**2) + 9./((2.+U)**2) + 1./U**2
LET V(I+1)=V(I)+PPRAM/I
LOOP
PRINT 1 LINE WITH V(N) AND PPRAM THUS
OPTIMAL RATE IS ****.*****, DERIVATIVE IS ***.*****
END
The output
The modeling techniques used by system dynamics and discrete event simulations are often different at
two levels: The modeler way of representing systems might be different, the underlying simulators'
algorithms are also different. Each technique is well tuned to the purpose it is intended. However, one
may use a discrete event approach to do system dynamics and vice versa.
Traditionally, the most important distinction is the purpose of the modeling. The discrete event approach
is to find, e.g., how many resources the decision maker needs such as how many trucks, and how to
arrange the resources to avoid bottlenecks, i.e., excessive of waiting lines, waiting times, or inventories.
While the system dynamics approach is to prescribe for the decision making to, e.g., timely respond to
any changes, and how to change the physical structure, e.g., physical shipping delay time, so that
inventories, sales, production, etc.
System dynamics is the rigorous study of problems in system behavior using the principles of feedback,
dynamics and simulation. In more words system dynamics is characterized by:
Searching for useful solutions to real problems, especially in social systems (businesses, schools,
governments,...) and the environment.
Using computer simulation models to understand and improve such systems.
Basing the simulation models on mental models, qualitative knowledge and numerical
information.
Using methods and insights from feedback control engineering and other scientific disciplines to
assess and improve the quality of models.
Seeking improved ways to translate scientific results into achieved implemented improvement.
Systems dynamics approach looks at systems at a very high level so is more suited to strategic
analysis. Discrete event approach may look at subsystems for a detailed analysis and is more
suited, e.g., to process re-engineering problems.
Systems dynamics is indicative, i.e., helps us understand the direction and magnitude of effects
(i.e., where in the system do we need to make the changes), whereas discrete event approach is
predictive (i.e., how many resources do we need to achieve a certain goal of throughout).
Systems dynamics analysis is continuous in time and it uses mostly deterministic analysis,
whereas discrete event process deals with analysis in a specific time horizon and uses stochastic
analysis.
Some interesting and useful areas of system dynamics modeling approach are:
Short-term and long term forecasting of agricultural produce with special reference to field crops
and perennial fruits such as grapes, which have significant processing sectors of different
proportions of total output where both demand and supply side perspectives are being
considered.
Long term relationship between the financial statements of balance sheet, income statement and
cash flow statement balanced against scenarios of the stock market's need to seek a
stable/growing share price combined with a satisfactory dividend and related return on
shareholder funds policy.
Managerial applications include the development and evaluation of short-term and long-term
strategic plans, budget analysis and assessment, business audits and benchmarking.
A modeler must consider both as complementary tools to each other. Systems dynamic to look at the
high level problem and identify areas which need more detailed analysis. Then, use discrete event
modeling tools to analyze (and predict) the specific areas of interest.
Social scientists have always constructed models of social phenomena. Simulation is an important
method for modeling social and economic processes. In particular, it provides a "middle way" between
the richness of discursive theorizing and rigorous but restrictive mathematical models. There are
different types of computer simulation and their application to social scientific problems.
Faster hardware and improved software have made building complex simulations easier. Computer
simulation methods can be effective for the development of theories as well as for prediction. For
example, macro-economic models have been used to simulate future changes in the economy; and
simulations have been used in psychology to study cognitive mechanisms.
The field of 'social simulation' seems to be following an interesting line of inquiry. As a general
approach in the field, a 'world' is specified with much computational detail. Then the 'world' is simulated
(using computers) to reveal some of the 'non-trivial' implications (or 'emergent properties') of the 'world'.
When these 'non trivial' implications are made known (fed back) in world, apparently it constitutes some
'added values'.
Artificial Life is an interdisciplinary study enterprise aimed at understanding life-as-it-is and life-as-it-
could-be, and at synthesizing life-like phenomena in chemical, electronic, software, and other artificial
media. Artificial Life redefines the concepts of artificial and natural, blurring the borders between
traditional disciplines and providing new media and new insights into the origin and principles of life.
Simulation allows the social scientist to experiment with ‘artificial societies' and explore the
implications of theories in ways not otherwise possible.
Web-based simulation is quickly emerging as an area of significant interest for both simulation
researchers and simulation practitioners. This interest in web-based simulation is a natural outgrowth of
the proliferation of the World-Wide Web and its attendant technologies, e.g. HTML, HTTP, CGI, etc.
Also the surging popularity of, and reliance upon, computer simulation as a problem solving and
decision support systems tools.
The appearance of the network-friendly programming language, Java, and of distributed object
technologies like the Common Object Request Broker Architecture (CORBA) and the Object Linking
and Embedding / Component Object Model (OLE/COM) have had particularly acute effects on the state
of simulation practice.
Currently, the researchers in the field of web-based simulation are interested in dealing with topics such
as methodologies for web-based model development, collaborative model development over the
Internet, Java-based modeling and simulation, distributed modeling and simulation using web
technologies, and new applications.
The increasing size of the systems and designs requires more efficient simulation strategies to accelerate
the simulation process. Parallel and distributed simulation approaches seem to be a promising approach
in this direction. Current topics under extensive research are:
System modeling for parallel simulation, specification, re-use of models/code, and parallelizing existing
simulations.
Language and implementation issues, models of parallel simulation, execution environments, and
libraries.
Theoretical and empirical studies, prediction and analysis, cost models, benchmarks, and comparative
studies.
Web based distributed simulation such as multimedia and real time applications, fault tolerance,
implementation issues, use of Java, and CORBA.
Simulation continues to be the primary method by which engineers and managers obtain information
about complex stochastic systems, such as telecommunication networks, health service, corporate
planning, financial modeling, production assembly lines, and flexible manufacturing systems. These
systems are driven by the occurrence of discrete events; and complex interactions within these discrete
events occur over time. For most discrete event systems (DES) no analytical methods are available, so
DES must be studied via simulation. DES are studied to understand their performance, and to determine
the best ways to improve their performance. In particular, one is often interested in how system
performance depends on the system's parameter v, which could be a vector.
DES's system performance is often measured as an expected value. Consider a system with continuous
parameter v V Rn, where V is an open set. Let
be the steady state expected performance measure, where Y is a random vector with known probability
density function (pdf), f(y; v) depends on v, and Z is the performance measure.
In discrete event systems, Monte Carlo simulation is usually needed to estimate J(v) for a given value v
= v0. By the law of large numbers
converges to the true value, where yi, i = 1, 2,..., n are independent, identically distributed, random
vector realizations of Y from f (y; v 0), and n is the number of independent replications.
There are a number of areas where sensitivity information (the gradient, Hessian, etc.) of a performance
measure J(v) or some estimate of it, is used for the purpose of analysis and control. In what follows, we
single out a few such areas and briefly discuss them.
Local information: An estimate for dJ/dv is a good local measure of the effect of on performance. For
example, simply knowing the sign of the derivative dJ/dv at some point v immediately gives us the
direction in which v should be changed. The magnitude of dJ/d? also provides useful information in an
initial design process: If dJ/dv is small, we conclude that J is not very sensitive to changes in , and hence
focusing concentration on other parameters may improve performance.
Structural properties: Often sensitivity analysis provides not only a numerical value for the sample
derivative, but also an expression which captures the nature of the dependence of a performance
measure on the parameter v . The simplest case arises when dJ/dv can be seen to be always positive (or
always negative) for any sample path; we may not be able to tell if the value of J(v) is monotonically
increasing (or decreasing) in v . This information in itself is very useful in design and analysis. More
generally, the form of dJ/dv can reveal interesting structural properties of the DES (e.g., monotonicity,
convexity). Such properties must be exploited in order to determine optimal operating policies for some
systems.
Response surface generation: Often our ultimate goal is to obtain the function J(v), i.e., a curve
describing how the system responds to different values of v . Since J(v) is unknown, one alternative is to
obtain estimates of J(v) for as many values of v as possible. This is clearly a prohibitively difficult task.
Derivative information, however may include not only first-order but also higher derivatives which can
be used to approximate J(v). If such derivative information can be easily and accurately obtained, the
task of response surface generation may be accomplished as well.
Goal-seeking and What-if problems: Stochastic models typically depend upon various uncertain
parameters that must be estimated from existing data sets. Statistical questions of how input parameter
uncertainty propagates through the model into output parameter uncertainty is the so-called "what-if"
analysis. A good answer to this question often requires sensitivity estimates. The ordinary simulation
output results are the solution of a direct problem: Given the underlying pdf with a particular parameter
value v , we may estimate the output function J(v). Now we pose the goal-seeking problem: given a
target output value J0 of the system and a parameterized pdf family, find an input value for the
parameter, which generates such an output. There are strong motivations for both problems. When v is
any controllable or uncontrollable parameter [the decision maker is, for example, interested in
estimating J(v) for a small change in v ], the so called "what-if" problem, which is a "direct problem"
and can be solved by incorporating sensitivity information in the Taylor's expansion of J(v) in the
neighborhood of v . However, when v is a controllable input, the decision maker may be interested in
the goal-seeking problem: what change in the input parameter will achieve a desired change in output
value J(v). Another application of goal-seeking arises when we want to adapt a model to satisfy a new
equality constraint (condition) for some stochastic function. The solution to the goal-seeking problem is
to estimate the derivative of the output function with respect to the input parameter for the nominal
system; use this estimate in a Taylor's expansion of the output function in the neighborhood of the
parameter; and finally, use Robbins-Monro (R-M) type of stochastic approximation algorithm to
estimate the necessary controllable input parameter value within the desired accuracy.
Optimization: Discrete-event simulation is the primary analysis tool for designing complex systems.
However, simulation must be linked with a mathematical optimization technique to be effectively used
for systems design. The sensitivity dJ/dv can be used in conjunction with various optimization
algorithms whose function is to gradually adjust v until a point is reached where J(v) is maximized (or
minimized). If no other constraints on v are imposed, we expect dJ/dv = 0 at this point .
Kiefer and Wolfowitz proposed a finite difference approximation to the derivative. One version of the
Kiefer-Wolfwitz technique uses two-sided finite differences. The first fact to notice about the K-W
estimate is that it requires 2N simulation runs, where N is the dimension of vector parameter . If the
decision maker is interested in gradient estimation with respect to each of the components of , then 2N
simulations must be run for each component of v . This is inefficient. The second fact is that it may have
a very poor variance, and it may result in numerical calculation difficulties.
The simultaneous perturbation (SP) algorithm introduced by Dr. J. Spall has attracted considerable
attention. There has recently been much interest in recursive optimization algorithms that rely on
measurements of only the objective function to be optimized, not requiring direct measurements of the
gradient of the objective function. Such algorithms have the advantage of not requiring detailed
modeling information describing the relationship between the parameters to be optimized and the
objective function. For example, many systems involving complex simulations or human beings are
difficult to model, and could potentially benefit from such an optimization approach. The simultaneous
perturbation stochastic approximation (SPSA) algorithm operates in the same framework as the above
K-W methods, but has the strong advantage of requiring a much lower number of simulation runs to
obtain the same quality of result. The essential feature of SPSA, which accounts for its power and
relative ease of use in difficult multivariate optimization problems--is the underlying gradient
approximation that requires only TWO objective function measurements regardless of the dimension of
the optimization problem (one variation of basic SPSA uses only ONE objective function measurement
per iteration). The underlying theory for SPSA shows that the N-fold savings in simulation runs per
iteration (per gradient approximation) translates directly into an N-fold savings in the number of
simulations to achieve a given quality of solution to the optimization problem. In other words, the K-W
method and SPSA method take the same number of iterations to converge to the answer despite the N-
fold savings in objective function measurements (e.g., simulation runs) per iteration in SPSA.
Perturbation analysis
Perturbation analysis (PA) computes (roughly) what simulations would have produced, had v been
changed by a "small" amount without actually making this change. The intuitive idea behind PA is that a
sample path constructed using v is frequently structurally very similar to the sample path using the
perturbed v . There is a large amount of information that is the same for both of them. It is wasteful to
throw this information away and to start the simulation from scratch with the perturbed v . In PA,
moreover, we can let the change approach zero to get a derivative estimator without numerical
problems. We are interested in the affect of a parameter change on the performance measure. However,
we would like to realize this change by keeping the order of events exactly the same. The perturbations
will be so small that only the duration, not the order, of the states will be affected. This effect should be
observed in three successive stages:
Step 1: How does a change in the value of a parameter vary the sample duration related to that
parameter?
Step 2: How does the change in individual sample duration reflect itself as a change in a subsequent
particular sample realization?
Step 3: Finally, what is the relationship between the variation of the sample realization and its expected
value?
Using the score function method, the gradient can be estimated simultaneously, at any number of
different parameter values, in a single-run simulation. The basic idea is that, the gradient of the
performance measure function, J'( v ), is expressed as an expectation with respect to the same
distribution as the performance measure function itself. Therefore, the sensitivity information can be
obtained with little computational (not simulation) cost, while estimating the performance measure. It is
well-known that the crude form of the SF estimator suffers from the problem of linear growth in its
variance as the simulation run increases. However, in the steady-state simulation the variance can be
controlled by run length. Furthermore, information about the variance may be incorporated into the
simulation algorithm. A recent flurry of activity has attempted to improve the accuracy of the SF
estimates. Under regenerative conditions, the estimator can easily be modified to alleviate this problem,
yet the magnitude of the variance may be large for queueing systems with heavy traffic intensity. The
heuristic idea is to treat each component of the system (e.g. each queue) separately, which
synchronously assumes that individual components have "local" regenerative cycles. This approach is
promising since the estimator remains unbiased and efficient while the global regenerative cycle is very
long.
Now we look at the general (non-regenerative) case. In this case any simulation will give a biased
estimator of the gradient, as simulations are necessarily finite. If n (the length of the simulation) is large
enough, this bias is negligible. However, as noted earlier, the variance of the SF sensitivity estimator
increases with increase in n so, a crude SF estimator is not even approximately consistent. There are a
number of ways to attack this problem. Most of the variations in an estimator comes from the score
function. The variation is especially high, when all past inputs contribute to the performance and the
scores from all are included. When one uses batch means, the variation is reduced by keeping the length
of the batch small.
A second way is to reduce the variance of the score to such an extent that we can use simulations long
enough to effectively eliminate the bias. This is the most promising approach. The variance may be
reduced further by using the standard variance reduction techniques (VRT), such as importance
sampling. Finally, we can simply use a large number of iid replications of the simulation.
Harmonic analysis
Another strategy for estimating the gradient simulation is based on the frequency domain method, which
differs from the time domain experiments in that the input parameters are deterministically varied in
sinusoidal patterns during the simulation run, as opposed to being kept fixed as in the time domain runs.
The range of possible values for each input factor should be identified. Then the values of each input
factor within its defined range should be changed during a run. In time series analysis, t is the time
index. In simulation, however, t is not necessarily the simulation clock time. Rather, t is a variable of the
model, which keeps track of certain statistics during each run. For example, to generate the inter-arrival
times in a queueing simulation, t might be the variable that counts customer arrivals.
Frequency domain simulation experiments identify the significant terms of the polynomial that
approximates the relationship between the simulation output and the inputs. Clearly, the number of
simulation runs required to identify the important terms by this approach is much smaller than those of
the competing alternatives, and the difference becomes even more conspicuous as the number of
parameters increases.
PA and SF (or LR) can be unified. Further comparison of the PA and SF approaches reveals several
interesting differences. Both approaches require an interchange of expectation and differentiation.
However, the conditions for this interchange in PA depend heavily on the nature of the problem, and
must be verified for each application, which is not the case in SF. Therefore, in general, it is easier to
satisfy SF unbiased conditions. PA assumes that the order of events in the perturbed path is the same as
the order in the nominal path, for a small enough change in v , allowing the computation of the
sensitivity of the sample performance for a particular simulation. For example, if the performance
measure is the mean number of customer in a busy period, the PA estimate of the gradient with respect
to any parameter is zero! The number of customers per busy period will not change if the order of events
does not change.
In terms of ease of implementation, PA estimators may require considerable analytical work on the part
of algorithm developer, with some "customization" for each application, whereas SF has the advantage
of remaining a general definable algorithm whenever it can be applied.
Perhaps the most important criterion for comparison lies in the question of accuracy of an estimator,
typically measured through its variance. If an estimator is strongly consistent, its variance is gradually
reduced over time and ultimately approaches to zero. The speed with which this happens may be
extremely important. Since in practice, decisions normally have to be made in a limited time, an
estimator whose variance decreases fast is highly desirable. In general, when PA does provide unbiased
estimators, the variance of these estimators is small. PA fully exploits the structure of DES and their
state dynamics by extracting the needed information from the observed sample path, whereas SF
requires no knowledge of the system other than the inputs and the outputs. Therefore when using SF
methods, variance reduction is necessary. The question is whether or not the variance can be reduced
enough to make the SF estimator useful in all situations to which it can be applied. The answer is
certainly yes. Using the standard variance reduction techniques can help, but the most dramatic variance
reduction occurs using new methods of VR such as conditioning, which is shown numerically to have a
mean squared error that is essentially the same as that of PA.
Discrete event simulation is the primary analysis tool for designing complex systems. Simulation,
however, must be linked with a optimization techniques to be effectively used for systems design. We
present several optimization techniques involving both continuous and discrete controllable input
parameters subject to a variety of constraints. The aim is to determine the techniques most promising for
a given simulation model.
Many man-made systems can be modeled as Discrete Event Systems (DES); examples are computer
systems, communication networks, flexible manufacturing systems, production assembly lines, and
traffic transportation systems. DES evolve with the occurrence of discrete events, such as the arrival of a
job or the completion of a task, in contrast with continuously variable dynamic processes such as
aerospace vehicles, which are primarily governed by differential equations. Owing to the complex
dynamics resulting from stochastic interactions of such discrete events over time, the performance
analysis and optimization of DES can be difficult tasks. At the same time, since such systems are
becoming more widespread as a result of modern technological advances, it is important to have tools
for analyzing and optimizing the parameters of these systems.
Analyzing complex DES often requires computer simulation. In these systems, the objective function
may not be expressible as an explicit function of the input parameters; rather, it involves some
performance measures of the system whose values can be found only by running the simulation model
or by observing the actual system. On the other hand, due to the increasingly large size and inherent
complexity of most man-made systems, purely analytical means are often insufficient for optimization.
In these cases, one must resort to simulation, with its chief advantage being its generality, and its
primary disadvantage being its cost in terms of time and money. Even though, in principle, some
systems are analytically tractable, the analytical effort required to evaluate the solution may be so
formidable that computer simulation becomes attractive. While the price for computing resources
continue to dramatically decrease, one nevertheless can still obtain only a statistical estimate as opposed
to an exact solution. For practical purposes, this is quite sufficient.
These man-made DES are costly, and therefore it is important to operate them as efficiently as possible.
The high cost makes it necessary to find more efficient means of conducting simulation and optimizing
its output. We consider optimizing an objective function with respect to a set of continuous and/or
discrete controllable parameters subject to some constraints.
The above figure illustrates the feedback loop application. Although the feedback concept is not a
simulation but a systemic concept, however, whatever paradigm we use one can always incorporate
feedback. For example, consider a discrete event system (DES) model that employs resources to achieve
certain tasks/processes, by only incorporating decision rules regarding how to manage the stocks and
thence how the resource will be deployed depending on the stock level, clearly, in the system structure
there are feedback loops.
Usually when modelers choose a DES approach they often model the system as open loop or nearly
open loop system, making the system behave as if there where no superior agent controlling the whole
production/service/ process. Closing the loops should be an elemental task that simulation modeler
should take care of, even if the scope does not involve doing it, there must be awareness of system
behavior, particularly if there is known to be that the system if under human decision making
processes/activities.
In almost all simulation models, an expected value can express the system's performance. Consider a
system with continuous parameter v V, where V is the feasible region. Let
J(v) = EY | v [Z (Y)]
be the steady state expected performance measure, where Y is a random vector with known probability
density function (pdf), f(y; v) depends on v, and Z is the performance measure.
In discrete event systems, Monte Carlo simulation is usually needed to estimate J(v) for a given value
v = v 0. By the law of large numbers
converges to the true value, where yi, i = 1, 2,..., n are independent, identically distributed, random
vector realizations of Y from f (y; v 0), and n is the number of independent replications.
The aim is to optimize J(v) with respect to v .
We shall group the optimization techniques for simulation into seven broad categories; namely,
Deterministic Search, Pattern Search, Probabilistic Search, Evolutionary Techniques, Stochastic
Approximation, Gradient Surface, and some Mixtures of the these techniques;
A common characteristic of deterministic search techniques is that they are basically borrowed from
deterministic optimization techniques. The deterministic objective function value required in the
technique is now replaced with an estimate obtained from simulation. By having a reasonably accurate
estimate, one hopes that the technique will perform well.
Deterministic search techniques include heuristic search, complete enumeration, and random search
techniques.
The heuristic search technique is probably most commonly used in optimizing response surfaces. It is
also the least sophisticated scheme mathematically, and it can be thought of as an intuitive and
experimental approach. The analyst determines the starting point and stopping rule based on previous
experience with the system. After setting the input parameters (factors) to levels that appear reasonable,
the analyst makes a simulation run with the factors set at those levels and computes the value of the
response function. If it appears to be a maximum (minimum) to the analyst, the experiment is stopped.
Otherwise the analyst changes parameter settings and makes another run. This process continues until
the analyst believes that the output has been optimized. Suffice it to say that, if the analyst is not
intimately familiar with the process being simulated, this procedure can turn into a blind search and can
expend an inordinate amount of time and computer resources without producing results commensurate
with input. The heuristic search can be ineffective and inefficient in the hand of a novice.
The complete enumeration technique is not applicable to continuous cases, but in discrete space v it does
yield the optimal value of the response variable. All factors ( v ) must assume a finite number of values
for this technique to be applicable. Then, a complete factorial experiment is run. The analyst can
attribute some degree of confidence to the determined optimal point when using this procedure.
Although the complete enumeration technique yields the optimal point, it has a serious drawback. If the
number of factors or levels per factor is large, the number of simulation runs required to find the optimal
point can be exceedingly large. For example, suppose that an experiment is conducted with three factors
having three, four, and five levels, respectively. Also suppose that five replications are desired to
provide the proper degree of confidence. Then 300 runs of the simulator are required to find the optimal
point. Hence, this technique should be used only when the number of unique treatment combinations is
relatively small or a run takes little time.
The random search technique resembles the complete enumeration technique except that one selects a
set of inputs at random. The simulated results based on the set that yields the maximum (minimum)
value of the response function is taken to be the optimal point. This procedure reduces the number of
simulation runs required to yield an 'optimal' result; however, there is no guarantee that the point found
is actually the optimal point. Of course, the more points selected, the more likely the analyst is to
achieve the true optimum. Note that the requirement that each factor assumes only a finite number of
values is not a requirement in this scheme. Replications can be made on the treatment combinations
selected, to increase the confidence in the optimal point. Which strategy is better, replicating a few
points or looking at a single observation on more points, depends on the problem.
Although it is desirable for search procedures to be efficient over a wide range of response surfaces, no
current procedure can effectively overcome non-unimodality (surfaces having more than one local
maximum or minimum). An obvious way to find the global optimal would be to evaluate all the local
optima. One technique that is used when non-unimodality is known to exist, is called the "Las Vegas"
technique. This search procedure estimates the distribution of the local optima by plotting the estimated
J( v ) for each local search against its corresponding search number. Those local searches that produce a
response greater than any previous response are then identified and a curve is fitted to the data. This
curve is then used to project the "estimated incremental" response that will be achieved by one more
search. The search continues until the value of the estimated improvement in the search is less than the
cost of completing one additional search.
It should be noted that a well-designed experiment requires a sufficient number of replications so that
the average response can be treated as a deterministic number for search comparisons. Otherwise, since
replications are expensive, it becomes necessary to effectively utilize the number of simulation runs.
Although each simulation is at a different setting of the controllable variables, one can use smoothing
techniques such as exponential smoothing to reduce the required number of replications.
Pattern search techniques assume that any successful set of moves used in searching for an
approximated optimum is worth repeating. These techniques start with small steps; then, if these are
successful, the step size increases. Alternatively, when a sequence of steps fails to improve the objective
function, this indicates that shorter steps are appropriate so we may not overlook any promising
direction. These techniques start by initially selecting a set of incremental values for each factor.
Starting at an initial base point, they check if any incremental changes in the first variable yield an
improvement. The resulting improved setting becomes the new intermediate base point. One repeats the
process for each of the inputs until one obtains a new setting where the intermediate base points act as
the initial base point for the first variable. The technique then moves to the new setting. This procedure
is repeated, until further changes cannot be made with the given incremental values. Then, the
incremental values are decreased, and the procedure is repeated from the beginning. When the
incremental values reach a pre-specified tolerance, the procedure terminates; the most recent factor
settings are reported as the solution.
The conjugate direction search requires no derivative estimation, yet it finds the optimum of an N-
dimensional quadratic surface after, at most, N-iterations, where the number of iterations is equal to the
dimension of the quadratic surface. The procedure redefines the n dimensions so that a single variable
search can be used successively. Single variable procedures can be used whenever dimensions can be
treated independently. The optimization along each dimension leads to the optimization of the entire
surface.
Two directions are defined to be conjugate whenever the cross-product terms are all zero. The conjugate
direction technique tries to find a set of n dimensions that describes the surface such that each direction
is conjugate to all others.
Using the above result, the technique attempts to find two search optima and replace the nth dimension of
the quadratic surface by the direction specified by the two optimal points. Successively replacing the
original dimension yields a new set of n dimensions in which, if the original surface is quadratic, all
directions are conjugate to each other and appropriate for n single variable searches. While this search
procedure appears to be very simple, we should point out that the selection of appropriate step sizes is
most critical. The step size selection is more critical for this search technique because - during axis
rotation - the step size does not remain invariant in all dimensions. As the rotation takes place, the best
step size changes, and becomes difficult to estimate.
Although quadratic functions are sometimes used, one assumes that performance is linearly related to
the change in the controllable variables for small changes. Assume that a good approximation is a linear
form. The basis of the linear steepest ascent is that each controllable variable is changed in proportion to
the magnitude of its slope. When each controllable variable is changed by a small amount, it is
analogous to determining the gradient at a point. For a surface containing N controllable variables, this
requires N points around the point of interest. When the problem is not an n-dimensional elliptical
surface, the parallel-tangent points are extracted from bitangents and inflection points of occluding
contours. Parallel tangent points are points on the occluding contour where the tangent is parallel to a
given bitangent or the tangent at an inflection point.
An effective technique to overcome local optimality for discrete optimization is the Tabu Search
technique. It explores the search space by moving from a solution to its best neighbor, even if this results
in a deterioration of the performance measure value. This approach increases the likelihood of moving
out of local optima. To avoid cycling, solutions that were recently examined are declared tabu (Taboo)
for a certain number of iterations. Applying intensification procedures can accentuate the search in a
promising region of the solution space. In contrast, diversification can be used to broaden the search to a
less explored region. Much remains to be discovered about the range of problems for which the tabu
search is best suited.
The Hooke and Jeeves pattern search uses two kinds of moves; namely, an exploratory and a pattern
move. The exploratory move is accomplished by doing a coordinate search in one pass through all the
variables. This gives a new "base point" from which a pattern move is made. A pattern move is a jump
in the pattern direction determined by subtracting the current base point from the previous base point.
After the pattern move, another exploratory move is carried out at the point reached. If the estimate of
J(v) is improved at the final point after the second exploratory move, it becomes the new base point. If it
fails to show improvement, an exploratory move is carried out at the last base point with a smaller step
in the coordinate search. The process stops when the step gets "small" enough.
Simplex-based techniques
The simplex-based technique performs simulation runs first at the vertices of the initial simplex; i.e., a
polyhedron in the v -space having N+1 vertices. A subsequent simplex (moving towards the optimum)
are formed by three operations performed on the current simplex: reflection, contraction, and expansion.
At each stage of the search process, the point with the highest J(v) is replaced with a new point foundvia
reflection through the centroid of the simplex. Depending on the value of J(v) at this new point, the
simplex is either expanded, contracted, or unchanged. The simplex technique starts with a set of N+1
factor settings. These N+1 points are all the same distance from the current point. Moreover, the
distance between any two points of these N+1 points is the same. Then, by comparing their response
values, the technique eliminates the factor setting with the worst functional value and replaces it with a
new factor setting, determined by the centroid of the N remaining factor settings and the eliminated
factor setting. The resulting simplex either grows or shrinks, depending on the response value at the new
factor settings. One repeats the procedure until no more improvement can be made by eliminating a
point, and the resulting final simplex is small. While this technique will generally performance well for
unconstrained problems, it may collapse to a point on a boundary of a feasible region, thereby causing
the search to come to a premature halt. This technique is effective if the response surface is generally
bowl- shaped even with some local optimal points.
All probabilistic search techniques select trial points governed by a scan distribution, which is the main
source of randomness. These search techniques include random search, pure adaptive techniques,
simulated annealing, and genetic methods.
Random search
A simple, but very popular approach is the random search, which centers a symmetric probability
density function (pdf) [e.g., the normal distribution], about the current best location. The standard
normal N(0, 1) is a popular choice, although the uniform distribution U[-1, 1] is also common.
A variation of the random search technique determines the maximum of the objective function by
analyzing the distribution of J(v) in the bounded sub-region. In this variation, the random data are fitted
to an asymptotic extreme-value distribution, and J* is estimated with a confidence statement.
Unfortunately, these techniques cannot determine the location of J* , which can be as important as the J
value itself. Some techniques calculate the mean value and the standard deviation of J(v) from the
random data as they are collected. Assuming that J is distributed normally in the feasible region., the
first trial, that yields a J-value two standard deviations within the mean value, is taken as a near-
optimum solution.
Various pure adaptive search techniques have been suggested for optimization in simulation. Essentially,
these techniques move from the current solution to the next solution that is sampled uniformly from the
set of all better feasible solutions.
Evolutionary Techniques
Nature is a robust optimizer. By analyzing nature's optimization mechanism we may find acceptable
solution techniques to intractable problems. Two concepts that have most promise are simulated
annealing and the genetic techniques.
Simulated annealing
Simulated annealing (SA) borrows its basic ideas from statistical mechanics. A metal cools, and the
electrons align themselves in an optimal pattern for the transfer of energy. In general, a slowly cooling
system, left to itself, eventually finds the arrangement of atoms, which has the lowest energy. The is the
behavior, which motivates the method of optimization by SA. In SA we construct a model of a system
and slowly decrease the "temperature" of this theoretical system, until the system assumes a minimal
energy structure. The problem is how to map our particular problem to such an optimizing scheme.
SA as an optimization technique was first introduced to solve problems in discrete optimization, mainly
combinatorial optimization. Subsequently, this technique has been successfully applied to solve
optimization problems over the space of continuous decision variables. SA is a simulation optimization
technique that allows random ascent moves in order to escape the local minima, but a price is paid in
terms of a large increase in the computational time required. It can be proven that the technique will find
an approximated optimum. The annealing schedule might require a long time to reach a true optimum.
Genetic techniques
Genetic techniques (GT) are optimizers that use the ideas of evolution to optimize a system that is too
difficult for traditional optimization techniques. Organisms are known to optimize themselves to adapt
to their environment.
GT differ from traditional optimization procedures in that GT work with a coding of the decision
parameter set, not the parameters themselves; GT search a population of points, not a single point; GT
use objective function information, not derivatives or other auxiliary knowledge; and finally, GT use
probabilistic transition rules, not deterministic rules. GT are probabilistic search optimizing techniques
that do not require mathematical knowledge of the response surface of the system, which they are
optimizing. They borrow the paradigms of genetic evolution, specifically selection, crossover, and
mutation.
Selection: The current points in the space are ranked in terms of their fitness by their respective response
values. A probability is assigned to each point that is proportional to its fitness, and parents (a mating
pair) are randomly selected.
Crossover: The new point, or offspring, is chosen, based on some combination of the genetics of the two
parents.
Mutation: The location of offspring is also susceptible to mutation, a process, which occurs with
probability p, by which a offspring is replaced randomly by a new offspring location.
A generalized GT generates p new offspring at once and kills off all of the parents. This modification is
important in the simulation environment. GT are well suited for qualitative or policy decision
optimization such as selecting the best queuing disciplines or network topologies. They can be used to
help determine the design of the system and its operation. For applications of GT to inventory systems,
job-shop, and computer time-sharing problems. GT do not have certain shortcomings of other
optimization techniques, and they will usually result in better calculated optima than those found with
the traditionally techniques. They can search a response surface with many local optima and find (with a
high probability) the approximate global optimum. One may use GT to find an area of potential interest,
and then resort to other techniques to find the optimum. Recently, several classical GT principles have
been challenged.
Differential Evolution: Differential Evolution (DE) is a genetic type of algorithm for solving continuous
stochastic function optimization. The basic idea is to use vector differences for perturbing the vector
population. DE adds the weighted difference between two population vectors to a third vector. This way,
no separate probability distribution has to be used, which makes the scheme completely self-organizing.
A short comparison
When performing search techniques in general, and simulated annealing or genetic techniques
specifically, the question of how to generate the initial solution arises. Should it be based on a heuristic
rule or on a randomly generated one? Theoretically, it should not matter, but in practice this may depend
on the problem. In some cases, a pure random solution systematically produces better final results. On
the other hand, a good initial solution may lead to lower overall run times. This can be important, for
example, in cases where each iteration takes a relatively long time; therefore, one has to use some clever
termination rule. Simulation time is a crucial bottleneck in an optimization process. In many cases, a
simulation is run several times with different initial solutions. Such a technique is most robust, but it
requires the maximum number of replications compared with all other techniques. The pattern search
technique applied to small problems with no constraints or qualitative input parameters requires fewer
replications than the GT. GT, however, can easily handle constraints, and have lower computational
complexity. Finally, simulated annealing can be embedded within the Tabu search to construct a
probabilistic technique for global optimization.
Two related stochastic approximation techniques have been proposed, one by Robbins and Monro and
one by Kiefer and Wolfowitz. The first technique was not useful for optimization until an unbiased
estimator for the gradient was found. Kiefer and Wolfowitz developed a procedure for optimization
using finite differences. Both techniques are useful in the optimization of noisy functions, but they did
not receive much attention in the simulation field until recently. Generalization and refinement of
stochastic approximation procedures give rise to a weighted average, and stochastic quasi-gradient
methods. These deal with constraints, non-differentiable functions, and some classes of non-convex
functions, among other things.
Kiefer and Wolfowitz proposed a finite difference approximation to the derivative. One version of the
Kiefer-Wolfwitz technique uses two-sided finite differences. The first fact to notice about the K-W
estimate is that it requires 2N simulation runs, where N is the dimension of vector parameter v . If the
decision maker is interested in gradient estimation with respect to each of the components of v , then 2N
simulations must be run for each component of v . This is inefficient. The second fact is that it may have
a very poor variance, and it may result in numerical calculation difficulties.
The original Robbins-Monro (R-M) technique is not an optimization scheme, but rather a root finding
procedure for functions whose exact values are not known but are observed with noise. Its application to
optimization is immediate: use the procedure to find the root of the gradient of the objective function.
Interest was renewed in the R-M technique as a means of optimization, with the development of the
perturbation analysis, score function (known also as likelihood ratio method), and frequency domain
estimates of derivatives. Optimization for simulated systems based on the R-M technique is known as a
"single-run" technique. These procedures optimize a simulation model in a single run simulation with a
run length comparable to that required for a single iteration step in the other methods. This is achieved
essentially be observing the sample values of the objective function and, based on these observations,
updating the values of the controllable parameters while the simulation is running, that is, without
restarting the simulation. This observing-updating sequence is done repeatedly, leading to an estimate of
the optimum at the end of a single-run simulation. Besides having the potential of large computational
savings, this technique can be a powerful tool in real-time optimization and control, where observations
are taken as the system is evolving in time.
Gradient surface method
One may combine the gradient-based techniques with the response surface methods (RSM) for
optimization purposes. One constructs a response surface with the aid of n response points and the
components of their gradients.
The gradient surface method (GSM) combines the virtue of RSM with that of the single- run, gradient
estimation techniques such as Perturbation Analysis, and Score Function techniques. A single simulation
experiment with little extra work yields N + 1 pieces of information; i.e., one response point and N
components of the gradient. This is in contrast to crude simulation, where only one piece of information,
the response value, is obtained per experiment. Thus by taking advantage of the computational
efficiency of single-run gradient estimators. In general, N-fold fewer experiments will be needed to fit a
global surface compared to the RSM. At each step, instead of using Robbins-Monro techniques to locate
the next point locally, we determine a candidate for the next point globally, based on the current global
fit to the performance surface.
The GSM approach has the following advantages; The technique can quickly get to the vicinity of the
optimal solution because its orientation is global [23, 39]. Thus, it produces satisfying solutions quickly;
Like RSM, it uses all accumulated information; And, in addition, it uses gradient surface fitting, rather
than direct performance response-surface fitting via single-run gradient estimators. This significantly
reduces the computational efforts compared with RSM. Similar to RSM, GSM is less sensitive to
estimation error and local optimality; And, finally, it is an on-line technique, the technique may be
implemented while the system is running.
A typical optimization scheme involves two phases: a Search Phase and an Iteration Phase. Most results
in analytic computational complexity assume that good initial approximations are available, and deal
with the iteration phase only. If enough time is spent in the initial search phase, we can reduce the time
needed in the iteration phase. The literature contains papers giving conditions for the convergence of a
process; a process has to be more than convergent in order to be computationally interesting. It is
essential that we be able to limit the cost of computation. In this sense, GSM can be thought of as
helping the search phase and as an aid to limit the cost of computation. One can adopt standard or simple
devices for issues such as stopping rules.
For on-line optimization, one may use a new design in GSM called 'single direction' design. Since for
on-line optimization it may not be advisable or feasible to disturb the system, random design usually is
not suitable.
Post-solution analysis
Stochastic models typically depend upon various uncertain and uncontrollable input parameters that
must be estimated from existing data sets. We focus on the statistical question of how input-parameter
uncertainty propagates through the model into output- parameter uncertainty. The sequential stages are
descriptive, prescriptive and post-prescriptive analysis.
Large deviations can be used to estimate the probability of rare events, such as buffer overflow, in
queueing networks. It is simple enough to be applied to very general traffic models, and sophisticated
enough to give insight into complex behavior.
Simulation has numerous advantages over other approaches to performance and dependability
evaluation; most notably, its modelling power and flexibility. For some models, however, a potential
problem is the excessive simulation effort (time) required to achieve the desired accuracy. In particular,
simulation of models involving rare events, such as those used for the evaluation of communications and
highly-dependable systems, is often not feasible using standard techniques. In recent years, there have
been significant theoretical and practical advances towards the development of efficient simulation
techniques for the evaluation of these systems.
Methodologies include: Techniques based on importance sampling, The "restart" method, and Hybrid
analytic/simulation techniques among newly devised approaches.
With the growing incidence of computer modeling and simulation, the scope of simulation domain must
be extended to include much more than traditional optimization techniques. Optimization techniques for
simulation must also account specifically for the randomness inherent in estimating the performance
measure and satisfying the constraints of stochastic systems. We described the most widely used
optimization techniques that can be effectively integrated with a simulation model. We also described
techniques for post-solution analysis with the aim of theoretical unification of the existing techniques.
All techniques were presented in step-by-step format to facilitate implementation in a variety of
operating systems and computers, thus improving portability.
General comparisons among different techniques in terms of bias, variance, and computational
complexity are not possible. However, a few studies rely on real computer simulations to compare
different techniques in terms of accuracy and number of iterations. Total computational effort for
reduction in both the bias andvariance of the estimate depends on the computational budget allocated for
a simulation optimization. No single technique works effectively and/or efficiently in all cases.
The simplest technique is the random selection of some points in the search region for estimating the
performance measure. In this technique, one usually fixes the number of simulation runs and takes the
smallest (or largest) estimated performance measure as the optimum. This technique is useful in
combination with other techniques to create a multi-start technique for global optimization. The most
effective technique to overcome local optimality for discrete optimization is the Tabu Search technique.
In general, the probabilistic search techniques, as a class, offer several advantages over other
optimization techniques based on gradients. In the random search technique, the objective function can
be non-smooth or even have discontinuities. The search program is simple to implement on a computer,
and it often shows good convergence characteristics in noisy environments. More importantly, it can
offer the global solution in a multi-modal problem, if the technique is employed in the global sense.
Convergence proofs under various conditions are given in.
The Hooke-Jeeves search technique works well for unconstrained problems with less than 20 variables;
pattern search techniques are more effective for constrained problems. Genetic techniques are most
robust and can produce near-best solutions for larger problems. The pattern search technique is most
suitable for small size problems with no constraint, and it requires fewer iterations than the genetic
techniques. The most promising techniques are the stochastic approximation, simultaneous perturbation,
and the gradient surface methods. Stochastic approximation techniques using perturbation analysis,
score function, or simultaneous perturbation gradient estimators, optimize a simulation model in a single
simulation run. They do so by observing the sample values of the objective function, and based on these
observations, the stochastic approximation techniques update the values of the controllable parameters
while the simulation is running and without restarting the simulation. This observing-updating sequence,
done repeatedly, leads to an estimate of the optimum at the end of a single-run simulation. Besides
having the potential of large savings in computational effort in the simulation environment, this
technique can be a powerful tool in real-time optimization and control, where observations are taken as
the system is evolving over time.
Response surface methods have a slow convergence rate, which makes them expensive. The gradient
surface method combines the advantages of the response surface methods (RSM) and efficiency of the
gradient estimation techniques, such as infinitesimal perturbation analysis, score function, simultaneous
perturbation analysis, and frequency domain technique. In the gradient surface method (GSM) the
gradient is estimated, and the performance gradient surface is estimated from observations at various
points, similar to the RSM. Zero points of the successively approximating gradient surface are then
taken as the estimates of the optimal solution. GSM is characterized by several attractive features: it is a
single run technique and more efficient than RSM; at each iteration step, it uses the information from all
of the data points rather than just the local gradient; it tries to capture the global features of the gradient
surface and thereby quickly arrive in the vicinity of the optimal solution, but close to the optimum, they
take many iterations to converge to stationary points. Search techniques are therefore more suitable as a
second phase. The main interest is to figure out how to allocate the total available computational budget
across the successive iterations.
For when the decision variable is qualitative, such as finding the best system configuration, a random or
permutation test is proposed. This technique starts with the selection of an appropriate test statistic, such
as the absolute difference between the mean responses under two scenarios. The test value is computed
for the original data set. The data are shuffled (using a different seed); the test statistic is computed for
the shuffled data; and the value is compared to the value of the test statistic for the original, un-shuffled
data. If the statistics for the shuffled data are greater than or equal to the actual statistic for the original
data, then a counter c, is incremented by 1. The process is repeated for any desired m number of times.
The final step is to compute (c+1)/(m+1), which is the significant level of the test. The null hypothesis is
rejected if this significance level is less than or equal to the specified rejection level for the test. There
are several important aspects to this nonparametric test. First, it enables the user to select the statistic.
Second, assumptions such as normality or equality of variances made for the t-test, ranking-and-
selection, and multiple-comparison procedures, are no longer needed. A generalization is the well-
known bootstrap technique.
The simulation models although simpler than the real-world system, are still a very complex way of
relating input (v) to output J(v). Sometimes a simpler analytic model may be used as an auxiliary to the
simulation model. This auxiliary model is often referred to as a metamodel.
In many simulation applications such as systems analysis and design applications, the decision maker
may not be interested in optimization but wishes to achieve a certain value for J(v), say J0. This is
the goal-seeking problem: given a target output value J0 of the performance and a parameterized pdf
family, one must find an input value for the parameter, which generates such an output.
Metamodeling
The simulation models although simpler than the real-world system, are still a very complex way of
relating input (v) to output J(v). Sometimes a simpler analytic model may be used as an auxiliary to the
simulation model. This auxiliary model is often referred to as a metamodel. There are several techniques
available for metamodeling including: design of experiments, response surface methodology, Taguchi
methods, neural networks, inductive learning, and kriging. Metamodeling may have different purposes:
model simplification and interpretation, optimization, what-if analysis, and generalization to models of
the same type. The following polynomial model can be used as an auxiliary model.
J(v) = J(v0) + v.J' (v0) + (v)2 J' ' (v0) / 2 + ...,
where v = v-v0 and the primes denote derivatives. This metamodel approximates J(v) for small v. To
estimate J(v) in the neighborhood of v0 by a linear function, we need to estimate the nominal J(v) and its
first derivative. Traditionally, this derivative is estimated by crude Monte Carlo; i.e., finite difference
which requires rerunning the simulation model. Methods which yield enhanced efficiency and accuracy
in estimating, at little additional computational (Not simulation) cost, are presented in this site. The
Score Function method of estimating the first derivative is:
where S=f'(y; v) / f(y; v)=d Lnf(y; v) / dv is the Score function and differentiations is with respect to v,
provided that, f'(y; v) exist, and f(y; v) is positive for all v in V.
The Score function approach can be extended in estimating the second and higher order of derivatives.
For example, an estimate for the second derivative based on the Score Function method is:
where,
Where S and H = S' + S2 are the score and information functions, respectively, widely used in statistics
literature, such as in the construction of Cramer-Rao bounds. By having gradient and Hessian in our
disposal, we are able to construct a second order local metamodel using the Taylor's series.
An Illustrative Numerical Example: For most complex reliability systems, the performance measures
such as mean time to failure (MTTF) are not available in analytical form. We resort to Monte Carlo
Simulation (MCS) to estimate MTTF function from a family of single-parameter density functions of the
components life with specific value for the parameter. The purpose of this section is to solve the inverse
problem, which deals with the calculation of the components' life parameters (such as MTTF) of a
homogeneous subsystem, given a desired target MTTF for the system. A stochastic approximation
algorithm is used to estimate the necessary controllable input parameter within a desired range of
accuracy. The potential effectiveness is demonstrated by simulating a reliability system with a known
analytical solution.
Consider the coherent reliability sub-system with four components component 1, and 2 are in series, and
component 3 and 4 also in series, however these two series of components are in parallel, as illustrated
in the following Figure. All components are working independently and are homogeneous; i.e.,
manufactured by an identical process, components having independent random lifetimes Y1, Y2, Y3,
and Y4, which are distributed exponentially with rates v = v0 = 0.5.
It is readily can be shown that the theoretical expected lifetime of this sub-system is
and
The estimated average lifetime and its derivative for the nominal system with v = v0 = 0.5, are:
J(v0 ) = max [min (Y3, j,Y4, j), min (Y1, j, Y2, j)] / n,
J'(v0 ) = max [min (Y3, j,Y4, j), min (Y1, j, Y2, j)] . S(Yi,j) / n,
and
J"(v0 ) = max[min (Y3, j,Y4, j), min (Y1, j, Y2, j)] . H(Yi, j)/n,
respectively, where Yi, j is the jth observation for the ith component (i = 1, 2, 3, 4). We have performed a
Monte Carlo experiment for this system by generating n = 10000 independent replications using
SIMSCRIPT II.5 random number streams 1 through 4 to generate exponential random variables Y1, Y2,
Y3, Y4 , respectively, on a VAX system. The estimated performance is J(0.5) = 1.5024, with a standard
error of 0.0348. The first and second derivatives estimates are -3.0933 and 12.1177 with standard errors
of 0.1126 and 1.3321, respectively.
A numerical comparison based on exact and the approximation by this metamodel reveals that the
largest absolute error is only 0.33% for any v in the range of [0.40, 0.60]. This error could be reduced by
either more accurate estimates of the derivatives and/or using a higher order Taylor expansion. A
comparison of the errors indicates that the errors are smaller and more stable in the direction of
increasing v. This behavior is partly due to the fact that lifetimes are exponentially distributed with
variance 1/v. Therefore, increasing v causes less variance than the nominal system (with v = 0.50).
Goal seeking problem
In many systems modeling and simulation applications, the decision maker may not be interested in
optimization but wishes to achieve a certain value for J(v), say J0. This is the goal-seeking problem:
given a target output value J0 of the performance and a parameterized pdf family, one must find an input
value for the parameter, which generates such an output. When is a controllable input, the decision
maker may be interested in the goal-seeking problem: namely, what change of the input parameter will
achieve a desired change in the output value. Another application of the goal-seeking problem arises
when we want to adapt a model to satisfy a new equality constraint with some stochastic functions. We
may apply the search techniques, but the goal-seeking problem can be considered as an interpolation
based on a meta-model. In this approach, one generates a response surface function for J(v). Finally, one
uses the fitted function to interpolate for the unknown parameter. This approach is tedious, time-
consuming, and costly; moreover, in a random environment, the fitted model might have unstable
coefficients.
For a given J(v) the estimated v, using the first order approximation is:
provided that the denominator does not vanish for all v0 in set V.
The Goal-seeker Module: The goal-seeking problem can be solved as a simulation problem. By this
approach, we are able to apply variance reduction techniques (VRT) used in the simulation literature.
Specifically, the solution to the goal-seeking problem is the unique solution of the stochastic equation
J(v) - J0 = 0. The problem is to solve this stochastic equation by a suitable experimental design, to ensure
convergence. The following is a Robbins- Monro (R-M) type technique.
where dj is any divergent sequence of positive numbers. Under this conditions, v =J0 - J(vj) converges
to approach zero while dampening the effect of the simulation random errors. These conditions are
satisfied, for example, by the harmonic sequence dj = 1/j. With this choice, the rate of reduction of di is
very high initially but may reduce to very small steps as we approach the root. Therefore, a better choice
is, for example dj = 9 / (9 + j). This technique involves placing experiment i+1 according to the outcome
of experiment i immediately preceding it, as is depicted in the following Figure:
Under these not unreasonable conditions, this algorithm will converge in mean square; moreover, it is an
almost sure convergence. Finally, as in Newton's root-finding method, it is impossible to assert that the
method converges for just any initial v = v0, even though J'(v) may satisfy the Lipschits condition over
set V. Indeed, if the initial value v0 is sufficiently close to the solution, which is usually the case, then
this algorithm requires only a few iterations to obtain a solution with very high accuracy.
An application of the goal-seeker module arises when we want to adapt a model to satisfy a new equality
constraint (condition) for some stochastic function. The proposed technique can also be used to solve
integral equations by embedding the Importance Sampling techniques within a Monte Carlo sampling.
One may extend the proposed methodology to the inverse problems with two or more unknown
parameters design by considering two or more relevant outputs to ensure uniqueness. By this
generalization we could construct a linear (or even nonlinear) system of stochastic equations to be
solved simultaneously by a multidimensional version of the proposed algorithm. The simulation design
is more involved for problems with more than a few parameters.
The simulation models are often subject to errors caused by the estimated parameter(s) of underlying
input distribution function. "What-if" analysis is needed to establish confidence with respect to small
changes in the parameters of the input distributions. However the direct approach to "what-if" analysis
requires a separate simulation run for each input value. Since this is often inhibited by cost, as an
alternative, what people are basically doing in practice is to plot results and use a simple linear
interpolation/extrapolation. This section presents some simulation-based techniques that utilize the
current information for estimating performance function for several scenarios without any additional
simulation runs.
Simulation continues to be the primary method by which system analysts obtain information about
analysis of complex stochastic systems. In almost all simulation models, an expectedvalue can express
the system's performance. Consider a system with continuous parameter v V, where V is the feasible
region. Let
be the steady state expected performance measure, where Y is a random vector with known probability
density function (pdf), f(y; v) depends on v, and Z is the performance measure.
In discrete event systems, Monte Carlo simulation is usually needed to estimate J(v) for a given value v.
By the law of large numbers
where yi, i = 1, 2,..., n are independent, identically distributed, random vector realizations of Y from f (y;
v ), and n is the number of independent replications. This is an unbiased estimator for J(v) and converges
to J(v) by law of large numbers.
There are strong motivations for estimating the expected performance measure J(v) for a small change in
v to v + v, that is to solve the so-called "what if" problem.
The simulationist must meet managerial demands to consider model validation and cope with
uncertainty in the estimation of v. Adaptation of a model to new environments also requires an
adjustment in v.
An obvious solution to the "what if" problem is the Crude Monte Carlo (CMC) method, which estimates
J(v+ v) for each v separately by rerunning the system for each v+ v. Therefore costs in CPU time can
be prohibitive The use of simulation as a tool to design complex computer stochastic systems is often
inhibited by cost. Extensive simulation is needed to estimate performance measures for changes in the
input parameters. As as an alternative, what people are basically doing in practice is to plot results of a
few simulation runs and use a simple linear interpolation/extrapolation.
In this section we consider the "What-if" analysis problem by extending the information obtained from a
single run at the nominal value of parameter v to the closed neighborhood. We also present the use of
results from runs at two or more points over the intervening interval. We refer to the former as
extrapolation and the latter as interpolation by simulation. The results are obtained by
some computational cost as opposed to simulation cost. Therefore, the proposed techniques are for
estimating a performance measure at multiple settings from a simulation at a nominal value.
A model based on Radon-Nikodym theorem to estimate J(v+ v) for stochastic systems in a single run is
as follows:
J(v+ v) = EY | v + v[Z (Y)] = EY | v [Z (Y).W]
adjusts the sample path, provided f(y; v) does not vanish. Notice that by this change of probability space,
we are using the common realization as J(v).
The generated random vector y is roughly representative of Y, with f(v). Each of these random
observations, could also hypothetically came from f(v+ v). W weights the observations according to
6this phenomenon.
Therefore, the "What-if" estimate is:
which is based on only one sample path of the system with parameter v and the simulation for the
system with v+ v is not required.
Unfortunately LR produces a larger variance compared with CMC. However, since E(W)=1, the
following variance reduction techniques (VRT) may improve the estimate.
In the statistical literature the efficient score function is defined to be the gradient
S(y) = d Ln f(y; v) / dv
We consider the exponential (approximation) model for J(v+ v) in a first derivative neighborhood of v
by:
J(v+ v) = Ai / Bi
where,
Ai= Z(yi).exp[v.S(yi)]
and
where the prime denotes derivative. This metamodel approximates J(v + v)) for small v. For this
estimate, we need to estimate the nominal J(v) and its first derivative. Traditionally, this derivative is
estimated by crude Monte Carlo; i.e., finite difference, which requires rerunning the simulation model.
Methods which yield enhanced efficiency and accuracy in estimating, at little additional cost, are of
great value.
There are few ways to obtain efficiently the derivatives of the output with respect to an input parameter
as presented earlier on this site. The most straightforward method is the Score Function (SF). The SF
approach is the major method for estimating the performance measure and its derivative, while
observing only a single sample path from the underlying system. The basic idea of SF is that the
derivative of the performance function, J'(v), is expressed as expectation with respect to the same
distribution as the performance measure itself.
Therefore, for example, using the estimated values of J(v) and its derivative J'(v), the estimated J(v + v)
is:
with variance
This variation is needed for constructing a confidence interval for the perturbed estimate.
Interpolation Techniques
Given two points, v1 and v2 (scalars only) "sufficiently" close, one may simulate at these two points
then interpolates for any desired points in between. Assuming the given v1 and v2 are sufficiently close
and looks for the "best" linear interpolation in the sense of minimum error on the interval. Clearly,
J(v) = EY | v [Z (Y)] = . EY | v [Z (Y)] +
(1 - ) . EY | v [Z (Y)]
where the likelihood ratios W1 and W2 are W1 = f(y; v) / f(y; v1) and W2 = f(y; v) / f(y; v2),
respectively.
One obvious choice is = f(y; v1) / [f(y; v1)+f(y; v2)]. This method can easily extended to k-point
interpolation.
For 2-point interpolation, if we let to be constant within the interval [0, 1], then the linear interpolated
"what-if" estimated value is:
where the two estimates on the RHS of are two independent Likelihood Ratio extrapolations using the
two end-points.
We define * as the in this convex combination with the minimum error in the estimate. That is, it
minimizes
Thus, the "best linear" interpolation for any point in interval [v1, v2] is:
Estimating system performance for several scenarios via simulation generally requires a separate
simulation run for each scenario. In some very special cases, such as the exponential density f(y; v)=ve-
vy
, one could have obtained the perturbed estimate using Perturbation Analysis directly as follow.
Clearly, one can generate random variate Y by using the following inverse transformation:
Yi=(1/v) Ln (1/Ui)
where Ln is the natural logarithm and Ui is a random number distributed Uniformly [0,1]. In the case of
perturbed v, the counterpart realization using the same Ui is
Yi=[1/(v+v)] Ln (1/Ui).
Clearly, this single run approach is limited, since the inverse transformation is not always available in
closed form. The following Figure illustrates the Perturbation Analysis Method:
Since the Perturbation Analysis Approach has this serious limitation, for this reason, we presented some
techniques for estimating performance for several scenarios using a single-sample path, such as the
Likelihood Ratio method, which is illustrated in the following Figure.
i) to introduce efficient variance reduction and bias reduction techniques with a view to improving the
accuracy of the existing and the proposed methods;
ii) to incorporate the result of this study in a random search optimization technique. In this approach one
can generate a number of points in the feasible region uniformly distributed on the surface of a hyper-
sphere each stage the value of the performance measure is with a specified radius centered at a starting
point. At estimated at the center (as a nominal value). Perturbation analysis is used to estimate the
performance measure at the sequence of points on the hyper-sphere. The best point (depending whether
the problem is max or min) is used as the center of a smaller hyper- sphere. Iterating in this fashion one
can capture the optimal solution within a hyper-sphere with a specified small enough radius. Clearly,
this approach could be considered as a sequential self-adaptive optimization technique.
iii) to estimate the sensitivities i.e. the gradient, Hessian, etc. of J(v) can be approximated using finite
difference. For example the first derivative can be obtained in a single run using the Likelihood Ratio
method as follows:
or
Other items for further research include some experimentation on large and complex systems such as a
large Jacksonian network with routing that includes feedback loops in order to study the efficiency of
the presented technique.