The principles of geostatistical analysis
IN THIS CHAPTER
3
The Geostatistical Analyst uses sample points taken at different locations in
a landscape and creates (interpolates) a continuous surface. The sample
Understanding deterministic points are measurements of some phenomenon such as radiation leaking
methods from a nuclear power plant, an oil spill, or elevation heights. The
Understanding geostatistical
Geostatistical Analyst derives a surface using the values from the measured
methods locations to predict values for each location in the landscape.
The Geostatistical Analyst provides two groups of interpolation techniques:
Working through a problem deterministic and geostatistical. All methods rely on the similarity of nearby
Basic principles behind sample points to create the surface. Deterministic techniques use
geostatistical methods mathematical functions for interpolation. Geostatistics relies on both
statistical and mathematical methods, which can be used to create surfaces
Modeling a semivariogram and assess the uncertainty of the predictions.
Predicting unknown values with The Geostatistical Analyst, in addition to providing various interpolation
kriging techniques, also provides many supporting tools. These tools allow you to
explore and gain a better understanding of the data so that you create the
The Geostatistical Analyst best surfaces based on the available information.
extension
This chapter will provide an overview of the theory behind deterministic and
geostatistical interpolation techniques. The first part of the chapter will
introduce you to the deterministic interpolation methods. You will then be
exposed to geostatistical methods through an example, and then you will
read about the principles, concepts, and assumptions that provide the
foundation for geostatistics.
49
BasicConcepts.p65 49 03/06/2001, 11:58 AM
Understanding deterministic methods
Generating a continuous surface used to represent a particular As you move farther away from the prediction location, the
measure is a key capability required in most GIS applications. influence of the points will decrease. Considering a point too far
Perhaps the most commonly used surface type is a digital away may actually be detrimental because the point may be
elevation model of terrain. These datasets are readily available at located in an area that is dramatically different from the prediction
small scales for various parts of the world. However, as you have location.
read earlier, just about any measure taken at locations across a One solution is to consider enough points to give a good sample
landscape, subsurface, or atmosphere can be used to generate a but small enough to be practical. The number will vary with the
continuous surface. A major challenge facing most GIS modelers amount and distribution of the sample points and the character of
is to generate the most accurate possible surface from existing the surface. If the elevation samples are relatively evenly distrib-
sample data as well as to characterize the error and variability of uted and the surface characteristics do not change across your
the predicted surface. Newly generated surfaces are used in landscape, you can predict surface values from nearby points
further GIS modeling and analysis as well as in 3D visualization. with reasonable accuracy. To account for the distance relation-
Understanding the quality of this data can greatly improve the ship, the values of closer points are weighted more heavily than
utility and purpose of GIS modeling. This is the role of the those farther away.
Geostatistical Analyst.
Analyzing the surface properties of nearby
locations
Generally speaking, things that are closer together tend to be
more alike than things that are farther apart. This is a fundamental
geographic principal (Tobler, 1970). Suppose you are a town
planner, and you need to build a scenic park in your town. You
have several candidate sites, and you may want to model their
viewsheds at each location. This will require a more detailed
elevation surface dataset for your study area. Suppose you have
preexisting elevation data for 1,000 locations throughout the
town. You can use this to build a new elevation surface.
When trying to build the elevation surface, you can assume that
the sample values closest to the prediction location will be similar.
But how many sample locations should you consider? And
should all of the sample values be considered equally?
This is the basis for the Inverse Distance Weighting (IDW)
interpolation technique. As its name implies, the weight of a value
decreases as the distance increases from the prediction location.
50 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 50 03/06/2001, 11:58 AM
Visualizing global polynomial interpolation
There are other solutions for predicting the values for unmea-
sured locations. Another proposed site for the observation area
is on the face of a gently sloping hill. The face of the hill is a
sloping plane. However, the locations of the samples are in slight
depressions or on small mounds (local variation). Using the local
neighbors to predict a location may over or underestimate
because of the influence of depressions and mounds. Further,
you may pick up the local variation and may not capture the
overall sloping plane (referred to as the trend). The ability to
identify and model local structures and surface trends can
increase the accuracy of your predicted surface.
But what if you were trying to fit the plane to a landscape that is a
To base your prediction on the overriding trend, you can fit a valley? You will have a difficult task obtaining a good surface
plane between the sample points. A plane is a special case of a from a plane. However, if you are allowed one bend in the plane
family of mathematical formulas called polynomials. You then (see image below), you may be able to obtain a better fit (get
determine the unknown height from the value on the plane for the closer to more values). To allow one bend is the basis for second-
prediction location. The plane may be above certain points and order global polynomial interpolation. Two bends in the plane
below others. The goal for interpolation is to minimize error. You would be a third-order polynomial, and so forth. The bends can
can measure the error by subtracting each measured point from occur in both directions, possibly resulting in a bowl-shaped
its predicted value on the plane, squaring it, and adding the surface.
results together. This sum is referred to as a least-squares fit.
This process is the theoretical basis for the first-order global
polynomial interpolation.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 51
BasicConcepts.p65 51 03/06/2001, 11:58 AM
Visualizing local polynomial interpolation force the surface to form nice curves (thin-plate spline), or you
can control how tightly you pull on the edges of the surface
Now what happens if the area slopes, levels off, and then slopes (spline with tension). This is the conceptual framework for
again? Asking you to fit a flat plane through this study site would interpolators based on radial basis functions.
give poor predictions for the unmeasured values. However, if you
are permitted to fit many smaller overlapping planes, and then use
the center of each plane as the prediction for each location in the
study area, the resulting surface will be more flexible and perhaps
more accurate. This is the conceptual basis for local polynomial
interpolation.
Visualizing radial basis functions
Radial basis functions enable you to create a surface that
captures global trends and picks up the local variation. This helps
in cases where fitting a plane to the sample values will not
accurately represent the surface.
To create the surface, suppose you have the ability to bend and
stretch the predicted surface so that it passes through all of the
measured values. There are many ways you can predict the shape
of the surface between the measured points. For example, you can
52 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 52 03/06/2001, 11:58 AM
Understanding geostatistical methods
Geostatistical solutions Fit a modelthis is done by defining a line that provides the best
fit through the points in the empirical semivariogram cloud graph.
So far, the techniques that we have discussed are referred to as That is, you need to find a line such that the (weighted) squared
deterministic interpolation methods because they are directly difference between each point and the line is as small as possible.
based on the surrounding measured values or on specified This is referred to as the (weighted) least-squares fit. This line is
mathematical formulas that determine the smoothness of the considered a model that quantifies the spatial autocorrelation in
resulting surface. A second family of interpolation methods your data.
consists of geostatistical methods that are based on statistical
models that include autocorrelation (statistical relationships Create the matricesthe equations for ordinary kriging are
among the measured points). Not only do these techniques have contained in matrices and vectors that depend on the spatial
the capability of producing a prediction surface, but they can also autocorrelation among the measured sample locations and
provide some measure of the certainty or accuracy of the prediction location. The autocorrelation values come from the
predictions. semivariogram model described above. The matrices and vectors
determine the kriging weights that are assigned to each measured
The following example will guide you through the basic steps of value.
geostatistics using ordinary kriging.
Make a predictionfrom the kriging weights for the measured
Kriging is similar to IDW in that it weights the surrounding values, you can calculate a prediction for the location with the
measured values to derive a prediction for each location. How- unknown value.
ever, the weights are based not only on the distance between the
measured points and the prediction location but also on the
overall spatial arrangement among the measured points. To use
the spatial arrangement in the weights, the spatial autocorrelation
must be quantified.
To solve the geostatistical example, you will walk you through a
series of steps.
Calculate the empirical semivariogramkriging, like most
interpolation techniques, is built on the assumption that things
that are close to one another are more alike than those farther
away (quantified here as spatial autocorrelation). The empirical
semivariogram is a means to explore this relationship. Pairs that
are close in distance should have a smaller measurement differ-
ence than those farther away from one another. The extent that
this assumption is true can be examined in the empirical semivari-
ogram.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 53
BasicConcepts.p65 53 03/06/2001, 11:58 AM
Working through a problem
Suppose you have gone out and collected five elevation points in where
your landscape. The configuration of the points is displayed in Z(si) is the measured value at the ith location, for example,
orange on the map below. Beside each point, the spatial coordi- Z(1,5) = 100;
nates are given as (X,Y).
li is an unknown weight for the measured value at the ith
location;
Y
s0 is the prediction location, for example, (1,4); and
5 Values:
N = 5 for the five measured values.
(1,5) (4,5) at (1,5) observe = 100
4 This is the same type of predictor as for IDW interpolation.
(3,4) at (3,4) observe = 105 However, in IDW, the weight, li, depends solely on the distance
3
at (1,3) observe = 105 to the prediction location. In ordinary kriging, the weight, li,
2
(1,3) depends on the semivariogram, the distance to the prediction
at (4,5) observe = 100
location, and the spatial relationships among the measured values
1 at (5,1) observe = 115 around the prediction location.
(5,1)
When making predictions for several locations, expect some of
1 2 3 4 5 X the predictions to be above the actual values and some below. On
average, the difference between the predictions and the actual
values should be zero. This is referred to as making the prediction
The kriging equations unbiased. To ensure the predictor is unbiased for the unknown
You will use ordinary kriging to predict a value for location X = 1 measurement, the sum of the weight li must equal one. Using this
and Y = 4, coordinate (1,4), which is called the prediction location constraint, make sure the difference between the true value, Z(s0),
(yellow point on the map). The ordinary kriging model is and the predictor, SliZ(si), is as small as possible. That is,
minimize the statistical expectation of the following formula,
Z(s) = m + e(s)
where s=(X,Y) is a location; one of the sample locations is 2
N
s=(1,5), and Z(s) is the value at that location; for example, Z (s0 ) i Z (si )
Z(1,5) = 100. The model is based on a constant mean m for the i =1
data (no trend) and random errors e(s) with spatial dependence.
Assume that the random process e(s) is intrinsically stationary.
These assumptions are discussed in the next sections. The
predictor is formed as a weighted sum of the data,
N
Z (s 0 ) = i Z (si )
i =1
54 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 54 3/21/01, 8:05 AM
from which the kriging equations were obtained. By minimizing its Calculating the empirical semivariogram
expectation, on average, the kriging predictor is as close as
possible to the unknown value. The solution to the minimization, To compute the values for the G matrix, we must examine the
constrained by unbiasedness, gives the kriging equations, structure of the data by creating the empirical semivariogram. In a
semivariogram, half the difference squared between the pairs of
locations (the y-axis) is plotted relative to the distance that
separates them (the x-axis).
= g
The first step in creating the empirical semivariogram is to
L 1
or
calculate the distance and squared difference between each pair
11 1 10
M O M M
M
M of locations. The distance between two locations is calculated by
1N
using the Euclidean distance:
L 1 * =
N1
L 1 0 N N0
NN
1 m 1 dij = ( xi x j )2 + ( yi y j ) 2
The empirical semivariance is 0.5 times the difference squared
These equations will also become more understandable when the 0.5 * average[(value at location i - value at location j)2 ].
values are filled in for the matrix and vectors in the following
section. Remember, the goal is to solve the equations for all of the Locations Distance Cal. Distances Difference 2 Semivariance
lis (the weights), so the predictor can be formed by using (1,5),(3,4) sqrt[(1-3)2 + (5-4)2] 2.236 25 12.5
(1,5),(1,3) sqrt[02 + 22] 2 25 12.5
SiliZ(si).
(1,5),(4,5) sqrt[32 + 02] 3 0 0
Most of the elements can be filled in if you know the semivari- (1,5),(5,1) sqrt[42 + 42] 5.657 225 112.5
ogram. In the next few sections, you will see how to calculate the (3,4),(1,3) sqrt[22 + 12] 2.236 0 0
semivariogram values. The gamma matrix G contains the modeled (3,4),(4,5) sqrt[12 + 12] 1.414 25 12.5
semivariogram values between all pairs of sample locations, (3,4),(5,1) sqrt[22 + 32] 3.606 100 50
where gij denotes the modeled semivariogram values based on the (1,3),(4,5) sqrt[32 + 22] 3.606 25 12.5
sqrt[42 + 22]
distance between the two samples identified as the ith and jth (1,3),(5,1) 4.472 100 50
(4,5),(5,1) sqrt[12 + 42] 4.123 225 112.5
locations. The vector g contains the modeled semivariogram
values between each measured location and the prediction
location, where gi0 denotes the modeled semivariogram values As you can see, with larger datasets (more measured samples) the
based on the distance between the ith sample location and the number of pairs of locations will increase rapidly and will quickly
prediction location. The unknown m in the vector l is also become unmanageable. Therefore, you can group the pairs of
estimated and it arises because of the unbiasedness constraint. locations, which is referred to as binning. In this example, a bin is
a specified range of distances. That is, all points that are within 0
to 1 meter apart are grouped into the first bin, those that are
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 55
BasicConcepts.p65 55 03/06/2001, 11:58 AM
within 1+ to -2 meters apart are grouped into the second bin, and For simplicity, the model that you will fit is a least-squares
so forth. The average empirical semivariance of all pairs of points regression line, and you will force it to have a positive slope and
is taken. In the following example, the data is placed into pass through zero. In the Geostatistical Analyst, there are many
five bins. more models that could be fit.
The formula to determine the semivariance at any given distance
Binning the Empirical Semivariogram in this example is:
Lag Distance Pairs Distance Av. Distance Semivariance Average
1+-2 1.414, 2 1.707 12.5, 12.5 12.5 Semivariance = Slope * Distance
2+-3 2.236, 2.236, 3 2.491 12.5, 0, 0 4.167
3+-4 3.606, 3.606 3.606 50, 12.5 31.25 Slope is the slope of the fitted model. Distance is the distance
4+-5 4.472, 4.123 4.298 50, 112.5 81.25 between pairs of locations and is symbolized as h. In the example,
5+ 5.657 5.657 112.5 112.5 the semivariance for any distance can be determined by:
Semivariance = 13.5*h
Fitting a model Now create the G matrix. For example, gi2 for the locations (1,5)
Now you can plot the average semivariance versus average and (3,4) in the equation is:
distance of the bins onto a graphthe empirical semivariogram. Semivariance = 13.5 * 2.236 = 30.19
But the empirical semivariogram values cannot be used directly in
the G matrix because you might get negative standard errors for (1, 5) (3, 4) (1, 3) (4, 5) (5, 1)
the predictions; instead, you must fit a model to the empirical Matrix (Gamma)
semivariogram. Once the model is fit, you will use the fitted model (1, 5) 0 30.19 27.0 40.5 76.37 1
when determining semivariogram values for various distances. (3, 4) 30.19 0 30.19 19.09 48.67 1
(1, 3) 27.0 30.19 0 48.67 60.37 1
(4, 5) 40.5 19.09 48.67 0 55.66 1
Variance (5, 1) 76.37 48.67 60.37 55.66 0 1
1 1 1 1 1 0
150
In the example above, for pair (1,5) and (3,4), the lag distance was
Empirical
120
calculated using the distance between the two locations (see the
previous table). The semivariogram value is found by multiplying
90 the slope 13.5 times the distance. The 1s and 0 in the bottom row
Fitted and the rightmost column arise due to unbiasedness constraints.
60
30
1 2 3 4 5 6 Distance
56 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 56 03/06/2001, 11:58 AM
The matrix formula for ordinary kriging is: Y
G*l = g 5
(1,5) (4,5)
Now the G matrix has been produced, but it is necessary to solve 4
for l, which contains the weights to assign to the measured (1,4) (3,4)
values surrounding the prediction location. Thus, perform simple 3
matrix algebra and get the following formula: (1,3)
2
l = G-1 * g 1
where G is the inverse matrix of G. By performing basic linear
-1 (5,1)
algebra, the inverse of G is obtained. 1 2 3 4 5 X
Inverse of Matrix (Gamma) Now that the G matrix and the g vector have been created, solve
-0.02575 0.00704 0.0151 0.00664 -0.00303 0.3424 for the kriging weights vector: l = G-1 * g. Use linear algebra to
0.00704 -0.04584 0.01085 0.02275 0.0052 -0.22768 do so. The weights are given in the table below.
0.0151 0.01085 -0.02646 -0.00471 0.00522 0.17869
0.00664 0.02275 -0.00471 -0.02902 0.00433 0.28471
-0.00303 0.0052 0.00522 0.00433 -0.01173 0.42189
Making a prediction
0.3424 -0.22768 0.17869 0.28471 0.42189 -41.701 Now that you have the weights, multiply the weight for each
measured value times the value. Add the products together and,
Next, the g vector is created for the unmeasured location that we finally, you have the final prediction for location (1,4).
wish to predict. For example, use location (1,4). Calculate the
distance from (1,4) to each of the measured points (1,5), (3,4),
(1,3), (4,5), and (5,1). From these distances, determine the fitted Weights Values Product
0.46757 100 46.757
semivariance using the formula Semivariance = 13.5* h, which
0.09834 105 10.3257
was derived earlier. The g vector for (1,4) is given in the following 0.46982 105 49.3311
table. -0.02113 100 -2.113
-0.0146 115 -1.679
Point Distance g Vector for (1,4) -0.18281 102.6218 Kriging Predictor
(1,5) 1 13.5
(3,4) 2 27.0
(1,3) 1 13.5 Next, examine the results. The following figure shows the weights
(4,5) 3.162 42.69 (in parentheses) of the measured locations for predicting the
(5,1) 5 67.5 unmeasured location (1,4).
1
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 57
BasicConcepts.p65 57 03/06/2001, 11:58 AM
the prediction location. In the example, the prediction interval
5 Values: ranges from 95.49 to 109.75 (102.62 + 1.96 * 3.64 ).
(0.46757) (-0.02113)
4
(1,5) = 100
G Vector Weights () g Vector Times Weights
102.50 (0.09834) (3,4) = 105 13.5 0.46757 6.312195
3
(1,3) = 105 27.0 0.09834 2.65518
(0.46982) 13.5 0.46982 6.34257
2
(4,5) = 100 42.69 -0.02113 -0.90204
1 (5,1) = 115 67.5 -0.0146 -0.9855
(-0.0146)
1 -0.18281 -0.18281
1 2 3 4 5
Kriging Variance 13.2396
Kriging Std Error 3.6386
As expected, the weights decrease with distance but are more
refined than a straight distance weighting since they account for
the spatial arrangement of the data. The prediction appears to be
reasonable.
Kriging variance
One of the strengths of using a statistical approach is that it is
possible to also calculate a statistical measure of uncertainty for
the prediction. To do so, multiply each entry in the l vector times
each entry in the g vector and add them together to obtain what
is known as the predicted kriging variance. The square root of the
kriging variance is called the kriging standard error.
In this case, the kriging standard error value is 3.6386. If it is
assumed that the errors are normally distributed, 95 percent
prediction intervals can be obtained in the following way:
Kriging Predictor + 1.96*sqrt(kriging variance)
The value 1.96 comes from the standard normal distribution where
95 percent of the probability is contained from -1.96 to 1.96. The
prediction interval can be interpreted as follows. If predictions are
made again and again from the same model, in the long run 95
percent of the time the prediction interval will contain the value at
58 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 58 03/06/2001, 11:58 AM
Basic principles behind geostatistical methods
Random processes with dependence been said that geostatistics uses the data twice: first to estimate
the spatial autocorrelation and second to make the predictions.
Unlike the deterministic interpolation approaches, geostatistics
assumes that all values in your study area are the result of a Understanding stationarity
random process. A random process does not mean that all events
are independent as with each flip of a coin. Geostatistics is based Consider again the coin example. There is a unique dependence
on random processes with dependence. For an example, flip rule among the coins. With only one set of measured values,
three coins and determine if they are heads or tails. The fourth there is no hope of knowing the dependence rules without being
coin will not be flipped. The rule to determine how to lay the told what they are. However, through continued observations of
fourth coin is if the second and third coins are heads, then lay the numerous samples, dependencies become apparent. In general,
fourth coin the same as the first; otherwise, lay the fourth coin statistics relies on some notion of replication, where it is believed
opposite to the first. estimates can be derived and the variation and uncertainty of the
estimate can be understood from repeated observations.
In a spatial or temporal context, such dependence is called
autocorrelation. In a spatial setting, the idea of stationarity is used to obtain the
necessary replication. Stationarity is an assumption that is often
reasonable for spatial data. There are two types of stationarity.
One is called mean stationarity. Here it is assumed that the mean
is constant between samples and is independent of location.
The second type of stationarity is called second-order
stationarity for covariance and intrinsic stationarity for
semivariograms. Second-order stationarity is the assumption that
Prediction for random processes with the covariance is the same between any two points that are at the
dependence same distance and direction apart no matter which two points you
choose. The covariance is dependent on the distance between
How does this relate to geostatistics and predicting unmeasured any two values and not on their locations. For semivariograms,
values? In the coin example, the dependence rules were given. In instrinsic stationarity is the assumption that the variance of the
reality, the dependency rules are unknown. In geostatistics there difference is the same between any two points that are at the
are two key tasks: (1)to uncover the dependency rules and (2) to same distance and direction apart no matter which two points you
make predictions. As you can see from the example, the predic- choose.
tions come from first knowing the dependency rules.
Second-order and intrinsic stationarity are assumptions neces-
Kriging is based on these same two tasks: (1) semivariogram and sary to get the necessary replication to estimate the dependence
covariance functions (spatial autocorrelation) and (2) prediction
of unknown values. Because of these two distinct tasks, it has
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 59
BasicConcepts.p65 59 03/06/2001, 11:58 AM
rules, which allows us to make predictions and assess uncertainty
in the predictions. Notice that it is the spatial information (similar
distance between any two points) that provides the replication.
The coin example is dependent (the first and second coins are
independent, but the first and fourth are dependent), so this
random process does not have second-order stationarity.
60 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 60 03/06/2001, 11:58 AM
Modeling a semivariogram
The following sections will further discuss how a semivariogram Once youve created the empirical semivariogram, you can fit a
is created.Assuming stationarity, the autocorrelation can be line to the points forming the empirical semivariogram model. The
examined and quantified. In geostatistics this is called spatial modeling of a semivariogram is similar to fitting a least-squares
modeling, also known as structural analysis or variography. In line in regression analysis. Some function is selected that serves
spatial modeling of the semivariogram, begin with a graph of the as the model, for example, a spherical type that rises at first and
empirical semivariogram, computed as, then levels off for larger distances beyond a certain range.
Semivariogram(distance h) = 0.5 * average [ (value at location i - The basic goal is to calculate the parameters of the curve to
value at location j)2] minimize the deviations from the points according to some
for all pairs of locations separated by distance h. The formula criterion. There are a lot of different semivariogram models to
involves calculating half the difference squared between the choose from. See Chapter 7 for more details and recommendations
values of the paired locations.To plot all pairs quickly becomes on how to choose a model. Now you will go through each of
unmanageable. Instead of plotting each pair, the pairs are these steps in detail.
grouped into lag bins. For example, compute the average
semivariance for all pairs of points that are greater than 40 meters
Creating the empirical semivariogram
but less than 50 meters apart. The empirical semivariogram is a To create an empirical semivariogram, determine the squared
graph of the averaged semivariogram values on the y-axis and difference between the values for all pairs of locations. When
distance (or lag) on the x-axis (see diagram below). these are plotted, with half the squared difference on the y-axis
and the distance that separates the locations on the x-axis, it is
called the semivariogram cloud. The scene below shows the
pairings of one location (the red point) with 11 other locations.
Again, note that it is the intrinsic stationarity assumption that
allows replication. Thus it is possible to use averaging in the
semivariogram formula above.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 61
BasicConcepts.p65 61 3/21/01, 8:06 AM
One of the main goals of variography is to explore and quantify their distance from one another. This grouping process is known
the spatial dependence, also called the spatial autocorrelation. as binning.
Spatial autocorrelation quantifies the assumption that things that Binning is a two-stage process. First, form pairs of points, and
are closer are more alike than things farther apart. Thus, pairs of second, group the pairs so that they have a common distance and
locations that are closer (far left on the x-axis of the semivari- direction. In the landscape scene of 12 locations, you can see the
ogram cloud) would have more similar values (low on the y-axis of pairing of all the locations with one location, the red point. Similar
the semivariogram cloud). As pairs of locations become farther colors for the links between pairs indicate similar bin distances.
apart (moving to the right on the x-axis of the semivariogram
cloud), they should become more dissimilar and have a higher
squared difference (move up on the y-axis of the semivariogram
cloud).
Binning the empirical semivariogram
As you can see from the landscape of locations in the previous This process continues for all possible pairs. You can see that in
page and the semivariogram cloud above, plotting each pair of the pairing process the number of pairs increases rapidly with the
locations quickly becomes unmanageable. There are so many addition of each location. This is why, for each bin, only the
points that the plot becomes congested, and little can be inter- average distance and semivariance for all the pairs in that bin are
preted from it. To reduce the number of points in the empirical plotted as a single point on the empirical semivariogram cloud
semivariogram, the pairs of locations will be grouped based on graph.
62 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 62 03/06/2001, 11:58 AM
In the second stage of the binning process, pairs are grouped
based on common distances and directions. Imagine a graph so
each point has a common origin. This property makes the
empirical semivariogram symmetric. Always put the links to the
right of the vertical axis.
1
2
3
4
The figure below shows all possible pairwise links among all
12locations. The points are rotated to orient north to the top.
Now, you can see that links 1 and 2 have a fairly similar distance
and direction. Each cell in the grid forms a bin. Links 1 and 2 fall
1 into the same bin, which is colored yellow. For link 1 form the
squared difference from the values at the two locations that are
linked, and do likewise for link 2. Then these are averaged and
2 multiplied by 0.5 to give one empirical semivariogram value for the
bin.
Perform the same process for another bin, the one colored green,
3 with links 3 and 4. To keep things simple, only four links are
shown, but of course there are many, many more.
4
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 63
BasicConcepts.p65 63 03/06/2001, 11:58 AM
For each bin, form the squared difference from the values for all
pairs of locations that are linked, and these are then averaged and
multiplied by 0.5 to give one empirical semivariogram value per
bin. In the Geostatistical Analyst, you can control the lag size and
number of bins. The empirical semivariogram value in each bin is
color coded and is called the semivariogram surface.
Bin
Center of semivariogram surface
In the graph above, the empirical semivariogram value for each
In the figure above, there are seven bins horizontally and bin for each direction is plotted as a red dot, where the y-axis is
vertically from the center of the semivariogram surface. For the the empirical semivariogram value and the x-axis is the distance
bins, the cool colors (blue and green) are lower values, and the from the center of the bin to the origin (center of semivariogram
warm colors (red and orange) are higher values. As you can surface). The color bar on the right matches the colors on the
see, in general, the empirical semivariogram values increase as the semivariogram surface. By binning and averaging the semivari-
bins get farther away from the origin. This indicates that values ogram cloud values, it is much more obvious that dissimilarity
are more dissimilar with increasing distance. Also notice the increases with distance. The yellow line in the figure above is a
symmetry that we described earlier. fitted semivariogram model, which will be discussed shortly.
The Geostatistical Analyst also gives a plot of the empirical
semivariogram.
64 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 64 03/06/2001, 11:58 AM
An alternative method that is often used for grouping the pairs be graphed in the grouping process or by graphing all pairs and
into bins is based on radial sectors (see the figure below). The considering only the portion of the graph representing a certain
Geostatistical Analyst does not use this method. direction. The scene below depicts a directional binning of 90
degrees, a bandwidth of five meters, an angle tolerance of 45
degrees, and a lag distance of five meters from a single sample
point (in blue).
Bandwidth
1
2
3
4
Empirical semivariograms for different directions
Sometimes the values for the measured locations will contain a
directional influence that can be statistically quantified but
perhaps cannot be explained by any known identifiable process. Lag distance
This directional influence is known as anisotropy. The angle of
tolerance will determine the angle in which close points will be
The directional search continues for each sample point and
included or excluded until it reaches the bandwidth. The band-
direction on the surface.
width specifies how wide the search should be when determining
which pairs of points will be plotted in the semivariogram. The
points in the bins are pairs of locations that are within certain
distances and directions apart. You can conceptually view
directional binning either by limiting the pairs of points that will
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 65
BasicConcepts.p65 65 03/06/2001, 11:58 AM
The scene below shows the directional binning of three points.
Notice that fewer pairs of locations will be included in the
grouping process than with the omnidirectional semivariogram in
the previous example.
Choosing the lag size
The selection of a lag size has important effects on the empirical
semivariogram. For example, if the lag size is too large, short-
range autocorrelation may be masked. If the lag size is too small,
there may be many empty bins, and sample sizes within bins will
be too small to get representative averages for bins.
When samples are located on a sampling grid, the grid spacing is
usually a good indicator for lag size. However, if the data is
The pairs are then binned according to common distances and acquired using an irregular or random sampling scheme, the
directions, the bins are averaged, and the average of the pairs for selection of a suitable lag size is not so straightforward. A rule of
each bin is plotted on the semivariogram. thumb is to multiply the lag size times the number of lags, which
should be about half of the largest distance among all points.
Alternatively, in the grid method of binning described earlier, all Also, if the range of the fitted semivariogram model is very small,
of the pairs can be binned, and you can make directional subsets relative to the extent of the empirical semivariogram, then you can
as illustrated below. The bin will be plotted on the semivariogram decrease the lag size. Conversely, if the range of the fitted
cloud graph if the center of the cell on the semivariogram surface semivariogram model is large, relative to the extent of the empiri-
is included in the search direction. cal semivariogram, you can increase the lag size. Semivariogram
models are discussed next.
66 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 66 03/06/2001, 11:58 AM
Fitting a model to the empirical semivariogram The diagrams below show two common models and identify how
the functions differ:
Semivariogram/Covariance modeling is a key step between spatial
description and spatial prediction. Earlier, it was described how to
fit a semivariogram model and how it is used in the kriging
equations (gamma matrix, G, and g vector). The main application
of geostatistics is the prediction of attribute values at unsampled
locations (kriging).
So far, youve read how the empirical semivariogram and covari-
ance provide information on the spatial autocorrelation of
datasets. However, they do not provide information for all
possible directions and distances. For this reason and to ensure
that kriging predictions have positive kriging variances, it is The Spherical model
necessary to fit a model (i.e., a continuous function or curve) to
the empirical semivariogram/covariance. This model shows a progressive decrease of spatial autocorrela-
tion (equivalently, an increase of semivariance) until some
Abstractly, this is similar to regression analysis, where a continu- distance, beyond which autocorrelation is zero. The spherical
ous line or a curve of various types is fit. model is one of the most commonly used models.
Different types of semivariogram models
The Geostatistical Analyst provides the following functions to
choose from to model the empirical semivariogram: Circular,
Spherical, Tetraspherical, Pentaspherical, Exponential, Gaussian,
Rational Quadratic, Hole Effect, K-Bessel, J-Bessel, and Stable.
The selected model influences the prediction of the unknown
values, particularly when the shape of the curve near the origin
differs significantly. The steeper the curve near the origin, the
more influence the closest neighbors will have on the prediction.
As a result, the output surface will be less smooth. Each model is The Exponential model
designed to fit different types of phenomena more accurately. This model is applied when spatial autocorrelation decreases
exponentially with increasing distance, disappearing completely
only at an infinite distance. The exponential model is also
commonly used.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 67
BasicConcepts.p65 67 03/06/2001, 11:58 AM
Understanding a semivariogramthe range, sill, The nugget
and nugget
Theoretically, at zero separation distance (i.e., lag = 0), the
As previously discussed, the semivariogram depicts the spatial semivariogram value should be zero. However, at an infinitesi-
autocorrelation of the measured sample points. Once each pair of mally small separation distance, the difference between measure-
locations is plotted (after binning), a model is fit through them. ments often does not tend to zero. This is called the nugget
There are certain characteristics that are commonly used to effect. For example, if the semivariogram model intercepts the y-
describe these models. axis at 2, then the nugget is 2.
The nugget effect can be attributed to measurement errors or
The range and sill spatial sources of variation at distances smaller than the sampling
When you look at the model of a semivariogram, you will notice interval (or both). Measurement error occurs because of the error
that at a certain distance the model levels out. The distance where inherent in measuring devices. Natural phenomena can vary
the model first flattens out is known as the range. Sample spatially over a range of scales. Variation at micro scales smaller
locations separated by distances closer than the range are than the sampling distances will appear as part of the nugget
spatially autocorrelated, whereas locations farther apart than the value. Before collecting data, it is important to gain some under-
range are not. standing of the scales of spatial variation that interest you.
The value that the semivariogram model attains at the range (the
value on the y-axis) is called the sill. The partial sill is the sill
minus the nugget.
Sill
Nugget
Range
68 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 68 03/06/2001, 11:58 AM
Accounting for directional influencestrend and
anisotropy
There are two types of directional components that can affect the
predictions in your output surface: global trends and directional
influences on the semivariogram/covariance (known as anisot-
ropy). A global trend is an overriding process that affects all
measurements in a deterministic manner. The global trend can be
represented by a mathematical formula (e.g., a polynomial) and
removed from the analysis of the measured points but added back
in before predictions are made. This process is referred to as
detrending (see Chapter 7, Using analytical tools when generat-
ing surfaces).
An example of a global trend can be seen in the effects of the
prevailing winds on a smoke stack at a factory (right). In the
image, the higher concentrations of pollution are depicted in the
warm colors (reds and yellows) and the lower concentrations in
the cool colors (greens and blues). Notice that the values of the
pollutant change more slowly in the eastwest direction than in
the northsouth direction. This is because eastwest is aligned
with the wind while northsouth is perpendicular to the wind.
The shape of the semivariogram/covariance curve may also vary
with direction (anisotropy) after the global trend is removed or if
no trend exists. Anisotropy differs from the global trend dis-
cussed above because the global trend can be described by a
physical process (in the example above, the prevailing winds) and
modeled by a mathematical formula. The cause of the anisotropy
(directional influence) in the semivariogram is not usually known,
so it is modeled as random error. Even without knowing the
cause, anisotropic influences can be quantified and accounted
for.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 69
BasicConcepts.p65 69 03/06/2001, 11:58 AM
Anisotropy is usually not a deterministic process that can be In this example, because of anisotropy, when the empirical
described by a single mathematical formula. It does not have a semivariogram for the measured points is plotted, you can see
single source or influence that predictably affects all measured that the spatial relationship is different for two directions. In the
points. Anisotropy is a characteristic of a random process that northsouth direction the shape of the semivariogram curve
shows higher autocorrelation in one direction than in another. increases more rapidly before leveling out.
The following image shows conceptually how the process might
look. Once again, the higher concentrations of pollution are Variable A
depicted in the warm colors (reds and yellows) and the lower
concentrations in the cool colors (greens and blues). The random 0.04
process shows undulations that are shorter in one direction than 0.035
another. These undulations could be the result of some unknown 0.03
or unmeasurable physical process but are modeled as a random
Semivariance
0.025
process with directional autocorrelation.
0.02
EastWest
0.015
NorthSouth
0.01
0.005
11
13
15
Lag (m)
For anisotropy, the shape of the semivariogram may vary with
direction. Isotropy exists when the semivariogram does not vary
according to direction.
70 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 70 03/06/2001, 11:58 AM
Combining variogram models The points go up, straighten out, and then bend again to level off
to the sill. You suppose that there are two distinct structures in
Many times there are two or more processes that will dictate the the data and a single model will not capture it. You may model the
spatial distribution of some phenomenon. For instance, the semivariogram with two separate models (e.g., spherical and
quantity of vegetation (the biomass) may be related to elevation exponential) and combine them into a single model.
and soil moisture. If this relationship is known, it is possible to
use cokriging to predict biomass. You could use the measured Representing multiple distinct random processes through a single
values of biomass as dataset one, elevation as dataset two, and variogram is discouraged, and it is best to separate the spatial
soil moisture as dataset three (see Chapter 6, Creating a surface processes whenever possible. However, the causal relationships
with geostatistical techniques). You might fit different variogram are not always understood. The choice of multiple models adds
models to each dataset because each exhibits different spatial more parameters to estimate and is a subjective exercise that you
structure. That is, the spherical model might fit elevation best, the perform by eye and then quantify by cross-validation and
exponential model might fit soil moisture best, and a combination validation statistics (see Chapter 7, Using analytical tools when
of the models might fit biomass best. The models can then be generating surfaces).
combined in a way that best fits the structure of the data. The Geostatistical Analyst allows you to select up to
However, sometimes you do not know the causal relationships of three models in addition to a nugget effect model. In the example
the factors that are determining the spatial structure in some above, the model consists of three components: a nugget effect
phenomenon. Using the same example of biomass above, you model and two spherical models with different ranges.
may only have the sample points measuring the biomass. When
you examine the variogram, you notice distinct inflection points.
0.18
0.16
0.14
Semivariance
0.12
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10
Lag (m)
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 71
BasicConcepts.p65 71 03/06/2001, 11:58 AM
Using the Geostatistical Analyst to fit a model to a The spatial configuration of the measured points, their values,
semivariogram and the prediction location are displayed in ArcMap below.
The example that was presented earlier in this chapter was
simplified to make it easier to understand. To demonstrate the
concept of modeling a semivariogram, more sample points will be
used.
Ten measured sample points have been taken of elevation.
Point X-Coordinate Y-Coordinate Value
Number
1 1 3 105
2 1 5 100
3 1 6 95
4 3 4 105
5 3 6 105
6 4 5 100
7 5 1 115
8 6 3 120 In the first two panels of the Geostatistical Wizard, you specify
9 6 6 110 the dataset, the prediction field, and the kriging method (in this
10 7 1 120 case, Ordinary Kriging). The third panel contains the semivari-
ogram modeling dialog box. Here our goal is to fit a semivariogram
model to the empirical semivariogram. You can see the list of
In this example, a value for the location x=2.75, y=2.75 will be available models below. In the previous example, a simple straight
predicted where the value is currently unknown (the yellow point line is fitted, but you can see there are many more choices. Each
in the image below). model is slightly different so that you can fit the best one
possible. Chapter 7, Using analytical tools when generating
surfaces, provides more information on the models.
72 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 72 03/06/2001, 11:58 AM
3 h 1 h 3
for 0 h r
(h) = s 2 r 2 r
s for r < h
where
qs is the sill value,
h is the lag vector, and h is the length of h (distance between 2
locations),
qr is the range of the model.
Note the parameters of the spherical model are in blue lettering at
the bottom left of the dialog box. This indicates a spherical model
is being used with a sill value of 86.1, a range of 6.96, and zero
nugget. Therefore, the calculated semivariogram values using the
selected spherical model will be:
In this example, the spherical model will be fitted to the empirical
semivariogram. The formula for the spherical model is given here.
As you can see, the formula is more involved than the simple line g(h) = 86.1*(1.5*(h/6.96)-0.5(h/6.96)3), for all lag values 6.96
used in the previous example in this chapter. However, the two
and
serve the same purpose with differing results.
g(h) = 86.1, for all lag values >6.96
This is similar to finding the semivariogram value for a given
distance, h, on the fitted line in our previous example, once the
line was fitted, the values for the matrix and vectors were deter-
mined for the ordinary kriging equation from the line. Here the
same can be done using the fitted spherical model.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 73
BasicConcepts.p65 73 03/06/2001, 11:58 AM
Kriging
Like IDW interpolation, kriging forms weights from surrounding The shape of the neighborhood is influenced by the input data
measured values to predict values at unmeasured locations. As and the surface that you are trying to create. If there are no
with IDW interpolation, the closest measured values have the directional influences on the spatial autocorrelation of your data,
most influence. However, the kriging weights for the surrounding then you will want to consider points equally in all directions. To
measured points are more sophisticated than those of IDW. IDW do so, you will probably want the shape of your neighborhood to
uses a simple algorithm based on distance, but kriging weights be a circle. However, if there is directional autocorrelation in your
come from a semivariogram that was developed by looking at the data, then you may want the shape of your neighborhood to be
spatial structure of the data. To create a continuous surface or an ellipse with the major axis parallel with the direction of long-
map of the phenomenon, predictions are made for locations in the range autocorrelation.
study area based on the semivariogram and the spatial arrange- The searching neighborhood can be specified in Step 3 of the
ment of measured values that are nearby. Geostatistical Wizard. Once a neighborhood shape is specified,
you can also restrict which locations within the shape should be
Searching neighborhood used. You can define the maximum and minimum number of
It can be assumed that as the locations get farther from the neighbors to include. You can also divide the neighborhood into
prediction location, the measured values will have less spatial sectors to ensure you get values from all directions. If you divide
autocorrelation with the prediction location. Thus, it is possible the neighborhood into sectors, the specified maximum and
to eliminate locations that are farther away that demonstrate little minimum number of neighbors will be applied to each sector.
influence using search neighborhoods. Not only is there less
relationship with locations that are farther away, but it is possible
that the locations that are farther away may have a detrimental
influence if they are located in an area much different than the
prediction location. Another reason to use search neighborhoods
is for computational speed. Recall from the first example that a
5-x-5 matrix was inverted. If you had 2,000 data locations, the
matrix would be too large to invert. The smaller the search
neighborhood, the faster the predictions can be made. As a result,
it is common practice to limit the number of points used in a
prediction by specifying a search neighborhood.
The specified shape of the neighborhood restricts how far and
where to look for the measured values to be used in the predic-
tion. Other neighborhood parameters restrict the locations that
will be used within that shape.
74 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 74 03/06/2001, 11:58 AM
There are several different sector types that can be used (below). Neighbors to
include = 5
One sector Eight sectors
Ellipse with four sectors Search strategy:
circle with four
quadrants.
Using the data configuration within the specified neighborhood, Radius = 3
in conjunction with the fitted semivariogram model, the weights
for the measured locations can be determined. From the weights Coordinates of
and the values, a prediction can be made for the unknown value test point
(x=2.75, y=2.75)
for the prediction location. This process is performed for each
spatial location to create a model of the continuous surface.
Creating a prediction surface using neighborhood
Points to Weights Prediction
searching be used = 107.59
As the datasets get larger and cover more area, you will want to
The locations that are used to predict the unknown value for the
limit which measured points you consider when predicting. If you
desired location (2.75, 2.75) are highlighted and color coded
consider points too far away, they may be in areas much different
(according to percentage size of coefficients l i)in the dialog view
than the prediction location. You will want to include enough
box. The points in the neighborhood are:
points in your calculations for a good sampling, but you do not
want to include those that are too far away from the prediction Neighborhood Original Point x-coordinate y-coordinate Value
location because either they contribute very little or they come Point Number Number
from an area unlike the area in which you are predicting (below). 1 1 1 3 105
In the dialog box below a circular neighborhood with a radius of 3 2 2 1 5 100
is specified, and the maximum number of neighbors to include 3 4 3 4 105
4 6 4 5 100
is 5. 5 7 5 1 115
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 75
BasicConcepts.p65 75 03/06/2001, 11:58 AM
The predicted value for the desired location (x=2.75, y=2.75) is Repeat this process for each pair of points to produce the G
107.59. The Geostatistical Analyst predicted the value by solving matrix. To keep the notation clear in the matrices, the points have
the ordinary kriging equations. been renumbered as shown in the top left.
G* l = g and solving for the weights l = G-1 * g i 1 2 3 4 5 6
With the spherical semivariogram model and with the trimmed 1 0.000 36.091 40.065 60.920 71.564 1.000
2 36.091 0.000 40.065 52.221 81.855 1.000
down set of measured values identified through the neighbor-
3 40.065 40.065 0.0000 25.881 60.920 1.000
hood search, it is possible to solve for l in the equation above. 4 60.920 52.221 25.881 0.000 67.559 1.000
First, create the G matrix. This is done by calculating the dis- 5 71.564 81.855 60.920 67.559 0.000 1.000
tances between the pairs of points and substituting them into the 6 1.000 1.000 1.000 1.000 1.000 0.000
fitted spherical model,
g(h) = 86.1*(1.5*(h/6.96)-0.5(h/6.96)3), for 0 < h < 6.96 Next find the inverse G-1.
The distances between the measured points are: i 1 2 3 4 5 6
1 -0.0191 0.01005 0.00776 -0.0021 0.00336 0.2114
Points Distance Points Distance 2 0.01005 -0.0187 0.00472 0.00402 -0.0001 0.24891
1,2 2.000 2,4 3.000 3 0.00776 0.00472 -0.0317 0.01619 0.00304 -0.1038
4 -0.0021 0.00402 0.01619 -0.0214 0.00324 0.27739
1,3 2.236 2,5 5.657 5 0.00336 -0.0001 0.00304 0.00324 -0.0095 0.36607
6 0.2114 0.24891 -0.1038 0.27739 0.36607 -47.922
1,4 3.605 3,4 1.414
1,5 4.472 3,5 3.606 Now it is necessary to create the g vector to solve the ordinary
2,3 2.236 4,5 4.124 kriging equation, l = G-1 * g. To do so, calculate the distance of
If the distance (h) between points 1 and 3 is substituted, where each of the five measured locations in our neighborhood to the
h=2.236, the semivariogram value is: prediction location (2.75, 2.75). The distances are:
g(h) = 86.1*(1.5*(2.236/6.96)-0.5(2.236/6.96)3) = 40.065 From x=2.75, y=2.75
Points Distance
1 1.768
2 2.850
3 1.275
4 2.574
5 2.850
76 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 76 03/06/2001, 11:58 AM
The g vector is created by substituting each of the distances into Prediction = 0.355 * 105 0.073 * 100 + 0.529 * 105
the fitted spherical model. - 0.022 * 100 + 0.211 * 115
From x=2.75, y=2.75
Points Fitted Semivariance Prediction = 107.59
1 32.097
i i Valuei
2 49.936
1 0.355 105
3 23.390
2 -0.073 100
4 45.584
3 0.529 105
5 49.936
4 -0.022 100
6 1.000
5 0.211 115
The extra row in the g vector (and the extra row and column in the
G matrix) has been added to ensure the weights sum to 1 (i.e., Repeating this for many prediction locations and mapping the
using the Lagrange multiplier explained further in Appendix A). results produces the prediction surface shown below.
Now solve for the weights of the l vector. An example of solving
for the weight of point 1 is:
l 1
= (-0.019*32.097 + 0.01005*49.936 + 0.00776*23.390
-0.0021*45.584 + 0.00336*49.936 + 0.2114*1.000)
= 0.355
The weights for all of the points and the Lagrange multiplier
(entry number 6) are:
Points li
1 0.355
2 -0.073
3 0.529
4 -0.022
5 0.211
6 -0.210
Output surfaces can be created with the Geostatistical Analyst in
Finally, predict the value of the location (2.75, 2.75) by multiplying
a number of formats. These include a shapefile of contour lines, a
the weights of the measured points (dropping entry number 6) by
shapefile of filled contour polygons, and a grid representing a
their values and then adding them together.
continuous surface and hillshade.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 77
BasicConcepts.p65 77 03/06/2001, 11:58 AM
A guide to the Geostatistical Analyst extension
In this last section, you will learn more about the Geostatistical Geostatistical Wizard
Analyst extension to ArcMap.
The Geostatistical Analyst provides a number of interpolation
The software is accessed via the Geostatistical Analyst techniques that use sample points to produce surfaces of the
dropdown menu on the ArcMap toolbar. There are three main phenomena of interest. The interpolation techniques in the
components to the Geostatistical Analyst: (1) Explore Data, Geostatistical Analyst are divided into two main types: determin-
(2) Geostatistical Wizard, and (3) Create Subsets. istic and geostatistical.
Explore data Deterministic
Before using the interpolation techniques, you can explore your Deterministic techniques are based on parameters that control
data with these tools. ESDA tools allow you to gain insight into either (i) the extent of similarity (e.g., Inverse Distance Weighted)
your data, enabling you to select the appropriate parameters for of the values or (ii) the degree of smoothing (e.g., radial basis
interpolation model. For example, when using ordinary kriging to functions) in the surface. These techniques do not use a model of
produce a quantile map, you should examine the distribution of random spatial processes.
the data because it is assumed that the data is normally distrib-
uted. Alternatively, you may explore for a trend in your data with Geostatistics
the ESDA tools, and you may wish to remove it in the prediction
process. Geostatistics assume that at least some of the spatial variation of
natural phenomena can be modeled by random processes with
The following tools are provided: spatial autocorrelation.
HistogramExplore the univariate distribution of a dataset. Geostatistical techniques can be used to:
Voronoi MapAnalyze stationarity and spatial variability of a Describe and model spatial patternsvariography.
dataset.
Predict values at unmeasured locationskriging.
Normal QQPlotCheck for normality of a dataset.
Assess the uncertainty associated with a predicted value at
Trend AnalysisIdentify global trends in a dataset. the unmeasured locationskriging.
Semivariogram/Covariance CloudAnalyze the spatial Kriging can be used to produce the following surfaces:
dependencies in a dataset.
Maps of kriging predicted values
General QQPlotExplore whether two datasets have the same
distributions. Maps of kriging standard errors associated with predicted
values
Crosscovariance CloudUnderstand crosscovariance
between two datasets. Maps of probability, indicating whether or not a predefined
critical level was exceeded
Maps of quantiles for a predetermined probability level
78 USING ARCGIS GEOSTATISTICAL ANALYST
BasicConcepts.p65 78 03/06/2001, 11:58 AM
Create subsets While the aim of the investigation may vary, you are encouraged
to adopt the following approach when analyzing/mapping spatial
The most rigorous way to assess the quality of an output surface processes:
is to compare the predicted values with those measured in the
field. It is often not possible to go back to the study area to
collect an independent validation dataset. One solution is to R ep resen t Add layers and display in ArcMap.
divide the original dataset into two parts. One part can be used to th e d ata
model the spatial structure and produce a surface. The other part
can be used to compare and validate the quality of the predic-
tions. The Create Subsets dialog box enables you to produce E xp lo re Investigate the statistical and spatial properties of
both test and training datasets. th e d ata your data.
Processing data
The software includes many tools for analyzing data and produc- F it a Choose a model to create a surface. The
ing a variety of output surfaces. m od el Geostatistical Wizard is used in the definition and
refinement of an appropriate model.
Perfo rm Assess the quality of the output surface using
d iag no stics Cross-Validation and Validation tools. This will help
you understand how well the model predicts the
values at unmeasured locations.
C o m p are More than one surface can be produced. The
th e m o dels surface can be compared using cross-validation
statistics.
THE PRINCIPLES OF GEOSTATISTICAL ANALYSIS 79
BasicConcepts.p65 79 03/06/2001, 11:58 AM
BasicConcepts.p65 80 03/06/2001, 11:58 AM