Stochastic and Statistical Methods in Hydrology and Environmental Engineering Time Series Analysis in Hydrology and Environmental Engineering PDF
Stochastic and Statistical Methods in Hydrology and Environmental Engineering Time Series Analysis in Hydrology and Environmental Engineering PDF
IN HYDROLOGY
AND ENVIRONMENTAL ENGINEERING
VOLUME 3
Series Editor:
V. P. Singh, Louisiana State University,
Baton Rouge, US.A.
The titles published in this series are listed at the end of this volume.
STOCHASTIC AND STATISTICAL METHODS
IN HYDROLOGY
AND ENVIRONMENTAL ENGINEERING
Volume 3
KEITH W. HIPEL
Departments of Systems Design Engineering and Statistics and Actuarial Science,
University of Waterloo, Waterloo, Ontario, Canada
A. IAN McLEOD
Department of Statistical and Actuarial Sciences,
The University of Western Ontario, London, Ontario, Canada
and Department of Systems Design Engineering, University of Waterloo,
Waterloo, Ontario, Canada
U.S.PANU
Department of Civil Engineering,
Lakehead University, Thunder Bay, Ontario, Canada
VIJAY P. SINGH
Department of Civil Engineering,
Louisiana State University, Baton Rouge, Louisiana, U.S.A.
ISBN 978-90-481-4379-5
PREFACE . . • • • • . . • . . . . . . . • . • . • xi
AN INTERNATIONAL CELEBRATION .• xv
ACKNOWLEDGEMENTS . . . . . • . XIX
Objectives
Contents
As listed in the Table of Contents, the book is. divided into the following main parts:
An important topic of widespread public concern in which time series analysis has a
crucial role to play is the systematic study of climatic change. In Part I, significant
contributions to climatic change are described in an interesting set of papers. For
instance, the first paper in this part is a keynote paper by Dr. D. P. Lettenmaier
that focuses upon time series or stochastic models of precipitation that account for
climatic driving variables. These models furnish a mechanism for transcending the
spatial scales between general circulation models and the much smaller spatial scale
at which water resources effects have to be studied and interpreted.
Within Part III, new developments in entropy are described and entropy concepts
are applied to problems in hydraulics, water quality monitoring, discontinuance of
hydrologic measurement stations, treatment plant efficiency and estimating missing
monthly streamflow data. Neural networks are employed in Part IV for forecasting
runoff and water demand.
In Part VI, nonparametric and parametric approaches to spatial analysis are described
and applied to practical hydrological problems. Next, some unique findings in spectral
analysis are given in Part VII. Finally, Part VIII is concerned with a variety of
interesting topics in streamflow modelling.
Audience
This book should be of direct interest to anyone who is concerned with the latest
developments in time series modelling and analysis. Accordingly, the types of Pro-
fessionals who may wish to use this book include:
Within each professional group, the book should provide useful information for:
Researchers
Teachers
Students
Practitioners and Consultants
PREFACE xiii
When utilized for teaching purposes, the book could serve as a complementary text
at the upper undergraduate and graduate levels. The recent environmetrics book
by K. W. Hipel and A. 1. McLeod that is entitled Time Series Modelling of Water
Resources and Environmental Systems (published by Elservier, Amsterdam, 1994,
ISBN 0 444 89270-2), contains an extensive list of time series analysis books (see Sec-
tion 1.6.3) that could be used in combination with this current volume in university
courses. Researchers should obtain guidance and background material for carrying
out worthwhile research projects in time series analysis in hydrology and environmen-
tal engineering. Consultants who wish to keep their companies at the leading edge
of activities in time series analysis and thereby serve their clients in the best possible
ways will find this book to be an indispensable resource.
AN INTERNATIONAL CELEBRATION
Dedication
The papers contained in this book were originally presented at the international
conference on Stochastic and Statistical Methods in Hydrology and Envi-
ronmental Engineering that took place at the University of Waterloo, Waterloo,
Ontario, Canada, from June 21 to 23, 1993. This international gathering was held in
honour and memory of the late Professor T.E. Unny in order to celebrate his life-
long accomplishments in many of the important environmental topics falling within
the overall conference theme. When he passed away in late December, 1991, Professor
T.E. Unny was Professor of Systems Design Engineering at the University of Water-
loo and Editor-in-Chief of the international journal entitled Stochastic Hydrology and
Hydraulics.
About 250 scientists from around the world attended the Waterloo conference in
June, 1993. At the conference, each participant was given a Pre-Conference Pro-
ceedings, published by the University of Waterloo and edited by K.W. Hipel. This
584 page volume contains the detailed conference program as well as the refereed
extended abstracts for the 234 papers presented at the conference. Subsequent to the
conference, full length papers submitted for publication by presenters were mailed
to international experts who kindly carried out thorough reviews. Accepted papers
were returned to authors for revisions and the final manuscripts were then published
by Kluwer according to topics in the following four volumes:
xv
xvi AN INTERNATIONAL CELEBRATION
The Editors of the volumes as well as Professor Unny's many friends and colleagues
from around the globe who wrote excellent research papers for publication in these
four volumes, would like to dedicate their work as a lasting memorial to Professor
T. E. Unny. In addition to his intellectual accomplishments, Professor Unny will be
fondly remembered for his warmth, humour and thoughtful consideration of others.
Conference Organization and Sponsorships
The many colleagues and sponsors who took part in the planning and execution of the
international conference on Stochastic and Statistical Methods in Hydrology
and Environmental Engineering are given below.
Organizing Committee
K. W. Hipel (Chairman) A. I. McLeod
U. S. Panu V. P. Singh
A. Bogobowicz T. Hollands
S. Brown J. D. Kalbfleisch
D. Burns E. LeDrew
C. Dufournaud E. A. McBean
1. Fang K. Ponnambalam
G. Farquhar E. Sudicky
Financial Support
Sponsors
University of Waterloo
President James Downey, Opening and Banquet Addresses
D. Bartholomew, Graphic Services
Danny Lee, Catering and Bar Services Manager
D. E. Reynolds, Manager, Village 2 Conference Centre
T. Schmidt, Engineering Photographic
Audio Visual Centre
Food Services
Graduate Students in Systems Design Engineering
Technical Assistance
Mrs. Sharon Bolender
Mr. Steve Fletcher
Mr. Kei Fukuyama
Ms. Hong Gao
Ms. Wendy Stoneman
Mr. Roy Unny
ACKNOWLEDGEMENTS
The Editors would like to sincerely thank the authors for writing such excellent papers
for publication in this as well as the other three volumes. The thoughtful reviews
of the many anonymous referees are also gratefully acknowledged. Moreover, the
Editors appreciate the fine contributions by everyone who attended the Waterloo
conference in June, 1993, and actively took part in the many interesting discussions
at the paper presentations.
Additionally, the Editors would like to say merci beaucoup to the committee members
and sponsors of the Waterloo conference listed in the previous section. Dr. Roman
Krzysztofowicz, University of Virginia, and Dr. Sidney Yakowitz, University of Ari-
zona, kindly assisted in organizing interesting sessions at the Waterloo conference
for papers contained in this volume. Furthermore, Dr. R. M. Phatarford, Monash
University in Australia, and Dr. K. Ponnambalam, University of Waterloo, were
particularly helpful in suggesting reviewers as well as carrying out reviews for papers
published in this book. Finally, they sincerely appreciate all the thoughtful personnel
at Kluwer who assisted in the publication of the volumes, especially Dr. Petra D.
Van Steenbergen, the Acquisition Editor.
April, 1994
xix
PART I
CLIMATIC CHANGE
APPLICATIONS OF STOCHASTIC MODELING IN CLIMATE CHANGE
IMPACT ASSESSMENT
DENNIS P. LETTENMAIER
Department of Civil Engineering FX-10
University of Washington
Seattle, W A 98195
INTRODUCTION
Stochastic models of the precipitation arrival process were originally developed to
address practical problems of data simulation, particularly for water resource
systems design and management in data-scarce situations, and to aid in
understanding the probabilistic structure of precipitation. Early attempts to
model the stochastic structure of the precipitation arrival process (wet/dry
occurrences) were based on first-order homogeneous Markov chains (e.g., Gabriel
and Neumann 1957; 1962). Various extensions of Markov models have since been
3
K. W. Hipel etal. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 3-17.
© 1994 Kluwer Academic Publishers.
4 D. P. LEITENMAffiR
variables are usually better than those of surface fluxes. They therefore attempt
to relate local
200 X 1 x CO2
E o 2xC02
E 150 + HIstorical
C
-
~
°a
0Q
100
-+
,..+
...
c..
50 -+ -+
0
J F M A M J J A SON 0
MONTH
Weather classification schemes have been the mechanism used by several authors
to summarize large-area meteorological information. The general concept of
weather classification schemes (see, for example, Kalkstein, et al., 1987) is to
characterize large-area atmospheric conditions by a single summary index.
Externally forced stochastic precipitation models can be grouped according to
whether the weather classification scheme is subjective or objective, and whether
it is unconditional or conditional on the local conditions (e.g., precipitation
occurrence) .
Subjective classification procedures include the scheme of Baur (1944), from
which a daily sequence of weather classes dating from 1881 to present has been
constructed by the German Federal Weather Service (Bardossy and Caspary,
1990), and the scheme of Lamb (1972), which has formed the basis for
construction of a daily sequence of weather classes for the British Isles dating to
1861. These subjective schemes are primarily based on large-scale features in the
surface pressure distribution, such as the location of semipermanent pressure
centers, the position and paths of frontal zones, and the existence of cyclonic and
anticyclonic circulation types (Bardossy and Caspary, 1990).
Objective classification procedures utilize statistical methods, such as principal
components, cluster analysis, and other multivariate methods to develop rules for
classification of multivariate spatial data. For instance, McCabe, et al. (1990)
utilized a combination of principal components and cluster analysis to form daily
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 7
12
E 10 Season 1
.
....
•0
~ S
Season 3
.
·
.l!-
z z • 0
a •
.....*"
8 K
0 Class 1 0 .0
~
~
.
6 4
·· .... .
0 CI..s2 o t:
K.O
n. 0 n. '0'
4 Class 3 00
13 0 0
13 o-
W Class 4 o
W
a: 2
a: 2 .. \<t(o
~.
n.
o'
n. .8Jro.o
6+ft "'0
. .
&0_._""" K~A:Jt if
..
.D . . . . . . . IO . . . ~1f60
0 0
E
& 6
Season 2 E
&
15 Season 4
. 0
..
z • II z
...••...
0 '0
Q
10
.,a
+.0 0
~ 4 • Do
~
t: n.
, D-
n. o.
13 ;:; t .. ~
W 2 W 5
.
a: +h.IIO~ a: A*~+
n.
~.
n.
..,.,...., .o ... Q6+ef' +
.,. ...
.. . ....
• x-e ....... _ .. )C~
0 '"
.... ....
~r
50 20 10 5 0.1 M 9590 so SO 20 10 5 nI
Hughes, et al. (1993) used an alternative approach that selected the weather
classes so as to maximize the discrimination of local precipitation, in terms of
joint precipitation occurrences (presence/absence of precipitation at four widely
separated stations throughout a region of dimensions about 1000 km). The
procedure used was CART (Breiman, et al., 1984), or Qlassification and
8 D. P. LETTENMAIER
Regression Trees. The large area information was principal components of sea
level pressure. Figure 3 shows the discrimination of the daily precipitation
distribution at one of the stations modeled,
-
o statet
A state 2
en + state 3
~<O x state 4
o o state 5
C v state 6
~
c
o v
-
.~
'0.
'g.... (\J
a..
o
0.1 5 20 50 80 99
Exceedance probability. percent
Hay, et al. (1991) used a classification method based on wind direction and cloud
cover (McCabe, 1990) which was coupled with a semi-Markov model to simulate
temporal sequences of weather types at Philadelphia. Semi-Markov models (Cox
and Lewis, 1978) with seasonal transition probabilities and parameters of the
sojourn time distribution, were used to simulate the evolution of the weather
states. This step is not strictly necessary if a lengthy sequence of variables
defining the large-area weather states (the classification used required daily wind
direction and cloud cover data) is available. Where such sequences are not
available, which is sometimes the case for GCM simulations, fitting a stochastic
model to the weather states has the advantage that it decouples the simulation of
precipitation, and other local variables, from a particular GCM simulation
sequence. The method of simulating daily precipitation conditional on the
weather state used by Hay, et al. (1991) was as follows. For each weather state
and each of 11 weather stations in the region, the unconditional probability of
precipitation was estimated from the historic record. Then, conditional on the
weather state (but unconditional on precipitation occurrence and amount at the
other stations and previous time) the precipitation state was selected based on the
unconditional precipitation occurrence probability. Precipitation amounts were
drawn from the product of a uniform and exponential distribution. Retrospective
analysis of the model showed that those variables explicitly utilized for parameter
estimation (conditional precipitation occurrence probabilities, mean precipitation
amounts) were reproduced by the model. An analysis of dry period lengths
suggested that the length of extreme dry periods was somewhat underestimated.
Bardossy and Plate (1991) also used a semi-Markov model to describe the
structure of the daily circulation patterns over Europe, with circulation types
based on synoptic classification. They developed a model of the corresponding
rainfall occurrence process that was Markovian within a weather state (circulation
type), but independent when the weather state changed. Precipitation
occurrences were assumed spatially independent. Bardossy and Plate (1991)
applied the model to simulate the precipitation occurrences at Essen, Germany.
For this station, they found that the persistence parameter in the occurrence
model was quite small, so that the model was almost conditionally independent
(that is, virtually all of the persistence in the rainfall occurrence process was due
to persistence in the weather states). The model reproduced the autocorrelations
of the rainfall occurrences, as well as the distributions of dry and wet days,
reasonably well. This is somewhat surprising, since other investigators (e.g.,
Hughes, et al., 1993) have found that conditionally independent models tend to
underestimate the tail of the dry period duration distribution. However, this
finding is likely to depend on both the structure of the weather state process, and
the precipitation occurrence process, which is regionally and site-specific.
Bardossy and Plate (1992) extended the model of Bardossy and Plate (1991) to
incorporate spatial persistence in the rainfall occurrences, and to model
precipitation amounts explicitly. The weather state classification procedure was
the same as in Bardossy and Plate (1991), and they retained the assumption of
conditional independence under changes in the weather state. Rather than
modeling the occurrence process explicitly, they modeled a multivariate normal
random variable W. Negative values of W corresponded to the dry state, and (a
transform of) positive values are the precipitation amount. Within a run of a
10 D. P. LETTENMAIER
(1993) linked the selection of the weather states with observed precipitation
occurrence information at a set of index stations using .the CART procedure
described above. Precipitation occurrences and amounts were initially modeled
assuming conditional independence, by simply res amp ling at random from the
historical observations of precipitation at a set of target stations, given the
weather states. They found that this model tended to underestimate the
persistence of wet and dry periods. Model performance was improved by
resampling precipitation amounts conditional on the present day's weather state
and the previous day's rain state. Unlike the models of Bardossy and Plate (1991;
1992) the Markovian persistence was retained regardless of shifts in the weather
state. Inclusion of the previous rain state reduced the problem with simulation of
wet and dry period persistence. However, by incorporating information about the
previous day's rain state the number of parameters grows rapidly with the number
of stations.
A somewhat different approach is the hidden Markov model (HMM), initially
investigated for modeling rainfall occurrences by Zucchini and Guttorp (1991).
The hidden Markov model is of the form
where St denotes the particular values of St. In this model, if there are m hidden
states, and w atmospheric variables (that is, X t is w-dimensional) the logistical
model has m(m-l)(w+l) free variables. Note that the model for the evolution of
St conditioned on St-l and Xt is effectively a regional model, and does not depend
on the precipitation stations. Hughes (1993) explored two special cases of the
model:
Some complications
One of the motivations for development of stochastic models thai couple large-
area atmospheric variables with local variables, such as precipitation, is to provide
a means of downscaling simulations of alternative climates for effects assessments.
However, as noted in the previous sections, most of the applications to date have
been to historic data. For instance, local precipitation has been simulated using
either an historic sequence of weather states (e.g., Hughes, et al., 1993) or via a
stochastic model of the historic weather states (e.g., Hay, et al., 1991, Bardossy
and Plate, 1992; Wilson, et al., 1992).
For most of the models reviewed, it should be straightforward to produce a
sequence of weather states corresponding to an alternative climate scenario (e.g.,
from a lengthy GCM simulation). There are, nonetheless, certain complications,
the most obvious of which is biases in current climate (baseline) GCM
simulations. For instance, Zorita, et al. (1993) found it necessary to use a weather
state classification scheme based on sea level pressure anomalies to filter out
biases in the mean GCM pressure fields. Otherwise the stochastic structure of the
weather state sequences formed from the baseline GCM simulation were much
different than those derived from the historic observations.
Selection of the variables to use in the weather state classification is
problematic. Wilson, et al. (1991) classified weather states using sea level
pressure and 850 mb temperature. However, if this scheme is used with an
alternative, warmer climate, the temperature change dominates the classification,
resulting in a major change in the stochastic structure of the weather class
sequence that may not be physically realistic. Although this problem is resolved
by use of variables, such as sea level pressure, that more directly reflect large-area
circulation patterns, elimination of temperature from consideration as a classifying
variable is troublingly arbitrary. A related problem is the effect of the strength of
14 D. P. LETTENMAIER
the linkage between the weather states and the local variables. In a sense, the
problem is analogous to multiple regression. If the regression is weak, i.e., it
doesn't explain much of the variance in the dependent variable (e.g., local
precipitation) and changes in the independent variables (e.g., weather states)
won't be evidenced in predictions of the local variable. Therefore, one might
erroneously conclude that changes in, for instance, precipitation would be small,
merely because of the absence of strong linkages between the large-scale and local
conditions (see, for example, Zorita, et al., 1993).
Application of all of the models for alternative climate simulation requires that
certain assumptions be made about what aspect of the model structure will be
preserved under an alternative climate. All of the models have parameters that
link the large-area weather states with the probability of occurrence, or amount
of, local precipitation. For instance, in the model of Wilson, et al. (1992) there
are parameters that control the probability of precipitation for each combination
of weather state and the precipitation state at the higher order stations. In the
model of Bardossy and Plate (1991) there is a Markov parameter that describes
the persistence of precipitation occurrences given the weather state. These
parameters, once estimated using historical data, must then be presumed to hold
under a different sequence of weather states corresponding, for instance, to a GCM
simulation. Likewise, many of the models (e.g., Bardossy and Plate, 1992;
Hughes, 1993) have spatial covariances that are conditioned on the weather state.
The historical values of these parameters likewise must be assumed to hold under
an alternative climate.
Essentially, the assumption required for application of the models to
alternative climate simulation is that all of the nonstationarity is accounted for by
the weather classes. One opportunity that has not been exploited is use of
historical data to validate this assumption. For instance, Bardossy and Caspary
(1990) have demonstrated that long-term changes have occurred in the
probabilities of some European weather states. It should be possible, by
partitioning the historical record, to determine whether conditional simulations
properly preserve associated shifts in local precipitation, such as wet and dry spell
lengths, and precipitation amounts.
Another complication in application of these models to alternative climate
simulation is comparability of the GCM predictions with the historic observations.
For instance, Hay, et al. (1991) used a weather classification scheme based on
surface wind direction and cloud cover. The resulting )'Veather classes were shown
to be well related to precipitation at a set of stations in the Delaware River basin.
Unfortunately, however, GCM predictions of cloud cover and wind direction for
current climate are often quite biased as compared to historic observations, and
these biases will be reflected in the stochastic structure of the weather class
sequence.
Finally, most of the models have been limited to simulation of local
precipitation, although other variables, such as temperature, humidity, and wind
are usually required for hydrological simulations. Hughes, et al. (1993) developed,
along with the daily precipitation model described earlier, a model of the daily
mean temperature and temperature range. They conditioned these variables on
the present and previous days' rain state. For simulation of a CO 2-doubled
scenario, they incremented the conditional mean temperatures by the aifference
between the 1 x CO 2 and 2 x CO 2 850 mb temperature. Bogardi, et al. (1993a;b)
coupled a weather state-driven precipitation model similar to that of Bardossy and
Plate (1992) with a model of daily temperature conditioned on weather state, and,
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 15
via nonparametric regression, the 500 mb pressure height. The 500 mb pressure
height was found to provide a reasonable index to within-year surface temperature
variations for several stations in Nebraska. Both of these models have
disadvantages. The method used by Hughes, et al. (1993) to infer temperature
under CO 2 doubled conditions makes an arbitrary assumption that the change in
the mean station temperature would be the same as the regional mean
temperature change at 850 mb. The regression-based approach of Bogardi, et al.
(1993a) essentially assumes that the transfer function relating within-year
variations in the 500 mb pressure height to station temperature variations applies
to differences between climate scenarios as well. That this may not be realistic is
suggested by the fact that the simulated changes in winter station temperatures
for CO 2 doubling are much larger than those in summer, even though most GCMs
simulate large summer surface air temperature changes in the Great Plains.
CONCLUSIONS
The coupling of weather state classification procedures, either explicitly or
implicitly, with stochastic precipitation generation schemese is a promising
approach for transferring large-area climate model simulations to the local scale.
Most of the work reported to date has focused on the simulation of daily
precipitation, conditioned in various ways on weather classes extracted from large-
area atmospheric features. The approach has been shown to perform adequately
in most of the studies, although there remain questions as to how best to
determine the weather states. Further, no useful means has yet been proposed to
determine the strength of the relationship between large-area weather classes and
local precipitation, and to insure that weak relationships do not result in spurious
downward biases in inferred changes in local precipitation at the local level. This
is an important concern, since at least one of the studies reviewed (Zorita, et al.,
1993) found conditions under which weather classes well-related to local
precipitation could not be identified.
There have been relatively few demonstrated applications of these procedures
for climate effects interpretations. One of the major difficulties is accounting for
biases in the GCM present climate, or "base" runs. In addition, few of the models
reviewed presently simulate variables other than precipitation needed for
hydrological studies. Temperature simulation is especially important for many
hydrological modeling applications, but methods of preserving stochastic
consistency between local and large-scale simulations are presently lacking.
ACKNOWLEDGMENTS
The assistance of James P. Hughes and Larry L. Wilson in assembling the review
materials is greatly appreciated.
REFERENCES
Bardossy, A., and H. J. Caspary (1990) "Detection of climate change in Europe by
analyzing European circulation patterns from 1881 to 1989", Theor. Appl.
Climatol., 42(3), 155-167.
Bardossy, A., and E. J. Plate (1991) "Modeling daily rainfall using a semi-Markov
16 D. P. LETTENMAIER
INTRODUCTION
The main purpose of this paper is to develop a methodology based on fuzzy rules
(FR) to reproduce the precipitation generation features of an existing subjective
classification of daily atmospheric circulation patterns (CPs) over Europe. This is a
novel approach both from the methodological and application viewpoint: FR have
been used extensively in control problems but not in modeling. One of the first
applications of fuzzy rule-based modeling in hydrology (groundwater) is found in
Bardossy and Disse (1992) but so far no surface hydrology or hydrometeorologic
examples could be found in the open literature.
Like FR, artificial neural networks (NN) provide a non-linear numerical mapping
of inputs into outputs. Neither the two approaches need a mathematical formulation
of how the output depends on the input. Unlike NN, FR needs the formulation of rules
which may be difficult even for experts but does not necessarily require a training
data set. On the other hand, unlike FR, NN may be applied to ill defined problems
but after training NN is a pure unstructerd black box and knowledge gained by the
19
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 19-32.
© 1994 Kluwer Academic Publishers.
20 A. BARDOSSY ET AL.
training algorithm cannot be decoded. Because the two approaches complement each
other, it seems interesting to compare their performance.
The present study belongs to a long-term collaborative project directed at devel-
oping a new methodology for the disaggregation of global hydrometeorogical input
in regional or local hydrological systems and the prediction of climate change effects
on these systems. In the first part of this approach, the CP's are classified, so that
their occurrence, persistence and transition probabilities may be investigated. The
CP's may be based on observation data or on the output of GCM's, in a changed
climate (for example 2 x CO 2 ) scenario case. The reason for selecting large scale
CP as input into local or regional hydrologic systems is that long records of reliable
pressure measurements are available. Furthermore the GCM's are based on weather
forcasting models which can predict the future pressure condition with much higher
accuracy than other parameters. Results obtained in West Germany, Nebraska and
Central Arizona indicate the existence of a strong relationship between daily CP
types and local hydrologic observations, such as precipitation, temperature, wind or
floods (Bardossy and Caspary, 1990; Bardossy and Plate,1991; Bogardi et al., 1992;
Matyasovszky et al., 1992; Duckstein et al., 1993). This relationship was essentially
described under the form of, say, daily precipitation at a given site conditioned upon
the event that the type of CP over the region was i = 1,···, I. A fundamental ele-
ment of this approach is thus a phenomenologically valid classification of the CP that
can generate a simulated series of precipitation events with high information content.
On the other hand, CPs are elements of complex dynamic large- scale atmospheric
phenomena, so that any classification scheme is fraught with uncertainty and impre-
cision (Yarnal and White, 1987, 1988; Bardossy and Caspary, 1990). Here we apply
a FR- and a NN- based approach to account explicitly for the imprecision in the sub-
jective classification of European CPs by Baur et al. (1944). These two approaches
should be able to reproduce the classification quite well with respect to the prediction
of climate change effects on local hydrological systems.
The paper is organized as follows: in the next section, background information
on existing CP classification schemes, is presented. The following section provides
a description of the fuzzy rule- based approach to modeling geophysical phenomena
and describes briefly the NN approach. Then the two approaches are applied to a
European case study and the results are evaluated. The final section consists of a
discussion and conclusions.
1. The location of sea level semipermanent pressure centers, such as Azores high
CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING 21
or Iceland low.
Classifications approaches
CP classification techniques may be grouped into subjective and objective procedures.
The former group has been in existence for over a century in Europe, as described in
Baur et al (1944), and about half a century in the USA. The latter type has emerged
as a result of the development of high-speed computers and availability of statistical
software packages, mostly principal component analysis and clustering.
Subjective or manual techniques, such as the one used as a case study herein
and described below, depend on the hydrometeorologist's interpretation of the major
features and persistence of a given pattern.
Objective techniques, in contrast, are based on statistical approaches such as
hierarchical methods (Johnson, 1967), k-means methods (Macqueen, 1967), and cor-
relation methods (Bradley et al., 1982; Yarnal, 1984). For example, in our Eastern
Nebraska study, 9 types of CPs have been identified after having performed a prin-
cipal components analysis coupled with the k-means method (Matyasovszky et al.,
1992).
Between these two groups, FR and NN are knowledge based classifications. They
are subjective and phenomenological. The subjectivity is taken into account either
explicitly by rules or implicitly by a training phase that uses expert knowledge. They
are objective because after model construction, same inputs are always classified in
the same way .
circulations of the major type "West" (W) are classified as zonal circulations. As
an illustration, Figure 1 shows the 500 hPa contour map for the circulation subtype
"West, cyclonic" (Wz) which persisted for 8 days after November 11, 1987 (Deutscher
Wetterdienst, 1948-1990)
and the resulting main flow directions to Central Europe the major types "North"
(N), "South" (S) and "East" (E) can be distinguished. In addition all trough types
with a north to south axis are classified as meridional circulations. The major types
"Northeast" (NE) and "Southeast" (SE) are also included with the meridional circula-
tion group because the normally coincide with blocking North and Eastern European
highs.
Further illustrations of these three types of CP are found in Bardossy and Cas-
pary (1990). The fuzzy rule-based approach will describe CPs by assigning fuzzy
quantifiers to the normalized value of pressure at grid points or pixels. Linguistic at-
tributes such as "The pressure centers are located above the eastern Atlantic Ocean"
could also have been used, but certain properties of CPs would be very difficult to
expressed even by fuzzy sets.
(1)
(2)
This way for each day the 700 hPa surface is mapped on the interval [0,1].
Training the NN at the first step needs a properly defined input/output - data
set. The input data set is encoded as follows: for each "subjectively" defined CP i
the mean and the standard deviation of the corresponding normalized 700 hPa daily
values are calculated:
(3)
(4)
data was obtained using a normal distributed random variable with mean riii( Uk)
and standard deviation Si(Uk). The heights h(Uk,t) are normed by (2) and is used in
this form as input for the NN. The output was an activation of the output neuron i
corresponding to CP i.
Finally the four class values Dib Di2' D i3 , Di4 are combined into the DOF of rule i,
Di as:
4
Di = II Div (11)
v=l
The FR classification is numerically very simple. November 29, 1986, has been
selected to illustrate the methodology. The CP type that persisted November 26-29,
1986 is "BM" with i = 2. The membership functions of the four classes are given as
above.
Normalized pixel values, membership grades for the selected day and rule defining
the CP type BM are shown in Table 1. For example, in the case v = 4, "very high" ,
14 :;:: 0.7, Fo = 1.00 Fa = 0.6375 and D24 = 0.7· 1.00 + 0.3·0.6375 = 0.8912. The
overall fulfillment grade is D2 = 0.3944 which is the maximum value. The DOF of
other CP types is less then 0.3, for Wz is 0 and that of type HM it is found to be
0.1247.
This example shows that even in the case when one pixel does not fulfill the
prescribed pressure level as it is the case of pixel 1 for v = 2, the fuzzy rules can
assign it to the selected case.
In (12), Wij are weights between neurons i and neuron j, choosen properly by a
learning algorithm described below, and OJ is the bias of the neuron j.
26 A. BARDOSSY ET AL.
The output of neuron j, OJ, equals the activation of neuron j, aj, which is given
by applying a sigmoid transformation function on inpj:
1
a·
3
= --:--:---""7"
1 + exp( -inpj)
(13)
The reason for employing the sigmoid function is that it is differentiable which is an
essential condition for back-propagation.
The NN utilized herein consists of a set of structured neurons with three different
types of layers:
• The input layer: these are the neurons which are activated by the input signal
comming from outside.
• Two hidden layers: these are the neurons which are supposed to perform the
transformation of the input to an output.
• The output layer: these are the neurons which provide signals for the outside.
Each neuron of a layer is conneced to each neuron of the adjacent layer. This means
that a signal is sent to the next layer (feedforward). Figure 2 shows the four layers
of the NN.
The interconnecting weights Wij have to be determined with the help of a back-
propagation supervised learning procedure. For this purpose a training set consisting
CIRCULATION PAITERNS FOR STOCHASTIC PRECIPITATION MODELING 27
of measured input data and corresponding desired output data is used. Weights
which minimize the squared difference between the known output of the training set
and the calculated output of the NN have to be found stepwize from the output layer
to the input layer by a gradient search method.
APPLICATION
CP classification by fuzzy rules and Neural Networks
The above described two procedures were used to classify the CPs over Europe. As
stated earlier the basis of the classification is the subjective Baur classes given in
Hess and Brezowsky (1969). For each day the measured 700 hPa surface is taken at
51 selected points Uk. For both procedures, the data base for model building and
validation respectively was presented above.
FR was defined encoding the expert knowledge of Hess and Brezowsky (1969) by
28 A. BARDOSSY ET AL.
fuzzy rules as follows: a few (2 to 4) points are selected for each class v = 1"",4
(very high to very low). The "Iv values are selected depending on the class v with
"11,4 = 0.7 and "12,3 = 0.1. Classes v = 1,4 means higher uncertainty - thus a convex
combination with a higher "OR" component an a value of "Iv closer to 1 is needed.
Classes v = 2,3 are more restrictive - thus a convex combination with a lower "OR"
component an a value of "Iv closer to 0 is needed. The proper selection of "I was done
by trial and error; results turned out not to be very sensitive to the choice of 'Y - but
it is necessary to use some mix of AND and OR rules because a pure AND rule may
be too weak (one zero element makes the DOF equal to zero) and an OR rule too
strong (DOF too large).
The architecture of the NN used consists of 51 neurons corresponding to 51 data
sites, the output layer consists of 29 neurons, corresponding to the 29 subjective
classified CP's by Hess and Brezowsky (1969). The first hidden layer consists of 45
neurons, the second hidden layer consists of 40 neurons. Concerning the proedure for
building the NN architecture (number of hidden layers, number of neurons), there
exists some heuristic rules but nevertheless the answer has been found by trial and
error. With increasing number of neurons the estimation error in the training phase
decreases but the model becomes overdetermined.
Both classification schemes were applied to a measured sequence of daily 700 hPa
elevations for the 10 year period 1977 to 1986. The methods were unable to reproduce
exactly the subjective series, but the stated goal of the classification was to develop
a semi-objective classification method, which resemble the subjective one and whose
quality was measured by the difference between generated and measured precipitation
values, as in the next section.
II = (r 1 "
_ 1 L.J(PAt -
2 1
p) )2' (14)
t
with
(15)
Thus the maximum value of II depends on p. In the present application, the possible
maximum of II is about 0.5.
CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING 29
The second information measure describes the squared difference between condi-
tional and unconditional mean precipitation.
(16)
with
(17)
where m is the unconditional mean daily precipitation amount at a given site and
rnA, is the mean daily precipitation conditioned on the CP being of type At.
The third information measure also depends on the mean precipitation and mea-
sures the relative deviation from the unconditional mean:
13 = -1 "l -1I
mA ,
L..... - (18)
T t m
with
OS h S 1 (19)
In order to compare the subjective and knowledge based classifications, the mean
information content of 25 stations in the Ruhr catchment was calculated for the
subjective classification of Hess and Brezowsky (1969) and then FR and NN classi-
fications. Tables 2 and 3 show the results in the summer and the winter seasons.
From these tables it is clear that the subjective classification delivers the best results.
However any of the three measures of information content, 11 ,12 ,13 does vary much
between the three approaches, the difference between the two seasons being larger
than that between the approaches.
30 A. BARDOSSY ET AL.
For the fuzzy rule based classification the information loss compared to the subjec-
tive classification is less then 20 %. The NN classification does not perform as well as
the FR. Given the simplicity of the fuzzy rule-based approach, we would recommend
it fore future use.
In contrast to NN, the FR classifier has been built encoding expert knowledge
without using a time series of subjectively classified daily CP. Thus FR (but not NN)
can be applied in regions where no time series of CP exists but both approaches need
expert knowledge of CP for these regions.
For further research, the performance of an objective phenomenological classifier,
taking into account explicit local hydrological parameters (precipitation, temperature,
winds, floods) seems to be of interest. For this the FR classifier may has to be
modified using some features of NN. The two approaches complement each other in a
very evident way as demonstrated recently in research each by Takagi and Hayashi,
(1991), Kosko, (1992) or Goodman et al. (1992).
To conclude, the fuzzy rule-based methodology appears to perform better than
neural network and almost as well as the Baur-type subjective classification. It
appears to be a usable approach to construct a time series of classification where
none is available.
CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING 31
ACKNOWLEDGMENTS
Research presented in this paper has been partially supported by the US National
Science Foundation under grants #BCS 9016462/9016556, EAR 9217818/9205717
and a grant from the German Science Foundation (DFG).
REFERENCES
Bardossy A. and Caspary, H. (1990) "Detection of climate change in Europe by an-
alyzing European Atmospheric Circulation Patterns from 1881 to 1989", The-
oretical and Applied Climatology 42, 155-167.
Bardossy, A. and Disse, M. (1993) "Fuzzy rule-based models for infiltration", Water
Resources Research 29, 2, 373-382.
Bardossy, A. and Plate, E.J. (1991) "Modeling daily rainfall using a semi-Markov
representation of circulation patterns" , Journal of Hydrology 122, 33-47.
Bardossy, A. and Plate, E.J. (1992) "Space-time model for daily rainfall using at-
mospheric circulation patterns", Water Resources Research 28, 5,1247-1260.
Baur, F., Hess P. and Nagel, H. (1944) Kalender der GroBwetterlagen Europas 1881-
1939, Bad Hornburg, FRG.
Bogardi, I., Duckstein, L., Mat yasovszky, I. and Bardossy, A. (1992) Estimating
space-time local hydrological quantities under climate change, Proceedings,
Fifth International Conference on Statistical Climatology, Toronto.
Duckstein, L., Bardossy, A. and Bogardi, I. (1993) "Linkage between the occurrence
of daily atmospheric circulation patterns and floods: an Arizona case study" ,
Journal of Hydrology, to appear.
Goodman, R.M., Higgins, C.M., Miller, J.W. (1992) "Rule-based neural networks
for classification and probability estimation", Neural Computation 4, 781-804.
32 A. BARDOSSY ET AL.
McCord Nelson, M. and Illingworth, W.T. (1991) A practical guide to neural nets,
Addison-Wesley, Reading.
Rummelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) "Learning representa-
tions by back-propagation errors", Nature 323, 9, 533-536.
Yarnal, B. (1984) "A procedure for the classification of synoptic weather maps from
gridded atmospheric pressure surface data", Computers and Geosciences 10,
397-410.
Assessing the risk to water resources facilities under climate change is difficult
because the uncertainty associated with 2xC0 2 climate scenarios cannot be readily
quantified. Grey systems theory is used to develop a grey prediction model (GPM)
that provides an interval of uncertainty. The GPM is used to extrapolate a
numerical interval around the decadal averages of precipitation and temperature
tltrough the year 2010 for a site in Northwestern Canada. The extrapolation is
calibrated on 20 years of data and validated against observations for the 1980's.
The values in the 1990's correspond to observed trends in the area. The
temperature and precipitation values are used to develop a grey water balance
model. The grey' intervals for annual potential evapotransipiration, deficit and
surplus are used to evaluate the reliability of a transient and three equilibrium
climate change scenarios scenarios. The grey intervals are not coincident with the
transient output, but they are trending towards the equilibrium scenario values.
This suggests that this particular transient scenario is inadequate for risk
assessment, and although the equilibrium scenarios appear to be within the grey
interval, they represent years beyond a reliable GPM extrapolation.
INTRODUCTION
The possibility of global climate change, and the subsequent changes to climate at
the local level, may alter the viability of new and existing water resource
structures. One decision-making tool that has been gaining acceptance in water
resource management and hydrology is risk assessment - a series of techniques that
are used to evaluate decisions when the future cannot be forecast with .certainty.
Risk is defined as a combination of the probability of an event occurrence and the
consequences associated with that event. It is partly a function of the quality of
information used to define a climatological event, and the uncertainty in the
observed or predicted data is strongly linked to the level of risk in a decision.
Generally, the larger the uncertainty, the higher the risk in making a decision.
33
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 33-46.
© 1994 Kluwer Academic Publishers.
34 B. BASS ET AL.
probability distribution, but the grey interval may be more appropriate since it is
difficult to make assumptions regarding the future variability of temperature and
precipitation. The analysis illustrates that grey theory may be an effective means
of using transient GeM scenarios, and evaluating the 2XC02 scenarios of other
GeMs. The analysis also suggest that the water balance may provide a better
means of detecting climate change although this is only a preliminary result and
bears further investigation.
where x(O)(i) is the ith element corresponding to period i. The problem under
consideration, for the grey prediction model (GPM), is the prediction of x(O)(i) for i
> n when standard statistical approaches are not applicable. A GPM is introduced
which can effectively approximate the uncertainty existing in X(O). First, there are
some requisite definitions related to the GPM.
The operations for grey vectors and matrices are defined to be analogous to those
for real vectors and matrices.
The concepts of accumulated generating operation (AGO) and inverse
accumulated generation operation (IAGO) are required for the GPM.
where:
k
X(D)(k) = ~ x(1ll)(i), k E [1, n], I~l
1:1.
and n = (r, r-I, '" , 1) and CJ) = (r-I, r-2, ... ,0). The rth IAGO of X (I) ( a(r)(X(t»
is defined in a similar manner (Huang, 1988). The AGO and IAGO are defined
from X(I) to x(r).
where:
where ®(t) = [k, k+I]. The support of ®(x) corresponding to ®(t) is defined as
follows:
(8)
we can convert it to
Let:
y<Oll'
y(O) = a S[X(l)) + b E,
= [S[X(l»), E) [a, b)T,
= [S[X(l»), E) C, (13)
Letting:
B = [S[X(I»), E),
(14)
(15)
Hence, x(')(k+ 1), V k, can be obtained by solving (9). Thus from the definition of
the IAGO, x(O)(k+I), V k, can be obtained from x(l)(k+I), V k. Obviously, when k >
38 B. BASS ET AL.
n-l, the obtained x(O)(k+l) provides a prediction of the x value in a future period
HI.
SIDDY LOCATION
The analysis was carried out at Cold Lake, Alberta. This location was chosen
because it is adjacent to the Saskatchewan River sub-basin that Cohen (1991)
examined using two GeM scenarios and the Thomthwaite water balance model.
Being relatively close to the Mackenzie Basin, which is the focus of a major
climate impact assessment, the results may also have bearing for future research in
that area. Cold Lake is also situated half-way between two GISS-GeM grid
points, which are used as a geographic "grey interval" in the water balance model.
Although the Thomthwaite model may not be appropriate for locations at these
latitudes (SO~ - S8~ are at the northerly edge of acceptability for the
Thomthwaite model, this study is only an exploratory evaluation of this technique.
DATA
The GPM is computed with 20 years of monthly temperature and precipitation data
(1966-198S) for Cold Lake, Alta. The grey temperature model is validated against
four years of data (1986-1989), and the precipitation grey model is tested against
six years of data (1986-1991). The water budget model is run with the GPM and
input from two grid points from the GISS transient GeM (GISST). The two grid
points are almost equidistant, directly north (1l0'W,S8~) and -south
(llO"W,S~O ) of Cold Lake, Alta which is situated at 1l0"W,S4~. The
temperature and precipitation output from the GISST are only available as decadal
averages, and for this study, the first four decades (1970 - 2010) are compared to
the GPM for the same time period. Three water budget components are compared
to the GISST and the three equilibrium scenarios for the decade of the 2040's. The
GREY THEORY APPROACH TO QUANTIFYING RISKS 39
RESULTS
The GPM was developed for monthly temperature and precipitation data (1965-
1985). The interval is validated against observations for the years 1986-1989
(Figure 1) for temperature and 1986-1991 for precipitation (Figure 2).
20~------------------------------~
~ 10+-~~~----~--~--~~~----~~~~
II
:
P
E
o+-~----~--~--~~~----~--~--~~
T
U
~ -10~~----~~------~~----~~~--~~
160
140
120
PREC
(mm) 100 co
80
,I J
60
40 A .~ ..< jl CI
co ;;;If.
1 ~L'" Ie N. P J 10 ~ 'A I!
CI
20 ito.
0
•
·~A ~~
.-'"1 ·u·
~. ~~ ~~~-~
BO .,
1--- 0bsaY0d • Grey-Max .. Grey-Min
T 2o.---------------------~----~~--_
E
M 10~~~~----~~----~~~--~~~~
p
E 0 ~..j/.--_¥r--_J..1_...Jjk__"_--.......___..JI_--4\o~
R
A -10~~--~~~----~~~----~~----~
T
~ ~O~------~~-------=~------~------~
E
The GISST precipitation is greater than the both the observed and the GPM
precipitation for the four decades except during the summer months in the first
decade of the twenty-first century (Figure 4).
160
140
p 120
R
E 100
c
80
(mm)
60
40
20
0
18717. 1880'. 1880s 2000's
Three variables are extracted from the water balance modelling: the annual
potential evapotranspiration (PE), the deficit (D) and the surplus(S). The water
budget model is run using decadal averages derived from observations (1970's and
1980's), the GPM and the GISST-GCM (Table 1). During the 1970's and 1980's
the observed PE falls within the grey interval, and for the 1980's, the observed
deficit falls between the grey interval while it is smaller than the lower grey range
for the 1970's. This is most likely due to the higher monthly average rainfall
between April and August in the 1970's. During the same period the water budget
derived from the GISST temperature and precipitation produces the opposite result.
During the next two decades the grey deficits decrease (95.9 - 25.9mm) and a
small surplus is evident (20.1 - 93.1mm). The PE is slightly larger throughout the
four decades at the high end of the grey interval (530.6 - 595.6 mm) while it
remains almost constant at the lower end of the interval. This result is due to the
fact that the grey temperature interval is quite small (Figure 3), and the grey
precipitation interval was much larger (Figure 4).
PE D S
1970 OBS 522.1 -63.1 0.0
GREYMAX 530.6 -9S.9 0.0
GREYMIN S13 -148.9 0.0
GISST SON 467.8 -1.7 396.4
GISST S8N 402.8 -0.2 347.9
The GISST water budgets also exhibit patterns similar to the 1970's and
1980's over the next two decades. At llOW, 50N the surplus increases from the
42 B. BASS ET AL.
1980's through the twenty-first century due to an increase in spring and summer
precipitation (Figure 4). At llOW, 58N, the surplus decreases for the same period
due to smaller levels of summer precipitation. Since the GPM appears to be a
valid description of the monthly temperature and precipitation at Cold Lake during
the 1970's and 1980's, it is used to adjust the GISST temperature upward and the
GISST precipitation downward (Appendix 1). At both grid points (GISSTA) the
PE increased, and at both grid points the deficits and surplus are now within the
respective grey intervals. The GISSTA water budget corresponds more closely
then the unadjusted GISST to the GISS, GFDL87, and OSU 2XC02 equilibrium
scenarios for the Cold Lake location (Cohen, 1989; 1991).
Bass et al. (1994) present a method for evaluating data quality for weather-
dependent decisions. This framework presents data uncertainty as a numerical
interval which a decision maker interprets as encompassing the actual or "true"
value for a decision. How much risk a user is willing to accept has to do with the
importance of a decision and the size of the interval that is acceptable.
The annual grey deficit, surplus and PE, both the GISST and GISSTA
scenarios, and the three equilibrium scenarios are plotted in Figures 5-7. The
GFDL87, GISS and OSU scenarios are plotted for the 2040's, since their climates
are supposed to be representative of some future equilibrium. In addition the
GISST values for the 2040's are also plotted. For each water budget component,
the GISST scenario is outside of the grey interval. In each figure the grey interval
is directed to the three equilibrium scenarios although there are obvious limits in
projecting the grey interval to the 2040's.
In Figure 5, the grey interval is probably too large for an effective decision
(small level of risk) for the 1990's. Nevertheless, it demonstrates that the GISSTA
deficit is within the grey interval although the high end is very close to the GISST
deficit as well. Figure 6 provides a more reasonable decision interval,
encompassing the GISSTA surplus, for the first decade of the twenty-first century
that is clearly separated from the GISST surplus. In addition the grey water budget
points towards the general direction of the three equilibrium scenarios. Figure 7
provides a clear evaluation of the quality of both the GISST and the GISSTAPE.
GREY THEORY APPROACH TO QUANTIFYING RISKS 43
0
OISS
-50
osu
-100 • N
DEFICIT 1
OISS
(mm)150
-200
-250
70 80 90 0 10 20 30 40
-OREY-MAX -OREY-MIN _OISST 11OW.5ON
_OISST 11OW,5llN ..... OISST(A) 11OW.50N ..... OISST(A) 11OW.58N
-I
Q.
~ 200 +-----------1
en
100 +-----=---------4
GISS
O 1~~~~~~~~~(~11~ow~·~~~losu
GFDl
70 80 90 0 10 20 30 40
-GREY-MAX • GREY-MIN ... GISST 11OW.5ON
_GISST 11OW.58N ..... GISST(A) 11OW.5ON..... GISST(A) 11OW.58N
750r-----------------~
700 +----------=-1
650~------~--------~
PE
(mm)600 +----~-----~
550 +----,,,,,,.--+--r-----~
500 +--~~"=""----~
450+---~-----~~:.
400~~~~~~~~~~
70 80 90 0 10 20 30 40
-GREY-MAX -GREY-MIN .... GISST 11OW,5ON
_GlSST 11OW,58N -t-GISST(A) 11OW,5ON -t-GISST(A) 11OW,58N
In this case, most of the scenarios, all the GISST and two of the GISSTA,
fall outside of the grey interval and would most likely be rejected. The grey water
budget interval also indicates that beyond the first decade of the twenty-first
century, neither the GISST or the GISSTA appear to be valid. However, the grey
water budget points in the direction of the three equilibrium scenarios, and this
includes the GISST for I10W, SON.
CONCLUSIONS
The results of this analysis suggest that grey prediction model may be an
appropriate tool for evaluating the risks associated with a GeM, particularly. The
GPM adequately represented the decadal averages for temperature and precipitation
for the 1970's and the 1980's. Preliminary analysis of temperature trends in north-
western Canada in the early 1990's suggest that spring temperatures have been
anomalously warm, which is also reflected in the GPM. The grey water budget
components also enclosed the water budgets based on observations for the same
period. Assuming that the GPM is valid for the 1990's and the first decade of the
twenty-first century, it provides a means of evaluating scenarios of the PE and the
surplus. However, the grey deficit interval is most likely too large to provide a
useful evaluation of deficit scenarios in the twenty-first century. In addition the
grey PE and surplus intervals also point in the general direction of three
equilibrium 2XC02 scenarios, although it would be premature to suggest that these
scenarios will remain valid for Cold Lake, Alberta in the 2040's. While the GPM
GREY THEORY APPROACH TO QUANTIFYING RISKS 45
appears to be valid for the monthly temperature and precipitation data at Cold
Lake, further testing at other sites and through the 1990's is required in order to
generalize these results.
REFERENCES
Bass, B., Russo, J.M. and Schlegel, J.W. (1994) "Data Quality in Weather-
Dependent Decisions" (in press).
Cohen, S.J., Welsh, L.E. and Louie, P.Y.T. (1989). Possible impacts of climatic
warming scenarios on water resources in the Saskatchewan River sub-basin.
Canadian Climate Centre, Report No. 89-9. Available from Climatological
Services Division, AES, Downsview, Ontario, Canada.
Deng, J. (1984) The Theory and Methods of Socio-economic Grey Systems (in
Chinese), Science Press, Beijing, China.
Huang, G.H. (1988) "A Grey Systems Analysis Method for Predicting Noise
Pollution in Urban Areas", The Third National Conference on Noise Pollution
Control, Chengdu, Sichuan, China (in Chinese).
Huang, G.H. and Moore, R.D. (1993) "Grey linear programming, its solving
approach, and its application". International Journal of Systems Science 24, 159-
172.
APPENDIX I
The GISST outputs were adjusted using the grey white mid value (WMV)
and the half-width between the two GISST grid points. The WMV is analogous to
a grey mean and was defined for each decade. A mean value was defined for
each month,
(16)
(17)
Similarly a mean GISST value was defined for each decade. Each monthly mean
GeM precipitation values were adjusted by the subtracting the difference between
the GeM mean and the WMV. For the GeM temperature values, this difference
was added to each monthly mean. The values for each grid point were recreated
by adding the GeM half-width to the adjusted monthly mean for HOW, SON and
subtracting this value from the adjusted monthly mean for 110W, SBN. The half-
width is defined as
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY
PRECIPITATION
ABSTRACT
INTRODUCTION
Stochastic models for precipitation occurrence at a site have a long, rich history in
hydrology. The description of precipitation occurrence is a challenging problem since
precipitation is an intermittent stochastic process that is usually non stationary, can exhibit
clustering, scale dependence, and persistence in time and space. Our particular interest is in
developing a representation for daily precipitation in mountainous regions in the western
United States.
Webb et.al (1992) note that, a mixture of markedly different mechanisms leads to the
precipitation process in the western United States over the year and even within a given
season. A rigorous attack on the problem would perhaps need to consider the classification
of different precipitation regimes at different time scales, the identification of such classes
from available data, and the specification of a stochastic model that can properly reproduce
these at a variety of time scales. Our focus is on developing appropriate tools to analyze the
raw daily data without deconvolution of the mixture based on synoptic weather
classification.
In most traditional stochastic models, probability distributions are assumed for the
length of wet or dry spells and also for the precipitation amounts. While such distributions
may fit the data reasonably well in some situations and for some data sets, it is rather
disquieting to adopt them by fiat. It is our belief that hydrologic models should (a) show
(rather than obscure) the interesting features of the data; (b) provide statistically consistent
estimators; and (c) be robust. Consistency implies that the estimates converge in probability
to the correct behaviour. The standard practice of assuming a distribution and then
calibrating the model to it clearly obscures features of the data and may not lead to a
consistent estimator from site to site. This is particularly relevant where the underlying
process is represented by a mixture of generating processes and is inhomogeneous. The
47
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 47-59.
© 1994 Kluwer Academic Publishers.
48 B. RAJAGOPALANET AL.
issue of interest is not the best fit of a model but the ability to represent a heterogeneous
process in a reasonable manner.
This motivates the need for a stochastic model for the generation of synthetic precipitation
sequences that is conceptually simple, theoretically consistent, allows the data to determine
its structure as far as possible, and accounts for clustering of precipitation events and
process heterogeneity.
Here we present results from a nonparametric seasonal wet/dry spell model that is
capable of considering an arbitrary mixture of generating mechanisms for daily precipitation
and is data adaptive.
The model yields stochastic realizations of daily precipitation sequences for different
seasons at a site that effectively represent a smoothed bootstrap of the data and are thus
equivalent in a probabilistic sense to the single realization observed at the site. The
nonparametric (kernel) probability density estimators considered in the model do not
assume the form of the underlying probability density, rather they are data driven and
automatic. The model is illustrated through application to data collected at Woodruff, Utah.
MODEL FORMULATION
The random variables of interest are the wet spell length, w days, dry spell length, d
days, daily precipitation, p inches, and the wet spell precipitation amount, Pw inches.
Variables wand d are defined through the set of integers greater than 1 (and less than
season length), and p and Pw are defined as continuous, positive random variables. The
year is divided into four seasons, viz., Season - I (January - March), Season - U(April -
June), Season - III(July - September), and Season - IV(October - December). The
precipitation process is assumed to be stationary within these seasons. Precipitation
measurements are usually rounded to measurement precision (e.g., 0.01 inch increments).
We do not expect the effect of such quantization of the data to be significant relative to the
scale of the precipitation process, and treat precipitation as a continuous random variable. A
mixed set of discrete and continuous random variables is thus considered. The precipitation
process over the year is shown in Figure 1.
Figure 1.
Precipitation process over the year
The key feature of the model is the nonparametric estimation of the probability density
function (using kernel density estimators) for the variables of interest, rather than fitting
parametric probability densities. The reader is referred to Silverman (1986) for a pragmatic
treatment of kernel density estimation and examples of applications to a number of areas.
The model is applied to daily precipitation for each season. The pdf's estimated for
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 49
each season are, f(w) the pdf of wet spell length, f(d) the pdf of dry spell length, f(p) the
pdf of daily precipitation amount f(p). Kernel density estimators are used to estimate the
pdfs of interest from the data set.
Synthetic precipitation sequences are generated continuously from season to season,
following the strategy indicated in Figure 2. A dry spell is first generated using f(d), a wet
spell is next generated using f(w). Precipitation for each of the w wet days is then
generated from f(p). The process is repeated with the generation of another dry spell. If a
season boundary is crossed, the pdfs used for generation are switched to those for the new
season.
For the univariate continuous case, the random variable of interest (p) is generated from
the kernel density estimate following a two step procedure given by Oevroye (1986, p.
765) and also in Silverman (1986). While, the discrete variables (wand d) are generated
from the cumulative mass function.
The above procedure neglects correlation between sequential wet and dry spell lengths
and correlation between daily rainfall amounts within a wet spell. These correlations can be
incorporated through the use of conditional pdf's and the disaggregation of total wet spell
precipitation into daily amounts, (LaB and Rajagopalan, in preparation).
For the data sets analysed here, all the correlations mentioned above were found to be
insignificant. Consequently we did not use conditional pdfs and disaggregation here .
.......
-
Dry Spell Wet Spell
d days w days
from fed) .... from few)
,
Independent daily
precipitation • w days
from f(p)
Figure 2
Structure of the renewal model for daily precipitation
=L ...LK(p-Pi)
n
fn(p) (2.1)
i=l nh h
This estimates the probability density f(p) based on n observations Pi. K(.) is a kernel
50 B. RAJAGOPALAN ET AL.
function defined to be positive, symmetric, have unit integral, symmetric and has finite
variance. These requirements ensure that the resulting kernel density estimate is a valid
density. The symmetry condition is not essential, but is used to avoid bias. The subscript
en' emphasizes that this is an estimate based on en' data points. The bandwidth parameter h
controls the amount of smoothing of the data in our density estimate. An estimator with
constant bandwidth h, is called a fixed kernel estimator. Commonly used kernels are:
One can see from Equation 2.1, that the kernel estimator is a convolution estimator. This
is illustrated in Figure 3. The kernel density estimate can also be viewed as a smoothing of
the derivative of the empirical distribution function of the data.
2.5
1.5
1 I. Data point I
0.5
o+-~-----~~~~~~~~-----~~~~~--~
o 5 10 15 20
x
Figure 3.
Example of Kernel pdf. using 5 equally spaced values (black dots) with
Bisquare Kernel, and a fixed bandwidth (h=4)
Note that x is assumed to be a continuous variable
The choice of the bandwidth and kernel can be optimized through an analysis of the
asymptotic mean square error (MSE), (E[(f(P)-fn (P»2]) or mean integrated square error
(MISE), the integral of MSE over the domain. Under the requirements that the kernel be
positive and symmetric, having unit integral and finite variance, Silverman (1986, p. 41)
shows that the optimal kernel in terms of minimizing MISE is the Epanechnikov kernel.
However it is only marginally better than others listed above. Silverman (1986, Eqn. 3.21)
shows that the optimal bandwidth, hopt, is a function of the unknown density f(p). In
practice a certain distribution is assumed for f(p) and the MISE is minimized to obtain
optimal bandwidth hopt with reference to the assumed distribution. Kernel probability
density estimation can also be improved by taking h to be variable, so that the smoothing is
larger in the tails where data is sparse, but less where the data is dense.
A number of bandwidth selection methods have historically been used, like the
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 51
cross validation methods (maximum likelihood and least squares cross validation, see
Silverman (1986), Sec. 3.4). These methods are prone to undersmoothing (Silverman,
1986). This is pronounced when the data is concentrated near a boundary. This is the case
with precipitation where there is a finite lower bound (precipitation> 0) to the domain.
Symmetric kernels near the boundary can violate this. One approach is to relax the
symmetry constraint and use boundary kernels such as suggested by Muller (1992). Here
however we chose to avoid the issue by working in a log transformed variable space. A
fixed bandwidth kernel density estimate (Eqn. 2.1) is applied to In(p) and the resulting
probability density is back transfonned, to get:
(2.3)
h was chosen as optimal with reference to the normal distribution in the log
space.Epenichnikov kernels were used. The optimal bandwidth is (using Silverman 1986,
Eqn.3.1)
where 0 is the standard deviation of the log transformed data. This method provides
adaptability of bandwidth and also gets around the boundary issue. Figure 4(a) shows this
method applied to precipitation data collected at Woodruff, Utah over the years 1948-1989
for Season 1 (Jan.-Mar.). Note that this follows the data as reflected by the histogram well.
There are differences between the kernel estimate and a fitted exponential distribution, but
from Figure 4(a) it is hard to see which is better. Figure 4(b) shows the cumulative
distribution functions obtained by integrating the probability density function. Both kernel
and exponential cdf estimates are compared to the empirical cdf using the Weibul plotting
position (i / (n+ 1», with 95% confidence limits ± <fa set up around the empirical cdf,
calculated using (Kendall and Stuart, 1979, p.481)
It can be seen from Figure 4(b) that the cdffrom exponential distribution leaves the 95%
confidence interval, while that from the kernel estimator lies entirely within. This suggests
that the true density function of the data is different from exponential.
relative frequencies that match those in the historical data set. One nonparametric estimator
of the discrete probability distribution of w or d is the maximum likelihood estimator that
yields directly the relative frequencies (e.g., number of wiln, for the ith wet spelliengtb Wi
in a sample of size n). The kernel method is better, because (a) it allows interpolation and
extrapolation of probabilities to spell lengths that were unobserved in the sample, and (b) it
has higher MSE efficiency.
Wang and Van Ryzin (1981) developed geometric kernels for discrete variables.
Simonoff (1983) has developed a maximum penalized likelihood (MPLE) estimator for
discrete data. Hall and Titterington (1987) show how discrete kernel estimators can be
formed continuous kernels evaluated at discrete points and renormalized. These three
methods were compared and we found that the Hall and Titterington (1987) approach
worked best. The geometric tended to undersmooth, while MPLE oversmoothed.
The Hall and Titterington (1987) estimator is similar to the estimator in Equation (2.1),
but with an adjustment for use with discrete data. It is given as:
n
f(w) = n-IhL Kh{h(w-wj)} (2.6)
1
where, h is the bandwidth and h E (0,1], Kh (h(w-wj)) evaluated only for discrete
(integers) w-wiis defined as,
K(.h) is the kernel, s(h) is the scale factor to rescale the discrete kernel to sum to unity,
and is
Where, f(w) is estimated using Equation (2.6), while fw(wlh) is also estimated using
Equation (2.6) but by dropping the point Wi. They proved that the above cross-validation
automatically adapts the estimator to an extreme range of sparseness types. If the data is
only slightly sparse, cross-validation will produce an estimator which is virtually the same
as the cell-proportion estimator. As sparseness increases, cross-validation will
automatically supply more and more smoothing, to a degree which is asymptotically
optimal. To alleviate the boundary problem, Dong and Simonoff (1992) have developed
boundary kernels for most of the commonly used kernels. Dong and Simonoff (1993) have
successfully tested the boundary kernels on data sets that are similar to the ones we have.
We have used the bisquare kernel (as defined in Equation 2.2c) and the corresponding
boundary kernels for our analysis.
Figures 4(c) and (d) illustrate this approach applied to wet and dry spells. In Figure
0
LJ)
N >
q
.... :~.--. z
.... ~ ..... 0
Z
~ j J "tl
Histogram of His!. Data co 95%C.I
Kernel Estimated PDF ci kernel CDF
>
i':l
~ j ~\ I _.- Fitted Exp. Distn. Exp. CDF 7:' >
~
# of data points = 718 u. '/ tI:I
Cl '"c:i .I >-l
() i':l
g 1 ~~ ./' (5
~
.. / i':l
ci ;"-/ tI:I
z
~ 1 ~~[jl:\..,
C\I ~
ci >
t""
o J O/..,I~;''''&'''',;y.'':'''>';w.Sfli$_QeIiIm:r-5ll~SJ:'(~.
~
0
t;j
0.0 0.2 0.4 0.6 -4 -3 -2 -1 0 tI:I
t""
ppt. (in) log(ppt.) 'Il
0
i':l
Figure 4(a) PDF of daily precip. for Season 1 Figure 4(b) CDF of daily precip. for Season 1 ~
0
t;j
~
Cl
0 t;j
C\I
6j \ c:i >
F
10 Observed Propn
><
Observed Propn. "tl
Kernel Estimate c:i
- Kernel Estimate i':l
E " 0 Geometric distn, Geometric dlstn ntI:I
a. \ E0.
0 ::a
c:i
- ::J
(\j
>
>-l
' ..
0 \ (5
10
0 Z
ci
'~
0 '.,
0
2 4 6 8 10 2 4 6 8 10 12 14
4(c) there is no appreciable difference between kernel estimate and fitted geometric
distribution. In Figure 4(d) the kernel estimate is seen to be a better smoother of the
observed proportion, than a fitted geometric distribution.
The above results indicate that the kernel estimators provide a flexible or adaptive
representation of the underlying structure, under weaker assumptions (e.g continuity,
smoothness) of the density than classical parametric methods.
Simulation Results
The above pdf and pmf estimates were used in the simulation model described earlier,
applied to the Woodruff Utah, data. In order to test the synthetic generation of the model,
the following statistics were computed for comparing the historical record.
Twenty five simulations were made and the above statistics were calculated for the three
variables. They are plotted along with the historical statistics as boxplots. The box in the
boxplots indicates the interquartile range of the statistic computed from twenty five
simulations, while the lines extending outward from the boxes go upto 95% range of the
statistic. The dots are the values of the statistic that fall outside the 95% range. The black
dot joined by solid lines is the statistic of the historical record. The boxplots show the range
of variation in the statistics from the simulations and also show the capability of the
simulations to reproduce historical statistics.
Figures 5, 6 and 7 show the boxplots of the various statistics, for each season, for the
three variables, daily precipitation, wet spell length and dry spell length respectively. It
can be seen from the above figures that the simulation procedure reproduces the
characteristics well. The number of wet spells and dry spells are simulated more than the
historic record. The reason is that in the historic data there are many missing values, which
results in lesser wet and dry spells, while simulations are made for the entire length of the
record. This introduces a small bias, as a result of which the historical statistics tend to fall
outside the boxes in the boxplots. This can be observed from Figures 6(c) and 7(a).
Thus, the model provides a promising alternative to the parametric approach. The
assumption free and data adaptiveness of the non parametric estimators makes the model
more robust to distributional assumptions.
Further work is required by way of analysing more data sets, and comparing with the
traditional stochastic models like Markov chain, Markov renewal model etc. The model is
being generalized by Lall and Rajagopalan (in preparation) to handle situations were the
correlations between the variables are significant.
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 55
.r::
0,
c:
~
Qi
Q.
'"~
"0
C'1
'0
>
Q)
I/)
c: E
0
I/) '"
'0
'"
Q)
C/)
I/)
~
Q.
)(
d1
e
t-
~
::>
C>
u:
L 9 s
4lfiualUads NP 10 "/lapIS
.r::
.r:: "6
c:
0, ~
c:
~ f-- I, f----f 'Of
Qi
a
Qi
Q.
I/)
'"~
"0
~
"0 '0
'0 I- ---i E
CO) f-- CO)
::>
~~
c:
E
'" E'"
c:
Q)
'"0
c: "x
'"E
~
0 In
'"'" '0 '"
Q)
(5
'"
Q)
C/) C/)
'0 1!l
Ci
x
f-; (\j 0
0
Ci
x
co 0
CO
3: r- f--- .....u
t-
;:::-
~
Q)
:; f--- ~ Q)
:;
C>
u: L-L-- Q>
u.
OL S9 09 SS US SP OP 0. Of:
"! T
'" T "!
.c:
0, 0
c -a;
a. T
-'" '" '" "":
Qi Qi 1
a. ~ T
Vl co
Qi '0 N
;: - :> 1
c Q)
en ~
1
ill
E
'" - '"
'<T I ----,-- q~
~
0
1
2 3 4 2 3 4
Seasons Seasons
Figure 6(a) Boxplots of mean of wet spell length Figure 6(b) Boxplots of stddev. of wet spell length
",I
TI
0
T
Q)
a.
UJ
Qi
?;
p T
Vi co
Q) !':1
Ol :;0
c )-
0
-'
~ ......
)-
Cl
ill 1 0
"C
)-
1 l'
)-
2 3 4 Z
tIl
>-,j
Seasons )-
r'
Figure 6(c) Boxplots of maximum of wet spell length
>
<D
0
T ~ T z0
I z"0
>
:;0:1
.L
"
0
~ >
c:i
S'::
Ci. 8: 1:!1
Q.
N
0> ~
«> 0 ~
(f)
I/)
~
:;0:1
1:!1
0
# ~ ...c:i
~
0 T/ ::;:
0
>
t""'
00 1
0 S'::
...c:i
0 0
2 3 4 2 3 4 tl
1!l
Seasons Seasons 2l
:;0:1
Figure 5(a) Boxplots of mean of daily precipitation Figure 5(b) Boxplots of stdev. of daily precipitation
6tl
1:!1
t""'
Z
Cl
0
('") tl
0 I i~ 0 >
N T P
>-<:
Ci.
a.
>-
If)
N a
$ T
;g
1:!1
0 Co Ll'!
~
""'>- .,.c ~
'0 ~
~ >
~ 0
N 1 g
0 / C! T Z
2 3 4 2 3 4
Seasons Seasons ~
Figure 5(c) Boxplots of % of yearly precipitation Figure 5(d) Boxplots of maximum ot dally precipitation
58 B. RAJAGOPALAN ET AL.
Acknowledgements
Partial support of this work by the U.S. Forest Service under contract notes, INT-
915550-RJVA and INT-92660-RJVA, Amend #1 is acknowledged. The principal
investigator of the project is D.S. Bowles. We are· grateful to J.Simonoff, J. Dong, H.G.
Muller, M.e. Jones. M. Wand and SJ. Sheather for stimulating discussions, provision of
computer programs and relevant manuscripts. The work reported here was supported in
part by the USGS through their funding of the second author's 1992-93 sabbatical leave,
when he worked with BSA,WRD,USGS, National center, Reston, VA.
REFERENCES
Devroye, L. (1986) Non-uniform random variate generation, Springer-Verlag, New York.
Dong Jianping and J. Simonoff. June 10, (1992) "On improving convergence rates for
ordinal contingency table cell probability estimation", unpublished report.
Dong Jianping and J. Simonoff. (1991), "The construction and properties of boundary
kernels for sparse multinomials", unpublished report.
Hall, P. and D.M. Titterington. (1987) "On smoothing sparse multinomial data",
Australian Journal of Statistics 29(1), 19-37.
Kendall, Sir M. and A. Stuart. (1979) Advanced Theory of Statistics, Vol. 2, Macmillan
Publishing Co., New York.
Lall, U. and B. Rajagopalan. " A nonparametric wet/dry spell model for daily
precipitation", in preparation for submission to Water Resources Research.
Muller, H.G. (1991) "Smooth optimum kernel estimators near endpoints", Biometrika
78(3),521-
530.
Silverman. B.W. (1986) Density estimation for statistics and data analysis. Chapman and
Hall, New York.
Simonoff, J.S. (1983) "A penalty function approach to smoothing large sparse contingency
tables", The Annals of Statistics 11(1),208-218.
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 59
Wang, M.C. and J. Van Ryzin. (1981) "A class of smooth estimators for discrete
distributions", Biometrika 68(1), 301-309.
Webb, R.H. and J.L. Betancourt. (1992) "Climatic variability and flood frequency of the
Santa Cruz river, Pima county, Arizona", USGS water-supply paper 2379.
PART II
FORECASTING
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE-
HOW MUCH UNCERTAINTY IS ENOUGH
OJ. DRUCE
Resource Planning, B.C Hydro
Burnaby Mountain System Control Centre
C/O Podium B, 6911 Southpoint Drive
Burnaby, B.C. Canada V3N 4X8
For the past several years, the British Columbia Hydro and Power Authority has made
available to the forest industry and others, probabilistic forecasts of month-end elevations
for its largest storage reservoir, Williston Lake. These forecasts consist of the median,
lower decile and upper decile values for each month over a 24 month period and are
updated on a monthly basis. They are generated by a stochastic dynamic programming
model, in combination with a simulation model. The SDP model derives a monthly
operating policy for Williston Lake, the adjacent 2416 MW G.M. Shrum generating
station and the 700 MW Peace Canyon run-of-river hydroelectric project located
downstream on the Peace River. The operating policy provides releases for each month
that are conditional on the reservoir storage state and on a randomized historical weather
state. The sample of month-end reservoir levels calculated by the simulation model is
easily manipulated to directly elicit the percentiles for the forecast. Analyses of the
forecasts issued since April 1989 indicate that while the median forecasts have been
relatively accurate, the decile forecasts with lead times of less than a year have
underrepresented the uncertainty in the reservoir levels. Furthermore, preliminary results
suggest that an ungraded version of the SDP model, that includes a stochastic
export/import market, will not add sufficient uncertainty for the shorter lead times. A
source of forecast error that deserves more attention has, however, been indentified.
INTRODUCTION
Williston Lake, with a live storage capacity of 40,000 x 106 m3, is the largest reservoir
operated by the British Columbia Hydro and Power Authority (B.C. Hydro). Williston
Lake was created when the W.A.C. Bennett Dam was constructed on the Peace River in
the 1960's. It filled for the first time in 1972. Adjacent to the W.A.C. Bennett Dam is the
2416 megawatt (MW) G.M. Shrum (GMS) hydroelectric power plant and just
downstream is the 700 MW Peace Canyon (PCN) run-of-river project. This hydroelectric
complex is located in northeastern British Columbia, as shown in Figure 1.
Traditionally, forecasts of Williston Lake levels were produced only for in-house use,
i.e., for operations planning. Then, in the late 1980's, it became apparent to B.C. Hydro
that other users of Williston Lake, primarily the forest industry, were keenly interested in
future water levels. B.C. Hydro responded by distributing the forecasts to the forest
companies and to area managers who deal more directly with the local interests. At the
same time, B.C. Hydro began relying more on economic criteria and less on water level
63
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 63-75.
© 1994 Kluwer Academic Publishers.
64 D.J.DRUCE
YUKON
B.C.
For this paper, the 24 month water level forecasts issued since April 1989 have been
analyzed to obtain performance statistics for the median forecasts and to determine
whether the methodology produces upper and lower decile forecasts that are credible.
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 65
B.C. Hydro supplies electricity to most of the province from an integrated system of
power plants that are predominantly hydroelectric. The nameplate generating capacity of
the system is 10,390 MW, with the 30 hydroelectric plants contributing 9332 MW.
Based on their operating regime, each of the hydroelectric plants can be placed into one
of three groups, namely, the Peace River plants, the Columbia River plants and the rest of
the system. The Peace River plants, OMS and PeN, are unique in that they provide most
of the monthly and annual operating flexibility in the system. The Columbia River
projects are characterized by large power plants with some storage reservoirs, but
relatively little control over monthly or annual generation levels. This lack of control is
partly the result of the Columbia River Treaty signed with the United States in 1961.
Mica Dam, on the main stem of the Columbia River is one of three dams built in Canada
under terms of the Columbia River Treaty. The reservoir created by Mica Dam,
Kinbasket Lake, has a live storage capacity of 14,800 x 106 m 3 of which approximately
8630 x 106 m3 are operated in accordance with Treaty requirements. Monthly releases
from Treaty storage or month-end storage targets for the upcoming year are
predetermined from studies jointly prepared by the United States and Canadian Entities
and, for the most part, are independent of runoff conditions. Storage targets rather than
releases are specified for the summer months to facilitate refill of Treaty storage over a
wide range of monthly inflow volumes. As a result, there is greater uncertainty over the
monthly releases and generation from the 1736 MW Mica power plant at that time of the
year. The 1843 MW Revelstoke project, located downstream of Mica Dam, has
considerable local inflow and 1560 x 106 m 3 of live storage capacity, but is generally
operated as a run-of-river plant. Consequently, the Columbia River Treaty obligations
play a dominant role in the monthly operation of both projects. There is however some
short-term operating flexibility available through the use of non-Treaty storage. B.C.
Hydro has an additional 1137 MW of generating capacity at two power plants, Kootenay
Canal and Seven Mile, located on tributaries of the Columbia River, but has little or no
control over the operation of the respective upstream storage reservoirs. The variation in
the summer generation from Columbia River plants is usually accommodated by
adjusting the generation from Peace River plants and is reflected in the Williston Lake
levels. The rest of the hydroelectric system is comprised of many small to moderate-
sized projects where the operation is dictated more by the hydrologic regime and storage
limitations than by system requirements. The percentage of the total system generation
provided by each of the three groups is shown in Figure 2. Thermal projects have not
contributed much generation in recent years, but the 912 MW Burrard thermal plant,
located near Vancouver, is available for system support as r~quired.
By design, B.C. Hydro generally has more energy than it needs to meet the domestic
demand, plus any other firm obligations, except under prolonged low water conditions.
Rather than have this surplus energy accumulate (as water) in the system reservoirs, until
it eventually spills, B.C. Hydro attempts to market the energy to utilities in the Pacific
66 D. I. DRUCE
2" THERMAL
31" PEACE
4bX COLUMBIA _ _- o J
In the past, the quantity of surplus energy available for the current year was estimated
from deterministic energy studies that assumed average water conditions over a four year
planning period. Energy beyond what was required to refill the reservoirs in the first year
could be exported without concern about the effect on the longer term reliability of the
system. However, for these studies, only the energy that could be generated by the
hydroelectric projects was considered. In the event that the reservoir inflows were less
than expected, thermal power plants could then be operated to refill the system. As a
policy, thermal power plants were not operated concurrently with exports. Since B.C.
Hydro is only a marginal supplier to the large Pacific Northwest/Southwest market,
export prices were typically set just below the decremental cost of the importing utility,
after allowing for transmission costs. The incremental cost of actually supplying energy
from the hydroelectric system was generally unknown, but was thought to be small, and
had no influence on either the price or quantity of the exports.
By the late 1980's, B.C. Hydro had started to move away from the notion that
hydroelectric energy had only a nominal production cost and toward marginal pricing,
based on opportunity costs. In the short-run, B.C. Hydro's opportunity costs are directly
affected by the forecasted markets for exports and for alternate resources and by the risk
of a system spill. For estimates of its system marginal cost, B.C. Hydro relies on the SDP
model developed for the operations planning of Williston Lake. An outline of how that
model is used for forecasting both the Williston Lake levels and the system marginal
marginal cost is included as the next section.
FORECASTING METHODOLOGY
Most stochastic dynamic programming models that are used for reservoir management
FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE 67
Under the assumption that Williston Lake is operated to balance system loads and
resources, it is necessary to establish how much of the load can be served by the other
resources. The GMS/PCN plants can then be operated to supply the residual fIrm load
and to accommodate any interruptible sales or purchases. The schematic, presented as
Figure 3, shows the components included in the derivation of the GMS/PCN load, as
well as other input and output for what has come to be known internally as the "marginal
cost" model.
I
GENERATION FROM
COLUMBIA RIVER
PLANTS
I
MARGINAL COST MODEL
I \
MARGINAL COST WILLISTON LAKE
FORECAST LEVEL FORECAST
Uncertainty for both loads and resources is based on the effects of randomized
monthly weather sequences from the years 1973 to 1992, i.e. the year before the current
year. The monthly domestic load forecast is adjusted by 20 gigawatt-hours (GW.h) per
degree Celsius whenever the mean temperature for an historical month at a weather
station near the load centre deviates from the long term mean temperature for the
corresponding month. However, this empirical relationship typically alters the monthly
load by less than three per cent. Uncertainty is added to the resource side by linking the
weather months and the energy historically generated by the rest of the hydroelectric
system, excluding the four large plants in the Columbia River group. As mentioned
earlier, the majority of these smaller projects have limited storage capacity and their
operation is related to the local hydrology rather than the system load. The year-to-year
variability in the small hydro generation for a given month is usually less than about five
68 D.J.DRUCE
percent of the system load. Examination of Figure 4 will help put this source of
uncertainty into perspective.
II.
b.
1- TOTAL SYSTEM 1--- GMS/PCN 1- SMALL HYDRO 1
5....
,.., 4....
.c
3t!)
'-'3....
>-
t!)
ffi
ffi 2....
1. II.
73 74 75 7b 77 78 79 81 81 82 83 84 85 8b 87 88 89 9. 91 92
YEAR
historical inflows. In each case, the inflow sequences, corresponding to the weather years
of 1973 to 1992 are divided into monthly values to produce monthly inflow and the
weather pairings. Monthly weather sequences are explicitly assumed to be independent
in the SDP
formulation and that implies that the monthly inflow values should not be highly
autocorrelated. The statistical summary of historical inflows to Williston Lake, included
as Table 1, shows that monthly autocorrelation is low to moderate. The range in the
annual water supply for Williston Lake, when converted to energy, amounts to about 18
per cent of the annual system load. Although, for the current year, that level of
uncertainty can be reduced by the hydrologic forecasting, as illustrated by Figure 5.
TABLE 1. Summary of inflow volumes to Williston Lake for 1973 to 1992.
For the past four years, the observed annual inflows have generally fallen within the
forecasted ranges. But, individual monthly values were often outside the forecasted
monthly range and that has resulted in substantial short-term errors in the water level
forecasts. The occurrence of such errors will tend to reduce over time as more weather
years are added to the data base.
The interruptible export and alternate resource markets are also major sources of
uncertainty for B.C. Hydro. In an upgraded version of the SDP model, these markets are
treated as stochastic variables. However, for the model that has been used for
forecasting, deterministic forecasts of the monthly quantities and prices for both markets
are input.
70 D.l.DRUCE
ill. I ..
,....
- - - - - -0-- ---- -I}-- - - - - -0- - - - - - -f)"
,,
.J::
:3 8. II. ,,
e ,
:3
0 '8 . . ........
....J
u. b.1I1I
z ............. 'Il
H ,,
....J ,,
<J:
:=! ,,
~ 4. 111111
"t1
z
H
W
(.!)
2. 11111
~ FORECASTING MODEL
\-1}- NAIVE \+- HYDROLOGIC \
~
.4------+------~----~----_+------~----~----~
1 JAN 1 FEB 1 MAR 1 APR 1 MAY 1 JUN 1 JUL 1 AUG
FORECAST DATE
Figure 5. Reduction in the uncertainty of the annual inflow to Williston Lake.
Given all the information described above. the SDP model selects for each reservoir
state and weather state. the monthly release from Williston Lake that will maximize the
net revenue to B.c. Hydro. That operating policy is then passed on to a companion
model that simulates the operation of the hydroelectric complex for each of the historical
weather years. from a known initial reservoir level. The sample of possible month-end
reservoir levels calculated by the simulation model are manipulated to obtain the
percentiles that are presented graphically. as shown in Figure 6. Based on the water level
forecast and the marginal value of water stored in Williston Lake. established by the SDP
model for each storage level and month over the planning horizon. a marginal cost
forecast is also routinely produced for internal use. It provides decision support for
interruptible sale. purchase and storage transactions. Other applications for the marginal
cost model have been previously described by Druce (1989).
Upper and lower decile values are plotted along with the median values to provide
some indication of the uncertainty associated with the forecasts. This is common practice
with the seasonal water supply forecasts available in British Columbia. Alberta and the
western United States. Moreover. an economic benefit can be expected whenever a
probabilistic forecast is used instead of a categorical forecast for decision making and the
gain tends to increase as the predictability decreases (Alexandridis and Krzysztofowicz.
1985). In the following section. the median and the decile water level forecasts for
Williston Lake are analyzed in an attempt to establish forecast credibility and to learn
where the forecasting methodology can be improved.
The water level forecasts were evaluated for accuracy by comparing the median
forecasted and the observed month-end values over each rolling 24 month period since
April 1989. Forecast statisitics are displayed. as a function of lead time. in Figure 7.
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 71
b&5
,....
:: bbll ,.------, ,
......
~
,
QI
" ,,
,5 b55 ,,
S
H '.
~ &51
...J
lJJ b45
PROBABILITY OF EXCEEDENCE
b41l
1- 1.11 1- 1.51 1--- '.91 1
A M J J A SON D J F M A M J J A SON D J F M
1993 1994 1995
2.51
I-+- BIAS I~ MEAN IERI 1-&- RMSE I
2." 9 ... .B--B-iJ..
)3"- "'0.,
,
,
JJ' 0.. 1)...
~
\
~
'1\
• 2 4 & 8 11
LEAD TIME (months)
12 14 1b 18 21 22 24
From these results, it appears that the forecasting methodology is very slightly
positively biased, but reasonably accurate. It was surprising to find however, that the
greatest errors, on average, have occured in the first 12 months of the forecast period.
72 D. J.DRUCE
greatest errors, on average, have occured in the first 12 months of the forecast period.
The same forecast statistics were calculated for a naive model, i.e. using the average
observed month-end water levels available prior to each forecast year. Those results,
plotted in Figure 8, also reveal greater errors in the fIrst 12 months of the forecast period.
It was therefore concluded that the perverse error pattern was not due to the simulation
modelling, but due to the weather and market conditions that prevailed in the four year
forecasting period. The modelling actually compensated quite well.
l.5'
1-1- BIAS I.... MEAN \ER\ I-a- RMSE I
l.1I B- -IT -9- -&-B
... 1:1 ....
'1),. ....
2.5' B... n
,.... D'iJ....
on '1),. ....
QI
B. .
!; 2.11
QI
. . IT -It
~ ~ ~_&-a-&-B_n
e -II" D,
lJ..
""
I!i 1. 5. '11'11
IX
ffi
1.11
....
1.51
I 2 4 b 8 11 12 14 1& 18 28 22 24
LEAD TIME (months)
The reliability of the upper and lower decile forecasts was tested by calculating how
frequently they contained the observed values. If the forecasting methodology accurately
accounted for uncertainty, then the frequency should be 80 per cent of the time. Once
again, as shown in Figure 9, results are poorer for the forecasts with the shorter lead
times. However, this was no surprise. In fact, work on upgrading the SDP model to
include more uncertainty has been underway for some time. B.C. Hydro has not had
much success in subjectively forecasting export markets and it was decided that the
uncertainty in the export market should be acknowledged explicitly in the modelling.
Also, by adding a stochastic export market to the SDP model it was anticipated that the
reliability of the decile forecasts of Williston Lake levels would improve, for the shorter
lead times.
The upgraded SDP model has two additional state variables, to separately account for
the monthly export quantity and export price. The size of the export market available to
B.C. Hydro is determined, for the most part, by water conditions for the United States
PacifIc Northwest. When the forecasted or observed water supply for the Columbia River
at The Dalles is 111,000 x 106 m3 or less, B.C. Hydro has, on average, had access to a
larger export market. The revised SDP model, therefore, has a state variable for water
conditions in the PacifIc Northwest that can take on one of two values. The export
FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE 73
18111
9111
88~ __~E~XP~E~C~TE~D~FR~E~QU~E~N:CY~__~______________________
,...
N
'"
>-
u
m 18
::J
C!/
I.U
0::
u.. b8
58
11 2 4 b 8 111 12 14 1b 18 21 22 24
LEAD TIME (months)
Figure 9. Frequency that decile forecasts contained observed values.
market quantity varies monthly for each water supply state, but over a year amounts to
7900 GW.h versus 4700 GW.h. For simulation purposes, the future water supply states
are modelled as a lag-one Markov process with the monthly state transition matrices
based on the actual forecasts issued over the period 1970 to 1992.
The expected monthly price for energy in the interruptible export market is forecasted
using an empirical relationship with the spot price of natural gas in the United States, at
Henry Hub, Louisiana. The New York Mercantile Exchange futures market provides the
forecast for the spot price of natural gas at Henry Hub for the first 18 months, then
various other sources of information are used to extend the forecast over the six year
planning horizon. However, the empirical relationship is not particularly strong, with a
R 2 value just over 0.50. The SDP model has therefore been reformulated to consider
deviations from the expected price as a state variable. Three possible deviations are
included, roughly -5, 0 and +5 mills per kilowatt hours from the expected price. The
exact values are calculated from the residuals of the regression equation and are updated
as new data on prices becomes available. They are the mean deviations for three class
intervals chosen to have equal frequencies over the period of record. Again, the export
price deviation states are modelled as a lag-one Markov process with the monthly state
transition matrices calculated from the residual pattern. The alternate resource or import
market has some links with the export market through share-the-profit pricing formulas.
Other sources of uncertainty for the alternate resources have yet to be considered.
For February, March and April of 1993, the original and the upgraded versions of the
SDP and simulation models were operated in parallel. From a comparison of their
respective water level forecasts, it appears that the addition of the stochastic
export/import market will increase the reliability of the decile forecasts, but only for lead
times of four months or more. However, this observation is based on a very small sample
of forecasts and is subject to change as more forecasts are produced.
74 D.J.DRUCE
Another area of concern has been the deterministic forecast of the generation from the
large Columbia River power plants, since that group supplies such a large proportion of
the system generation. Perhaps through more reliable modelling of the operation of these
plants, the accuracy of the Williston Lake level forecasts could improve sufficiently, over
the shorter lead times, that it would not be necessary to add even more complexity to the
SDP model. This hypothesis was investigated by calculating error statistics for the
Columbia River generation forecasts for durations of one to eight months. The error
statistics were computed for three combinations of plants - the main stem plants Mica and
Revelstoke, the tributary plants Kootenay Canal and Seven Mile and all four plants
together. The results for the Mica and Revelstoke plants are shown in Figure 10, in terms
of Williston Lake storage. The monthly generation patterns for the main stem plants and
the tributary plants are negatively correlated. Consequently, the errors for the Mica and
Revelstoke generation forecasts are, in most cases, worse than those for all four plants
combined. It is apparent from a comparison of the error statistics presented in Figures 7
and 10 that much of the error in the Williston Lake level forecasts, for the shorter lead
times, can be attributed to the forecasts of the generation supplied by the Mica and
Revelstoke plants. These results are quite encouraging because they point to greater
effort in modelling those Columbia River plants that B.C. Hydro has more control over
and for which probabilistic seasonal inflow forecasts already exist.
2.51
I-+- BIAS 1-1- MEAN IERI I-e- RMSE I
2."
_-"0,,_----.£1
_-11"---
8. . . -" -- -n-----..IJ----
IX
, .. ,,"
I.II~----~--_,r_--~----_r----~----._--~----_,
I 2 4 b 8
DURATION (months)
CONCLUSIONS
The SDP and simulation models used for the operations planning of Williston Lake
produce forecasts of month-end water levels that are relatively accurate, over a planning
horizon of 24 months. However, upper and lower decile forecasts, based on the
uncertainty of weather effects, are not credible for lead times of less than one year.
Preliminary results from upgraded versions of the SDP and simulation models, that
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 75
include the extra uncertainty of stochastic export/import markets, indicate that the
reliability of the decile forecasts will likely increase only for lead times of four months or
more. The credibility of decile forecasts, with shorter lead times, may improve with
better modelling of the effects of the operation of the Mica and Revelstoke power plants
on the Columbia River.
REFERENCES
Alexandridis, M.J. and Krzysztofowicz, R. (1985) "Decision models for categorical and
probabilistic weather forecasts", Applied Mathematics and Computation 17,241-266.
Druce, D.J. (1984) "Seasonal inflow forecasts by a conceptual hydrologic model for Mica
Dam, British Columbia", in J.J. Cassidy and D.P. Lettenmaier (eds.), A Critical
Assessment of Forecasting in Western Water Resources Management, American Water
Resources Association, Bethesda, Md., pp. 85-91.
Druce, D.J. (1989) "Decision support for short term export sales from a hydroelectric
system", in J.W. Labadie, L.E. Brazil, I. Corbu and L.E. Johnson (eds.), Computerized
Decision Support Systems for Water Managers, American Society of Civil Engineers,
New York, N.Y., pp 490-497.
Druce, D.J. (1990) "Incorporating daily flood control objectives into a monthly stochastic
dynamic programming model for a hydroelectric complex", Water Resources Research
26(1), 5-11.
Yakowitz, S. (1982) "Dynamic programming applications in water resources", Water
Resources Research 19 (4), 673-696.
Yeh, W. W-G., Becker, L., Hua, S-Q., Wen, D-P., and Liu, D-P. (1992) "Optimization of
real-time hydrothermal system operation", Water Resources Planning and Management
Division, ASCE 118(6),636-653.
EVALUATION OF STREAMFLOW FORECASTING MODELS
INTRODUCTION
Being able to forecast future streamflows is of major interest to operators of water
resource systems. Many different approaches have been tried: repeating historic inflow
series, using mean values of historic inflow series, constructing stochastic models based
on the statistical analysis of historic inflow series and developing physically based
conceptual models. Unfortunately, a streamflow forecasting model which is a success
in one application could be a failure in another. Different forecasting models may have
to be selected for different applications. Historically, the evaluation and selection are
often conducted by comparing the forecasted and the observed streamflows through
numeric andlor graphic criteria (WMO 1986). Little consideration is given to the
particular application where the forecast is required. However, an operator could be
more concerned with which model can be used to achieve a better management of the
water resource system rather than how good the forecasted values are in comparison
with the actual future streamflows. In this study, different forecasting models are
evaluated through simulated real-time applications, in addition to using numeric criteria
(WMO 1986), to see which one can best improve the performance of a reservoir system.
In this study, eleven models are created for one-month ahead inflow forecasting. The
forecasted inflow for the coming month and the mean inflows for future months (leading
77
K. W. Hipel et al. (eds.).
Stochastic and Statistical Methods in Hydrology and Environmental Engineering. Vol. 3. 77-85.
© 1994 Kluwer Academic Publishers.
78 T. TAO AND W. C. LENNOX
to infinity) constitute a future inflow sequence. The inflow sequence is then used in
deciding the optimal release policy for the coming month in the simulated real-time
monthly operation of a two-reservoir system. In the following sections, eleven
forecasting models and five numeric evaluation criteria (WMO 1986) are first detailed.
The real-time operation of a reservoir system and its simulation are then described.
Finally, a case study involving the simulated real-time operation of a system of two
reservoirs in parallel under different scenarios is given.
STREAMFLOW FORECASTING
where i (i=I.2 •.. n). m (m=1,2, ... ,12) and yr (yr=I,2, ...• l') represent generating
station, month and year, respectively. n is the number of generating stations and Yis the
length of the series in terms of year. The forecast based on the monthly mean values is.
Q/..m.yr)=Q/..m) yr=Y+ I,Y+2,... (2)
2. Periodic, first order Markov process based on historic series, spatial correlation
assumed.
Streamflows usually repeat the same pattern annually. The periodic, first order Markov
process is constructed by finding a set of parameters for every two consecutive months
instead of estimating one set of parameters for the whole series. The forecast is given
by:
" (3)
Qj(m,yr) =a/..m) + L b,(m) • Qj(m-l,yr) yr=Y+I,Y+2, ...
j-I
The parameters at) and bl) are estimated using least squares estimation technique based
on Y years of data.
3. Periodic, first order Markov process based on historic series. independence
assumed.
The forecast is given by Eq.(3) except that b;fJ equals to zero for i¢j.
4. Second order Markov process based on deseasonalized historic series, spatial
correlation assumed.
The deseasonization is in fact a process of removing the trend from a stochastic series
which is equivalent to deducting the monthly mean values from the historical series. Let
the deseasonalized series be q,[(yr-l) ·12+m], then
EV ALVATION OF STREAMFLOW FORECASTING MODELS 79
The forecast is given by Eq.(S) except that bi) equals to zero for i¢j.
The streamflow series usually follows log-normal distribution. Let the new series, after
taking the natural logarithm of the original historical series, be W,{m, yr). Then
~(m,yr)=ln[Qj(m,yr)] yr=1,2, ... ,Y (6)
The model building and parameter estimating are the same as Model #4 except that
Q,{m, yr) is now replaced by W,(m, yr), However, the noise term is no longer additive
as far as the forecasts are concerned. The final forecast is obtained as:
Q/(m,yr) =exp[l¥,(m,yr)] (7)
The next four forecasting models are based on first order Markov process. They are the
same as Models #4 to #7 except that b;J..t) equals to zero for t=2.
10. First order Markov process based on logarithm-taken and deseasonalized historic
series, spatial correlation assumed.
11. First order Markov process based on logarithm-taken and deseasonalized historic
series, independence assumed.
80 T. TAO AND W. C. LENNOX
Models #2 and #3 have twelve sets of parameters for twelve months. Models #4 to #11
have only one set of parameters for all twelve months. The models are used only for
one-month ahead inflow forecasting. The future inflow sequence used in finding the
optimal release policy are generated through combining the forecasts for the coming
month and the mean values for all other months leading to infinity.
The numeric evaluation criteria (WMO 1986) used here are given below:
b. Ratio of the sum of squares of the monthly residuals to the centred sum of
squares of the monthly observed streamflows:
(9)
where Yo (mean values with bar) and yjrepresent observed and forecasted streamflows,
and N is total number of months involved.
period t; to and ft represent the beginning and the end of a specific operating horizon and
g indicates some physical or non-physical quantity for evaluation.
The performance index for decision-making or real-time operation is derived from the
performance index for evaluation and is expressed as
...
j =L g[S(k),R(k),k] (14)
1-/
where t is for the coming decision period. It should be noted that in this study the
optimization horizon extends to infinity.
where Q(t) is an nx1 uncontrollable inflow vector; L(t) is an nx1 loss vector and Pis
an nxn system matrix. The diagonal elementsfu's of P are always equal to 1. The off-
diagonal elementfv is -1 if the reservoir i receives release from the reservoir j, and 0
otherwise. The uncontrollable inflow is defined as the part of total inflow to a reservoir
after subtraction of releases from upstream reservoirs.
The formulated problem requires the use of nonlinear optimization. Most optimization
techniques search for the optimum of a non-linear system through time consuming
iterative processes. The selected technique may be feasible in real-time operations.
However, the time required for simulating this process many times over may not be
acceptable. The eleven forecasting models are tested indirectly by finding an
approximately equivalent quadratic performance index and using linear quadratic control,
where the analytic solution is available.
If such a quadratic performance index exists, the problem can now be rewritten as,
where S~k) is an nx1 target storage vector; R~k) is an nx1 target release vector; A and
Bare nxn weighing matrices; P is the solution to the Riccati equation Eq.(19) and is
an nxn matrix with all elements constant; and p(k) is the solution to the tracking
equation Eq.(20) and is an nx1 vector. A is symmetric, positive semi-definite and B and
P are symmetric, positive definite.
82 T. TAO AND W. C. LENNOX
The solution process starts with finding the target storages and releases to formulate an
approximately equivalent performance index. The targets are obtained by optimizing the
original performance index with respect to mean monthly values of historic inflows for
several years. The optimization is subject to all system constraints. Since the future
inflows are made of monthly mean values from the second month on and the inflows are
periodic, the targets are periodic as well. No matter what the initial storages are, the
future storages and releases will be the same as the target storages and releases after
some months. It can be observed from Eq. (18) that p(k) can be predetermined as,
p(k)=-PS';"k) (21)
The optimal release policy for the coming month can then be obtained as follows:
R(t) =R';"t) +C{[S';"t+ 1) +FR';"t)] -[S(t)+Q(t) -i{t)]} (22)
where
C=-B-IPT(P-A)=B-1PT[(P-A)FB-lpTp_Pj (23)
If the reservoir releases required in the release policy of Eq.(22) are not within the
release constraints, they are first set to meet the release constraints. In the process of
implementing the release policy, the releases are further adjusted to ensure that the
storage constraints are met even it means the violations of release constraints. That is
consistent with the reality.
The East River watershed is located in the Province of Guangdong, Southern China.
There are two generating stations on the East River Watershed: the Harvest Dam and
the Maple Dam (Figure 1). The Maple Dam is upstream in the East River, and the
Harvest Dam is downstream of the tributary, the Harvest River, which joins the East
River below the Maple Dam. The system parameters are listed in Table 1.
Southern China
Harvest River
arvest Dam
Thirty-four years of monthly historic inflow series are available. The data serves two
purposes. One is for parameter estimation of forecasting models. Another is for
simulating the real-time operation where they are used as future "actual n inflows.
The system performance index used to decide the optimal release policy for the coming
month is
'" 2 2
J=:L
t-t
[:L C -:L Pi(k)]2
i-l
i
i-l
1=1,2, ... ,408 (24)
The system is optimized over seven years or eighty-four months based on monthly mean
values of inflows and subject to constraints given in Eqs.(15) and (27) to obtain the
target storages and releases. February is assumed having 28.25 days.
The results from the numerical evaluation of eleven forecasting models are presented in
Table 2. The values in the brackets in the first row indicate what should be expected
from a perfect forecast. It can be observed that the second model is generally the most
favoured.
Table 2 Numerical evaluation of forecasting models
Table 3 shows the results of simulated system operations. The fIrst eleven represent the
simulated real-time operations using linear quadratic control with forecasted inflows
from eleven different forecasting models for the coming month. The twelfth is the same
as the first eleven except that the 'actual' inflows are substituted for the forecasted
inflows for the coming month. There is an expected improvement over the first eleven.
This indicates the merits of improved inflow forecasting.
Table 3 System performance of simulated real-time operation
Since there is no iteration involved in the linear quadratic control, it takes less than one
minute to complete the computation for the first twelve scenarios on a Compaq 386/33L.
However, it takes much more time to fmd targets.
The last scenario, called the 'Ideal case', serves as a reference and can never be
reached. It is assumed that the future inflows are perfectly known beforehand. The
system goes through a one-shot optimization over all 408 months. It should be noted that
the simulated operation using the forecasting model #6 produces only 3.4% less total
power than the ideal case.
The interesting finding is that the forecasting model #2 which is favoured by the numeric
criteria does not outperform other forecasting models in terms of maximizing the power
generation or minimizing the performance index.
SUMMARY
The results shown above do not warrant any specific conclusion as to which method
should be used in the evaluation and selection of a streamflow forecasting model.
However, the study demonstrates that a streamflow forecasting model can be evaluated
by a method other than the usual numeric and/or graphic criteria. The method is to
recognize the application for which the model is intended and to assess its merits based
on the final results, such as increase of power generation, reduction of flood damage,
etc.
ACKNOWLEDGEMENTS
The writers would like to thank Professor Xi-can Shi of Tsinghua University, Beijing,
China for providing the data used in this study.
REFERENCES
Pindyck, R.S. (1972) HAn application of the linear quadratic tracking problem to
economic stabilization policy. H IEEE Trans. A.uto. Control AC -17(3),287-300.
INTRODUCTION
Floods are one of the most destructive acts of nature. Real time flood forecasting has been
developed for flood protection and warning system recently. Depending upon the response
time of a basin, a mathematical model to be used for real-time may consist of some parts
of the following three basic elements: (1) rainfall forecasting model, (2) rainfall-runoff
forecasting model, and (3) flood routing model (Reed, 1984). Because lots of catchments
have quick response to rainfall input, a rainfall forecasting model is desirable which will
act in unison with a rainfall-runoff model to extend the forecast lead time. Normally, the
rainfall forecasting models are subject to significant forecasting error even if the forecast
lead time is short (Einfalt, 1991). An alternative method used to extend the forecast lead
time is discussed in this paper.
MATHEMATICAL MODEL
A wide range of rainfall-runoff forecasting models have been developed recently including:
(1) unit hydrograph and other methods using the S-curve, the discrete linear cascade
87
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 87-97.
© 1994 Kluwer Academic Publishers.
88 P.-s. YU ET AL.
reservoir model (Chander and Shanker, 1984; Bolinski and Mierkiewing, 1986; Corrodini
and Melone, 1987; Corrodini et al., 1986), (2) conceptual models, (3) non-linear storage
models, and (4) transfer function models. O'Connel and Clark (1979) and Reed (1984)
have reviewed some of these models. In this paper, the transfer function model is used to
simulate the process of rainfall-runoff and storage-runoff for flood forecasting separately.
Powell (1985), Owens (1986), and Cluckie and Owens (1987) have demonstrated that
rainfall-runoff process can be satisfactorily simulated by the transfer function model,
Q(t) = ajQ(t -1) +a2Q(t -2) +"+apQ(t -p) +bjR(t -1) +b2R(t -2) +
(1)
... +bqR(t -q) +e(t),
where
When (1) is applied for runoff forecasting, a rainfall forecasting model is required to
forecast the rainfall in the future. The model to forecast rainfall in this paper is:
(2)
In the conceptual rainfall-runoff models according the the storage approach, the runoff at
the outlet of catchment is commonly simulated as a function of storage, S, (for example: Q
= KS). Based on this hydrologic phenomena, a storage-runoff forecasting model is
developed, in which the runoff at the present time is assumed to be a function of previous
runoff and catchment storage. The major difference from the rainfall-runoff forecasting
model as described in the previous section is that the storage replaced the rainfall as model
input. The forecasting models include storage-runoff forecasting model,
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 89
Q(t) = ~Q(t -1) +~Q(t -2) +"+apQ(t -p) +bIS(t -1) +b2 S(t -2) +
(3)
... +bqS(t -q) +e(t)
(4)
As the storage over the catchment area cannot be directly measured, the storage at present
time is computed from the mass balance between the rainfall input and the discharge
output,
PARAMETER ESTIMATION
O(t) = ap(t -1) +~O(t -2) + .. +apO(t -p) +b/(t -1) +bi(t -2) +
(6)
... +b/(t"-q) +e(t),
in which the O(t),O(t -1), ... ,OCt -p) are system outputs (i.e. discharge). 1(t), 1(t -
1), ... , I(t - q) are system inputs, which is the rainfall in the rainfall-runoff forecasting
model or the storage in the storage-runoff forecasting model; p and q are model orders
and ai' ... ,ap , bl' ···,bq are model parameters, which are calibrated by using historical
input and output data. If the historical data have N observations and m is the larger of p
and q, (6) can be written as:
[om+'
~m+2 ] = [am
ON
~m+l
0N-l
Om_I
Om
°N_2
°m-p+1
°m-p+2
°N_P
1m
1m +1
IN- 1
1m- I
1m
I N _2
..
m,+,]
I~m-q+2 X
I N _q
nr']
~
"
'
bq
.
~m~
eN
(7)
i.e.,
90 P.-s. YU ET AL.
Based on the minimum square error between the observed and simulated output data, the
optimum parameters are determined according to the following equation:
(9)
The model order can be decided based on the minimum value of SBC (Schwartz's
Bayesian Criterion) (Schwartz, 1978),
a! is the variance of the residuals (the difference between observed discharge, OCt), and
simulated discharge, OCt),
~2
(f = (0 -0)' (0 -0)
-'--_---C---'-_-'- (11)
e N _(p + q)
CASE STUDY
Sixteen storm events over the Fei-Tsui reservoir catchment in northern Taiwan were
collected for a case study and eight of the storm events are used for calibration to
determine the model orders and parameters. The other eight storm events are used to
verify the model performance based on criteria of eight kinds (Habaieb et aI., 1991; Wang
et aI., 1991; Abraham and Lendalter, 1983), which is divided into two groups, statistical
and hydrologic indexes.
Statistical index
1
L
N
MAD = - IQ(t) -Q(t)l (12)
N t=)
where
N = No. of observation,
Q(t) = observed flow,
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 91
(13)
RTIC =
N[Q(t) -
[I~ Q(tW
INI~ [Q(tW ]112 (14)
where
(17)
where
92 P.-s. YU ET AL.
(18)
where
EV = t~ [Q(t) -Q(t)] t~
N / N
Q(t) (19)
RESULTS
Based on the criteria of SBC in (10), the optimal models calibrated by using eight storm
events are
Both rainfall-runoff forecasting model and storage-runoff forecasting model are applied to
eight storm events to verifY the model performance. One to six hours ahead forecast
hydrographs are compared with the observed hydrographs as shown in Figure 1 and
Figure 2, in which only two of eight storm events are shown. It is found that the forecast
hydrographs have a time problem in both models. The r-hour ahead forecast hydrograph is
shifted by nearly r-hours to compare with the observed hydrograph, therefore a backward
shift operator is applied to correct the time problem,
(24)
where r is the lead time. Figure 3 and Figure 4 are the forecast hydrographs corrected by
Eq(24). It seems that the time problem in forecast hydrographs is significantly reduced.
Figure 5 presents the comparison of model performance based on eight kinds of criteria.
From the analytical results of eight storm events one may conclude that the storage-runoff
forecasting model has better performance than rainfall-runoffforecasting model.
CONCLUSIONS
Rainfall-runoff forecasting model and storage-runoff forecasting model are developed and
compared by using 16 storm events over Fei-Tsui reservoir catchment in northern Taiwan ..
It is found that the forecast hydrographs have a time problem in both models. After both
models are corrected by a backward shift operator, based on the comparison between the
forecast hydrographs (one to six hours ahead) and eight kinds of criteria, the storage-
runoff forecasting model has better model performance than the rainfall-runoff forecasting
model. More researches are required to confirm this conclusion .
REFERENCES
Abraham, B. and Ledolter, J. (1983) Statistical Methods for Forecasting, John Wiley &
Sons, Inc., New York.
Bobinski, E. and Mierkiewicz, M. (1986) ''Recent developments in simple adaptive flow
forecasting models in Poland", Hyd. Sci. J. 31,263-270.
Chander, S. and Shanker, H. (1984) "Unit hydrograph based forccast model", Hyd. Sci. J.
31, 287-320.
Cluckie, I. D. and Owens, M. D. (1987) ''Real-time Rainfall Runoff Models and Use of
Weather Radar Information", Chap. 12, Weather Radar and Flood Forecasting, ed. by V.
K. Collinge and C. Kirby, Wiley.
94 P.-s. YU ET AL.
Corradini, C., Melone, F. and Uvertini (1986) "A semi-distributed model for real-time
flood forecasting", Water Resource Bulletin 22,6, 1031-1038.
Corradini, C. and Melone, F. (1987) "On the structure of a semi-distributed adaptive
model for flood forecasting", Hyd. Sci. J. 32,2,227-242.
Einfalt, T. (1991) "Inaccurate rainfall forecasts: hydrologically valuable or useless?", New
Technologies in Urban Drainage UDT '91, ed. by C. Maksimovic, Elsevier Applied
Science, London.
Habai'eb, H., Troch, P. A. and De Troch, F. P. (1991) "A coupled rainfall-runoff and
runoff-routing model for adaptive real-time flood forecasting", Water Resources Manage-
ment 5, 47-6l.
O'Connel, P. E. and Clark, R. T. (1981) "Adaptive hydrological forecasting - a review",
Hyd. Sci. Bulletin 26,2, 179-205.
Owens, M. (1986) Real-Time Flood Forccasting Using Weather Radar Data, Ph. D.
Thesis, University of Birmingham, Department of Civil Engineering, u.K.
Powell, S. M. (1985) River Basin Models for Operation Forecasting of Flow in Real-
Time, Ph. D. Thesis, University of Birmingham, Department of Civil Engineering, UK
Reed, D. W. (1984) A Review of British Flood Forecasting Practice, Institute of
Hydrology, Report No 90.
Schwartz, G. (1978) "Estimating the dimension of a model", Annuals of Statistics 6,461-
464.
Wang, R. Y. and Tan, C. H. (1991) "Study on the modified tank model and its application
to the runoff prediction of a river basin", Taiwan Water Conservancy 39, 3, 1-23. (in
Chinese)
Wei, W. W. S. (1990) Time Series Analysis: Univariate and Multivariate Methods,
Addison-Wesley Publishing Company, Inc., New York.
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 95
.... ,.
•• _.' 3hrF .
••• - - 2hrF.
...
8DII - - - - - bF. '100 · - - · · ..tlrF.
.• -., StirF.
.. __
Rainfall . .... hF.
• _0 • • IhrF .
"'"
.
••• n ..
...
--Reo!
- - - - - fhrF. 1. . .
__ 0 b r.
••
...
• •••• 3hrF. .11
-
- _. _. 4hrF.
2"
••
Figure I. Using rainfall-runoff forecasting model to estimate one to SIX hours ahead
forecasted hrdrograph.
Storm Event
7916
.
'200
--0-,
'000
ODD
ODD
.
. .0
Figure 2. Using Storage-runoff forecasting model to estimate one to SIX hours ahead
forecasted hrdrograph.
96 P.-s. YU ET AL.
100
- - - - - 2hrF.
- •. - - 3hrF.
- - - - - .thrF.
.... • •• _. atlrF.
- . ' . - 'hrF•
••• -. ItlrF,
Rainfall 400 .000
• _ ••. 'flrF.
zoo ...
... ...
•1
,
- - R...
. _. -''''F. "'"
...
~
3GO
Rainfall-
.. ...
200 lOG
Runoff ,
...
Figure 3. Using corrected rainfall-runoff forecasting model to estimate one to six hours
ahead forecasted hrdrograph.
----....
Model 7021 7916
Storage
-
,,.-
.
lit, •.
,-
·····."r •.
-.. -- ...
,ao•
--....
Storage- .. ...
Runoff
'" 2ao
•
.
,
Figure 4. Using corrected storage-runoff forecasting model to estimate one to six hours
ahead forecasted hrdrograph.
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 97
"
MAD "
"
MSE
RTIC
CC
CE
:~I1JjIiIiIJ'
.,
EV :~~.n-"'Ii'l
"j
-0.1
-0.115
r
. .
:: .Jl.Pi'fii
''
.0.15
.0.26
Figure 5. Comparsion of model performance based on eight kind of criteria with corrected
rainfall-runoff and storage-runoff forecasting model.
SEEKING USER INPUT IN INFLOW FORECASTING
Traditionally, inflow forecasting models were used like black boxes. Users prepared the
inputs at one end and received the outputs at the other end. In such an environment, the
inflow forecasting models were aimed at generating inflow forecasts without any user
intervention. This paper describes anew, user friendly, approach to inflow forecasting.
It allows users to input their preferences, interactively, during the execution of the
model and to generate inflow forecasts with which they feel comfortable.
INTRODUCTION
One of the requirements for integrating the water management and operation of Ontario
Hydro's hydroelectric facilities into its Grid is to generate daily inflow forecasts for up
to 732 days for those facilities and their control reservoirs. It is recognized that inflow
forecasts with longer term could hardly be indicative of what will occur due to the
random process upon which they are based. In order to assess the risk associated with
the use of inflow forecasts, it is necessary to have a variety of equally likely inflow
forecasts. The model described in this paper provides such a capability, allowing users
to generate inflow forecasts with any desired exceedance probabilities of volume.
Each generated inflow series has three parts. The first part comprises inflows for the
first four days. They are provided interactively by a user-selected inflow forecasting
model. The second part includes inflows from day 5 to day N-l. These are heuristically
modified historic inflows which precede those in the third part. They ensure a smooth
transition from the first part to the third part. The third part contains inflows from day
N, which ranges from 15 to 60, to day 732. They are part of historic inflow records.
The process of generating such local inflow series is accomplished through three
modules. They are called expected forecast, heuristic forecast and probabilistic forecast
respectively. Users play an active role in deciding what will be produced in each module
(Tao 1993). The paper describes how such user interfaces are achieved.
EXPECTED FORECAST
The expected forecast generates the inflows for the first four days of the two-year time
99
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 99-103.
© 1994 Kluwer Academic Publishers.
100 T. TAO ET AL.
horizon. Preliminary comparative studies have been conducted to retain the best of
several conceptual and stochastic models (Tao and Lai, 1991) for the hydrologic
conditions in Ontario, Canada. Three stochastic models have been selected as a result
of these studies. They are ARMAX(3,O,2), ARIMA(I,I,I) and ARIMA(I,2,2). Each
of these models generates a different four-day forecast. The exogenous input in
ARMAX(3,O,2) reflects the contribution of the precipitation of the previous and current
days to the forecasted inflows. Natural logarithms of the inflows are used in formulating
these three models. This ensures that the forecasted inflows will always take non-
negative values.
The forecasts are performed one watershed at a time. For each run of the model, users
are required to provide the past three days of observed local inflows, the temperature
and precipitation for the previous day, and forecasted temperature and precipitation for
the next four days at the indicated site. The temperature and precipitation data represent
maximum daily temperature and total daily precipitation respectively (see the upper half
of Figure 1). The maximum daily temperature is used to decide whether or not the
precipitation should be considered in ARMAX(3,O,2). If the maximum temperature is
equal to or lower than zero degree celsius, the precipitation is assumed not to contribute
to the runoff. Once all required inputs have been provided, three different four-day
forecasts are instantly displayed on the screen (see the lower half of Figure 1). Users
can select one of them or issue their own forecast based on experience with the river
system, knowledge of the meteorological forecast and the output of the three stochastic
forecasting models.
u:mmame
"i!ll, :::::m::===
!o:::s..::a:=
IialiHII!mI!!!
rue-::mmm:i ="==:::-
..:::::: 24.88 22.77 20.67 18.91
,UMA(1.1.1 ) i5k=a5i;~ ms:r:.==:m I~£i 26.51 26.77 26.91 26.98
JiUMA(1.2.2) =~ ~ I!ll5SI!!lI5! §::sm:ii 27.05 28.14 29.27 30.45
Note: Inflow data refer to the station: Mississagi River at Rocky Island Lake
Precip. and teq:»er. data refer to the station: Mississagi OH
IE· INPUT ISER'S FORECASTS
HEURISTIC FORECAST
The heuristic forecast provides a smooth transition between the single four-day forecast
retained by the user, which ends on day 4, and the multiple series of daily historic
SEEKING USER INPUT IN INFLOW FORECASTING 101
inflows, which start from day N. Basically, it eliminates the discontinuity that occurs on
day 4 between the expected forecast and the historical series of inflows (Figure 2). The
module extends the inflow forecast from the fourth day to the Mh day. From the Mh
day on, the forecast is represented by series of historic inflows. The value of N is
selected by the user and must be between 15 and 60. When selecting the value of N,
the user places judgement on the current state of the watershed and on how far apart this
is from its historical average at the time of the forecast. The rule for a smooth transition
from day 4 to day N is to reduce the differences identified on day 4 between the
expected forecast and each historic series at a constant rate continuously until such
difference on day N-l is only 1 % of the difference on the fourth day. The reduction rate
"r" can be determined by solving equation (1):
rN-s=.OI (1)
where QE 4 and QA 4 are the expected inflow and the actual historic inflow on the fourth
day respeCtively. Figure 2 shows an example where N=60.
90~---------------------------------------------.
LEGEND
80
A EXPECTED FORECAST
70
-- HISTORIC RECORDS
Ci) 60 ........ HEURISTIC FORECAST
E
~ 50
~
u:: 40
30 Discontinuity
201-'k------"'-
"'"
""
10 ...., ...........,..
OLL--------------L-~------------L-------------~
04 50 60 100 150
Time (day)
The heuristic forecast also takes care of the occurrence of peak inflow during the spring
freshet. It is assumed that the peak inflow caused by snowmelt during the spring freshet
can only happen once a year. Users are asked to indicate if the peak inflow for the
102 T. TAO ET AL.
current spring freshet has passed. If it is the case, the heuristic forecast will phase out
those peak inflows of historic series which lag behind. The process is demonstrated in
Figure 3 for one of the historic series. It first shifts the peak inflow of the historic series
to day one. The daily inflows after the peak are continuously moved to day 3, day 5,
and so on. This process is repeated nd times, where ~ represents the number of days
between day one and the day when the peak of historical inflow series occurred (see
Figure 3). The inflows on even days are set to be the averages of two neighbouring
inflows on odd days. The shifted inflows are then modified using equations (1) and (2)
to achieve a smooth transition from the fourth day to the Mh day.
~.--------------.----------------------------~
Actual inflows Forecasted inflows
nd I
200 LEGEND
HISTORIC RECORDS
-+- SHIFTED RECORDS
"ii) 150 HEURISTIC FORECAST
.[ A EXPECTED FORECAST
~
u:: 100
50
OL--------------LL-----------~~~-------- __~
-50 04 50 60 100
Time (day)
PROBABll.JSTIC FORECAST
The probabilistic forecast sorts out representative series based on user-specified
exceedance probabilities of volume. The user has a choice of defining two such
probabilities. The first one is based on the biannual volume. The second one is based
on the volume corresponding to a user-specified time period, which ranges from 45 days
to 366 days. The process starts with finding two sets of volumes: one set covers the
inflow volumes of each series from day one until the end of a user-specified time period.
Another set covers the total volume of all inflows of each series. The volumes in each
set are then ranked in descending order and exceedance probabilities are calculated.
Finally, each inflow series is associated with its exceedance probability. Figure 4
presents four forecasted local inflow series with representative exceedance probabilities
based on annual volumes, corresponding to a user-specified time period of 366 days, at
SEEKING USER INPUT IN INFLOW FORECASTING 103
Rocky Island Lake of the Mississagi River. Forty years of inflow data were used.
260
........... 5% ------ 50% - 75% - - 95%
240
60
220 First 60 days
200 50
180
40
_1S0
If)
,[ 140 30
~ 120
20
u:: 100
80 10
SO
40
20
SUMMARIES
The inflow forecasting approach introduced in this paper provides users with more than
one optional scenario at each step, allowing them to make decisions interactively during
the forecasting process. Every decision made by users has an effect on the final forecast.
The major assumption of the new approach is that users are experienced practitioners.
The new approach is designed to enhance their capability of making an acceptable
forecast, and not to relieve them of doing the forecast. Figure 4 can, in fact, be viewed
on screen at the end of the forecast. Users can go back and try with different inputs until
they are comfortable with the inflow forecast they generate.
REFERENCES
Tao, T. and L. Lai (1991) Design Specifications of Short Term Inflow Forecaster,
Technical Report, Ontario Hydro.
Tao, T. (1993) Short Term Inflow Forecaster: User's Manual, Technical Report, Ontario
Hydro.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY
Linear Procedures for Time Series Modeling, recently introduced in the Brazilian
Electrical Sector by ELETROBRAS through the Coordinating Group for the
Interconnected Operation of the Power System - GeOI, are presented for the following
model sub-classes: Univariate Autoregressive Moving Average (ARMA), ARMA
Exogenous or Transfer Function (ARMAX or TF), Seemingly Unrelated or
Contemporaneous ARMA (SURARMA or CARMA), Multivariate or Vectorial ARMA
(MARMA or VARMA) and MARMA Exogenous (MARMAX). The methodology and
the algorithms here proposed had, as a cornerstone, the works of Professor Hannan,
developed alone or with other researchers after 1980 and is a real application of inflows
forecasting, which takes a very important place in the Brazilian Electrical Operation
Planning.
INTRODUCTION
A great number of univariate and multivariate time series models have been recently
proposed in hydrology, and they can be classified according to the dependency and
relationship previously mentioned.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 107
where
i) cl>p(B) and 9 q (B) are respectively the autorregressive and moving average
polynomial matrices in the backward shift operator B. It is assumed that all the
roots of the determinantal equations IcI>p (B)I= 0 and 19 q (B)I= 0 are outside the
unit circle.
ii) Zt ia a suitable transformation of the time series Wt. In our applications we first use
the Box-Cox transformation and them standartized by the monthly means and
standartized by the monthly means and standard deviations, since both were
periodic.
As concerns the complete muItivariated model expressed by (4), the following
remarks can be made:
(i) during the model building process the degrees of the operators cl>ij(B) and 9ij(B)
can be adjusted so that the ARMA models for the individual time series accurately
describe the behavior of each series;
(ii) the SURARMA model is the result of cl>ij(B) and 9ij(B) coefficients being null for i :#
j, that is, the parameters matrices are diagonal;
(iii) the ARMAX model is obtained when the coefficients cl>ij(B) and 9ij(B) are null for
i < j or, in other words, when the parameters matrices are triangular;
(iv) the MARMAX is obtained when is deleted one or more rows from the
autoregressive matrix, cl>p(B), and one or more columns of the moving average
parameters matrix, 9q(B).
Basically, the procedures for the linear algorithms consist in linearizing the estimates
of the innovations ina , Po.
-t
(i = 1, ... , k), where
1
Po.
is an initial estimate of the
1
parameter vector, (i = 1, ... , k), obtained previously in the Initial Identification and
~i
Estimation Stage. Thus, the ~i estimator expression can be written as
108 P. R. H. SALES ET AL.
(2)
a
where i is the partial derivative of
-t
a
-t
with respect of ~. .
1
(3)
(4)
If the model does not have moving average terms, a is linear in J3 and the solution
-t -
is simply given by
(5)
Otherwise, a new parameters vector given by (5) is used in place of J3 and the
-0
linearization process is repeated until final convergence.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 109
APPLICATIONS
In order to get the forecasts, the proposed algorithms were used in eight series of natural
monthly average flows rates of the reservoirs of Furnas, on the Grande River, ltumbiara on
the Paranaiba River, Dba Solteira on the Parana River, Barra Bonita on the Tiete River,
Jurumirim on the Paranapanema River, Tres Marias and Sobradinho on the Sao Francisco
River as well as the incremental series of Sobradinho. Each one of the hydrological time
series analyzed has 648 observations. The data cover the period from January 1931 to
December 1984 and were obtained from Centrais Eletricas Brasileiras SA -
ELETROBRAS, Brazil- see Figure 1.
Application of the Box-Cox transfonnation, Table 1 shown that all of them were of a
natural logarithmic type.
TABLE 1. Box & Cox transformation selected to the monthly natural average
flow rates of site developments of Furnas, Itumbiara,
Dba Solteira, Tres Marias, Sobradinho and the Intermediate Basin
SERIES Al A2
FURNAS 0 -179
ITUMBIARA 0 -179
I.SOLTEIRA 0 -1238
B.BONITA 0 -35
JURUMIRIM 0 -45
T.MARIAS 0 -90
SOBRADINHO 0 -529
INTERMEDIATE BASIN 0 -468
From the initial estimates of the parameters of the ARMA (1,1) model identified in the
previous stages. ~1 = 0.8426, (h = -0.2278 and &;= 0.4351 we moved to final
stage of the proposed algorithm. The final estimates shown on Table 2 were obtained
after ten iterations with accuracy of I x 10-4.
0';
TABLE 2. Final estimates of the parameters cl>1' 01 and of the ARMA(1,I)
model fitted to the transformed series of Fumas site development
Forecasts for one year ahead with two standard error intervals are shown in Figure 2.
4
3,5
3
2,5 • 1985
-!! 2 •+ + forecast
-
"'s +
"b 1,5 +
• • observed
-
+
---
• •
+
0,5 ~ • • • • •
0
J FMAMJJA SON D
Figure 2. Forecasts for 1985 with two standard error. ARMA model- Furnas.
1, 0;
TABLE 3.Final estimates of the cl> d d2, fl and parameters of the
ARMAX model fitted to Tres Marias, Sobradinho and
Intermediate Basin series
Ex-ante forecasts for one year ahead with two standard error intervals are shown in
Figure 3.
14
12
,
10 1985
•
8
~
+ + forecast
-
"'a 6
"b 4 • observed
L; i
2
0
._.-t....!--
J F M A M J J A S 0 N D
Figure 3. Ex-ante forecasts for 1985 with two standard error. ARMAX model-
Sobradinho, input Tres Marias.
SURARMA (p,q) model to Dha Solteira, Barra Bonita and Jurumirim series
The algorithm considered, in iterative way, the estimates of the n matrix in order to
obtain vector 13 estimates and the corresponding standard errors.
Table 4 resumes the results of the convergence of the algorithm after four iterations
with accuracy of 1 x 10-4.
Forecasts of one year ahead for the three series with two standard error intervals are
shown in Figure 4.
25
20 1985
... 15
........
-a + forecast
"b1O • observed
5
° J FMAMJJA SON D
Figure 4a. Forecasts for 1985 with two standard error. SURARMA model- TIha
Solteira.
1,4
1,2 1985
1
+ forecast
..
...~ 0,8
"S 0,6 • observed
0,4
0,2 • • • •• • • i +
•
+
° J F M A M J J A S 0 N D
Figure 4b. Forecasts for 1985 with two standard error. SURARMA model- B. Bonita.
0,8
0,6
1985
0,4
•
• • • •• i
:.!'!:!
"'s + + + forecast
"S 0,2
0
• • + + + +
•••
• observed
J F M A M J J A S 0 N D
Figure 4c. Forecasts for 1985 with two standard error. SURARMA model- Jurumirim.
114 P. R. H. SALES ET AL.
Using the residuals of the three series which were obtained previously, several
multivariate MARMA (p,q) models were estimated. With the BIC (p,q) criterion the
MARMA (1,1) model was identified.
A careful analysis of the results permitted the maximal use of the proposed algorithm
in Furnas, Itumbiara and Tres Marias series.
First, the iterative process of the final stage considered the complete multivariate
model, that is, with no restriction imposed on its parameters. The final estimates were
obtained after five iterations with accuracy of 1 x 10-4.
After this, restrictions were imposed to the parameters of the MARMA (1,1) model. In
other words, the hypothesis that not all parameters of the model differed significantly from
zero was considered consistent. In fact, the SURARMA model seems suitable here, but for
illustration of the MARMA algorithm we deleted only parameter within one standard error.
The final estimates of the parameters of the restricted MARMA (1,1) model were
obtained after four iterations in the final stage of the proposed algorithm with 1 x 10-4
accuracy. Table 5 summarizes the principal results of the final convergence process.
Standard errors of the estimates are shown in parenthesis.
Forecasts for the three series for one year ahead with two standard error intervals are
shown in Figure 5.
Figure Sa. Forecasts for 1985 with two standard error. MARMA model- Furnas.
5
4
1985
3
• ••
-!!
+ + forecast
-
"'s 2 +
"b • observed
1
0
.............
• • ~
J F M A M J J A S 0 N D
Figure 5b. Forecasts for 1985 with two standard error. MARMA model- ltumbiara.
5
4
1985
3
-!! + forecast
-
"'s 2
-•
"b • observed
1 +
0
J F M A M J J A S 0 N D
Figure 5c. Forecasts for 1985 with two standard error. MARMA model- Tres Marias.
116 P. R. H. SALES ET AL.
FINAL COMMENTS
Some general comments can now be made.
i) Since variances of dry periods are smaller than wet periods in all graphs of 12 steps
ahead forecasts, this becames apparent whem we transform back to the original
variables.
ii) Since SURARMA models, from the physical point is most sensible in most
applications we obtained smaller standart errors for parameters and residual
variances than with ARMA models (eq. Table 2 and 5 for Furnas). Whenever
ARMAX where conveniant the residual variance where smaller than for ARMA
models
iii) Currently, at ELETROBRAS, forecast comparison are being made among the
automatic methodology of this paper with the Box-Jenkins methodology. The
results seems promising for the automatic methodology.
iv) The computer time on an ffiM 4381 R14 were respectively 3.06 sec. for the
ARMA, 10.52 sec. for the ARMAX, (4.67+3x3.06) sec. for the SURARMA and
19.78 sec. for the MARMA application. Currently we are working on a
microcomputer version with more efficient numerical algorithms.
v) Theoretical properties of the identification and estimation procedures given in this
paper are presented in Hannan and Deistler (1988) and references therein.
Simulation results and applications on this and related work are given in Newbold
and Hotopp (1986), Hannan and McDougall (1988), Poskitt (1989) and Koreisha
and Pukkila (1989, 1990 a,b) and Pukkila et al (1990).
ACKNOWLEDGMENT
The authors are grateful to the late Professor E. J. Hannan for his encouragement and
for making available many of his, at the time, unpublished papers and an anonimous
referee for his usefull suggestions.
REFERENCES
Hannan, EJ. and McDougall, AJ. (1988) "Regression procedures for ARMA
estimation", Journal of the American Statistical Association, Theory and Methods, 83,
490-498.
Hannan, E.J. and Rissanen,l. (1983) "Recursive estimation of mixed autoregressive-
moving average order", Biometrika, 69, 81-94. Correction, Biometrika, 70, 303.
Hannan, S. and Deistler, M. (1988) The Statistical Theory of Linear Systems, John
Wiley & Sons, New York.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 117
Koreisha, S. and Pukkila, T. (1989) "Fast linear estimation methods for vector
autoregressive moving-average models", J. of Time Series An. 10,325-329.
Koreisha, S. and Pukkila, T. (1990 a) "Linear methods for estimating ARMA and
regression models with serial correlation", Comun. Statist. - Simula, 19, 71-102.
Koreisha, S. and Pukkila, T. (1990 b) "A generalized least-squares approach for
estimation of autoregressive moving-average models", J. of Time Series, An. 11, 139-
151.
Newbold, P. and Hotopp, S.M. (1986) "Testing causality using efficiently parametrized
Vector ARMA models", Applied Mathematics and Computation, 20, 329-348.
Poskitt, D.S. (1989) "A method for the estimation and identification of transfer function
models", J. Royal Statist. Soc. B, 51,29-46.
Pukkila, T., Koreisha, S., Kallinen, A (1990) "The identification of ARMA models",
Biometrika, 73, 537-548.
Salas, J.D., Delleur, J.W., Yevjevich, V. and Lane, W.L. (1980) Applied Modeling of
Hydrologic Time Series, Water Resources Publication.
Sales, P.R.H. (1989) "Linear procedures for identification and parameters estimation of
models for uni and multivariate time series", D. Sc. Thesis, COPPE / UFRJ (In
Portuguese) .
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1986) "Inflows forecasting in the
operation planning of the Brazilian Hydroelectric System", Annals of the II Lusitanian-
Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal, 217-226
(In Portuguese).
Sales P.R.H., Pereira, B. de B. and Vieira, AM. (1987) "Linear procedures for
identification and estimation of ARMA models for hydrological time series", Annals of
the VII Brazilian Symposium of Hydrology and Water Resources, Salvador, Bahia,
605-615 (In Portuguese).
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 a) "A linear procedure for
identification of transfer function models for hydrological time series", Annals of the IV
Luzitanian-Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal,
321-336 (In Portuguese).
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 b) "A linear procedure for
identification and estimation of SURARMA models applied to multivariate hydrological
time series", Annals of the IV Luzitanian-Brazilian Symposium of Hydraulics and
Water Resources, Lisbon, Portugal, 283-248 (In Portuguese).
Terry, L.A, Pereira, M.V.F., Araripe Neto, T.A, Silva, L.F.C.A and Sales, P.R.H.
(1986) "Coordinating the energy generation of the Brazilian national hydrothermal
electrical generating system", Interfaces, 16, 16-38.
PART III
ENTROPY
APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS
IN HYDRAULICS
CHAO-LIN CHIU
Department of Civil Engineering
University of Pittsburgh
Pittsburgh, PA 15261
USA
(1)
(E a
N
p(u) =exp i +1 u i ) (2)
~=o
flmaxp{u) du =1 (4)
(Umaxup(u)du =
Jo
u= Q
A
(5)
(6)
and
(7)
(9 )
(10)
u (11)
Z1max
(12)
(13)
(15)
0.9
0.8
0.7
0.6
cr:
-
0;: 0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8
ulu.u
(a)
I
0.9
0.8
0.7
0.6
l;0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 O.S
ulu_
(b)
becomes
(dU)
dr r=R
=_ 2L1max (eM-l)
MR
(18)
h =32 (eM-l) ( U
f M L1max
)-l( DU)-l(~)
v v
L Tf
D 2g
(19)
(20)
in which
(21)
becomes
(22)
.........
0.1
f .......
t ' .......
0.01
0.001
o 2
" 6
M
& 10 12 14
1.0
0 0 - (Ni1:undse 19)2)
Nit .. 10S.000
0.& Eq. 3 (M .. 6.sS)
"Univasal" Law
0.6
~
- 0.4
0.2
~
.:.
0 Cd
0 0.1
Wu
......
0.2 0.3
0
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
uIU
IOC IOC .
101
101
." IC)1
dr 10
C IC)1
~
"0
••.••• v~ cradical I
by "'UIUYCnIl-!&W 0.99
'"<-
(!)
10
- - Vcloc:ity padicIIt
~ byEq.3
U
0
Gi
> lett
letZ
OPEN-CHANNEL FLOW
ln which
(25)
(26)
~ = ----L-
D-h
exp(l - ----L-)
D-h
(27)
'-:'-..----------i-%•
Channel bed
b .'0· 0.388
CD
C!) Kcasurecl Dau. Cirau , \:aG. 1987)
""eo
c- ....d t
"" ....d It
!bld Itt
!blels
t It lIt
-z:
~: -6.97 -2 -5.89 -l.
6.32xlO -S.06xlO
&l: - - 7.lwn-l.
-S.4l· -2
-7.18x10_ l
2.93x10_ s
&,: - - -l ••SuO
~
<-
.",
"" ,
....
N
IA,----------------------vnnnTTI~--~--~--~/1
0.1
0.
REFERENCES
With respect to design of water quality monitoring networks, the entropy principle
can be effectively used to develop design criteria on the basis of quantitatively
expressed information expectations and information availability. Investigations on
the application of the entropy method in monitoring network design have revealed
promising results, particularly in the selection of technical design features such as
monitoring sites, time frequencies, variables to be sampled, and sampling duration.
Yet, there are still certain problems that need to be overcome so that the method
can gain wide acceptance among practitioners. The presented study discusses the
advantages as well as the limitations of the entropy method as applied to the design
of water quality monitoring networks.
INTRODUCTION
Despite all the efforts and investment made on monitoring of water quality, the
current status of existing networks shows that the accruing benefits are low (Sanders
et aI., 1983). That is, most monitoring practices do not fulfill what is expected of
monitoring. Thus, the issue still remains controversial among practitioners and
researchers for a number of reasons. First, there are difficulties in the selection of
temporal and spatial sampling frequencies, the variables to be monitored, and the
sampling duration. Second, benefits of monitoring cannot be defined in quantitative
terms for reliable benefit/cost analyses. There are no definite criteria yet
established to solve these two problems. The entropy principle can be effectively
used to develop such criteria on the basis of quantitatively expressed information
expectations and information availability. This approach is justified in the sense that
a monitoring network is basically an information system. In fact, investigations on
135
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 135-148.
© 1994 Kluwer Academic Publishers.
136 N. B. HARMANCIOGLU ET AL.
application of the entropy principle in water quality monitoring network design have
revealed promising results, particularly in the selection of technical design features
such as monitoring sites, time frequencies, variables to be sampled, and sampling
duration.
There are still certain difficulties that need to be overcome so that the method
can gain wide acceptance among practitioners. Some of these difficulties stem from
the mathematical structure of the concept. For example, entropy, as a measure of
the uncertainty of random processes, has not yet been precisely defined for
continuous variables. The derivation of mathematical expressions for multivariate
distributions other than normal and lognormal are highly complicated. Other
difficulties encountered in the application of the method are those that are valid for
any other statistical procedure. As such, the entropy principle requires sufficient
data on the processes monitored to produce sound results. However, it is uncertain,
particularly in case of messy water quality data, to determine when a data record can
be considered sufficient. Another difficulty occurs in assessing monitoring
frequencies higher than the already selected ones.
The presented study addresses the above difficulties as well as the merits of
using the entropy principle in the design of water quality monitoring networks. The
discussions are supported by case studies relating to specific design problems such
as the selection of monitoring sites, sampling frequencies, and variables.
In recent years, the adequacy of collected water quality data and the performance
of existing monitoring networks have been seriously evaluated for two basic reasons.
First, an efficient information system is required to satisfy the needs of water quality
management plans. Second, this system has to be realized under the constraints of
limited financial resources, sampling and analysis facilities, and manpower. Problems
observed in available data and shortcomings of current networks have led
researchers to focus more critically on the design procedures used.
The early and even some current water quality monitoring practices were often
restricted to "problem areas" or "potential sites for pollution", covering limited
periods of time and limited numbers of variables to be observed. Recently, water
quality related problems and the need for monitoring have intensified so that the
information expectations to assess water quality have also increased. This pressure
has resulted in an expansion of monitoring activities to include more observational
sites and larger number of variables to be sampled at smaller time intervals. While
these efforts have produced plenty of data, they have also raised the question
whether one "really" needs "all" these data to meet information requirements.
Therefore, a more systematic approach to monitoring is required. As a result,
various network design procedures have been proposed and used to either set up
ASSESSMENT OF THE ENTROPY PRINCIPLE 137
The studies carried out so far show that the method works quite well for the
assessment of an existing network. It appears as a potential technique when applied
to cases where a decision must be made to remove existing observation sites, and/or
reduce frequency of observations, and/or terminate sampling program. The method
may also be used to select the numbers and locations of new sampling stations as
well as to reduce the number of variables to be sampled (HarmanciogIu and
Alpaslan, 1992).
On the other hand, the entropy method cannot be employed to initiate a
network; that is, it cannot be used for design purposes unless a priori collected data
are available. This is true for any other statistical technique that is used to design
and evaluate a monitoring network. In fact, the design process is an iterative
procedure initiated by the selection of preliminary sampling sites and frequencies.
This selection has to be made essentially by nonstatistical approaches. After a
certain amount of data is collected, initial decisions are evaluated and revised by
statistical methods. It is throughout this iterative process of modifying decisions that
the entropy principle works well. Its major advantage is that such iterations are
realized by quantifying the network efficiency and cost-effectiveness parameters for
each decision made.
ASSESSMENT OF THE ENTROPY PRINCIPLE 141
One of the valuable aspects of the entropy concept as used in network design is its
ability to provide a precise definition of "information" in tangible terms. This
definition expresses information in specific units (i.e., napiers, decibels, or bits) so
that it constitutes a completely quantitative measure. At this point, it is important
to note again the distinction between the two terms "data" and "information". The
term "data" represents a series of numerical figures which constitute a means of
communication with nature; what these data actually communicate to us is
"information". This distinction means that availability of data is not a sufficient
condition for availability of information unless those data have utility, and the term
"information" describes this utility or usefulness of data. Among the various
definitions of information proposed to date, the entropy measure appears to be the
only one that gives credence to the relevance or utility of data.
The value of data can also be expressed in quantitative terms since it is
measured by the amount of information the data convey. This observation implies
that monitoring benefits may eventually be assessed on the basis of quantitative
measures rather than indirect descriptions of information. In comparison with the
current methods, the entropy method develops a clearer and more meaningful
picture of data utility versus cost (or information versus cost) tradeoffs. 'This
advantage occurs because both the information and the costs can be measured in
terms of quantitative units. For example, if cost considerations require data to be
collected less frequently, the entropy measure describes quantitatively how much
information would be risked by increasing the sampling intervals (Harmancioglu,
1984; Harmancioglu and Alpaslan, 1992). By such an approach, it is possible to
express how many bits of information would be lost against a certain decrease in
costs (or in monetary measures). Similarly, it is possible to define unit costs of
monitoring in such terms of the amount of dollars per bit of information.
particularly if the monitoring objectives have been changed or revised. The entropy
method may again be used to assess the data collected to determine how much
information is conveyed by the network under present conditions. If revisions and
modifications are made, their contnbution to an increase in information can be
measured by the same method. Within this respect, the entropy theory also serves
to maintain flexibility in the network since each decision regarding the technical
features can be assessed on objective grounds.
Cost-effectiveness
A major difficulty underlying both the design and the evaluation of monitoring
systems is the lack of an objective criterion to assess cost-effectiveness of the
network. In this assessment, costs are relatively easy to estimate, but benefits are
often described indirectly in terms of other parameters, using optimization
techniques, Bayesian decision theory or regression methods (Schilperoort et aI., 1982;
ASSESSMENT OF THE ENTROPY PRINCIPLE 143
The entropy method measures the information content of available data (extraction
of information) and assesses the goodness of information transfer between temporal
or spatial data points (transfer of information). These two functions constitute the
basis of the solution to the design problems of what, where, when, and how long to
observe. Such a solution is based on the maximization of information transfer
between variables, space points, and time points, respectively. The amount of
information transfer used in such an analysis can be measured by entropy in specific
units. The selection of each technical design factor can be evaluated again by means
of entropy to define the amount of information conveyed by the data collected by
each of the selected monitoring procedures. These evaluations may eventually
provide the ability to make quantitatively based rational decisions on how long a
gauging should be operated (Alpaslan et at, 1992).
Harmancioglu and Alpaslan (1992) demonstrated the applicability of the entropy
method in assessing the efficiency and the benefits of an existing water quality
monitoring network with respect to temporal, spatial and combined temporal/spatial
design features. They described the effect of each feature upon network efficiency
and cost-effectiveness by entropy-based measures. For example, the effect of
extending the sampling interval from monthly to bimonthly measurements for three
variables investigated leads to a loss of information in the order of 20.4% for DO,
32.8% for Cl, and 68.7% for EC, as shown in Fig.l. Here, the selection of an
appropriate sampling interval is made by assessing how much information the
decision-maker would risk versus given costs of monitoring. A similar evaluation can
be made with respect to the number and locations of required sampling sites as in
Fig.2, where changes in rates of information gain are investigated with respect to the
number of stations in the network. Harmancioglu and Alpaslan (1992) further
combined both spatial and temporal frequencies as in Fig.3 to assess the variation
144 N. B. HARMANCIOGLU ET AL.
(0J0)
1.0
0.80
00
--
,;t060
CI
-
.
:z:::
....
I
><,"0.40
-
><
I-
EC
0.20
o~~----~----~----~---
234
ot(months)
Figure 1. Effects of sampling frequency upon information gain about three water
quality variables (Harmancioglu and Alpaslan, 1992).
of information with respect to both space and time dimensions. The results of these
analyses have shown the applicability of the entropy concept in network assessment.
DO
EC
O~~2----~3----~4----5~--~6-
no. of stations
.......
...III
.a." 4.700
~
"
c
4.600
~ ~I
-2
2 3 4
At(months)
1992). When the determinant of the matrix is too small, entropy measures cannot
be determined reliably since the matrix becomes ill-conditioned. This often occurs
when the available sample sizes are very small.
On the other hand, the question with respect to data availability is "how many
146 N. B. HARMANCIOGLU ET AL.
data would be considered sufficient". For example, Goulter and Kusmulyono (1993)
claim that the entropy principle can be used to make "sensible inferences about
water quality conditions" but that sufficient data are not available for a reliable
assessment. The major difficulty here arises from the nature of water quality data,
which are often sporadically observed for short periods of time. With such "messy"
data, application of the entropy method poses problems both in numerical
computations and in evaluation of the results. Particularly, it is difficult to determine
when a data record can be considered sufficient.
With respect to the temporal design problem, all evaluations are based on the
temporal frequencies of available data so that, again, the method inevitably appears
to be data dependent. At present, it appears to be difficult to assess smaller time
intervals than what is available. However, the problem of decreasing the sampling
intervals may also be investigated by the entropy concept provided that the available
monthly data are reliably disaggregated into short interval series. This aspect of
entropy applications has to be investigated in future research.
Another important point in entropy applications is that the method requires the
assumption of a valid distribution-type. The major difficulty occurs here when
different values of the entropy function are obtained for different probability
distribution functions assumed for the same variable. On the other hand, the
entropy method works quite well with multivariate normal and lognormal
distributions. The mathematical definition of entropy is easily developed for other
skewed distributions in bivariate cases. However, the computational procedure
becomes much more difficult when their multivariate distributions are considered.
When such distributions are transformed to normal, then uncertainties in parameters
need to be assessed.
Another problem that has to be considered in future research is the
mathematical definition of entropy concepts for continuous variables. Shannon's
basic definition of entropy is developed for a discrete random variable, and the
extension of this definition to the continuous case entails the problem of selecting
the discretizing class intervals 4x to approximate probabilities with class frequencies.
Different measures of entropy vary with Ax such that each selected 4x constitutes a
different base level or scale for measuring uncertainty. Consequently, the same
variable investigated assumes different values of entropy for each selected Ax. It may
even take on negative values which contradict the positivity property of the entropy
function in theory.
One last problem that needs to be investigated in future research is the
development of a quantifiable relationship between monitoring objectives and
technical design features in terms of the entropy function. As stated earlier, an
information-based design strategy requires the delineation of data needs or
information expectations. To ensure network efficiency, "information supplied" and
"information expected" must be expressed in quantifiable terms by the entropy
ASSESSMENT OF THE ENTROPY PRINCIPLE 147
concept. At the current level of research, if one considers that the most significant
objective of monitoring is the determination of changes in water quality, then the
entropy principle does show such changes with respect to time and space. However,
future research has to focus on the quantification of information needs for specific
objectives (e.g., trend detection, compliance, etc.) by means of entropy measures.
CONCLUSION
REFERENCES
Alpaslan, N.; Harmancioglu, N.B.; Singh, V.P. (1992) "The role of the entropy
concept in design and evaluation of water quality monitoring networks", in: V.P.
Singh & M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources,
Dordecht, Kluwer Academic Publishers, Water Science and Technology Library,
pp.261-282.
Dawdy, D.R. (1979) "The worth of hydrologic data", Water Resources Research,
15(6), 1726-1732.
Goulter, I. and Kusmulyono, A. (1993) "Entropy theory to identify water quality
violators in environmental management", in: R.Chowdhury and M. Sivakumar (eds.),
Geo-Water and Engineering Aspects, Balkema Press, Rotterdam, pp.149-154.
Harmancioglu, N. (1981) "Measuring the information content of hydrological
processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil
Engineering, Ege University, Faculty of Engineering, pp.13-38.
Harmancioglu, N. (1984) "Entropy concept as used in determination of optimum
sampling intervals", Proceedings of Hydrosoft '84, International Conference on
Hydraulic Engineering Software, Portoroz, Yugoslavia, pp.6-99 and 6-110.
Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of
148 N. B. HARMANCIOGLU ET AL.
Four methods of statistical inference are discussed. These include the two well known non-entropy
methods due to Fisher and Bayes and two entropic methods based on the principles of maximulI}
entropy and minimum cross--entropy. The spheres of application of these methods are elucidated in
order to give a comparative understanding. The discussion is interspersed with illustrative examples.
INTRODUCTION
Maximum entropy and minimum cross-entropy principles provide methods distinct from the classical
methods of statistical inference. In this context the following questions naturally arise:
• What is statistical inference?
• What are the classical methods of statistical inference?
• How do these methods compare with entropic methods of statistical inference?
• When should one use entropic rather than non-entropic methods?
The answers to these questions are related to the age old controversy arising from the two methods
of non-entropic statistical inference: (1) Bayesian and (2) non-Bayesian. There are strong arguments
for and against both of these methods of inference. The object of the present paper is to shed some
light on these fundamental questions, from the vantage point of the entropic methods of inference.
different approaches can. however. supplement one another to enlarge the scope of investigation. It
is th<'fefore essential that users of statistical inference, irrespective of the discipline they belong to,
understand the differences and similarities between the various types of statistical inference, without
being overly conccrned about the doctrinaire cont roversies that beset the different groups.
Over the course of its evolution the discipline of statistics has becB divided into two views:
flayt:;wn. and lIoII-Baytsian or the frequcnllsl vicw. 1\lore reccntly, however. st.atistical inference
has been enriched within the framework of the principles of maximum elltropy or mmlmum cross-
entropy. These principles provide a more unified foundation for the problem specification and the
criteria for obtaining a meaningful solution. Since the Bayesian and the frequentist schools are
well established, the newer methodologies tend to be classified with respect to these two, either as
Bayesian or as non-Bayesian. However, entropic methods represent a distinct class of statistical
reasoning. Nevertheless, the followers of the entropic methods feel closer to the Bayesians despite
t.heir claim for a separate identity.
In the next sections, the similarities and differences of the following four types of statist.ical
inference are discussed:
(i) classical, traditional, or orthodox stat.istical inference;
(ii) Bayesian inference;
(iii) inference based on maximum entropy principle (MaxEnt); and
(iv) inference based on minimum cross-entropy principle (MinxEnt).
Furthermore, the conditions under which each is most appropriate, and the circumstances under
which these can supplement one another are considered.
Bayesian inference
This approach differs from the classical approach in the sense that, in addition to specifying a popula-
tion density f(x, 0), where X and () may be scalars or vectors, and having observations Xl, X2, ... , X n ,
it also assumes some a priori distribution for the parameter O. It assumes that there is a specific
a priori distribution which the parameter follows. This method then proceeds to use the observa-
tions Xl, X2, ... , Xn to update the knowledge of the a priori distribution resulting in an a posteriori
distribution which incorporates the knowledge of the data. If more independent observations are
obtained, the a posterIOri distribution, obtained as a result of the first set of observations, may be
treated as an a priOri distribution for the next set of observations resulting in a second a posteriori
distribution.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 151
The assumption of an a priori distribution of (J and the continuous updating of this distribution in
the light of independent observations are essential features of Bayesian inference. The traditionalists
assert that (J is a fixed number and consequently, its probability distribution IS an inadmissible
consideration. The Bayesians counter the objection by saying that the probability distribution of (J
is not to be understood in the relative frequency sense; in fact, it is not the true value of (J that is
being discussed, but rather our perception of this value. This perception changes as our knowledge
based on observations increases. The probability distribution of (J depends on the observations and
it is our objective to find this distribution.
In fact, not assuming any probability distribution for (J is very often equivalent to assuming that
all values of (J are equally likely, or that (J has a uniform, regular or degenerate, distribution. By
not making a statement about the a priori distribution, we may in fact be making a statment about
it. In this process, we may restrict our choice drastically. If our knowledge of the situation, prior
to our getting the data, warrants an a priori distribution for e, it should be used. In some sense,
we may say that, as the amount of Bayesian a priori information decreases, the Bayesian inference
methodology approaches the classical inference methodology. From this point of view, the classical
inference may be regarded as a limiting form of Bayesian inference. Bayesian and non-Bayesian
methods of statistical inference are conceptually quite different, although in many situations, they
may give the same results.
There are many situations in which Bayesian methods are better geared to provide answers than
the classical methods. Bayesian inference uses informative priors, while classical inference uses non-
informative priors. The uniform distribution gives a non-informative prior, but other non-informative
priors do exist also.
Cexp [ --
n
l(m-mo)2 - -lL: (Xi-m)2] =C exp [12(1
I
--m n) + m (mo
-(12 + -(12 -er + -niX)] , (2)
2 (12 2 (12 2 er
2 2
° ;=1 ° °
so that the a posteriori probability distribution is N (ml' ern, where
mo iX
-+--
er5 er 2 /n lin
ml= 1 1
and -
err -- -er5+(12
-. (3)
-+--
er5 er 2 1n
Remarks:
• As era -- 00, i.e., as the a priori distribution tends to the degenerate uniform distribution, the
influence of the a priori distribution tends to disappear and ml -- iX and er5 __ er2 In, which
are the classical results. This result illustrates equation the limiting case of the minimum
cross-entropy when the a priori distribution is the uniform distribution.
• On the other hand, if ero is small, then the a priori distribution dominates and the data make
only relatively small contributions to ml and err. In fact, it is this fear of 'dominance of the
a priori' which deters many people from using Bayesian methods.
• If a further random sample, Yl, Y2, ... , Yr, is obtained from the distribution, then the final
a posteriori distribution has
(4)
and
lin r
- --+-
er~ - (15 er 2+er-2 . (5)
Thus, the final distribution is the same as that obtained from N(mo, (15) with a random sample
Xl, X2, ... , Xn; Yl, Y2,' .. , Yr· This is an important feature of Bayesian inference, viz, that the
final result is independent of the number of stages in which updating is done, so long as the
total information in all stages combined is the same.
Problem 2: Given that the parameter p of a binomial distribution is a B( a, b) variate and n inde-
pendent Bernoullian trials have resulted in r successes, find the a posteriori probability distribution
distribution for p.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 153
where
lI=m-n (20)
and
y=Ax. (21)
It then follows that
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 155
Problem 4: Given that a random variate has an a priori distribution N(mo, 0"2) and that its mean
is m, find its a posteriori distribution. Now, given in addition that its variance is also known and
is 0"2, find its new a posteriori distribution. Given further that E[(x - m)4] =
b4, find the third
a posteriori distribution and calculate entropies and cross-entropies.
Solution 4: Here
[ 1 (x - mo)2]
go(x) = =1
y27r0"0
exp --2 2
0"0
• (22)
1
Minimizing
00 f(x)
f(x) In -(-) dx, (23)
i: i:
-00 go x
subject to
f(x)dx =1 and xf(x)dx = m, (24)
results in
f(x) = gl(X) = ~
y27r0"0
exp [_~ (x -
2
;n)2] .
0"0
(25)
1 00
-00
I(x)
I(x) In - (-) dx,
gl X
(26)
subject to
and (27)
results in
I(x) =g2(X) = _1_
v"f;
exp [_~.!.-(x_--::m,-')~2]
0" 2 0- 2
(28)
so that the second a posteriori distribution is N (m, 0"2). It can be observed that this distribution is
independent of both mo and 0"5. Again, minimizing
1 00
-00
I(x)
I(x) In - (-) dx,
g2 X
(29)
156 J. N. KAPUR ET AL.
subject to
and (30)
we obtain
f(x) = Aexp [-i (x ~;n)2 - A(X - m)4] , (31)
where A and A are determined by equations (30). This distribution will be different from the normal
distribution unless b4 = 3114 •
Remarks:
• In example 1, the a priori distribution of m was given; here, the a priori distribution of x is
given.
• In example 1, Bayes's theorem was used; here, the principle of minimum cross-entropy is used.
• In example 1, information was supplied in terms of random values Xl, X2,"" Xn ; here infor-
mation is supplied in terms of knowledge of the mean, variance, or other moments.
• Continuous updating is done in both cases.
• Example 4 shows that, as more and more observations are obtained, the uncertainty about.
the mean of the population decreases. In fact, even with a single set of observations, as
n -> 00, 111 -> 0 and entropy goes to its minimum value of -00. It also demonstrates that
if information is given in the form of different values for the moments already given in the
a priori distribution, then the uncertainty may decrease, or may in fact increase. However, if
the information is about different moments, then those involved in the a priori distribution,
the uncertainty is likely to decrease and will certainly not increase.
• Some cross-entropies of interest are
1 00
-00
gl(X) In gl«X» dx
go x
= ~(m-mo?
2 115
(32)
1 00
-00
g2(x)ln g2«X» dx
go x
= 1 (m - mo)2
-2 2
110
112 -
+--2-'
110
115
(33)
In spite of the additional information, the second cross-entropy may be either smaller or larger
-1:
than the first. Some entropies of interest are
1
= 21n (21TeI10)
-1:
go(x)lngo(x)dx (34)
-1:
gl(X) Ing1(X) dx = 2 1n(21TeI10) (35)
1
g2(X) Ing2(x) dx = 2 In (21Tel1). (36)
Thus the entropy does not change when the mean alone is changed, but it does change when
the variance is changed, although it may either increase or decrease depending on whether 11
is greater than or less than 110. On the other hand, entropies of the distributions obtained in
example 1 are
and (37)
and
(38)
so that, in this case, the entropy goes on decreasing as more and more information becomes
available.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 157
Problem 5: Let X be an N(rno, Eo) variate and let it be given that E[Ax] =
y and that the
covariance matrix of Ax is R. Find the minimum cross-entropy distribution for x.
J I(x)
I(x) In g(x) dx, (39)
J
subject to
I(x)dx= 1, (40)
and
J (Ax - y)(Ax - y)T I(x) dx = R. (42)
This gives
- AT
(Ax - y)I T
- 2(AX - y) D- I (Ax - y) , 1 (43)
Remarks:
• In examples 3 and 5, though the problems look similar, these are not exactly the same. The
methods of attack are different and the solutions are different.
• In example 3, it is required that the standardized likelihood satisfies the given constraints.
• In example 5, on the other hand, we want the a posteriori distribution to satisfy these con-
straints.
• The standardized likelihood is a conditional probability distribution while the a posteriori
distribution is not.
• The constraints give a unique likelihood and a unique Bayes's a posteriori distribution. How-
ever, the constraints alone do not give a unique a posteriori distribution and the minimum
cross-entropy principle must be used in order to obtain a unique probability distribution.
Problem 6: A box contains n + 1 balls which can be either white or black, but there is at least
one ball of each color. As such, there can be n possible hypotheses, HI, H2, ... , Hn where Hj is the
hypotheses that there are i white balls and N + 1 - i black balls. The a priori probabilities of these
hypotheses being true are given by the probability distribution q =(qlo Q2,"" qn). Now, let a ball
be drawn from the box and let it be white. Find the a posteriori probabilities of the n hypotheses,
=
{P(Hj IE), i 1,2, ... , n}, where E is the event that the ball drawn is white.
158 J. N. KAPUR ET AL.
Now,
i
P(H;) = q; and P(EIH')=-
• n+l'
(45)
so that
(46)
Remarks:
• We may consider a parameter, (J, which takes values 1,2, ... , n according to which of the
hypotheses 1,2, ... , n are true. Given the a priori probability distribution of (J, its a posteriori
probability distribution is found.
• Here, the information given was sufficient to determine the conditional probabilities and a pos-
teriori probabilities.
Problem 1: In the previous example, it is given that the a priori probabilities are ql, q2, ... , qn, and
that the mean number of white balls observed is m. Find the a posteriori probability distribution.
Solution 1: If p = (pi, P2, ... , Pn) is the a posteriori probability distribution, then it is given that
n
and Lip; =m. (47)
;=1
Unlike the situation in example 7, this information is not sufficient to determine p uniquely. We
therefore appeal to the principle of minimum cross-entropy and minimize a suitable measure of
entropy subject to equations (47).
Minimizing the Kullback-Leibler measure, we get
i = 1,2, ... ,n, (48)
where a and b are determined by the equations
n n
a Lq;b; =1 and a Liq;b; = m. (49)
;=1 i=l
This result gives an a posteriori distribution different from equation (46). However, minimizing the
Havrada-Charvat measure of cross-entropy of second order,
(50)
L iq; + d L i q; = m.
n n n
c+dLiq; =1 and C 2 (52)
i=l i=l ;=1
If
(53)
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 159
then
c=O and d= (
n
~iq;
)-1 , (54)
.=1
so that
P; =iq; / t iq;, (55)
;=1
which is the same as equation (46).
Remarks:
• In example 6, there is enough information to find p uniquely.
• In example 7, the knowledge of the mean alone does not give an unique a posteriori distribution.
However, using the minimum cross-entropy principle, the same a posteriori distribution as that
given by Bayes's theorem can be obtained, provided we
1. take the prescribed value of the mean to be the value given by Bayes's theorem, and
2. use the Havrada-Charvat measure of cross-entropy of second order.
• In general, using the generalized minimum cross-entropy principle, we can say that we can
always find an appropriate measure of cross-entropy and an appropriate set of constraints
such that when this measure is minimized subject to these constraints, the same a posteriori
distribution is obtained as is given by Bayes's theorem.
• If the Kullback-Leibler measure is required, the geometric mean can be prescribed so that the
given information is
n n
p; =q;eiJ , (57)
so that
(59)
(61)
is obtained, where the /J's are determined using by the constraints (61), (62), and (64). Next, we
shall start with equation (65) as the a priori distribution and use constraints (61) and (64) to get
the a posteriori distribution
where the A's and the II'S are obtained using constraints (61), (62), and (64). Since equations (65)
and(66) are of the same form, and since the multipliers are determined by the same constraints, the
=
final probability distribution is the same so that Pi PI for all i.
Also, the entropy of the final distribution, (Pt.P2, ... ,Pn) is less than or equal to the entropy of the
intermediate distribution, (P1, P2, ... , Pn), since (P1, P2, ... , Pn) has greater entropy than any other
distribution which satisfies these constraints, and since (Pt. P2,"" Pn ) is one such distribution.
As such, the principle of gain in information continues to hold for both maximum entropy and
minimum cross-entropy principles. However, there will be information gain when old constraints
continue to hold and additional constraints, linearly independent of the earlier ones, are imposed.
Under these conditions, there will be a positive information gain and the uncertainty will increase.
If additional independent constraints are not imposed, and we only give information changing the
values of the moments given earlier, the entropy can, in fact, increase.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 161
CONCLUSIONS
A Bayesian approach to statistical inference implies an initial "opinion" in the form of an a priori
probability distribution. Then, it uses the available "evidence" in the form of knowledge of a
random sample or of some moments to obtain a final "opinion" in the form of a posterior probability
distribution. In this sense, all our first three methods follow the Bayesian approach.
In Bayesian inference we are given a density function f(z, 9). We start with an a priori distribution
for 9, use the values of the random sample to construct a likelihood function, and then, use Bayes'
theorem to obtain the a posteriori probability distribution for 9.
In the minimum cross-entropy inference, we are given the a priori distribution for the random
variates. We use the evidence in the form of values of some moments to get the a posteriori prob-
ability distribution via Kullback's minimum cross-entropy principle [Kullback and Leibler, 1951,
Kullback, 1959].
In the maximum entropy inference, we are not given any initial opinion about the probability
distribution. We use Laplace's principle of insufficient reason and assume that the a priori probability
distribution is uniform. Then, we proceed as in the minimum cross-entropy approach.
In the Bayesian approach, only the density function f( z, 9) is considered and no other prior
opinion. The evidence is in the form of a random sample for the population and the final opinion is
in the form of a Dirac delta function for 9. Thus, there is a great deal of commonality between the
four methods.
The principles of maximum entropy and minimum cross-entropy have been explored in detail in
[Kapur, 1989, Kapur and Kesavan, 1989]
[J.N.Kapur and H.K.Kesavan, 1990, Kapur and Kesavan, 1992], [H.K.Kesavan and J.N.Kapur, 1989],
and [H.K.Kesavan and J.N.Kapur, 1990b, H.K.Kesavan and J.N.Kapur, 1990a] where some other
aspects of statistical estimation are also given.
Methods of estimating non-parametric density function by using maximum entropy principle have
been discussed by [Theil and Fiebig, 1984] for both the univariate and multivariate cases.
Earlier, the discussion of [Campbell, 1970] on the equivalence of Gauss' principle and Minimum
Discrimination for estimation of probabilities illustrates the interaction between entropic and non-
entropic methods of inference.
The principle of maximum entropy can also be used to derive maximum entropy priors for use in
Bayesian estimation. In a given a density function f(z,9), maximum entropy prior is that density
function f(x, 9) for which the entropy
is maximum. The principle is closely related to maximum data information priors discussed by
[Zellner, 1977].
ACKNOWLEDGEMENTS
This work was possible due to the financial support in the form of grants from the Natural Sciences
and Engineering Research Council of Canada and the Province of Ontario's Centres of Excellence
Programme.
REFERENCES
Burg, J. (1972). "The Relationship between Maximum Entropy Spectra and Maximum likelihood
Spectra". In Childrers, D., editor, Modern Spectral Analysis, pages 130-131. M.S.A.
Campbell, L. L. (1970). "Equivalence of Gauss's Principle and Minimum Discrimination Estima-
tion of Probabilities". Ann. Math. Stat., 41, 1011-1013.
Cramer, H. (1957). "Mathematical Methods of Statistics". Princeton University Press.
Fisher, R. (1921). "On the Mathematical Foundations of Theoretical Statistics". Phil. Trans.
Roy. Soc., 222(A), 309-368.
Fougere, P., editor (1990). "Maximum Entropy And Bayesian Methods Proceedings of the 9th
MaxEnt Workshop". Kluwer Academic Publishers, New York.
162 1. N. KAPUR ET AL.
Goel, P. K. and Zellner, A., editors (1986). "Bayesian Inference and Decision Techniques". North-
Holland, Amsterdam.
Grandy, W. T. J. and Schick, L. H., editors (1991). "Maximum Entropy and Bayesian Methods".
Kluwer Academic Press, Dordrecht.
Havrda, J. H. and Charvat, F. (1967). "Quantification Methods of Classication Processes: Concept
of Structural a Entropy". Kybematika, 3, 30-35.
H.K.Kesavan and J.N.Kapur (1989). "The Generalized Maximum Entropy Principle". IEEE
Trans. Syst. Man. Cyb. 19, pages lO42-lO52.
H.K.Kesavan and J .N.Kapur (1990a). Maximum Entropy and Minimum Cross Entropy Principles:
Need for a Broader Perspective. In Fougere, P. F., editor, Maximum Entropy and Bayesian
Methods, pages 419-432. Kluwer Academic Publishers.
H.K.Kesavan and J.N .Kapur (1990b). "On the Family of Solutions of Generalized Maximum and
Minimum Cross-Entropy Models". Int. Jour. Gen. Systems vol. 16, pages 199-219.
Jaynes, E. (1957). "Information Theory and Statistical Mechanics". Physical Reviews, 106,620-
630.
J.G.Erickson and C.R.Smith, editors (1988). "Maximum Entropy and Bayesian Methods in Sci-
ence and Engineering, vol 1 (Foundations), vol 2 (Applications)". Kluwer Academic Publishers,
New York.
J .H.Justice, editor (1986). "Maximum Entropy and Bayesian Methods in Applied Statistics".
Cambridge University Press, Boston.
J.N.Kapur and H.K.Kesavan (1990). Inverse MaxEnt and MinxEnt Principles and their Applica-
tions. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods, pages 433-450. Kluwer"
Academic Publishers.
J.N.Kapur and Seth, A. (1990). A Comparative Assessment of Entropic and Non-Entropica
Methods of Estimation. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods,
pages 451-462. Kluwer Academic Publishers.
J .Skilling, editor (1989). "Maximum Entropy and Baysean Methods". Kluwer Academic Pub-
lishers, Doedrecht. .
Kapur, J. (1989). "Maximum Entropy Models in Science and Engineering". Wiley Eastern, New
DeIhL
Kapur, J. and Kumar, V. (1987). "A measure of mutual divergence among a number of probability
distributions". Int. Jour. of Maths. and Math. ScL, 10,3, 597-608.
Kapur, J. N. and Kesavan, H. K. (1989). "Generalized Maximum Entropy Principle (with Appli-
cations)". Sandford Educational Press, University of Waterloo.
Kapur, J. N. and Kesavan, H. K. (1992). "Entropy Optimization Principles and their Applica-
tions". Academic Press, San Diego.
Kullback, S. (1959). "Information Theory and Statistics". John Wiley, New York.
Kullback, S. and Leibler, R. (1951). "On Information and Sufficiency". Ann. Math. Stat., 22,
79-86.
Rae, C. R. (1989). "Statistical Data Analysis and Inference". North-Holland, Amsterdam.
Renyi, A. (1961). "On Measures of Entropy and Information". Proc. 4th Berkeley Symp. Maths.
Stat. Prob., I, 547-561.
Seth, A. K. (1989). "Prof. J. N. Kapur's Views on Entropy Optimization Principles". Bull. Math
Ass. Ind., 21,22, 1-38,1-42.
Shannon, C. E. (1948). "Mathematical Theory of Communication". Bell System Tech. Journal,
27,1-4,379-423,623-656.
Smith, C. and W.T. Grandy, J., editors (1985). "Maximum-Entropy and Bayesian Methods on
Inverse Problems". D. Reidel, Doedrecht, Holland.
Theil, H. and Fiebig, D. (1984). "Exploiting Continuity: Maximum Entropy Estimation of Con-
tinuous Distributions". Ballinger, Cambridge.
Tribus, M. (1966). "Rational Descriptions, Decisions and Designs". Pergamon Press, Oxford.
Wilks, S. S. (1963). "Mathematical Statistics". John Wiley, New York.
Zellner, A. (1977). "Maximal data information prior distributions". In Aykac, A. and Brumat, C.,
editors, New Developments in the Applications of Bayesian Methods. North Holland, Amsterdam.
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE
N B. HARMANCIOGLU
Dokuz Eylul University
Faculty of Engineering
Bornova 35100 Izmir, Turkey
In the design and operation of hydrologic data collection networks, the question of
how long data gathering should be continued is a problem that is not often
addressed. The current situation is that no definite criteria have been established
to decide upon when and where to terminate data collection. Entropy-based
measures of information, as used in this study, present convenient and objective
means of assessing the status of an existing station with respect to information
gathering. Such an assessment can be realized by evaluating the redundancy of
information in both the time and the space domains. Accordingly, a particular site
that either repeats the information provided by neighboring stations or produces the
same information by successive measurements can be discontinued. The presented .
study shows that the entropy concept can be effectively employed to evaluate the
spatial and temporal redundancy of information produced by a hydrologic network.
The application of the method is demonstrated on case studies comprising water
quality and quantity monitoring networks in selected Turkish river basins.
INTRODUCTION
There are four basic issues to be considered in the design and operation of
hydrologic data collection networks: what to measure, where, when, and for how
long. Among these, the last issue of how long data collection should be continued
is not often addressed even though monitoring agencies, often due to budgetary
constraints, would like to know whether continuous data collection practices are
required or not. The current situation is that no definite criteria have been
established to decide upon when and where to terminate data collection.
Maddock (1974) was among the first to address the problem; he considered
station discontinuance based on correlation links with other stations in the network.
Wood (1979) used the sequential probability ratio test, where the decision whether
to discontinue a station is dependent on statistical considerations that include the
163
K. W Hipel etal. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 163-176.
© 1994 Kluwer Academic Publishers.
164 N. B. HARMANCIOGLU
M
H(Xl'X2, ... ,XM) = ~ H(~) (1)
m=l
where H(~) represents the marginal entropy of each variable ~ in the form of:
N
H(~) = K ~ p(xn) log [1/p(~)] (2)
n=l
with K=l if H(~) is expressed in napiers for logarithms to the base e. Eq.(2)
defines the entropy of a discrete random variable x.n
with N elementary events of
probability P n = p(~) (n= 1, ... ,N) (Shannon and Weaver, 1949). For continuous
density functions, p(~) is approximated as [f(xn).llx] for small Ax, where f(xn) is the
relative class frequency and ~x, the length of class intervals (Amorocho and
Espildora, 1973). Then the marginal entropy for an assumed density function f(~)
is:
+00
_00
In the above, the selection oUx becomes a crucial decision as it affects the values
of entropy (Harmancioglu and Alpaslan, 1992; Harmancioglu et aI., 1986).
If significant stochastic dependence occurs between M variables, the total
entropy has to be expressed in terms of conditional entropies H(~!Xl''''~) added
to the marginal entropy of one of the variables (Harmancioglu, 1981; Topsoe, 1974):
166 N. B. HARMANCIOGLU
M
H(Xl'X2,···,XM ) = H(X1) + l: H(~ IXl'···'~_l) (4)
m=2
+00 +00
..00 ..00
r· J
+00 +00
..00 ..00
difference between two joint entropies. For example, the conditional entropy of
variable X with respect to two other variables Y and Z can be determined as:
Variable DO EC
2 4.658 2.510
3 9.326 5.050
4 14.025 7.626
5 18.724 10.207
6 23.424 12.812
The analysis of station discontinuance in the time domain is based on the assessment
of temporal information transfer between successive measurements. By using the
entropy principle, a monitoring site is evaluated for redundancy of information with
respect to two factors: length of data records and sampling frequency. Investigation
on the basis of the first factor focuses on the change of information conveyed by a
station with respect to record length N. Here, one may decide to terminate a
station if, after a certain duration of monitoring, no new information is obtained by
additional measurements. The entropy method can be used to delineate the change
of information conveyed by data on the basis of the record length N.
170 N. B. HARMANCIOGLU
The definition given in Eq.(3) for the marginal entropy of X involves the term
"log (l/Ax)" which essentially describes a reference level of uncertainty according to
which the entropy (uncertainty or information conveyed) of the process is evaluated.
In this case, the entropy function assumes values relative to the reference level
described by "log (l/Ax)" (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992;
Harmancioglu et a!., 1992). On the other hand, some researchers propose the use
of a function m(x) such that the marginal entropy of a continuous variable is
expressed as:
+00
I
+00
or:
where "log N" becomes the reference level of uncertainty. Another property of this
term is that, for a record of N observations, "log N" represents the upper limit of the
entropy function (Harmancioglu, 1981; Topsoe, 1974). Using this property and
rearranging Eq.(ll), one may write:
+00
..0()
where H'(X) now describes the change in information conveyed by data compared
to the reference level of "log N". Here, the absolute value of the H'(X) function has
to be considered as it defines the difference between the upper limit "log N" and the
information provided by data of N records. Since "log N" represents the maximum
amount of uncertainty, H'(X) measures the amount of information gained by making
observations. This amount, or the rate of change of information conveyed, changes
as N is increased. The particular point where H'(X) stabilizes or remains constant
with respect to N indicates that no new information is gained by additional
observations. If this point is reached within the available record of observations,
then one may decide to discontinue sampling, as further observations will not bring
further information.
The approach described above is applied to nine years of daily runoff data of
the three stations in Aras basin, which were analyzed in the previous section for
station discontinuance in the space domain. Figure 1 shows the H'(X) values for
each station in terms of the rates or changes of information gain with respect to
record length N. These values are computed with daily data of each station for N
years, starting with one year (365 observations) and consecutively increasing the
length of record to 9 years by 365 x N. The curves obtained show that the rate of
information gain is high for the first three years. The rate of change decreases after
the fourth year, indicating a relatively smaller amount of information conveyed by
data beyond this period. However, none of the stations have yet reached the point
where this change is negligible; that is, the H'(X) values have not yet become
constant with respect to N in anyone of the stations. Thus, one may infer that none
of the stations have reached the point of termination.
If a monitoring station is found not to have reached the point of discontinuance, one
may investigate further whether temporal sampling frequencies may be decreased
or not. Again, entropy measures can be used to analyze this problem by evaluating
172 N. B. HARMANCIOGLU
c
3
b
a
...... 2
....
I/)
.a.
III
IU
c::
X
-:J:
........
0 2 4 6 8 10
N( Years)
Figure 1. The change of information gain with respect to length of data records:
(a) Kagizman, (b) Cobandede, (c) Mescitli .
To analyze the effect of serial correlation upon marginal entropy, the variable X can
be considered to be made up of series ~, ~.l' .., ~.k' each of which represents the
sample series for time lags k=O,I, ...,K and which obey the same probability
distribution function. Then conditional entropies such as H(~ I~.l)' H(Xi I~.l'
~.2' ...' ~.k) can be calculated. If ~.k (k= 1,... ,K) are considered as different
variables, the problem turns out to be one of the analysis of K + 1 dependent multi-
variables; thus, formulas similar to Eq.(6) can be used to compute the necessary
conditional entropies (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):
..00 ..00
(14)
exists between the variables ~.k (k=O, ... ,K). Thus, as the degree of serial
dependence increases, the marginal entropy of the process will decrease until the
condition:
(15)
is met for an infinitesimally small value of Eo It is expected that the lag k where the
above condition occurs indicates the degree of serial dependence within the analyzed
process (Schultze, 1969; Harmancioglu, 1981).
The above approach is applied to the same three series of nine years of daily
runoff data analyzed in the previous section. Figure 2 shows the change of their
marginal entropies with respect to time lag k. It is observed here that, for all
stations, only the first time lag, or the first order serial dependence, is effective in
reducing the uncertainty of each station. The following lags do not contribute
significantly to this reduction so that nonnegligible uncertainty still remains in the
processes at time lags beyond k= 1. This result indicates that successive
measurements do not repeat the same information in the time domain. Thus, the
stations should not be discontinued. The fact that the first lag in each station
produces the highest reduction of uncertainty raises the question whether the
temporal frequencies may be extended fromAt=1 day toAt=2 days or to larger time
frequencies. This problem is investigated by evaluating the transinformations
(common information) between successive measurements for different At sampling
174 N. B. HARMANCIOGLU
--....
I II
CLI
G
'Ci
ftI
c: 5
--oX
,---________________________________ a
x- I 4
~===========================~
......
x-
3
2
x.....
~
0 2 4 6 10
k (time lag)
CONCLUSION
The study presented addresses the problem of station discontinuance on the basis
of the information provided by a gage in both the space and the time domains. The
entropy measures of information are used to quantitatively describe the contribution
of a sampling site to the reduction of uncertainty about the processes observed. The
approach used considers a particular gage as part of a gaging network, the purpose
of which is not to serve particular design objectives but to gather information about
a hydrologic process.
Within this context, the application of the entropy principle to cases of observed
water quality and quantity data shows that the method can be effectively used to
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 175
( Dfo)
to
1993) will have to be solved as part of future research so that the entropy principle
can be used more effectively in assessment of station discontinuance.
REFERENCES
N.ALPASLAN
Dokuz Eylul University
Faculty of Engineering
Bomova 35100 Izmir, Turkey
The inputs and outputs of water and wastewater treatment plants (TP) are
significantly variable so that the design and operation of TP require an assessment
of such random fluctuations. In practice, however, this variability is often
inadequately accounted for, and mean values (or maximum values) of the input and
output processes are used in either designing the TP or evaluating their operational
performance. The study presented introduces the use of the entropy concept in
assessing the uncertainty of input and output processes of TP within an informational
context. In particular, the entropy measures of information are employed to define
a "dynamic efficiency index" (DEI) as the rate of reduction in input uncertainty or
entropy to arrive at a minimum amount of uncertainty in the outputs. Besides
describing the performance of TP, the definition of such an index has further
advantages, the most significant one being in sensitivity analyses of process
parameters. The approach described is demonstrated in the case of an existing TP
of a paper factory, for which input and output data on BOD, COD, and TSS
concentrations are available.
INTRODUCTION
The inputs and outputs of water and wastewater treatment systems fluctuate
significantly with time so that the proper design and operation of treatment plants
(TP) require an assessment of such variability. Inputs to a water treatment plant are
provided by sources such as surface waters, groundwater, or reservoirs whose output
processes are basically random variables. The outputs from the TP also have a
variable character depending upon the operation of the plant, which essentially
produces such variability (Weber and Juanico, 1990). Inputs to wastewater TP show
more fluctuations as they are constituted by domestic and industrial wastewaters.
Again, the operation of the TP inevitably produces variable outputs.
177
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 177-189.
© 1994 Kluwer Academic Publishers.
178 N.ALPASLAN
It is not easy to accurately quantify these random fluctuations in both the inputs
and the outputs of TP. This is often due to poor knowledge of the various sources
of variability that can affect such processes. Consequently, in practice, the random
nature of inputs and outputs are inadequately accounted for, and their mean or
maximum values are used in either designing the TP or in evaluating its operational
performance.
With respect to design of TP, the most important factor that affects the selection
of design parameters is the variability and thereby the uncertainty of the input
processes. Use of mean or maximum values, which is often the case in practice, to
describe the inputs may lead to overdesign or underdesign. At this point, one needs
to properly assess the variability of the inputs so that the design parameters can be
selected accordingly.
With respect to the operation of TP, the performance of the treatment system
has to be evaluated. This is often realized by defining the "operational efficiency"
of the TP in terms of both the inputs and the outputs. Efficiency is simply described
as:
(1)
where E is the efficiency, and Si and Se are the time variant random input and
output (effluent) concentrations, respectively. In practice, the TP system is assumed
to reach a level of steady state conditions. That is, no matter how variable the
inputs are, the TP is expected to produce an effluent of uniform and stable
characteristics. This implies a reduction in input variability to obtain smooth effluent
characteristics. The assumption of steady state conditions is made for the sake of
simplicity both for design purposes and for assessment of operation. In that case,
the above definition of efficiency is given either for a single time point or by using
the means of Si and Se data within the total period of observation. In reality, the
operation of a TP reflects a dynamic character; yet, due to the assumptions of steady
state conditions, there exists no definition of efficiency that accounts for the dynamic
or random character of the inputs and outputs of the TP.
There are only a few studies that focus on the variability of input/output
processes of TP in evaluating its performance. Weber and Juanico (1990) describe
in statistical terms that the effluents from a TP may be as variable as the input raw
water or sewage. They claim that the coefficient of variation is a better indicator
of input/output variability than the standard deviation as a representative statistic.
Ersoy (1992) and Tas (1993) arrived at similar results in comparing various statistical
parameters to describe random fluctuations of these processes. They also
investigated the relationships between input and output variables to express
operational efficiency so as to account for the dynamic character of TP. However,
none of the above studies have arrived at a precise relationship between the
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 179
variability of input/output processes and the efficiency of the TP. Tai and Goda
(1980 and 1985) define "thermodynamic efficiency" on the basis of thermodynamic
entropy and show that the efficiency of a water treatment system can be described
by the rate of decrease in the entropy of polluted water. They also relate the
reduction in thermodynamic entropy to entropy of discrete information conveyed by
the output process of a TP.
The study presented considers the use of the informational entropy concept as
a measure of uncertainty in evaluating the variability of both the input and the
output processes of a TP. Furthermore, the entropy concept is also proposed here
to define a "dynamic efficiency index" (DEI) as the rate of reduction in input
uncertainty or entropy to arrive at a minimum amount of uncertainty in the outputs.
In the ideal case, the TP is expected to produce outputs that comply with an effluent
standard, the value of which is generally constant. In terms of entropy, this constant
indicates a value of zero entropy. In practice, though, the outputs will fluctuate
around the standard, and the TP is expected to reduce the variability of the output
(effluent) so that such fluctuations are kept below the standard value. In entropy
terms again, this indicates that the uncertainty of the effluents should be minimum
or that it should approach zero. Then the performance of the TP can be evaluated
by means of the DEI which measures the rate of reduction in uncertainty (entropy)
of the inputs so that it approaches zero, or in practical terms, a minimum value for
the entropy of the outputs. The term "dynamic" is used here to indicate that
efficiency is expressed on the basis of variability of input/output processes by using
entropy measures of uncertainty.
The approach described above is demonstrated in the case of an existing TP of
a paper factory, for which input and output data on BOD, COD, and TSS
concentrations are available. The results of the application indicate the need for
further investigations on the subject, particularly in relating the informational
entropy concept to the dynamic processes of treatment.
APPLIED METHODOLOGY
N
H(X) =K ~ p(x.,) log [l/p(x.,)] (2)
n=l
with K =1 if H(X) is expressed in napiers for logarithms to the base e. H(X) gives
a single value for the information content of X and is called the "marginal entropy"
of X, which always assumes positive values within the limits 0 and log N.
When two random processes X and Y occur at the same time, stochastically
independent of each other, the total amount of uncertainty they impart or the total
amount of information they may convey is the sum of their marginal entropies
(Harmancioglu, 1981):
N N
H(XIY) = -K ~ ~ p(xn'Yn).log p(xn IYn) (4)
n=l n=l
N N
H(YIX) =-K ~ ~ p(xn'Yn).log P(Yn Ixn) (5)
n=l n=l
where p(x."yn) (n = 1,... ,N) define the joint probabilities and p(x.,1 yn) or p(yn Ixn) the
conditional probabilities of the values x., and Yn. The conditional entropy H(X I Y)
defines the amount of uncertainty that still remains in X, even if Y is known; and
the same amount of information can be gained by observing X.
If the variables X and Yare stochastically dependent, the total entropy is
expressed as (Schultze, 1969; Harmancioglu, 1981):
The total entropy H(X,Y) of dependent X and Y will be less than the total entropy
if the processes were independent:
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 181
In this case, H(X, Y) represents the joint entropy of X and Y and is a function of
their joint probabilities:
N N
H(X,Y) =K~ ~ p(xn'Yn) log [l!p(xn'Yn)] (9)
n=l n=l
The difference between the total and the joint entropy is equal to another concept
of entropy called "transinformation":
Transinformation, like the other concepts of entropy, always assumes positive values
and equals 0 when two processes are independent of each other.
For continuous density functions, p(xn) is approximated as [f(xn). Ax] for small
t.x, where f("n) is the relative class frequency and /lx, the length of class intervals
(Amorocho and Espildora, 1973). Then the marginal entropy for an assumed
density function f(xn) is:
+00
_00
and the joint entropy for a given bivariate density function f(x,y) is:
182 N.ALPASLAN
IJ
+00 +00
-00 -00
H(X IY; Ax) = JJ f(x,y) log [I/f(x Iy)] duly + 10g[I/Ax] (15)
.00 .00
The variability and, therefore, the uncertainty of input and output processes of a TP
can be measured by the entropy method in quantitative terms. This can be realized
by computing the marginal entropy of each process by Eq.(13) under the assumption
of a particular probability density function. Representing the input process by ~
and the output process by Xe, marginal entropies H(Xi ) and H(Xe) can be expressed
in specific units to describe the uncertainty prevailing in such processes. It must be
noted here that the level gf uncertainty obtained is relative to the selected
discretizing interval x of Eq.(13). Such relativity in entropy measures may appear
to be inconvenient in assessment of uncertainty (Harmancioglu et aI., 1993).
However, since the objective here is to compare the uncertainty of inputs with that
of the outputs, the problem is sufficiently handled when the same x is used for
both processes. In this case, uncertainty of both processes is rated with respect to
the same reference level of uncertainty defined by log (1/ x). Likewise, marginal
entropies of different water quality variables can be compared with each other by
keeping the reference uncertainty level constant for all variables.
As stated earlier, a treatment system is expected to reduce the variability of
inputs to produce an effluent with stable characteristics. This implies that the most
efficient operation of a TP is realized when maximum reduction in the uncertainty
or variability of the inputs is achieved to arrive at a minimum amount of uncertainty
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 183
in the outputs. In entropy terms, the uncertainty of the effuents Xe, or H(Xe)'
should be minimized. Accordingly, DEI can be defined as the rate of reduction in
uncertainty (entropy) of the inputs, H(Xi ), so that it approaches a minimum value
for the entropy of the outputs, H(Xe). Such a measure can be expressed in (%) as:
In essence, the above definition involves the first requirement for efficient treatment,
namely that the difference between input/output variability must be maximized.
This difference is actually an indicator of treatment capacity such that if it reflects
low efficiency, then the process or design parameters of the TP may have to be
changed to increase the DEI. Furthermore, calculation of the DEI for different
values of different process parameters can help to identify those parameters which
significantly affect the treatment system. The TP may be considered sensitive to
those values of particular param6ters which lead to maximum reductions in H(Xi ).
Then, such parameters will need to be more strictly observed throughout monitoring
procedures for both the design and the operation of the TP.
The above approach to assessment of efficiency appears to comply with the
thermodynamic efficiency definition given by Tai and Goda (1980 and 1985). As
mentioned earlier, their description of efficiency refers to the decrease in the
thermodynamic entropy of polluted water, where the treated media moves from a
state of disorder to order. The terms "order" and "disorder" are analogical in both
the thermodynamic and the informational system considered. In the former,
"disorder" refers to thermodynamic disorder or pollution, which can be measured by
the thermodynamic entropy of the system. In the latter, "disorder" indicates high
variability in the system, again quantified by entropy measures albeit in an
informational context. Accordingly, the two efficiency definitions, one given by Tai
and Goda (1980 and 1985) and the other presented here, are similar in concept; the
major difference between them is that the former is given in a thermodynamic
framework, whereas the latter is presented on a probabilistic basis.
The second requirement for effective treatment is recognized as the insensitivity
of effluents with respect to the inputs. That is, the correlation between the inputs
and the outputs is required to be a minimum for a reliable treatment system. Such
a requirement may also be considered as an indicator of TP efficiency. Entropy
measures can again be employed to investigate the relationship between
input/output processes. In this case, conditional entropies in the form of H(XJXi)
have to be computed as in Eq.(15). The condition H(XJXi)=H(Xe) indicates that
the outputs are independent of the inputs and, consequently, that the TP is effective
in processing the inputs. Otherwise, if inputs and outputs are found to be correlated
with H(XeIXi)IH(Xe), this implies that the effluents are sensitive to the inputs and
that the treatment system fails to effectively transform the inputs.
184 N.ALPASLAN
Another entropy measure of correlation between the input and the output
processes is transinformation T(Xi,Xe). If transinformation between the two
processes is zero, this indicates that they are independent of each other. In the case
of complete dependence, T(Xi,Xe) will be as high as the marginal entropy of one
of the processes.
APPLICATION
The above described methodology is applied to the case of Seka Dalaman Paper
Factory treatment plant in Turkey, for which input and output data on BOD
(biochemical oxygen demand), COD (chemical oxygen demand), and TSS (total
suspended solids) concentrations are available. The Seka treatment plant, with its
physical and biological treatment units, is designed to process wastewaters from the
factory at a capacity of 4500 m3/day. Data on daily BOD, COD, and TSS
concentrations were obtained for the period between January 1989 and October
1991. The input variables were monitored at the main entrance canal to the
treatment plant, and the output concentrations were observed at the outlet of the
biological treatment lagoon. The available data sets have a few missing values as
monitoring was not performed on days of plant repair and maintenance.
First, input/output processes of the three variables are analyzed for their
statistical properties. Table 1 shows these characteristics together with the classical
efficiency parameter computed by Eq.(I) using the mean values of the inputs and
the outputs. According to these computations, the TP appears to process TSS more
efficiently than the other two variables; in fact, the efficiency drops down to 75% for
COD. It is also interesting to note in Table 1 that both the input and the output
processes of TSS reflect more uncertainty than BOD and COD if the coefficient of
variation, Cv' is considered as an indicator of variability. Furthermore, for high
levels of efficiency, the Cv of outputs is higher than that of the inputs as in the case
of BOD and TSS.
Next, the same data are investigated by means of entropy measures to assess
their variability and the efficiency of the TP. Table 2 shows the marginal entropies
H(~) and H(Xe) of the input/output processes, the conditional entropies H(XJXi)
of the effluents with respect to the inputs, transinformations T(Xi,Xe), joint entropies
H(~,Xe)' and finally the DEI of Eq.(16) for each variable. These entropy measures
are computed by Eqs.(13) through (15), assuming normal probability density function
for each variable. It may be observed from this table that the input TSS has the
highest variability (uncertainy or entropy) followed by COD and BOD. With respect
to the outputs, COD still shows high variability whereas the uncertainty (entropy)
of BOD and TSS are significantly reduced. Likewise, the joint entropy of inputs and
outputs is the highest for COD. These results show that the treatment processes
applied result in TSS having the highest reduction in input uncertainty and COD
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 185
having the lowest reduction. This feature is also reflected by the dynamic efficiency
index which reaches the highest value for TSS and the lowest for COD.
It is interesting to note that when the classical efficiency measure of Eq.(I)
gives values in the order of 96%, 91%, and 75% respectively for TSS, BOD, and
COD, the efficiencies defined by entropy measures on a probabilistic basis result in
the respective values 50%, 36%, and 25%. Although the two types of efficiencies
described do not achieve similar values, their relative values for each variable are
in the same order. That is, both types of efficiencies are the highest for TSS and
the lowest for COD with BOD in between. On the other hand, the DEI values
comply with the thermodynamic efficiency definition given by Tai and Goda (1980
and 1985). The DEI rates obtained here reflect the highest reductions in input
uncertainty or entropy under the prevailing operational procedures. The only
difference between the DEI presented here and the thermodynamic efficiency of Tai
and Goda (1980 and 1985) is that the DEI are expressed on the basis of a
probabilistic measure of entropy rather than on a thermodynamic basis.
The entropy measures shown in Table 2 also reveal the relationship between
inputs and outputs for each variable. For TSS, the conditional entropy of outputs
H(XA) is equal to the marginal entropy H(Xe),which indicates that the outputs are
186 N.ALPASLAN
CONCLUSION
The input and output processes of a TP fluctuate significantly with time. This
variability is often insufficiently recognized in both the design and operation of
treatment systems. The study presented proposes the use of entropy measures of
information in the assessment of input/output uncertainty of TP. Such measures
help to identify how variable or how steady the inputs or the outputs are so that
both the design parameters and the operational efficiency of a TP can be evaluated.
According to the approach applied, the operational efficiency of TP is assessed
by means of the "dynamic efficiency index" (DEI) which represents the rate of
reduction in the uncertainty of inputs to produce minimum amount of entropy or
uncertainty in the outputs. In essence, two requirements are foreseen for an
effective and reliable treatment system: (a) the highest reduction in input uncertainty
must be obtained; (b) the outputs must be insensitive to the inputs; that is, the
condition of independence must be satisfied. The entropy measures, as proposed
here, can be effectively used to assess whether these two requirements are met by
the operation of the TP.
The definition of TP efficiency on the basis of entropy measures has further
advantages, the most significant one being in sensitivity analyses of process
parameters. Such parameters in biological treatment, for instance, may be maximum
specific growth rate, decay rate, and yield coefficient, the values of which must be
selected for the design of TP. The TP system is either sensitive or insensitive to
these design parameters so that its efficiency is eventually affected by them. The
effects of these parameters are already recognized; however, the degree of
uncertainty they convey to the system are not well quantified. Values of these
parameters for design purposes are either taken from literature or determined by
laboratory analyses. Outputs from a simulation model of a TP may be observed
with respect to different parameter values and in calculating DEI for the system for
each case. The TP system may be considered sensitive to those values of
parameters which lead to maximum reductions in H(~). Then those parameters
will need to be more strictly observed throughout the data collection procedures for
the design as well as for the operation of TP.
As discussed earlier, the DEI definition proposed in the study is parallel to the
thermodynamic efficiency description of Tai and Goda (1980 and 1985), who express
188 N. ALPASLAN
REFERENCES
Tai, S. and Gada, T. (1980) "Water quality assessment using the theory of entropy",
in: M.J. Stiff (ed.), River Pollution Control, Ellis Horwood Publishers, ch.21,
pp.319-330.
Tai, S. and Gada, T. (1985) "Entropy analysis of water and wastewater treatment
processes", International Journal of Environmental Studies, Gordon and Breach
Science Publishers, vol.25, pp.13-21.
Tas, F. (1993) Definition of Dynamic Efficiency in Wastewater Treatment Plants (in
Turkish), Dokuz Eylul University, Izmir, Graduation Project in Environmental
Engineering (dir. by N.Alpaslan).
Weber, B. and Juanico, M. (1990) "Variability of effluent quality in a multi-step
complex for wastewater treatment and storage", Water Research, vo1.24, no.6,
pp.765-771.
INFILLING MISSING MONmLY STREAMFLOW DATA USING A
MULTIVARIATE APPROACH
Water resources planners and managers use historic monthly streamflow data for a
variety of purposes. Often, the data set is not complete and gaps may exist due to various
reasons. This paper develops and tests two computer models for infilling the missing
values of a segment. The first model utilizes data only from the series with a segment
of missing values, whereas the second model utilizes data from the series with a segment
of missing values as well as from other concurrent series without a segment of missing
values. These models are respectively referred to as the Auto-Series (A~) model and the
Cross-Series (CS) model. Both models utilize the concepts of seasonal segmentation and
cluster analysis in estimation of the missing values of a segment in a set of monthly
streamflows. The models are evaluated based on comparison of percent differences
between the estimated and the observed values as well as on entropic measures.
Results indicate that the AS model provides adequate predictions for missing
values in the normal range of flows, but is less reliable during extreme (high) range of
flows. Whereas, the results from the CS model indicate that the use of another
concurrent set of streamflow data enhances predictions for all ranges of flows.
INTRODUCTION
In the past, various approaches have been used for infilling missing values in monthly
streamflow data [panu (1992)], and among them, the most commonly used are the
regression approach and the multivariate approach. One multivariate approach
incorporating the concept of segmentation of data series into seasonal segments (or
groups) was suggested by Panu and Unny (1980) and later developed by Afza and
Panu (1991). This approach utilizing the characteristics of segmented data for infilling
missing values of a segment has a distinct advantage over the regression approach. The
191
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 191-202.
© 1994 Kluwer Academic Publishers.
192 C. GOODIER AND U. PANU
latter approach treats each data point as an individual value, while the former approach
utilizes the group characteristics of similar data values. Based on such consideration of
values in groups, the missing values can be in filled as a whole group rather than as
individual values. The multivariate approach and the regression approach are
conceptually presented in Figure 1.
HIGH FLOW
0 6 12 18 24 42
l1ME IN MONTHS
1200
MISSING
1(XX) (b) REGRESSION APPROACH DATA
EACH POINT
if; 8:X)
FILLED
<
.s em
INDMDUALLY
3:
0
..J
LL. 400
c:J
200
6 12 18 24 3J 36 42
11ME IN MONTHS
Figure 1. Data Infilling Approaches: (a) Multivariate Approach and (b) Regression
Approach.
One problem with the regression approach is the diverging confidence limits for
subsequent estimates as more reliance is given to the most recent estimate of a previously
INFILLING MISSING MONTHLY STREAMFLOW DATA 193
unknown value. On the other hand, the multivariate approach has constant confidence
limits for the segment. The model development based on the multivariate approach
follows.
MODEL DEVEWPMENT
The development of the AS and the CS models is summarized in the form of a flow chart
in Figure 2.
DETERMINE
SEASONAL SEGMENTATION
The first step in developing the models is the determination of seasonal segments in the
data series. These seasonal segments, as described by Panu (1992), form the pattern
vectors. The correlogram and the periodogram are used to infer seasonality for both the
models. The AS model requires segmentation of the data series with missing values.
Whereas, the CS model requires segmentation of both, the concurrent data and the data
series with missing values.
The next step involves testing for multivariate normality of pattern vectors. In
order to test for multivariate normality, the ranked Mahalanobis Distance is plotted
against the theoretical X2 values corresponding to different probabilities. If the pattern
vectors exhibit signs of non-normality, transformations are applied to the vectors until
194 C. GOODIER AND U. PANU
AS Model: This model assumes lag-one Markovian structure for the inter-pattern
relationships and in turn estimates the missing values based on the pattern vector
occurring immediately prior to the gap [Figure 3].
=>
1t JJ
The missing segment in the sequence of streamflows is designated as Sk' The pattern
vector in segment Sk_l is used to estimate the missing values in segment Sk' The inter-
pattern structure takes into account the Markovian transitions from other segments,
similar to Sk_l and Sk' which were identified by the clustering technique.
Johnson and Wichern (1988) suggest that the conditional mean and covariance of
the segment Sk can be determined given that the segment Sk-l has occurred. In this paper,
the conditional mean and covariance are considered sufficient statistics to describe the
missing pattern vector for Sk and their formulation is explained as follows.
Let S= [Sk ISk-lf be distributed as multivariate normal and denoted by Nil' ,I:) ,
for d ;::: 2, where
and
The determinant of the partitioned section I:k-1,k-l must be greater than zero, i.e. II:k_l,k_
I I > O. Given that the segment Sk.l has occurred, the conditional mean and covariance of
missing segment Sk' is given by:
INFILLING MISSING MONTHLY STREAMFLOW DATA 195
The mean of St, as given above, is considered adequate vector to represent the missing
segment St.
The utility of this model is limited [panu (1992)] due to its dependence solely on
the information contained in the data set with the missing values. The development of the
CS model overcomes this difficulty, as presented below, by using additional information
contained in the concurrent data set.
CS Model: This model assumes cross-correlation between the data set with missing
values and the concurrent data set. Both these data sets are respectively referred to as the
subject river and the base river. It is noted that the base river could be any other data set
(precipitation, streamflow, etc.), but it is simply referred to as the base river. The CS
model is conceptually presented in Figure 4.
GAP
1t
The missing values in segment St from the subject river are infilled based on the
observed pattern vector in segment Sbt from the base river. The cross-pattern
relationships in the CS model take into account the transitions from segments with similar
characteristics (identified by the clustering technique) from Sht to segment St. The
conditional mean and covariance of St can be determined based on the considerations
outlined above and the observed pattern vector in the segment Sbk [Johnson and Wichern
(1988)].
The formulation of conditional mean and covariance for the CS model is similar
to that of the AS model with the exceptions as follows. The pattern vector for the
segment Sbk is substituted for Stott and all subsequent occurrences of terms with
subscripts (k-l) are substituted with the similar terms from the base river, i.e. substitute
196 c. GOODIER AND U. PANU
The streamflow gauging station (05QAOO1) of English River at Sioux Lookout, Ontario,
was considered to be the site with missing values. An uninterrupted set of unregulated
streamflows is available for this site from 1922 to 1981. Another streamflow gauging
station (05QAOO2) of English River at Umfreville is located upstream of Sioux Lookout
station. Flow values for this station are available from 1922 to 1990. This station is used
as the base river in the CS model. Precipitation data at Sioux Lookout Airport is
available for the period of 1963 to 1990. The precipitation data is used as another base
river in the CS model. Since concurrent data from 1963 to 1981 is available for all three
data sources, this 18 years of data is used in the application.
For the application of the AS and the CS models to the monthly streamflow data
of the English River at Sioux Lookout, the seasonality was inferred, from the
correlogram and the periodogram, to be two six-month seasons or one twelve-month
season. The starting and ending months, respectively, were determined for the six-month
dry season to be November and April, and for the wet season to be May and October.
On one hand, the presence of a single twelve-month season was inferred for precipitation
data at Sioux Lookout Airport. On the other hand, the presence of two six-month seasons
or one twelve-month season was assessed for streamflows of the English River at
Umfreville. Experimental runs were conducted for both the models using two six-month
seasons or one twelve-month season.
The multivariate normality of the pattern vectors was best achieved by using
natural log transformation. Clustering of the segmented data was performed based on the
assumption that there were two clusters in each season. Results incorporating the
clustering technique showed only minor deviations from those obtained without using
sub-clusters. For brevity, only the results without using sub-clusters are presented in this
paper. Both the models are applied to in fill a missing segment based on the assumption
that such a segment occurs sequentially over the entire length of the data series.
Three methods of analysis were used to examine the results; graphical, statistical, and
entropic. As well, a comparison was made on the results obtained by inftlling the missing
values: using the mean, minimum or maximum value for each month. Plots of the results
for both the models are presented in Figure 5.
INFILLING MISSING MONTHLY STREAMFLOW DATA 197
Observed Flow
InfiIledFlow
Oct-71
InIilled Flow
InfiliedAow
Graphical Analysis: An examination of the results from the AS model indicates that the
estimated values closely follow the observed values for most years, but entail larger error
in case of extreme flows. This would be expected since the estimated values are based
only on the flow values that have occurred during other years in the data series. To
overcome this difficulty, one could use a multivariate random number generator to
estimate the missing values such that the conditional mean and covariance of the
estimated values are same as those of the data series.
The estimated values from the CS model, using precipitation as the concurrent
data, show deviations from the observed flow in many cases. This could be due to the
effects of snowfall and spring runoff. During winter months, precipitation falls as snow,
and does not entail any effect in streamflows until spring-thaw. This lag period is
variable, and has an influence on the structure of covariance matrix in the CS model.
Future development of the model could include a procedure to account for the variability
in snowmelt phenomenon.
Further examination of the results from the CS model, using another streamflow
data as the concurrent data, indicates that the estimated values follow very closely the
observed values for the entire range of flows. This would be expected, since both sets
of data are in the same watershed, and the outcome of hydrologic events could be similar
at both gauging stations.
Statistical Analysis: The percent differences (positive or negative) between the estimated
and the observed values are given in Table 1. These results were obtained separately for
two cases: with and without the use of sub-clusters in each season. The results with the
use of sub-clusters exhibit only minor deviations from the results obtained without using
sub-clusters. It is in this vein that only the results without using sub-clusters are
INFILLING MISSING MONTHLY STREAMFLOW DATA 199
presented in this paper. Similar statistics are also included in the table for infilling
missing values by using the mean, minimum or maximum value for each month.
In Table 1, the least error occurs when a concurrent set of streamflow data is used in the
CS model. However, the error is much larger for this model when other concurrent data
such as precipitation is used. On the other hand, in such cases, use of the AS model
entails smaller error and would be the obvious choice for data infilling.
Other currently popular methods of replacing the missing values by such ad hoc values
as mean, minimum, or maximum value entail too large an error [Table 1]. The results
of our analyses indicate that one should avoid [panu and Mclarty (1991)] the use of such
ad hoc methods of data infilling.
Both of the models are further assessed for their effectiveness in reducing the
uncertainty associated with infllied data values. Entropic measures, as explained in the
Appendix, are used in such an assessment.
Entropic Analysis: The results obtained for various entropic measures related to both
the models are summarized in Table 2. From the above results, it is apparent that the
maximum reduction in entropy (58 %) occurs for the CS model, when using streamflows
of English River at Sioux Lookout, and at Umfreville. A large reduction in entropy also
results, when seasonal segments are grouped into sub-clusters. This reduction in
uncertainty is gained due to exclusion of certain clusters, once the occurrence of a
particular season is known.
200 C. GOODIER AND U. PANU
CONCLUSIONS
The AS model is found satisfactory in estimation of the missing values in normal range
of streamflows but performs inadequately in case of extreme (high) flows. Statistical
results show that the error in the estimated values could range from -53% to +84%.
Entropic analysis indicates a small reduction (three percent) in entropy, when considering
the system as Markovian as opposed to random. In other words, the assumption for inter-
pattern structure to be of Markovian type does not appear valid for infilling purposes.
The CS model using precipitation as another concurrent data provides widely
varying estimates of missing values. In terms of percent error, the variability has been
found to range from -59 % to + 198 %. Entropic analysis for such data sets indicates that
only a small reduction in entropy (eight percent) is achieved. Such a small reduction in
entropy is an indicator that there exists a poor cross-correlation between the streamflow
data and the precipitation data.
The CS model is found to perform adequately in estimation of the missing values
with the use of concurrent streamflow data from a nearby station. The estimates of
missing values have been found satisfactory in average flow range as well as extreme
(high) flow range. The percent error in the estimated values ranges from -29 % to + 29 %.
Entropic analysis indicates that a reduction of 58% in entropy is achieved. In other
words, the use of concurrent data exhibiting high cross-correlation with streamflows data
having missing values, provides satisfactory estimates of the missing values.
ACKNOWLEDGEMENT
The financial support provided by the Natural Sciences and Engineering Research
Council of Canada is gratefully acknowledged in conducting various aspects of this
investigation. The computational efforts by C. Goodier are especially appreciated.
REFERENCES
Afza, N. and U.S. Panu (1991) Infilling of Missing Data Values in Monthly
Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead
University, Thunder Bay, Ontario.
Domenico, P.(1972) Concepts and Models in Groundwater Hydrology,
McGraw-Hill, San Francisco, U.S.A.
Hartigan, J. and M. Wong (1979) "Algorithm AS 136: A K-Means Clustering
Algorithm", Applied Statistics, 28, 100-108.
Johnson, R.A. and D.W. Wichern (1988) Applied Multivariate Statistical Analysis,
Prentice Hall, New Jersey.
Khinchin, A.I. (1957) Mathematical Foundations of Information Theory,Dover
Publications Inc., New York.
INFILLING MISSING MONTHLY STREAMFLOW DATA 201
Panu, U.S, and T.E. Unny (1980) "Stochastic Synthesis of Hydrologic Data Based on
Concepts of Pattern Recognition", Journal of Hydrology, 46, 5-34, 197-217, 219-237.
Panu, U.S. and B. McLarty (1991) Evaluations of Quick Data Infilling Methods in
Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead
University, Thunder Bay, Ontario.
Panu, U.S. (1992) "Application of Some Entropic Measures in Hydrologic Data Filling
Procedures", Entropy and Energy Dissipation in Water Resources, Kluwer Academic
Publishers, Netherlands, 175-192.
Shannon, C.E. (1948) "The Mathematical Theory of Communication", Bell System
Technical Journal, 27, 379-428; 623-656.
The entropy of a system is a measure of its degree of disorder. Shannon (1948) first
applied the concept of entropy to measure the information content of a system. Khinchin
(1957) reports on Shannon's entropy in dealing with a finite scheme, which is applicable
to a hydrologic data series. Entropy (H) as measure of uncertainty of a system is defined
as follows:
II
H(Pt,p2,···,pIl) = - E Pk In(Pk)
k=t
Where, n is the number of states and p" is the probability of the ICh state in a fmite
scheme.
The maximum entropy occurs when all outcomes are equally likely. For a series
of n equally likely events, the probability of each event is lin, and the maximum entropy
of the system is obtained as follows.
HJI/IJU. = In(n)
While grouping the segmented data into clusters, an adjustment must be made to the
definition of entropy (H) to account for the clusters. The entropy for clustering, He, can
be computed as given below.
w lit
Hc = - E
kal
p(sJ E
c-l
P(Ck) In[p(cJ]
Where, w is the total number of seasons per year, nil. is the total number of clusters in
any season k, p(cJ is the probability of cluster c in season k, and p(sJ is the probability
of season k.
The value of entropy for clustering does not take into account the ordering of
202 c. GOODIER AND U. PANU
clusters. That is, the clusters could have occurred in any order, and still the value of
entropy would be same. To further look into the effect of ordering (Le., the dependence
among clusters), the entropy of a Markov chain as applicable to the AS model is
examined.
Entropy of a Markov Chain (AS Model): Domenico (1972) describes the entropy of
a Markov chain (H...) as the average of all the individual entropies of the transitions (H0,
weighted in accordance with the probability of occurrence of the individual states. It is
noted that there are as many states as there are clusters. The Markovian entropy can be
expressed as follows.
II
H .. = -E
i=1
PiHj
Where, n is the number of states and Pi is the probability of occurrence of the state i.
A measure of reduction in uncertainty can be obtained by taking the difference
between He and Hm. In other words, the clustered data (Le., the dependence among
clusters) is treated as Markovian rather than random. For the CS model, the entropy of
a combined system consisting of two related systems must be used rather than the
Markovian entropy.
Entropy of a Combined System [CS Model]: Domenico (1972) suggests that entropic
measures can be applied to situations where two related systems are observed. The
applicability of such an entropic measure to the CS model is apparent. That is, two sets
of observed data are available for the CS model, namely, streamflow data with a missing
segment, and concurrent data with no missing segment.
The measure of entropy of one system (say, the system X, be the streamflow data
with the missing segment), given the knowledge of the observations in the other system
(say, the system Y, be the concurrent data without the missing segment),is obtained as
follows.
II II
Where, P(Xi IYj) is the conditional probability of system X being in state Xj, given that
system Y is observed to be in state yj' and P(Xj,y) is the joint probability of Xj and Yj.
The original measure of entropy of system X, [H(X)] can be computed from II.:
for a clustered system. Thus, the measure of uncertainty reduced in X, after observing
Y, is obtained as follows.
H(X ~ Y) = He - H(XIY)
where, H(X ~ Y) represents the decrease in uncertainty in X, after observing Y.
PART IV
NEURAL NETWORKS
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION
In this paper, a new method to forecast runoff using neural networks (NNs) is
proposed and compared with the fuzzy inference method suggested previously by the
authors (Fujita and Zhu, 1992). We first develop a NN for off-line runoff prediction.
The results predicted by the NN depend on the characteristics of training sets. Next,
we develop a NN for on-line runoff prediction. The applicability of the NN to runoff
prediction is assessed by making 1-hr, 2-hr and 3-hr lead-time forecasts of runoff
in Butternut Creek, NY. The results indicate that using neural networks to forecast
runoff is rather promising. Finally, we employ an interval runoff prediction model
where the upper and lower bounds are determined using two neural networks. The
observed hydrograph lies well between the NN forecasts of upper and lower bounds.
INTRODUCTION
Neural network architectures are models of the neurons, they are not deterministic
programs, but learn examples. Through the learning of training sets which consist of
pairs of inputs and target outputs, neural networks iteratively adjust internal parameters
to the point where the networks can produce a meaningful answer in response to each
input. After the learning procedure is complete, information about relationship between
inputs and outputs, which may be non-linear and extremely complicated, is encoded
in the network.
As we know, the relationship between rainfall and runoff is non-linear and quite
complex due to many related factors such as field moisture capacity and evaporation
rate, etc. A mathematical definition of this kind of relationship is difficult, thus it
would be attractive to try the neural network approach which accommodates this kind
of problem.
The runoff prediction can be classified into several cases based on accessibility of
hydrological data. Table 1 shows the two cases considered in this paper, where
question mark "?" denotes the unknown future runoff that we are about to forecast.
If runoff information at every moment by the current time for the present flood is
205
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 205-216.
© 1994 Kluwer Academic Publishers.
206 M.-L. ZHU ET AL.
available, the authors call it the on-line case. On the contrary, if this infonnation is
not available for the present flood, the authors call it the off-line case.
To forecast runoff for the present flood, neural networks need first to learn about
previous flood events. The learning procedure is conducted using the back-propagation
XII 02,
input layer output layer
Figure 1. A fully interconnected, three-layer neural network.
Forward-propagation step:
This step calculates the output from each processing unit of the neural network by
starting from the input layer and propagating forward through the hidden layer to the
output layer. Each processing unit except the units in the input layer takes a weighted
sum of its inputs and applies a sigmoid function to compute its output. Specifically,
given an input vector [xl'~'''' ,xm] , the outputs from each layer are calculated in
this way:
First, the input layer is a special case. Each unit in this layer just sends the input
values as they are along all the output interconnections to the units in the next layer.
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 207
(3)
output layer:
"
s2 k=E 01j<o>2tj+02k k=1,2 ...p (4)
j=1
(5)
where olj, 02k are the outputs from unit j in the hidden layer and from unit k in the
output layer respectively. <o>lji is the interconnection weight between i-th unit in the
input layer and j-th unit in the hidden layer, and 6)2tj is the interconnection weight
between j-th unit in the hidden layer and unit k in the output layer. 01j , 02k are the
biases of unit j and unit k respectively. Function f, a sigmoid curve, is expressed by
(6), where x is defined over (-00, +00) so that the function values result in the
interval (0,+1).
1
f(x)=-- (6)
l+e-x
Backward-propagation step
The back-propagation step is an error-correction step which takes place after the
forward-propagation step is completed. The calculation begins at the output layer and
progresses backward through the hidden layer to the input layer. Specifically, the
output value from each processing unit in the output layer is compared to the target
output specified in the training set. Based on the difference between output and target
output, an error value is computed for each unit in the output layer, then the weights
are adjusted for all of the interconnections that go into the output layer. Next, an
error value is calculated for all of the units in the hidden layer and the weights are
adjusted for all of the interconnections that go into the hidden layer. The following
equations indicate this error-correction step explicitly.
For unit k in the output layer, its error value is computed as:
ak=(tk -02k)*!'(s2J (7)
where tk=the target value of unit k in the output layer
/(s2 k) =the derivative of sigmoid function for S2k
208 M.-L. ZHU ET AL.
The interconnection weights going into unit k in the output layer then are adjusted
based on the l> k value as follows:
(iJ2lq~new)=(iJ2kj(old)+T\ l>kolj (8)
where T\ is the learning rate. And the bias Ok for unit k in the output layer is
corrected as follows:
(9)
For unit j in the hidden layer, its error value is computed as:
p
l>j=[L l)k(iJkj] !'(slj) (10)
k=1
The interconnection weight c.>ji which goes to unit j in the hidden layer from unit
i in the input layer is then corrected as follows:
(11)
and the bias for unit j in the hidden layer is corrected as:
Olinew)=Oli°ld)+T\l>j (12)
In the case of off-line runoff prediction, the runoff data for the present flood is
inaccessible. This limitation determines that the NN should be developed to infer the
future runoff based on only rainfall data. The runoff system equation may be
expressed as:
Q(t)=f fR(t-l),R(t-l-l) ...R(t-d)} (13)
where R, Q, t denote rainfall, runoff and time respectively, and 1, d are two
parameters reflecting the hydrological characteristic of the basin.
As mentioned previously, when a NN is used to address a complex non-linear
problem, such as the relationship between rainfall and runoff, it must first learn from
a training set so that a desired output will be produced. The trained NN is generalized
to deal with inputs which may not be included in the training set. The validity of the
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 209
trained NN naturally depends on the characteristic of the selected training set. In order
to make accurate runoff forecasts, it is important to understand the dependence of the
NN on the training set. Since it is not easy to obtain various kinds of flood data from
observations to examine this dependence, we employed the storage function method
to simulate flood data arbitrarily. Equation (14) shows the basic equation of the
storage function method where R. Q, L denote rainfall, runoff and lag-time
respectively, and K, P denote the storage coefficient and storage exponent
respectively.
0 0 0
5 5 5
10 Tr=15~r) 10 10
-e rp=10 mm/hr) -e -e
~ Tp=8(hr)
~ ~
~ ..5.
II:
..5.
II:
2 2 2
Figure 3. The two floods Figure 4. The first flood Figure 5. The second flood
as training data. for validation. for validation.
~~~~~~~~~o
5
Tr=15(hr) 10
I
10
f
rp=10(mm/hr)
Tp=8(hr)
If
2
o 10 20 30 40 50 r)
Figure 6. The four floods Figure 7. The first flood Figure 8. The second flood
as training data. for validation. for validation.
0 0
5 5
Tr=10(hr)
E 10 ~10(mm/hr) 10
I
-e p=8(hr)
~ ~E
If ~
2 2 2
Figure 9. The third floods Figure 10. The forth flood Figure 11. The fifth flood
for validation. for validation. for validation.
works well for these five floods. Although the rainfall input profiles in the validation
data were completely different from those used in the training data, their durations of
rainfall Tr's, their peak rainfall intensities rp's as well as their time to peak rainfall
intensity Tp's could be seen to fall within the ranges of rainfall inputs used in the
training data. This characteristic is here defined as interpolation, while the contrary
is defined as extrapolation. By comparing Figure 5 with Figure 8, we can see that the
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 211
In the case of on-line runoff prediction, current runoff data is accessible for the
present flood. Therefore, we may express the runoff system equation as:
(15)
ll.Q(t)=/ fR(t-l) ...R(t-m),ll.Q(t-l) ... ll.Q(t-n)}
magnitude under a positive input of 11Q(t) when flow turns from an increasing to a
decreasing stage. Furthermore, a larger positive prediction error may be caused in the
further lead-time forecasts since the predicted values are used as indicated in the
prediction algorithm in (19), (20).
0 0 0
"C" "C" "C"
5i 5i 51a:
10 10 10
-c3 -.::3
~ ~
~ 0~
Figure 14. 1-hr lead- Figure 15. 2-hr lead-time Figure 16. 3-hr lead-
time forecast results. forecast results. time forecast results.
The authors have previously studied the forecasting of runoff using the fuzzy
inference method. The method first establishes the fuzzy relation between rainfall and
runoff based on previous flood data; then it forecasts the future runoff for the present
flood through making a fuzzy reasoning based on the above obtained fuzzy relation.
The method was applied to forecast the same flood events in Butternut Creek. Figures
17, 18, 19, as examples, show the 1-hr, 2-hr and 3-hr lead-time forecast results for
214 M.-L. ZHU ET AL.
the flood in Oct. 20, 1976 respectively. Table 3 presents the evaluation results for
these forecasts based on the same criteria as shown in Table 2.
~~"r-~~~~O
51
-e-
Lead llme=2hr
10
Figure 17. 1-hr lead- Figure 18. 2-hr lead-time Figure 19. 3-hr lead-
time forecast results. forecast results. time forecast results.
Comparing Table 2 with Table 3 and Figures 14, 15, 16 with Figures 17, 18, 19,
we can see that the prediction accuracy of the NN method shows slightly better
performance than that of the fuzzy inference method. Besides, the calculation time of
the NN method for forecasting runoff is extremely short, since the time-consuming
job of tuning the internal parameters of the NN can be executed in advance. However,
when we try to make a long lead-time forecast of runoff, predicted information for
rainfall is needed. At the present level of technology , this information is provided
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 215
Interval prediction
In principle, the more training sets a NN learns from, the more accurate output it will
yield. These training sets should consist of various types of floods. On the other hand,
such various types of floods as training sets make it difficult for a NN to converge
to a desired state. To avoid this difficulty, we may employ a modified learning
algorithm proposed by Ishibuchi which was originally developed for determining the
upper and lower bounds of a nonlinear interval model (Ishibuchi and Tanaka, 1991).
The modified learning algorithm is the same as the back-propagation learning
algorithm, except that it introduces a coefficient C to (7)
(21)
a1 =c*(t1 -02J*!(s2J
First, in the learning procedure, if the output value 021 from NN is more than or
equal to target value t1 , let C take a very small positive value a, and thus the
internal parameters of NN such as interconnection weights and the biases whose
corrections are based on the magnitude of error value a1 will be changed only
slightly. On the other hand, If 021 is less than t1 , let C take 1, and thus the internal
parameters will be corrected substantially. As a result, through presenting the training
set to NN iteratively, the outputs from NN fmally will be expected to be all larger
than or equal to the target values. The NN trained in this way is defined as NN·.
Second, let C take its value just contrary to the above case. And thus, after the
learning is completed, the outputs from NN will be expected to be all smaller than
or equal to the target values. The NN trained in this way is defined as NN•.
We consider that this modified learning algorithm may be applied to make an
interval prediction of runoff where the lower and upper bounds of interval are
provided by NN. and NN· respectively. Especially, when a NN fails to achieve a
convergence state for learning various types of floods simultaneously, applications of
NN. and NN· may be meaningful since NN. and NN· may automatically focus on
small runoff cases and large runoff cases respectively.
Again, we chose the Butternut Creek for calculation, and adopted the same runoff
system equation and the same structure of neural network as stated in subsection (2).
Using the above modified learning algorithm, we obtained two different neural
networks NN. and NN· through learning the same training set which consists of the
first two floods in this basin shown in Figure 13. In the calculation, a was set at
0.01. The trained NN. and NN· then were used to carry out the forecast task for the
216 M.-L. ZHU ET AL.
Several neural networks have been developed to forecast runoff in three manners, an
off-line prediction, an on-line prediction and an interval prediction. The dependence
of the NNs' performance on the training set for the off-line prediction was discussed
and this work helped us understand how to construct an adequate training set.
However, for the off-line prediction, further study about the application to actual
basins is still needed. On the other hand, the method for the on-line prediction
appeared very applicable. The interval runoff prediction model, which consisted of two
neural networks provided a way to estimate the upper and lower bounds' of a flood.
REFERENCES
Fujita, M. and Zhu, M.-L. (1992) "An Application of Fuzzy Theory to Runoff
Prediction", Procs. of the Sixth IAHR International Symposium on Stochastic
Hydraulics, 727-734.
Rumelhart, McClelland and the PDP Research Group (1986) Parallel Distributed
Processing, MIT Press, Cambridge, Vol.1, 318-362.
Zhu, M.-L. and Fujita, M. (1993) "A Comparison of Fuzzy Inference Method and
Neural Network Method for Runoff Prediction", Proceedings of Hydraulic
Engineering, JSCE. Vol.37, 75-80.
Zhu, M.-L. and Fujita, M. (1994) "Long Lead Time Forecast of Runoff Using Fuzzy
Reasoning Method", Journal of Japan Society of Hydrology & water resources, Vol.7,
No.2.
Ishibuchi, H. and Tanaka, H.(1991) "Determination of Fuzzy Regression Model by
Neural Networks", Fuzzy Engineering toward Human Friendly Systems, Procs. of
the International Fuzzy Engineering Symposium'91, Vol.1, 523-534.
PREDICTION OF DAILY WATER DEMANDS
BY NEURAL NETWORKS
In this paper a new approach based on the artificial neural network model is pro-
posed for the prediction of daily water demands. The approach is compared with the
conventional ones and the results show that the new approach is more reliable and
more effective. The fluctuation analysis of daily water demands and the sensitivity
analysis of exogenous variables have also been carried out by taking advantage of the
ability of neural network models handling with non-linear problems.
INTRODUCTION
Xl~ X=Wl"Xl+W2"X2+W3"X3+8
~ Input
Inpu t Layer i
Hidden Layer j
Output Layer k
~ ,
Output I difference 1- -------'
TeacQsignal
tively. When an input {Ii, i = 1,2,···, N} is given to the units of the input layer,
the inputs and outputs of the hidden layer and output layer units are represented by
The inputs to the neural network model for the daily water demand prediction prob-
lem are the exogenous variables which cause the fiactuation of daily water demands.
According to Zhang, et al. (1992) and Yamada et al. (1992) the main factors are the
following five exogenous variables:
1) last day's delivery
2) daily high temperature
3) weather
4) precipitation
5) day's characteristics.
220 S. P. ZHANG ET AL.
The forecasts of daily high temperature, weather and precipitation are available
from the weather report presented in the morning or in the last evening by the
meteorological observatory. For the model user's convenience and considering the
accuracy of weather report, we treated weather and precipitation as discrete variables,
i.e., weather is classified as sunny, cloudy and rainy, and precipitation is classified
as three ranks, i.e., [Ommjday,lmmjday), [lmmjday,5mmjday) and [5mmjd,00).
For the day's characteristics, weekday and sunday (including National Holiday) are
distinguished.
The variable values as inputs to the neural network model are defined as follows.
1). The last day's delivery is transformed into a variable which belongs to (0,1) by
1
11 = (6)
1 + e-a(Q-Q)
------:---==:-
where II is the transformed last day's delivery, Q the real last day's delivery, Q the
mean of delivery records, and (Y a parameter for guaranteeing that the transformed
daily delivery record series {Iit), t = 1,2" . " T} is a uniform distribution, which is
very important to improve the accuracy of prediction.
2). The daily high temperature is also transformed in the same way to the last
day's delivery, that is,
1
12 = -----=- (7)
1 + e-{J(T-T)
where 12 is the transformed high temperature, T the real daily high temperature,
T the mean of daily high temperature records, and f3 a parameter for guaranteeing
that the transformed daily high temperature record series {I~t), t = 1,2, ... ,T} is a
uniform distribution.
3). Weather is quantified as follows.
Identifying the composition of the neural network model means deciding layer number
of network and unit number of each layer. In this study, the input layer has five units
corresponding to the five exogenous variables, and the output layer has only one unit
corresponding to the prediction of daily water demand.
The layer number has been set to 3 and the unit number of the only hidden
layer has been decided to be 17 by the following procedure , which is based on the
philosophy that a simpler structure is a better structure (Zhang et al., 1992).
Step 1: Set the unit number j=l.
Step 2: Train the neural network with the learning procedure (see the next section)
until the difference of the outputs between successive iteretions is within a specified
error.
Step 3: Calculate the mean relative error of outputs.
Step 4: Set j=j+l and repeat the above steps until the mean relative error is less
then a specified expectation of prediction relative error.
for the connection weights and threshold values, respectively. Then the outputs
corresponding the inputs of the teacher data {I?), ... , IW,
t = 1,2"", T} can be
obtained from (1) to (5). Let the outputs as {U(t), t = 1,2"", T} . It is easy to
understand that {U(t), t = 1,2, ... ,T} are different from the outputs of the teacher
data {a(t), t = 1,2, ... , T}, and an error function can be defined as follows.
T
R(O) = ~] a(t) _ U(t))2 (11)
t=1
It is clear that R(O) is a function of connection weights and threshold values. The
Error Back Propagation algorism makes use of the connection weights and threshold
values that minimize the etror function R(O), and a nonlinear programming method as
well as an iteration process are applied to solve the optimization problem and obtain
the optimal (sometimes suboptimal) connection weights and threshold values. In this
study the steepest descent method is used. The final iteration (learning) procedures
are as follows.
T
(k+I)
wJ wy) -1]' l:(8(t) 'lj(t)) (12)
t=1
T
e(k+I) = e(k) - 1] . 2:( 8(t)) (13)
t=1
T
(k+l)
w, w~) - 1]' l:(8(t) . W)k+l) . "I)t) . I;(t)) (14)
'J
t=1
With the learned (identified) neural network model, the daily water demands from
April, 1990 to March, 1991 have been predicted and compared with the records(Figure
2). The relative error distribution of the predictions is shown in Figure .3. It can be
seen that the relative error is less than .5% for .3.39 days of the year. The five days
when the relative error is greater than 10% are a special holiday, the New Year day,
and three typhoon-striking days. In general, the predictions by the neural network
model are in an excellent agreement with the records.
Time
Figure 2. Predictions of daily water demands.
3~~
25,)
200
til
0""
OJ
1::~
1("0
50
S( 1. 4%)
0
Relative Error(~)
The predictions by the neural network model have also been compared wit.h those
by the multiple regression model, ARIMA model and the model based OIl the Kalman
Filtering Theory (for detail about the compared models, see Zhang et al., 1992). The
results are shown in Table I, where the following three indexes are applied to estimate
224 S. P. ZHANG ET AL.
1. Mean Relative Error (MRE).The smaller the mean relative error is, the better
the predictions are.
2. Correlation Coefficient between predictions and records (CC). The larger the
correlation coefficient is, the better the predictions are.
3. Relative Root Mean Square Error (RRMSE). RRMSE=O for the perfect predic-
tions, and RRMSE=l if the predictions are equal to the mean of the records.
It can be seen that all of the indexes show the neural network model gives the
best predictions of the tested four models. From the above results we can say that
the neural network model is more reliable and suitable for a practical purpose.
Although the nonlinear relations between daily water demands and exogenous vari-
ables has been recognized, most of the published methods described the relations with
a linear expression, as pointed out in Section 1. It is clear that these methods can not
give us correct knowledge about the fluctuation structure of daily water demands.
In this meaning the neural network model greatly differs from the conventional
models, that is, the neural network model has a nonlinear structure and can simulate
any complex relations between input and output through learning.
Figure 4 shows some simulation results by the learned (identified) neural network
model. From these results we can say that the neural network model indeed is
able to handle with a nonlinear problem and that a model with nonlinear structure
is necessary to describe correctly the relations between daily water demands and
exogenous variables. However, to identify the nonlinear relations between daily water
demands and exogenous variables more simulations are needed, which is a. meaningful
problem to be studied.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 225
~ ~--------------~--~--~~--~~
: (Sunny, I 1 =372(103n1»:
390 ~.' ..... :.' ..... ~." ....:....... '~.' ..... :.' .. "':." ............
: :. . ...•.....
. . . .. ........;.. .
3S'l :...... i ......:......:...... :...... ~ ..... ;. . :.::>,:<:..... ,.... .
: : : .:.... :
370
. ..........~...... .,;
; ..... : ..... r···· T'
·;·'·'''·'··;·'··<··~·····r··· -:...... : .... .
360 i·····:······:······:······~·····t··~Weekday···i·····
: : : : : ... : .... Sunday&hol iday
350
14 18 22 26 34
400 ~------------------~~--~--~~--~
: (Rain(15mm/day) , It =372 (103n1 »
390 : ..... : ..... : ......:...... :...... : ............:...... :.....•..... :
: ·:· : : .: . : : : : :
... .
.
..
.
.,...... .
3S'l .' .... : .....
·
r'" . :..... ':" .... ;..... :..........
. .
'.: ::·.:·:·:·:·:r····1
..
370
', .....•..•..•.•~......:~..-~.:::.:.::.:):.:..
. . . . • :'" ; ...... : .. '.'
..'.
.. :.. ....
.
...: ..... :
400 ~~----------------~~~~~~~--~
: (Sun~ay&~olid~y. 1:1 =37~ (l 03~)
390 : ..... : ..... ~ ......:...... ;...... : ..... ; ......;.- .... ;... ...; ..... :
~ ~ ~ ~': .: ........;... ~
3S'l ; ..... ; ......;. .....;. ..... :...... : ..... ; ......:..• ,.,-;.;.:: •• :;:" .. :
.
........-..... .. .,~.:. .. ..
i.·· ...:. ..-.::.:.:~. . .,. --~.,., . . ,.;<. <: >. .:. : ....:...... :...... ;
;....
370 ; .....
. . . . .
. . .: "-..;" .: . .. ~. ::::-:~ R~iny ;
360 : ..... : ..... : ......:...... ; . Sunny;
.. .. ~ ...
. : .. Cloudy
3~ ~--~--~--r-~---r--~.--~---r,
~ g 22 H --~~~~"
_ ~
-aO
aI = f(2) . [1 - f(2)] . A· B (18)
aO
where
aI = {aO/aIl ,···, aO/aIN V, a N x 1 vector.
A = {Wij}, a N x M matrix.
B = {wJ ' f(X j ). [1- f(X j )]}, a M x 1 vector.
Figure 5 shows the changing sensitivity coefficients of the five exogenous variables
in one year, which are the monthly average values of those calculated with the 8 year's
weather and daily water demand records by (18). The results can be summarized as
follows.
1. The daily high temperature is a main factor of the fluctuation of daily water
demands in one year except in the winter season.
2. The influences of the last day's delivery on daily water demand are also very
great, especially in the fall and winter seasons, but the its range of changing
with time is smaller than that of daily high temperature.
3. The influences of the other exogenous variables are far less than those of the
daily high temperature and last day's delivery.
1.8
.., :.6
c::
.~ 1.4 ~I2
..............
CJ
1.2
~ 0.8
U
..,»
.....
0.~
~.4 . ". ~ , .
..,>
..... 0.2 .......
0
'"c::<!l -::'.2
~_""b
?i I~ 11-
CJ) h
-('.4
6 7 S q 10 II 12 2
Month
Figure 5. Sensitivity coefficients of exogenous variables.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 227
SUMMARY
In this study we applied a neural network model for predictions of daily water de-
mands and verified that the model is very efficient and reliable. The neural network
model differs greatly from most of the statistical model in the ability handling with
a nonlinear problem. The results of the fluctuation structure analysis of daily water
demands and the sensitivity analysis of exogenous variables have suggested that a
neural network model can be used to identify the demand structure due to the self-
organization ability of the model.
REFERENCES
Jinno,k., Kawamura,T., and Ueda,T.(1986) "On the dynamic properties and predic-
tions of daily deliveries of the purification stations in Fukuoka City", Technology
Reports of the Kyushu University 59(4), 495-502.
Koizumi,A., Inakazu,T., Chida,K., and Kawaguchi,S.(1988) "Forecasting of daily wa-
ter consumption by multiple ARIMA model", J. Japan Water Works Association
57{ 12), 13-20.
Tsunoi,M.(1985) "An estimate of water supply based on weighted regression analysis
using a personal computer, J. Japan Water Works Association 54(3), 2-6.
Zhang,S.P., Watanabe,H., and Yamada,R.(1992) ;'Comparison of daily water demand
prediction models", Annual Reports of NSC, Vol.18, N 0.1.
Asou,H.(1988), The Information Processing by Neural Network Models, Sangyo Pub-
lishers, Tokyo.
Yamada,R., Zhang,S.P., and Konda,T.(1992) "An Application of Multiple ARIMA
Model to Daily Water Demand Forecasting", Annual Reports of NSC, VoLl8, No.l.
Rumelhart,D.E., Hinton,G.E., and Williams, R.J.(1986) "Learning Representations
by Back-Propagating Errors", Nature, 323(9), 533-536.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING
One of the major constraints on the use of backpropagation neural networks as a practical
forecasting tool, is the number of training patterns needed. We propose a methodology
that reduces the data requirements. The general idea is to use the Box-Jenkins models in
an exploratory phase to identify the "lag components" of the series, to determine a
compact network structure with one input unit for each lag, and then apply the validation
procedure. This process minimizes the size of the network and consequently the data
required to train the network. The results obtained in four studies show the potential of
the new methodology as an alternative to the traditional time series models.
INTRODUCTION
Most of the available techniques used in time-series analysis, such as Box-Jenkins
methods (Box-Jenkins,1976), assume a linear relationship among variables. In practice
this drawback can make it difficult to analyze and predict accurately the real processes
that are represented by these time series. Tong (1983) described some drawbacks of linear
modelling for time series. In the last decade several nonlinear time series models have
been studied, such as the threshold autoregressive models developed by Tong & Lim
(1980). These are 'model-driven approaches' (Chakraborty et al., 1992) in which we first
identify the type of relation among the variables (model selection) and afterwards estimate
the selected model parameters.
More recently, neural networks have been studied as an alternative to these
nonlinear model-driven approaches. Because of their characteristics, neural networks
belong to the data-driven approaches, i.e. the analysis depends on the available data, with
little a priori rationalization about relationships between variables and about the models.
The process of constructing the relationships between the input and output variables is
addressed by certain general purpose 'learning' algorithms (Chakraborty et al., 1992).
Some drawbacks to the practical use of neural nets are the long time consumed
in the modelling process and the large amount of data required by the present neural
network methodologies. Present methodologies, depending on the problem, can take
several hours or even days in the neural network calibration process, and they require
229
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 229-242.
© 1994 Kluwer Academic Publishers.
230 G. LACHTERMACHER AND J. D. FULLER
RELEVANT BACKGROUND
Neural networks are composed of two primitive elements: units (processing elements)
and connections ('weights') between units. In essence, a set of inputs are applied to a unit
that, based on them, fires an output. Each input has its own influence on the total output.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 231
In other words, each input has its own weight in the total output. The connections of
several units (artificial neurons), arranged in one or more layers, make a neural network.
Many network structures are arranged in layers of units that can be classified as
Input, Output or Hidden layers. According to the type of connections between units,
neural networks can be classified as feedforward or recurrent. In feedforward networks,
the units at a layer i may only influence the activity of units at higher levels ( closer to
the system's output). Also, in a pure feedforward system, the set of input units to one
layer cannot include units from two or more different layers. The recurrent models are
characterized by feedback connections. This type of connection links cells at higher levels
to cells at lower ones. In this study we used pure fully connected feedforward neural
networks.
Backpropagation learning procedure (Rumelhart et al., 1986) is a gradient descent
method that establishes the values of the neural network parameters (weights and biases)
to minimize the output errors, based on a set of examples (patterns). This learning
procedure was used in this study.
The resolution of the conflicting conclusions, reported in the literature, may be
connected to the fact that a neural network model's performance is highly dependent on
its structure, activation function of the units, type of connections, backpropagation
stopping criteria, data normalization factors and on the overfitting problem, among other
things. Furthermore, no definitive established methodology exists to deal with the neural
network modelling problem and no unique comparison method is used in all studies.
Lapedes & Farber (1987, 1988) applied neural networks to forecast two chaotic
time series, i.e. they are generated by a deterministic nonlinear process, but look like
"random" time series. They concluded that the backpropagation procedure may be thought
of as a particular, nonlinear, least squares method. Their results indicated that neural
networks allow solution of the nonlinear system modelling problem, with excellent
prediction properties, on such "random" time series when compared to traditional
methods. Unfortunately, as they pointed out, their study did not include the effects of
noisy real time series, and the related overfitting problem.
Tang et al. (1991) compared neural networks and Box-Jenkins models, using
international airline passenger traffic, domestic car sales and foreign car sales in the U.S ..
They studied the influence on the forecasting performance of the amount of data used,
the number of periods for the forecast and the number of input variables. They concluded
that the Box-Jenkins models outperformed the neural net models in short term forecasting.
On the other hand the neural net models outperformed the Box-Jenkins in the long term.
Unfortunately, the stopping criteria and the overfitting problems were not investigated.
Therefore, wrong conclusions may have been reached since the networks studied are very
large, compared with the number of training patterns, and the network could have been
overtrained.
Recently, a large number of studies have tried to apply neural networks to short
term electric load forecasting (EI-Sharkawi et al., 1991; Srinivasan et al., 1991; Hwang
& Moon, 1991, among others). Most of the studies used feedforward neural networks
with all the units using nonlinear activation functions (sigmoid or hyperbolic tangent). All
these studies required a large amount of data and had large network structures. The input
232 O. LACHTERMACHER AND J. D. FULLER
data included several types of variables such as weather variables, dummy variables to
represent the day of the week and variables to represent the historic pattern, among
others. Almost all the studies did not discuss the overfitting problem, the relation between
the size of the network and the number of patterns used in the training procedure, the
stopping criteria or how the best structure was found. Most of them just presented the
results without fully explaining how they were obtained. Most concluded that neural
networks performed as well as or better than the traditional methods.
Important work has been done by Weigend (1991) and Weigend et al. (1990,
1991). They introduced the weight-elimination backpropagation learning procedure to deal
with the overfitting problem. They also presented all the relevant information that
characterized the models used in the forecasting of the sunspots and exchange rate time
series. They also discussed the stopping criteria in the validation procedure and compared
the results with traditional time series models.
They concluded that the neural network model performed as well as the TAR
model (Tong, 1983,1990 and Tong & Lim, 1980) in the case of one-step-ahead
prediction. In the case of multi-step prediction the neural net models outperformed the
TAR model. The drawback of the weight-elimination procedure is to increase the training
time by the inclusion of the penalty term and another dynamic parameter, A..
Nowlan & Hinton (1992), used an alternative approach called Soft Weight-Sharing. The
results obtained are slightly better than both compared models. Besides the drawback of
increasing the complexity of the modelling process, compared to the weight-elimination
procedure, the authors concluded that the effectiveness of the technique is likely to be
somewhat problem dependent However, 1he aulhors claim 1he advantage of a more sophisticated
model is its ability to better adapt to individual problems.
ME1HODOWGY
In this section we describe the hybrid methodology developed for the practical application
of neural networks in time series analysis, the performance analysis used to compare the
neural network models to other types of time series models, and the software and
hardware used in the study. In order to investigate the benefits of the methodology, four
distinct stationary time series were used.
Hybrid Methodology
The neural networks' main drawbacks of large data and time requirements are related to
the fact that it has been a data-driven approach, because no definitive procedure is
available for its use in time series modelling. The general idea of our new hybrid
methodology is to use the available Box-Jenkins methodology as an exploratory procedure
which identifies some important relationships in the data series. Based on the information
produced in the exploratory step, we define an initial structure whose small size decreases
the neural network's modelling time and reduces the number of estimated parameters and
the amount of data required. This overcomes the most important practical drawbacks to
the application of neural networks as a forecasting tool. Furthermore, because the Box-
Jenkins models are linear, we believe that the nonlinearities included in the neural
networks models will help to improve the forecasting performance of the final model.
I
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 233
Modelling Procedure:
The modelling procedure consists of basically two steps. The first is called the
exploratory phase and the second is called the modelling phase. In the exploratory phase,
the general idea is to observe the series in order to identify its basic characteristics. In the
modelling phase, the general idea is to use the information obtained in the first step to
aid the design of the neural network structure and then perform the neural network
modelling. In the following sections we will describe each of the methodology's phases.
Exploratory Phase:
In this research, our exploratory phase (Tukey, 1977, Ripel & McLeod, 1993) consists
of two parts. In the first one, we use the plot of the time series and of the autocorrelation
function to try to identify trends in the mean and in the variance, seasonalities and
outliers of the data series. Based on this information the second part consists of the Box-
Jenkins modelling process. This process consists of the identification, estimation and
diagnostic checking of the, appropriate ARMA model (Ripel & McLeod, 1993) for each
time series. The decision of the best model to represent each time series is based on the
Akaike Information Criterion (AIC) developed by Akaike (1974).
Modelling Phase:
An important feature gathered at the exploratory phase is what we call the lag
components of the data series, i.e. the lag entries that are important to forecast a new
element of a given time series in a linear context, as suggested by the autoregressive
parameters of the calibrated Box-Jenkins model. A neural network structure that uses
these lag entries as its input variables should easily be trained to perform the linear
transformation done by the Box-Jenkins models. Furthermore, we expect that part of the
"randomness" term of the linear model is in fact nonlinearity, and it can be learned and
incorporated in the new model by the implicit nonlinearity of the neural network models.
Learning Procedure:
In all the studies performed, we used the backpropagation learning procedure, to train the
networks. The additional momentum term, that on average speeds the process (Hertz et
al. 1991), was also used in most of the studies. There exist several other methods that
speed the training procedure. In order to make it clear that any gains in the speed of
modelling are due to the hybrid methodology, we purposely used only the well known
technique of backpropagation, with a momentum term, and we avoided the use of any
other speeding technique.
TrainingNa1idation Procedure:
As mentioned before, an important issue in the application of neural networks is the
relation of the size of the network (and so the number of free parameters) to the number
of training examples. Like other methods of function approximation, such as polynomials,
a large number of parameters will allow the network to fit the training data closely.
However, this will not necessarily lead to optimal generalization properties (i.e. good
forecasting ability, in the present context).
234 G. LACHTERMACHER AND J. D. FULLER
Weigend et al. (1990), suggested two methods to deal with the overfitting
problem. The first one, the weight-elimination backpropagation procedure, was mentioned
in the last section. The second one involves providing a large structure which contains
a large number of modifiable weights but stopping the training before the network has
made use of all of its degrees of freedom. As pointed out by Weigend et al. (1990,1991).
the problem with this procedure is to determine when the network has extracted all useful
information and is starting to incorporate the noise in the model parameters. They
suggested that part of the available data should be separated to serve as a validation set.
The performance on this set should be monitored and the training should be stopped when
the error on the validation set starts to decrease slowly ( almost constant ) or to increase.
Three problems of the latter procedure were pointed out by Weigend et al. (1990).
The first one is that part of the time series available cannot be used directly in the
training process. In some cases. the available series do not present more than 40 to 50
elements. so it is impractical to separate one part of the series to be used as a validation
set, because all available data are needed for training and performance evaluation. The
second problem is related to the pair of training set and validation set chosen. The authors
found out, when studying the sunspot series, that the results depend strongly on the
specific pair chosen. The last problem is related to the stopping point. They point out that
is not always clear where the network is starting to learn the noise of the time series.
They proposed the weight-elimination procedure to overcome these problems. However,
the drawback of weight-elimination is the increase of the training time.
Our training/validation procedure is based on Weigend's validation methodology.
However, it deals with the overfitting and learning time problems in a different way. The
general idea is to use the information obtained in the exploratory phase of the
methodology to establish a small network (and so, few parameters) that would learn on
the lag components of the time series. To avoid the need to use real data for validation,
we generate a synthetic times series, that possesses the same statistical properties of the
original time series, using the Waterloo simulation procedures (Hipel & McLeod, 1993),
to be used as a validation set. The forecasting performance on this validation set is then
monitored as the training proceeds, as suggested by Weigend et al. (1990,1991), and the
stopping decision is based on this performance.
number of training patterns. As said before, the relation of the number of weights and the
number of patterns is important to guarantee a good generalization performance of the
neural network model. Therefore, given the fact that the validation procedure (Weigend
et al., 1990, 1991) suggests an initial large structure, the number of hidden units is
determined in order to make the number of weights follow the heuristic rule given by :
where
p is the total number of patterns used to train the network,
H is the number of hidden units used in the structure, and
I is the number of input units used in the structure.
This heuristic is based on the relation of one weight to ten patterns suggested in
the literature (Weigend et al., 1991) in order to obtain good generalization properties. The
idea is to have a structure that is 1.1 to 3 times bigger than the relation of one to ten
suggested in the literature.
Parameters' Initialization:
In this paper all the initial sets of weights and biases were randomly generated from a
uniform distribution in the range from -0.25 to 0.25. The same range was used by
Weigend et al. (1990,1991) in their studies of neural network modelling in order to avoid
bad solutions.
Penonnance Criterion:
Several performance criteria have been used in neural network time series modelling.
Weigend et al. (1990,1991), Weigend (1991) and Nowlan & Hinton (1992) used the
Average Relative Variance (ARV). Gorr et al. (1992) used the Mean Error (ME), the
Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE). Coakley et al.
(1992) studied several alternative performance measures such as the Mean Squared Error
(MSE), the Root Mean Squared Error (RMSE) and the Mean Absolute Percentage Error
(MAPE). In this study we used MSE as the cost function for backpropagation, and RMSE
to determine the stopping point of training and to evaluate forecasting performance.
236 G. LACHTERMACHER AND J. D. FULLER
Patterns Nonnalization:
Because of the output range of the sigmoid unit ( zero to one ), the patterns used to train
the neural network should be normalized. Several normalization procedures are described
in the literature. The most common one is to divide all the values by one number that is
larger than the biggest value presented in the studied time series. In this paper, this
number is refereed to as the normalization factor (NF). This procedure was used by
Weigend et al. (1991) in their studies of the sunspot time series.
In the case of stationary series, sometimes we want to forecast values which might
be outside the historical range. Therefore we choose a normalization factor that ranges
from 30% to 100% larger than the maximum value in the historical training patterns. The
value varies according to visual inspection of the plot of the series (variance), that was
done during the exploratory phase of the methodology.
Stopping Criterion:
The stopping criterion used in our methodology is similar to Weigend et al.'s (1991)
validation procedure. During the training process we monitor the performance of the
validation set (synthetic time series) using the RMSE criterion. Gorr et al. (1992) used
the RMSE in their validation process,but they the authors used part of the original series
as the validation set. These measurements are made in the original range of the time
series. The general idea is when the RMSE starts to increase, or to decrease very slowly,
the process should be stopped in order to avoid overlearning and deterioration of the
generalization properties of the resultant model. Furthermore, we observe during our study
that a better criterion is to observe the behaviour of the validation and training sets
together. We noticed that when the training is starting to overlearn, the trends of the
validation RMSE and training RMSE plots are in opposite directions. This was observed
in most of the problems studied.
Sensitivity Analysis:
In order to check whether the neural network structure is adequate, an additional input
unit is included in the initial structure suggested by the calibrated Box-Jenkins models.
If the results do not present significant improvement, the first model is maintained. If the
model with the extra input unit is preferred, then a new unit is added to the input layer,
and the process is repeated. Additional hidden units are also tested. To be consistent with
our aim to minimize data requirements, we compare "forecast" performance by RMSEs
on the synthetic data. This kind of procedure has been used by Ripel & McLeod (1993)
in order to test the ARMA model in relation to the overfitting problem.
Petfonnance Analysis
In this section we describe the types of time series studied, and the types of prediction
performed in each case.
elements. These series were chosen because they were previously studied by Hipel &
McLeod (1993) in a comparison study among the ARMA model and several other time
series models. In order to obtain a fair comparison of the neural network models
developed and the results obtained by Ripel & McLeod (1993), the same conditions
(modelling and prediction data sets) were used to train the neural networks.
Types of Prediction:
The forecasting abilities of the neural network models were tested over a period of time
of 30 years depending on the size of the studied time series. Two types of prediction
were performed, the one step ahead and the multi-step ahead.
In one step ahead prediction, only the actual observed time series is utilized as the
input of the models (neural networks and Box-Jenkins) to forecast the next entry of the
series. Each new forecasted entry is then independent and uncorrelated from the previous
forecasts.
Two types of multi-step ahead prediction are reported in the neural network
literature. The first type is called iterated multi-step prediction. In this type, in order to
obtain several steps ahead of prediction, the results obtained in the model are used in the
subsequent forecasts. In this case, the network has just one unit in the output layer, which
represents the next forecast entry of the time series. This forecasting technique is similar
to the one used by the Box-Jenkins models.
In the second type of multi-step ahead prediction, the neural network contains
several units at the output layer of the structure, each representing one step to be
forecasted. However, this type of structure requires a larger time series than the first type
of multi-step prediction in order to avoid generalization problems. Since the goal of this
research is to develop a methodology that uses small real world time series, this technique
was not used. Instead, we used iterated multi-step predictions.
Computational Resources
In the case of the neural networks models, we used the Hybrid Backpropagation (HBP)
software developed by Lachtermacher (1991,1993). The software was rewritten in Turbo-
Pascal® utilizing Turbo Vision® (OOP library) to increase the processing speed of the
software. The training was done in a 486 - 50MHz PC compatible. The training time of
a model ranged from 30 minutes to 3 hours depending on the number of training patterns
and the structure used.
RESULTS
Hipel & McLeod (1993) compared the performance of the ARMA models against several
stationary models, using the river flow time series. We expand this study with the
inclusion of the results of the calibrated neural network models, using the same modelling
and prediction sets. All the time series models are well described in Hipel & McLeod
(1993). Table I summarizes the results obtained.
238 G. LACHTERMACHER AND 1. D. FULLER
Table L One Step Ahead Prediction Time Series Models' RMSE Comparison
Model\Rank 1 2 3 4 5 6 Rank
Sum
ARMA 0 2 0 1 0 1 14
FGN 0 0 2 1 1 0 15
FDIFF 0 0 0 0 2 2 22
Markov 1 1 0 0 1 1 14
Nonparametric 1 1 1 1 0 0 10
Neural Nets 2 0 1 1 0 0 9
Table II presents a ranking summary of the results described in Table I. The rank
sum is simply the sum of the product of the rank and the number of times the model
received that rank. Therefore, models with low rank sums forecast better overall than
models with higher rank sums (Hipel & McLeod, 1993).
As can be seen from Table II, the calibrated neural network models presented the
best overall performance in the one step ahead predictions, considering the four river flow
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 239
studies. Furthermore, in all studies performed using stationary series, at least one neural
network model outperformed the corresponding ARMA model in the one step ahead
predictions.
We should note that the differences in the performances between the calibrated
neural network models and the corresponding ARMA models were very small and may
not justify the additional work to achieve them.
The hybrid methodology proved to be a good approach to forecast stationary series
well, reducing the time used to calibrate the neural network model to about two hours,
typically. Moreover, the use of the synthetic time series as the validation set proved to
be an efficient way to decrease the neural network data requirements in this type of time
series.
An important observation is that the performance of the neural networks' one step
ahead predictions deteriorate with the inclusion of additional hidden and/or input units.
The deterioration is more influenced by the inclusion of an unnecessary input unit than
by a hidden one. Therefore, more care should be taken in the determination of the
relevant number of input units than in setting the number of hidden units used in the
network structure.
In the case of the multi-step prediction (Table ill) both models have almost the
same performance. Another important fact was noted in this type of prediction. We
observed that the performance of the neural networks improves with the increase of the
number of estimated parameters of the model (by increasing the number of hidden or
input units) in all studies. However, there is not yet an exact explanation of these facts.
and Modelling. In the exploratory phase we identify the 'lag components' of the series
using the traditional Box-Jenkins methods. Based on these results a small structure is
suggested and then the training is performed. In the following paragraphs we describe the
conclusions reached in this study.
Synthetic Data
In most of the cases, the synthetic data seems to mimic the original data sufficiently well
to suggest an appropriate stopping point, to avoid overfitting. Furthermore, the use of
synthetic data in the sensitivity analysis procedure usually pointed to the model with the
best overall performance in both types of prediction tested.
Future Resesarch
Further research should be developed to tailor the Hybrid methodology for other types
of time series, such as the seasonal and nonstationary time series. In this, some attention
should be given to the use of tradional methods, such as moving average and
deseazonalization factors. Furthermore, the application of the Hybrid Methodology to a
multivariate time series should also be studied.
In addition, the methodology should be tested on more time series in order to
verify the initial results obtained for the stationary time series. Moreover, a special study
should be done to establish a procedure to better identify the adequate normalization
factor(s) to be used in the forecasting process.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 241
REFERENCES
Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions
on Automatic Control, AC-19, 6, 716-723.
Box, G. E. P., Jenkins, G. M. (1976) Time series analysis: forecasting and control,
Holden-Day, Inc. Oakland, California.
Chakraborty, K, Mehrotra, K, Mohan, C. K, Ranka, S. (1992) Forecasting the behavior
of multivariate time series using neuml networks, Neural Networks, 5,961-970.
Coakley, J. R., McFarlane, D. D., Perley, W. G. (1992) A Itemative criteriaforevaluating
artificial neuml network performance, presented at TIMS/ORSA Joint National
Meeting, April.
El-Sharkawi, M. A., Oh, S., Marks, R. J., Damborg, M. J., Brace, C. M. (1991) Short
term electric load forecasting using an adaptative trained layered perceptron, in
Proceedings of the First Forum on Application of Neural Networks to Power
Systems, 3-6, Seattle, Washington.
Gorr, W., Nagin, D., Szcypula, J. (1992) The relevance of artificial neuml networks to
managerial forecasting; an analysis and empirical study, Technical Report 93-1,
Heinz School of Public Policy and Management, Carnegie Mellon University,
Pittsburgh, PA, USA.
Hertz, John, Krogh, Anders, Palmer, Richard G. (1991) Introduction to the Theory of
Neuml Computation, Addison-Wesley Publishing Co., Don Mills, Ontario, 1-8 and
89-156.
Hipel, KW., McLeod, A.I. (1993) Time series modelling of water resources and
environmental systems, to be published by Elsevier, Amsterdam, The Netherlands.
Lachtermacher, G. (1991) A fast heuristic for backpropagation in neuml networks,
Master's Thesis, Department of Management Sciences, University of Waterloo,
Waterloo, Ontario, Canada.
Lachtermacher, G. (1993) Backpropagation in Time Series Analysis, Ph.D Thesis,
Department of Management Sciences, University of Waterloo, Waterloo, Ontario,
Canada.
Lapedes, A., Farber, R. (1987) Nonlinear signal processing using neuml networks:
prediction and system modelling, Technical Report LA-UR-*&-2662, Los Alamos
National Laboratory.
Lapedes, A., Farber, R. (1988) How neuml nets works, in Neural Information Processing
Systems, ed. Dana Z. Anderson, 442-456, American Institute of Physics, New
York.
Nowlan, S. J., Hinton, G. E. (1992) Simplifying neuml networks by soft weight-sharing,
Neural Computation, 4, 473-493.
Rumelhart, David E., McClelland James L. and The PDP Research Group (1986) Pam/lei
Distributed Processing: Explorations in the Microstructure of Cognition. Volume
1: Foundations, MIT Press, Cambridge, Massachusetts, USA.
Srinivasan, D., Liew, A.C., Chen, J. S. P. (1991) Short term forecasting using neuml
network approach, in Proceedings of the First Forum on Application of Neural
Networks to Power Systems, 12-16, Seattle, Washington.
242 G. LACHTERMACHER AND J. D. FULLER
Tang, Z., Almeida, C., Fishwick, P.A. (1991) Times series forecasting using neural
networks vs. Box-Jenkins methodology, in Simulations, 303-310, Simulations
Councils, Inc., November.
Tong, H., Lim, K. S. (1980) Threshold autoregressive, limit cycles and cyclical data,
Journal of the Royal Stat. Society, series B, 42, 3, 245-292.
Tong, H. (1983) Threshold Models in non-linear time series analysis, in Lecture Notes
in Statistics, ed. D.Brillinger, S. Flenberg, J.Ganid, J.Hartigan and K. Krickeberg,
Springer-Verlag, New York, N.Y., USA.
Tukey, J. W. (1977) Exploratory data Analysis, Addison-Wesley, Reading,
Massachusetts, USA.
Weigend, A. S. (1991) Connectionist architectures for time series prediction of dynamical
systems, PhD Thesis, Department of Physics, Stanford University, University
Microfilms International, Ann Arbor, Michigan.
Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1990) Predicting the future: a
connectionist approach, International Journal of Neural Systems, 1, 3, 193-209.
Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1991) Back-propagation, weight-
elimination and time series prediction, in Connectionist Models - Proceedings of
the 1990 Summer School, Edited by D.S.Touretzky, J.L.Elman, T.J.Sejnowski,
G.E.Hinton, Morgan Kaufmann Publishers, Inc.
PART V
TREND ASSESSMENT
TESTS FOR MONOTONIC TREND
The monotonic trend tests proposed by Mann (1945), Abelson and Tukey (1963) and
Brillinger (1989) are reviewed and evaluated. Simulation experiments using a very
large number of simulations are carried out for comparing the Abelson-Tukey and
Mann-Kendall tests. The advantages and disadvantages of each test are discussed
and the practical implementation and usefulness of these test procedures are clearly
demonstrated with some applications to environmental data..
INTRODUCTION
Recently, Brillinger (1989) proposed a new test for monotonic trend. The important
advantage of this test over the Mann-Kendall test (Mann, 1945) is its validity in
the presence of an auto correlated error component. Brillinger demonstrated that in
this case his test would have asymptotic power equal to one whereas other tests,
which do not take into account the autocorrelation of the error component, would by
comparison have an asymptotic relative efficiency of zero.
For the situation where the errors from the monotonic trend can be assumed to
be statistically independent, the method of Brillinger (1989) may be replaced with a
test originally developed by Abelson and Tukey (1963). It is of interest to compare
the power of the Mann-Kendall and Abelson-Tukey tests since there are important
examples where the errors appear to be independent and identically distributed white
noise. However, it should be noted that if it is the case that the error component is
significantly correlated, then both of these tests would be expected to perform very
poorly relative to Brillinger's trend test.
In the next section, the Brillinger trend test and its practical implementation
details are outlined. Then, in the subsequent two sections the Abelson-Tukey and
Mann-Kendall tests are briefly summarized. Power comparisons of the Abelson-Tukey
and Mann-Kendall tests demonstrate the usefulness of these tests.
1.0
0.5
~
Z •
..-----------
•
-------------_
w ......"....
U
u.. 0.0 ..
----
~ :./*
u -0.5.
-1.0~----~----~----~----~----~
o 20 40 60 80 100
TIME
Figure 1. Plot of coefficient Ct in (3) against t for the Brillinger trend test.
ZB = L:CtZt , (2)
est.sd.(L: CtZt)
where
Ct
Ct: -
= yt(l-;;) J+ (t t+1
1)(1- -;-), (3)
and n is the length of the series. Under the null hypothesis of no trend, the statistic
ZB is asymptotically normally distributed with mean 0 and variance 1. Large values
of IZBI indicate the null hypothesis is untenable and hence there is the possibility
of a trend in the series. The trend is increasing or decreasing according as ZB is
> 0 or < 0, respectively. A plot of Ct versus t for n = 100 is shown in Figure 1.
Notice how the function Ct contrasts the values between each end of the series so
values near the beginning are given weights close to -1 while those near the other
end are given weights near +1. The function Ct was originally derived by Abelson
and Tukey (1963) for testing the case where the error component TJt is assumed to be
independently normally distributed with mean zero and constant variance. This will
be referred to as the Gaussian white noise case. It can be shown that
(4)
where 111(0) denotes the spectral density function of the error component evaluated
at O.
TESTS FOR MONOTONIC TREND 247
In order to estimate /,,(0), it is first necessary to estimate Tf. Assuming there are
no outliers, an estimate of the trend component is given by the running average of
order V defined as
v 1
St =L V ZHi· (5)
2 +1 i=-V
j=1 3
where
n-l-V 2 °t
'In J}
•
= exp { -
0
~.
fj L..J Tft , (7)
t=V+l n
where i =A and
sin{bj (2V+1)}
a - 2n (8)
j - (2V + l)sin(¥.!)"
The parameter L determines the degree of smoothing of the periodogram compo-
nent. A plot of the periodogram of the estimated autocorrelated error component,
~t, showing the bandwidth corresponding to L is suggested by Brillinger (1989) to
aid in choosing L. As with the choice of V, a suitable selection of L is essential to
obtain a reasonably good estimate of /,,(0).
Finally
est.sd·(L CtZt) = J21r j,,(O) L c~. (9)
In practice, the Fourier transform Ej may either be computed using the Discrete
Fourier Transform (DFT) or the Fast Fourier Transform (FFT). If the FFT is em-
ployed, the series is padded with zeros at both ends until it is of length n' = 2P ,
where p = [log2(n)) + 1, where [e) denotes the integer part. To avoid leakage, espe-
cially when the FFT is used, Tukey (1967) and Brillinger (1981, p.54) recommend
that data tapering be used. Tukey's split cosine bell data taper (Bloomfield, 1976,
p.84) involves multiplying the series ~t by the cosine tapering function UtI where
Ut = "21 ( 1 - cos
1r(t i- !))
2 , for t = 1, ... ,I,
248 A. I. MCLEOD AND K. W. HIPEL
to form the tapered series ~; = ~tUt. The Fourier transform for the tapered series is
then evaluated. The percentage of data tapered, say r, is then r = 200i/n'. Tukey
recommends choosing r = 10 or 20. Hurvich (1988) suggests a data based method
for choosing the amount of tapering to be done.
The choice parameters V, L and r are very important in the application of
Brillinger's test since a poor selection of these parameters may result in a completely
meaningless test result. We have found it helpful to practice with simulated time se-
ries data in order to develop a better feel for how these parameters should be chosen.
ABELSON-TUKEY TEST
In this case, under the null hypothesis of no trend, the test statistic may be written
as
ZA = ECtzt , (11)
J(E cl)(E(zt - z)2
where z = E zt/n. Under the null hypothesis of no trend, the statistic ZA is asymptot-
ically normally distributed with mean 0 and variance 1. Large values of IZAI indicate
the null hypothesis is untenable and hence there is the possibility of a trend in the se-
ries. The trend is increasing or decreasing according as ZA is > 0 or < 0, respectively.
THE MANN-KENDALL TREND TEST
The Mann-Kendall trend test is derived by computing the Kendall rank correlation
(Kendall, 1975) between Zt and t (Mann, 1945). The Mann-Kendall trend test as-
sumes that under the null hypothesis of no trend, the time series is independent and
identically distributed. Since the rank correlation is a measure of the monotonic
relationship between Zt and t, the Mann-Kendall trend test would be expected to
have good power properties in many situations. Unlike the Brillinger test, one is not
restricted to having consecutive equi-spaced observations. Thus, the observed series
may be measured at irregularly spaced points in time. However, one can assume the
previous notation, where Zt is interpreted as the t-th observation in data series and
t = 1, ... ,n.
In the general case where there may be multiple observations at the same time
point producing ties in t and there may also be ties in the observations Ze, the Mann-
Kendall score is given by
8».
n
S = E sign«zt - z.) (t - (12)
t<.
This situation arises in water quality data due to repeated uncorrelated samples
taken at the same time and the limited accuracy of the observations. Under the null
hypothesis of no trend, the expected value of S is zero while increasing or decreasing
monotonic trends are indicated when S < 0 or S > O. Valz et al. (1994a) present
improved approximations to the null distribution of S in the case of ties in both
rankings as well as an exact algorithm to compute its significance levels {Valz et al.,
TESTS FOR MONOTONIC TREND 249
S = 2P - (;), (13)
where P is the number of times that Zt2 > Ztl for all tb t2 = 1, ... , n, such that t2 > t l .
Under the null hypothesis all pairs are equally likely, so Kendall's rank correlation
coefficient which is defined in the case of no ties as
(14)
can be written as
T = 211"c - 1, (15)
where 1I"c is the relative frequency of positive concordance (i.e., the proportion of time
for which Zt2 > Ztl when t2 > tl).
In the case where there are no ties in either ranking, it is known (Kendall, 1975,
p.51) that under the null hypothesis, the distribution of S may be well approximated
by a normal distribution with mean zero and variance,
1
var(S) = 18n(n -1)(2n + 5), (16)
POWER COMPARISONS
The power function at a 5% test of significance, denoted by f3MK and f3AT for the
Mann-Kendall test and the Abelson-Tukey tests, respectively, are estimated for var-
ious forms of the basic trend model
where Zt is the observed time series, f(t) represents a monotonic trend component and
at represents an error component which is independent and identically distributed.
Length of series, n = 10,20,50 and 100, are generated one million times for each
of a variety of trend models and error component distributions. The proportion of
times that the null hypothesis of no trend is rejected gives estimates for f3MK and {JAT.
Thus, the maximum standard error in the estimated power functions, {JMK and (JAT,
is 10-3 / v'2 == 0.0005. Consequently, one may expect that the power probabilities
may differ by at most one digit in the third decimal from the true exact result most
of the time.
Three models for trend are examined. The first trend model is a linear trend
so f(t) = )..t. In this case, it is known that the Mann-Kendall trend test is nearly
optimal since when at is Gaussian white noise, the Mann-Kendall trend test has 98%
asymptotic relative efficiency with respect to the optimal estimator which is linear
250 A. I. MCLEOD AND K. W. HIPEL
regression (Kendall and Stuart, 1968, §45.25). In the second model, f(t) is taken to
be the step function
f(t) = 0, if t:S n/2,
= >., if t > n/2. (18)
Step functions such as this are often used in intervention analysis modelling (see Hipel
and McLeod (1994, Oh. 19) for detailed descriptions and applications of various types
of intervention models). A priori, it would be hoped that both the Abelson-Tukey and
Mann-Kendall trend test should perform well for step functions. In the third model,
f(t) = ACt, where Ct is defined in equation (3). For this model, the Abelson-Tukey
procedure is optimal when at is Gaussian white noise. The values of the parameter
A in these trend models is set to A = aJlO/n where a = 0.01,0.04,0.07,1.0.
Two models for the error component distribution are used. The first is the normal
distribution with mean zero and variance 0'2, while the second is a scaled contami-
nated normal distribution, </>c(z),
(1 - p)</>(z) + p.l</>(z/O'c))
</>c(z)=0' (1 (19)
- p - pUc
iT. ) ,
where O'c = 3 and p = 0.1. The scaling ensures that the distribution has variance
equal to 0'2. These particular parameters are suggested by Tukey (1960) and have
been previously used in many simulation studies. The reason for o'c = 3 is that Tukey
(1960) found that there are many datasets occurring in practice where this choice was
suggested. We choose p = 0.1 since for this choice the variance contribution from both
distributions is equal and so the contamination effect is largest. In previous simulation
studies (see Tukey (1960)), it is found that this choice produces the greatest effect
when a non-robust estimator is compared to a robust one. We take 0' = 0.5,1,2,4 in
both the normal and contaminated normal cases.
Data are generated by applying the Box-Muller transformation (Box and Muller,
1958i to uniform (0,1) pseudo-random variables generated by Superduper (Marsaglia,
1976 . The tests are applied to the same data series but a different set of random
num ers is used for every model and parameter setting.
The simulation results are presented in Tables lola to 3.2b. As previously noted,
due to the very large number of simulations, all results presented in these tables
are essentially exact to the number of decimal places given. Tables lola and LIb
show the results for a simple linear trend with Gaussian white noise. For comparison
of {3MK and {3AT, one can look at their ratios, {3MK/{3AT, as well as their absolute
magnitudes. These ratios vary from 0.93 to 1.56. As might be expected, in no case
is the Abelson-Tukey test substantially better than the Mann-Kendall test whereas
there are many instances, especially for series length 100, where the Mann-Kendall
is much better. This conclusion also applies to Tables 2.1a and 2.2b. In Tables 1.1a
through 2.2b, the only situations where the Abelson-Tukey test has larger power is
when the null hypothesis is either true or very nearly true so the power function is
really just reflecting the probability of a Type 1 error.
For step functions, the results are shown in Tables 2.1a through 2.2b. For longer
series lengths shown in Tables 2.1b and 2.2b, the Mann-Kendall test dominates since
the only times where the Abelson-Tukey has a larger power is when the null hypothesis
is true and even in these cases the Mann-Kendall is better since probability of Type
1 error is closer to its nominal 5% level. For smaller samples shown on Tables 2.1a
TESTS FOR MONOTONIC TREND 251
n (t (T
f3MK f3AT f3MK/f3AT
10 0.00 0.50 0.060 0.056 1.07
10 0.00 1.00 0.060 0.056 1.07
10 0.00 2.00 0.060 0.056 1.07
10 0.00 4.00 0.060 0.055 1.08
10 0.50 0.50 0.184 0.143 1.28
10 0.50 1.00 0.086 0.081 1.06
10 0.50 2.00 0.062 0.062 1.00
10 0.50 4.00 0.058 0.058 1.01
10 1.00 0.50 0.468 0.307 1.52
10 1.00 1.00 0.184 0.143 1.29
10 1.00 2.00 0.086 0.080 1.07
10 1.00 4.00 0.062 0.062 1.00
10 2.00 0.50 0.686 0.563 1.22
10 2.00 1.00 0.469 0.307 1.53
10 2.00 2.00 0.184 0.143 1.29
10 2.00 4.00 0.086 0.080 1.07
10 5.00 0.50 0.694 0.907 0.77
10 5.00 1.00 0.693 0.653 1.06
10 5.00 2.00 0.575 0.383 1.50
10 5.00 4.00 0.253 0.182 1.39
20 0.00 0.50 0.050 0.053 0.95
20 0.00 1.00 0.050 0.053 0.95
20 0.00 2.00 0.050 0.053 0.94
20 0.00 4.00 0.050 0.053 0.95
20 0.50 0.50 0.221 0.148 1.49
20 0.50 1.00 0.090 0.078 1.16
20 0.50 2.00 0.060 0.059 1.00
20 0.50 4.00 0.052 0.054 0.96
20 1.00 0.50 0.635 0.376 1.69
20 1.00 1.00 0.221 0.148 1.49
20 1.00 2.00 0.090 0.078 1.16
20 1.00 4.00 0.059 0.059 1.00
20 2.00 0.50 0.978 0.797 1.23
20 2.00 1.00 0.634 0.376 1.69
20 2.00 2.00 0.221 0.149 1.49
20 2.00 4.00 0.090 0.078 1.16
20 5.00 0.50 0.994 1.000 0.99
20 5.00 1.00 0.991 0.906 1.09
20 5.00 2.00 0.802 0.501 1.60
20 5.00 4.00 0.315 0.197 1.60
256 A. I. MCLEOD AND K. W. HlPEL
n Q (1'
f3MK /3AT f3MK//3AT
10 0.00 0.50 0.062 0.073 0.85
lO 0.00 1.00 0.071 0.059 1.20
lO 0.00 2.00 0.056 0.068 0.82
10 0.00 4.00 0.059 0.060 0.98
10 0.50 0.50 0.236 0.147 1.61
10 0.50 1.00 0.109 0.105 1.04
10 0.50 2.00 0.083 0.084 0.99
10 0.50 4.00 0.042 0.062 0.68
10 1.00 0.50 0.519 0.330 1.57
lO 1.00 1.00 0.224 0.153 1.46
lO 1.00 2.00 0.103 0.098 1.05
10 1.00 4.00 0.065 0.084 0.77
10 2.00 0.50 0.701 0.614 1.14
10 2.00 1.00 0.505 0.309 1.63
10 2.00 2.00 0.219 0.167 1.31
10 2.00 4.00 0.104 0.091 1.14
10 5.00 0.50 0.689 0.901 0.76
10 5.00 1.00 0.695 0.685 1.01
10 5.00 2.00 0.588 0.395 1.49
10 5.00 4.00 0.314 0.211 1.49
20 0.00 0.50 0.041 0.049 0.84
20 0.00 1.00 0.049 0.068 0.72
20 0.00 2.00 0.054 0.061 0.89
20 0.00 4.00 0.054 0.067 0.81
20 0.50 0.50 0.311 0.173 1.80
20 0.50 1.00 0.094 0.084 1.12
20 0.50 2.00 0.066 0.074 0.89
20 0.50 4.00 0.058 0.057 1.02
20 1.00 0.50 0.721 0.386 1.87
20 1.00 1.00 0.285 0.164 1.74
20 1.00 2.00 0.111 0.087 1.28
20 1.00 4.00 0.068 0.073 0.93
20 2.00 0.50 0.966 0.811 1.19
20 2.00 1.00 0.748 0.409 1.83
20 2.00 2.00 0.287 0.158 1.82
20 2.00 4.00 0.103 0.089 1.16
20 5.00 0.50 0.992 0.994 1.00
20 5.00 1.00 0.974 0.907 1.07
20 5.00 2.00 0.865 0.533 1.62
20 5.00 4.00 0.412 0.206 2.00
258 A. I. MCLEOD AND K. W. HlPEL
n a (f'
f3MK f3AT f3MK/f3AT
50 0.00 0.50 0.044 0.053 0.83
50 0.00 1.00 0.046 0.065 0.71
50 0.00 2.00 0.055 0.058 0.95
50 0.00 4.00 0.057 0.063 0.90
50 0.50 0.50 0.350 0.161 2.17
50 0.50 1.00 0.124 0.080 1.55
50 0.50 2.00 0.062 0.081 0.77
50 0.50 4.00 0.065 0.055 1.18
50 1.00 0.50 0.852 0.418 2.04
50 1.00 1.00 0.356 0.153 2.33
50 1.00 2.00 0.113 0.074 1.53
50 1.00 4.00 0.080 0.076 1.05
50 2.00 0.50 1.000 0.897 1.11
50 2.00 1.00 0.807 0.365 2.21
50 2.00 2.00 0.311 0.126 2.47
50 2.00 4.00 0.123 0.076 1.62
50 5.00 0.50 1.000 1.000 1.00
50 5.00 1.00 1.000 0.968 1.03
50 5.00 2.00 0.958 0.537 1.78
50 5.00 4.00 0.496 0.220 2.25
100 0.00 0.50 0.049 0.050 0.98
100 0.00 1.00 0.054 0.048 1.12
100 0.00 2.00 0.052 0.060 0.87
100 0.00 4.00 0.044 0.072 0.61
100 0.50 0.50 0.365 0.147 2.48
100 0.50 1.00 0.114 0.073 1.56
100 0.50 2.00 0.051 0.067 0.76
100 0.50 4.00 0.054 0.067 0.81
100 1.00 0.50 0.880 0.387 2.27
100 1.00 1.00 0.350 0.123 2.85
100 1.00 2.00 0.121 0.083 1.46
100 1.00 4.00 0.068 0.062 1.10
100 2.00 0.50 1.000 0.899 1.11
100 2.00 1.00 0.869 0.357 2.43
100 2.00 2.00 0.363 0.135 2.69
100 2.00 4.00 0.122 0.068 1.79
100 5.00 0.50 1.000 1.000 1.00
100 5.00 1.00 1.000 0.976 1.02
100 5.00 2.00 0.970 0.570 1.70
100 5.00 4.00 0.508 0.193 2.63
TESTS FOR MONOTONIC TREND 259
and 2.2a it generally holds true that the Mann-Kendall test outperforms the Ableson-
Tukey test although there is one curious exception which occurs. In particular, when
n = 10, a = 5 and tr = 0.5 we have f3MK = 0.694 and f3AT = 0.907 in the Gaussian
white noise case and f3MK = 0.689 and f3AT = 0.901 in the contaminated Gaussian
white noise case.
Finally for the trend function based on the Abelson-Tukey function, one can see
that the Abelson-Tukey method is better in almost all cases, as should be expected.
The difference in power is roughly comparable to the differences one can see in Ta-
bles 1.1a through 2.1b. More generally, in situations when it is not known where the
monotonic trend commences, the Abelson-Tukey contrast may be expected to out-
perform the Mann-Kendall test. On the basis of these simulations, one can conclude
that both tests seem to perform reasonably well. In actual applications, it would be
reasonable to use either or both tests.
ILLUSTRATIVE APPLICATIONS
All datasets discussed in this section are available bye-mail from the statlib archive
by sending the following e-mail message: send hipel-mcleod from datasets to
statlibGtemper. stat. emu. edu. Additionally, the graphical output and analytical
results are generated using the decision support system called McLeod-Hipel Time
Series (MHTS) package (McLeod and Hipel, 1994a, b).
A time series plot of the estimated total annual precipitation for 1900-1986 for the
Great Lakes, is shown in Figure 2 along with a Cleveland's robust LOESS smooth
curve (Cleveland, 1979). As shown by this smooth, there is an apparent upward trend.
An autocorrelation plot, cumulative periodogram analysis and a normal probability
plot of the residuals from the trend curve shown in Figure 2, are shown in Figures
3, 4 and 5, respectively. These figures suggest that the data could be modelled as
Gaussian white noise. One can consult Hipel and McLeod (1994, Ch. 7) for details
on the interpretation of these plots. In order to test the statistical significance of the
apparent trend upwards one may use either the Mann-Kendall or the Abelson-Tukey
methods. These methods yield test statistics T = 0.2646 and ZA = 3.22 with two-
sided significance levels of 2.9 x 10-4 and 3.3 x 10-3 , respectively. Therefore, the null
hypothesis of no trend is strongly rejected.
As a matter of interest, neither the autocorrelation plot nor the cumulative pe-
riodogram for the original data detect any statistically significant non-whiteness or
autocorrelation. Hipel et al. (1983, 1994, Section 23.4) demonstrate that for mono-
tonic trends in Gaussian white noise, the Mann-Kendall trend test clearly outperforms
autocorrelation tests. This example also nicely illustrates this fact.
264 A. I. MCLEOD AND K. W. HIPEL
41
•
•
37 •
z • • • • •• • • •
. ...
0
~ • • • RS80
~ 33 • • •
a:: • •• • •
1AJ
V> •••••• • •• ••
•
m
• •• •• • • •• •• • •
0
• •
29 •
• • • • •• • ••
10 20 30 40 50 60 70 80 90
OBSERVATION NUMBERS
Figure 2. Annual precipitation in inches for the Great Lakes (1900-1986).
1.0..-
...
...
I-
...
0.5 i--
l-
t..
U 0.0
l- I I
« ~
-I • I
-0.5 ~
~
~
~
l-
-1.0 I I I I I I I I I I
0 1 2 3 4 5 6 7 8 9 10
LAG
Figure 3. Autocorrelation function (ACF) plot of the Great Lakes residual
precipitation data (1900-1986).
TESTS FOR MONOTONIC TREND 265
1.0
::E
<
a::::
(,!) O.B
0
0
0
a:::: 0.6
UJ
~
UJ
> 0.4
~
5::>
::E 0.2
::>
u
0.0
0.00 0.10 0.20 0.30 0.40 0.50
FREQUENCY
Figure 4. Cumulative periodogram graph of the Great Lakes residual precipitation
data (1900-1986).
9 1=.028. DSP=.045. W=.9894,
S.L. .91029E+00 .76202E+00 .94726E+00
Vl
1.&.1
-'
t=
z
«
o~ 0.00
-'
~ -1.50
0::
oz -3.00~~__~______~______~____~
-8 -4 a 4 8
EMPIRICAL QUANTILES
• ••
• •
Figure 5. Normal probability plot of the residuals of the Great Lakes residual
precipitation data (1900-1986).
266 A. I. MCLEOD AND K. W. HIPEL
Hipel and McLeod (1994, Section 19.2.4) show how the effect of the Aswan dam on
the average annual riverflows of the Nile River measured just below the dam can
be modelled using intervention analysis. They also demonstrate using intervention
analysis that there is in fact a significant decrease in the mean annual flow after the
dam went into operation in 1903. It is of interest to find out whether the trend tests
are able to confirm this trend hypothesis by rejecting the null hypothesis of no trend.
Figure 6 displays a time series plot of the mean annual Nile River flow for 1870-1945
(rn 3 / 8) with a superimposed Cleveland robust LOESS trend smooth. Figures 7 and
8 show autocorrelation and cumulative periodogram plots of the error component
or residuals from the trend. These plots indicate that the error component is an
auto correlated time series. Thus, Brillinger's trend test can be applied. We chose
a running-average smooth with V = 8 in (5). The smoothed curve and original
data are shown in Figure 9. Next a smoothing parameter for the spectral density
estimation at zero is set to L = 5 in (6) and a 10% cosine-bell taper is used (see Figure
10). The resulting test statistic is ZB = -4.83 with a two-sided significance level of
1.4 x 10-6 • For comparison, the Mann-Kendall and the Abelson-Tukey methods yield
test statistics T = -0.4508 and ZA = -4.52 with two-sided significance levels < 10- 10
and 6 x 10-6 , respectively.
Brillinger (1989) demonstrates the usefulness of his method on a very long time
series (n > 3 x 104 ) of daily river heights. Our example above, shows the usefulness
of Brillinger's test even for comparatively short series (n = 75).
CONCLUSION
•
•
z 4000
•
• ••••
0
t=
<
•
~ 3200
• •
• • ••• •
. .. ... •
L&J
(/) • •
m • • • ••
0 2400 •• •• • •• • RS80
• •
•
10 20 30 40 50 60 70 80
OBSERVATION NUMBERS
Figure 6. Average annual flows (water year) of the Nile River (m 3 /s) (1870-1945).
1.0 r-
l-
I-
l-
I-
0.5 i--
I-
lL. 0-
I
U 0.0 ... • •
< I I I
I I
---
I
-0.5
--
-1.0 - I I I I I I I I I I
o 1 2 3 4 5 6 7 8 9 10
LAG
Figure 7. Autocorrelation function (ACF) of the error component from the trend of
the average annual Nile River flows.
268 A.1. MCLEOD AND K. W. HIPEL
1.0
:E
<
cr 0.8
(,!)
0
0
0
cr 0.6
I.&J
0..
I.&J
> 0.4
.....
:5
:::> 0.2
:E
:::>
u
0.0
0.00 0.10 0.20 0.30 0.40 0.50
FREQUENCY
Figure 8. Cumulative periodogram plots of the error component from the trend of
the average annual Nile River flows .
•
•
• • -
Z
0
~ :.
••
•
..
•
•
cr 3200 • •
I.&J
(/) • •• • •• • • •
• •••
• •
. ..
CO
0 •
2400 •• • • •
•
•
•
10 20 30 40 50 60 70 80
OBSERVATION NUMBERS
Figure 9. The smoothed curve and original annual data of the Nile River.
TESTS FOR MONOTONIC TREND 269
~
«
a:::: 9.38
C,!)
0
0
0 6.25
a::::
L.U
D-
C,!)
0 3.13
.....J
o.0 0 LLU...u..L.u.L.lu..u.~u.....u..&.&..U.I..L.L.I..&...L.l.lu..a..&.""""""I..I..I..I...u..&.'-&.I
REFERENCES
Kwiatkowski.
Hurvich, C. M. (1988), "A mean squared error criterion for time series data windows",
Biometrika 75,485-490.
Kendall, M. G. and Stuart, A. (1968). The Advanced Theory of Statistics, Volume
3, Hafner, New York.
Kendall, M. G. (1973). Time Series, Griffin, London.
Kendall, M. G. (1975). Rank Correlation Methods (4th ed), Griffin, London.
Mann, H. B. (1945), Nonparametric tests against trend, Econometrica 13, 245-259.
Marsaglia, G. (1976), "Random Number Generation", in Encyclopedia of Computer
Science, ed. A. Ralson, pp. 1192-1197, Petrocelli and Charter, New York.
McLeod, A. 1. and Hipel, K. W. (1994a) The McLeod-Hipel Time Series (MHTS)
Package, copyright owned by A. 1. McLeod and K. W. Hipel, McLeod-Hipel Research,
121 Longwood Drive, Waterloo, Ontario Canada N2L 4B6, Tel: (519)884-2089.
McLeod, A. 1. and Hipel, K. W. (1994b) The McLeod-Hipel Time Series (MHTS)
Package Mannual, McLeod-Hipel Research, 121 Longwood Drive, Waterloo, Ontario
Canada N2L 4B6, Tel: (519)884-2089.
Tukey, J. W. (1960), "A survey of sampling from contaminated distributions" in
Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling,
Edited by I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow and H. B. Mann,
Standford University Press, Standford.
Tukey, J. W. (1967), "An introduction to the calculations of numerical spectrum
analysis" in Advanced Seminar on Spectral Analysis of Time Series, edited by B.
Harris, pp.25-46, Wiley, New York.
Valz, P., McLeod, A. I. and Thompson, M. E., (1994a, to appear) "Cumulant gen-
erating function and tail probability approximations for Kendall's score with tied
rankings", Annals of Statistics.
Valz, P., McLeod, A. I. and Thompson, M. E., (1994b, to appear) "Efficient algo-
rithms for the exact computation of significance levels for Kendall's and Spearman's
scores.", Journal of Statistical Graphics and Computation.
ANALYSIS OF WATER QUALITY TIME SERIES
OBTAINED FOR MASS DISCHARGE ESTIMATION
Methods are proposed for quantifl!ng long term mean annual riverine load reduc-
tions of the nutrient phosphorus lP] and other agricultural pollutants anticipated
in southwestern Ontario Great Lakes tributaries due to farm scale nonpoint source
[NPS] remediation measures implemented in the headwater catchments. Riverine
delivery of NPS pollutants is a stochastic process driven by episodic hydromete-
orologic events; thus, progress towards tributary load reduction targets must be
interpreted as the expected mean annual reduction achieved over a suitably long,
representative hydrologic sequence. Trend assessment studies reveal that runoff
event biased water quality monitoring records are conceptualized adequately by the
additive model Xi = iti + Ci + Si + Ti + ei where Xi is sample concentration,
iti is 'global' central tendency, Ci is discharge effect, Si is seasonality, Ti is trend
(local central tendency) and ei is residual noise. As the watersheds systematic hy-
drochemical response embodied in components Cj and Si has remained stable in
the presence of gradual concentration trends, the expected mean annual load reduc-
tions may be inferred by the difference between the mean annual loads estimated by
adjusting the water quality series to (1) pre-remediation and (2) current mean con-
centration levels where concentrations on unsampled days are simulated by Monte
Carlo methods. Fitting components by robust nonparametric smoothing :filters in
the context of generalized additive models, and jointly fitting interactive discharge
and seasonal effects as a two dimensional field C®St are considered.
INTRODUCTION
Diffuse or nonpoint source [NPS] water pollution by agricultural runoff has long
been recognized as a significant issue in southern Ontario. During the 1970s under
the aegis of the International Joint Commission [IJC], Canada and the U.S. un-
dertook to define NPS impacts on the Great Lakes with the PL U ARG [Pollution
from Land Use Activities Reference Group] studies that documented the extent of
water quality impairment by agriculture in southern Ontario (Coote et al., 1982;
271
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 271-284.
© 1994 Kluwer Academic Publishers.
272 B. A. BODO ET AL.
Wall et al., 1982; Miller et al., 1982; Nielsen et al., 1982; Frank et al., 1982).
A 1983 review by the IJC (1983) noted that Ontario had yet to implement any
comprehensive NPS remediation policies in response to PLUARG recommendations
tabled in 1978. In 1987 Canada and the U.S. finally set specific remediation targets
under Annex 3 of the amendments to the 1978 Canada-U.S. Great Lakes Water
Quality Agreement (IJC, 1987) which call for a 300 tonne per annum reduction of
phosphorus [P]loading to Lake Erie from the Canadian side. Presumably much of
the reduction was to be achieved by NPS remediation which involves implementation
of various farm scale land management practices including vegetated buffer strips
along streambanks, low tillage cultivation, restricted livestock access to streams,
solid and liquid waste management, and the retirement of erosion prone land from
cultivation.
Inherently stochastic NPS contaminant delivery mechanisms driven by ran-
domly episodic hydrometeorological processes mitigate against attempts to estab-
lish the quantitative impact of abatement practices. While short term pilot studies
at the plot and small watershed scale can demonstrate the general effect of a partic-
ular management practice, they often fail to provide a reliable basis for broad scale
extrapolation because the hydrologic regime of the pre-treatment monitoring phase
differs appreciably from that during treatment. Linking farm scale NPS abatement
measures implemented in small headwater catchments to progress towards specific
Great Lakes tributary nutrient load reduction targets poses a formidable challenge
as very subtle trends must be detected against appreciable background variation
imparted by stochastic NPS contaminant delivery mechanisms.
Over the past decade, federal and provincial agencies began implementing NPS
remediation policies in Ontario ostensibly to mitigate inland surface water quality
problems from agricultural runoff, and hopefully, to reduce Canadian tributary
nutrient loads to the lower Great Lakes. In the short term, miscellaneous farm
scale abatement initiatives will not discernibly influence water quality in the main
channels of larger rivers draining southwestern Ontario. However, the long term
cumulative impact of many small scale improvements should ultimately manifest in
downstream waters. It is presently unclear what has been achieved on a grander
scale by the patchwork of Ontario agricultural pollution remediation programs.
Some have been in progress since the mid 1980s and a comprehensive review of
water quality trends across southwestern Ontario would be timely. This paper
explores extensions of time series methods deVeloped for assessing long term water
quality concentration trends to the problem of estimating long term trends in mass
delivery of nutrients and other NPS pollutants to the Great Lakes.
method informally known as 'stratified Beale ratio estimation' by which the annual
mass load is derived as the standard ratio estimate from sampling survey theory
(Cochran, 1977) adjusted by the bias correction factor of E.M.L. Beale (Tin, 1965).
To improve estimates, data were retrospectively blocked into two or more strata or
flow classes of relatively homogeneous concentrations. Two way blocking by flow
and time categories was also applied to treat both flow and seasonal variation.
Beyond estimation technique, the quality of annual mass load estimates de-
pends on the quality of the river monitoring record. At the outset of PLUARG,
vaguely understood mass delivery phenomena and the formidable logistical burdens
posed by runoff event sampling inevitably lead to unsampled high delivery periods
and oversampled low delivery periods. Realizing that a watershed's fundamental
hydrophysicochemical response had not changed appreciably from one year to the
next, Ontario tributary loads were determined by pooling 1975 and 1976 survey
data in order to improve the respective annual and monthly estimates (Bodo and
Unny, 1983).
Following PLUARG, Environment Ontario (MOE) implemented the 'Enhanced
Tributary Monitoring' [ETM] program at 17 major tributary outlets to the Great
Lakes where higher frequency sampling was to be conducted in order to more reli-
ably estimate mass delivery for a limited suite of variables. Annual P loads were to
be reported to the IJ C for inclusion in the biennial reports of the Great Lakes Water
Quality Board. Due to minimal budgets and staffing, the execution of the ETM
program was somewhat haphazard from the outset. Local 'observers' resident in
the vicinity of the ETM sites were retained to collect and ship samples. Observers
were instructed to sample more frequently at high flows with neither quantitative
prescription of river stage for judging precisely what constituted high flows nor any
regular supervision. Accordingly, sampling performance has varied erratically and
critical high flow periods have gone unsampled. Initially from 20-120 and more
recently from 20-60 samples per annum were obtained from which annual P loads
were determined by two strata Beale ratio estimation with a subjectively deter-
mined flow boundary separating low and high flow classes. Each year was treated
independently and flow boundaries were manipulated subjectively to force data into
two classes. Consequently, annual P mass load estimates for Ontario tributaries re-
flect the vagaries of sampling practice as much or more so than legitimate trends
and hydroclimatic variations that determine the true mass transport.
Figure 1 shows the location of the 3 southwestern Ontario ETM sites most
suitable for studying mass-discharge trends. The Grand and Saugeen Rivers which
were the Ontario PL UARG pilot watersheds now have lengthy water quality records
spanning 1975-1993. The ETM site in the Thames River basin, the most intensely
agricultural watershed of southwestern Ontario, has low frequency data from 1966-
1975 and higher frequency records from late 1979. Figure 1 also illustrates a prob-
lem common to ETM sites that are not co-located with flow gauges where flows
estimated by areal proration from gauges in upstream or adjacent watersheds are
employed to generate annual P loads. Errors in mean daily flow estimates derived
for the Grand and Thames ETM sites are small as only 15% and 10% of respective
upstream areas are ungauged.
While studies (Dolan et al., 1981; Richards and Holloway, 1987; Young and
DePinto, 1988; Preston et al., 1989) have shown stratified Beale ratio estimation
to perform at least as well as other techniques for estimating annual loads in larger
274 B. A. BODO ET AL.
+
SCALE:
(1)
where Xi is the concentration of the ith sample collected at time ti, Qi is the mean
discharge over bi = (tHl - ti-l)/2, the time interval represented by sample i. This
method produces acceptable results when the sampling is conducted at a frequency
appropriate to the characteristic hydrochemical response of the stream. The rivers
we consider herein are large enough that instantaneous How on any given day does
not differ appreciably from the mean How for that day and that water quality
concentrations do not vary significantly during the day. For these systems, dally
sampling would produce very good annual load estimates. The quality of estimates
would decline as sampling rates fell below the characteristic duration time of a
runoff event. Technique (1) wa.s employed by Baker (1988) to estimate sediment,
nutrient and pesticide loads to Lake Erie from U.S. tributaries. In contra.st to
Ontario monitoring activity, the programs supervised by Baker and colleagues have
obtained from 400-500 samples per year at the main sites which are co-located with
How gauges.
quality time series like those generated at ETM sites. Lengthy concentration series
from outlet sites of the two main Ontario PLUARG watersheds, the Grand and
Saugeen Rivers, served as the development data set. Beyond statistical develop-
ments and some indications of modest trends, the results clearly demonstrated that
these larger southwestern Ontario watersheds exhibit generally stable hydrochemi-
cal response over the long term. Fundamental concentration-flow relationships and
seasonal cycles remain largely unchanged in the presence of negligible to modest
concentration trends. Thus, to good first approximation, the water quality concen-
tration process may be conceptualized as the additive process
(2)
X t is constituent concentration, Xt the central tendency of X t , Ct is covariate effect,
8 t is seasonal effect, Tt is chronological time trend and et is residual noise. Here
Ct , 8t , and Tt represent relative departures from process mean Xt • Covariate effect
Ct is defined by a functional relation Ct = I( Qt) with stream discharge. Trend Tt
is the temporally local level of the constituent in the chronological time dimension
t at which the process evolves. Seasonal 8t represents stable annually recurrent
variation in the circular dimension of seasonal time T defined here on the interval
[0,1] as the fraction of time from the beginning of the year. In decimal format,
chronological time t is the sum of the year plus the seasonal fraction T.
For watersheds with stable hydrochemical response, these times series models
of water quality concentrations afford a means of estimating reductions in riverine
contaminant mass delivery attributable to changes in upstream land practice. Sup-
pose that model (2) is successfully fit to the Grand River P series which extends
from 1972-1992 and that we wish to evaluate net changes in annual P delivery that
may have occurred since the PLUARG reference years 1975-76. Diagnostic graph-
ics and trend tests reveal that P concentrations have declined gradually through
the 1980s. After adjustments for flow and seasonal effects, we determine that the
respective 1975-76 and 1990-92 mean levels were Xl and X2. Next we construct
the two hypothetical series
(3a)
(3b)
which amounts to centring the entire data series about the two respective reference
levels. For each series we determine annual P loads by a consistent method for each
year from 1975-1992 and average the results to obtain £1 and £2 the mean annual P
loads as if mean levels Xl and X2 respectively had prevailed over the entire period.
The difference a£ = £2 - £1 gives an estimate of the mean annual reduction in
P mass delivery that could be expected for the hydrologic sequence observed from
1975-1992. On consideration of the intrinsic stochasticity of mass delivery, the
binational P reduction targets for Lake Erie can be meaningfully interpreted only
as mean annual estimates determined over a suitably long period of representative
hydrologic record of 10 years or more.
Recently, Baker (1992) proposed essentially this method for estimating the ex-
tent of P load reductions from U.S. tributaries to Lake Erie, and gave preliminary
276 B. A. BODO ET AL.
results suggesting that significant progress towards the U.S. P reduction commit-
ment of 1700 t per annum (IJC, 1987) had been achieved by NPS remediation
measures applied in the upper reaches of Ohio's Maumee lliver in the 1980s. Re-
cent analyses (llichards and Baker, 1993) support the early results. Though Ohio
monitoring data are superior, there is no good reason preventing the application of
the same approach to Ontario's ETM records. Bodo (1991) applied a similar time
series adjustment technique at the Thames ETM site to contrast differences in the
expected seasonal risks of exceeding Canada's water quality guideline for the her-
bicide atrazine between years of high and low applications. The main requirement
of the technique is a good time series model fit, demonstrated for the Grand and
Saugeen lliver sites in the trend assessment study (McLeod et al., 1991). Due to
hapzard ETM records and difficulties with stratified Beale ratio estimation, it is
proposed to simulate concentrations on unsampled days according to model (3) and
then determine annual loads by the technique of equation (1).
where Gj are Ct , Sf, and Tt for model (2) and hO are arbitrary smooth functions.
Iteration reduces the influence of potentially significant sampling biases embed-
ded in the initial fits of systematic terms. For the Grand and Saugeen lliver data
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 277
1,000
..J
........
E
!0)
....
0 100
'e
0
D- ..
...
10
10 100 1,000
Flow cubic mI s
these include: (1) chronological biases toward heavily sampled periods, particularly
the PL UARG years 1975-1977, (2) seasonal bias towards spring runoff months, and
(3) a covariate bias towards higher flows. To varying degrees, these three sampling
biases overlap. The seasonal and covariate bias are largely synonymous. In Figure
2, the unadjusted P concentration-flow relationship fit with the LOWESS smoother
is biased towards the PL UARG years, the spring period, and high flows. On second
and higher iterations of the backfitting algorithm, chronological and seasonal bias
are reduced as the relation is determined on data that have been de-trended and
de-seasonalized.
To optimize the model fit, it is necessary to understand clearly the respective
roles of the model components. In seasonal and chronological time respectively,
seasonal St and trend Tt are expected to represent temporally local concentration
norms where norms are interpreted as concentrations that prevail most of time.
Consequently, model estimates St and 1\ should be close to the median concen-
trations expected at time t. In contrast, the modelled covariate effect Ct must
represent as precisely as possible the contribution of high flows that are generally
atypical of seasonal flow norms; hence, maintaining a bias towards high flows is de-
sirable. Tuning the fitting procedures for 'St
and 1\ to best achieve their objectives,
will contribute to attaining the best fit possible of Ct.
Iteration does not necessarily eliminate all the potential distortions introduced
by sampling biases, but additional measures can be applied to optimize the fit of
specific components. Heavy chronological and seasonal sampling biases introduce
autocorrelation effects that play havoc with the LOWESS filter currently used to fit
trend component Tt • LOWESS smoothing is controlled by a nearest neighbour spec-
ification. For time series smoothing, it performs best when data density is relatively
uniform over the time horizon of the series. More uniform chronological data density
was achieved in previous work by reducing data to a series of monthly mean concen-
278 B. A. BODO ET AL.
trations before applying the LOWESS smoother. The procedure could be improved
in the following ways. Because Tt should represent temporally local concentration
norms, replacing the arithmetic mean with a median or a bi-weight mean (Mosteller
and Tukey, 1977) would provide resistance to abnormal concentrations. Further im-
provement can be obtained by introducing sample weights Wf <X 6i = (ti+l -ti-l )/2
that restrict the influence of closely spaced samples. Component Tt is now estimated
as the LOWESS smooth of a reduced series of monthly weighted means. Experience
with a wide variety of low frequency water quality series (Bodo, 1991) suggests that
Tt is most often best fit by tuning LOWESS to approximate smoothing at a fixed
temporal bandwidth of about one year by specifying the number of nearest neigh-
bours mas 1-2 times the typical annual sample density. Individual estimates
the original sample times may be derived by interpolating the smoothed output of
n at
the reduced series.
Like trend term Tt , St should represent seasonal variation in concentration
norms; hence, a robust central tendency statistic is again preferred over an arith-
metic mean. Approximating St as a smooth function by replacing the monthly
mean sequence - essentially a bin filter with a mid month target point - with the
output of a fixed bandwidth running bin filter applied on a uniform grid of target
points distributed throughout the year will, by at least a small amount, better the
fit of recurrent seasonal variation by eliminating end of month step discontinuities.
Generally, a bandwidth of one month, or for simplicity 1/12 year, eliminates prob-
lems with high spring sampling densities and provides a sharp seasonal fit that is
rarely improved by smaller bandwidths even with very high frequency water qual-
ity data. A current implementation, the seasonal running median rSRM] smoother
(Bodo, 1989, 1991), employs 100 target points spaced every 3.65 days in non leap
years. This may be excessive, but the added computational burden is small.
For the ETM river data, the robust SRM :filter may yet generate output biased
to heavily sampled years and flows higher than the expected seasonal norms. If the
median is replaced by a weighted mean as the measure employed to summarize the
observations within bins, weights can be introduced to minimize the influence of
(a) heavily sampled years, and (b) sample flows remote from the expected seasonal
mean daily flow norms. Weights wf defined previously can be applied to treat the
chronological bias. Weights w? to minimize flow bias for sample i at chronologi-
cal time ti could be proportioned as the probability of occurrence of sample flow
Qi conditional on the seasonally local distribution of mean daily flows encountered
at the corresponding seasonal target point Ti within a bin of the same bandwidth
employed for seasonal smoothing. Alternatively, the flow weight might be made
inversely proportional to the distance of sample flow Qi from the seasonal median
of mean daily flows at time Ti. Some experimentation will be necessary to obtain
acceptable weights. The influence of both chronological and flow bias on the def-
inition of the seasonal St may ultimately prove inconsequential; nonetheless, the
effects are presently unknown and some investigation is required.
The covariate term Ct may also be influenced by chronological and seasonal
sampling biases. Applying the chronological weights Wf should approximately
equalize the influence of all years on the fitted Ct. It is reasonable to ignore the
seasonal sampling bias which is largely equivalent to the bias towards higher flows.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 279
(5)
This requires smoothing concentrations over the two dimensional cylindrical field
defined by the circular seasonal time dimension and the longitudinal covariate di-
mension. The resulting response surface inherently includes covariate-seasonal inter-
actions. LOESS (Cleveland and Devlin, 1988; Cleveland and Grosse, 1991), the mul-
tivariate extension of the LOWESS filter, has been used to fit C05t within a back-
fitting algorithm, but other procedures such as two dimensional spline smoothers
could be considered.
Experimental results (Figure 3) confirm suspicions that the shape of the con-
centration response to flow can vary through the course of the year. For the Grand
and Saugeen Rivers, variations are generally confined to low flows which suggests
that potential bias in the additive covariate-seasonal fit 8t + St of model (2) is
small and may not unduly affect annual mass-discharge estimates. Though promis-
ing, model (5) currently remains best suited as a conceptualization aid. The main
difficulties lie in optimizing the degree of smoothing obtained by the LOESS fil-
ter. Irregular water quality data distributions over the C05t field may confound
the usual automatic smoothing control selection procedures such as cross valida-
tion (Hastie and Tisbshirani, 1990) which perform best with relatively uniformly
distributed data. Currently, the best approach is by graphical comparison of the
respective two dimensional and unidimensional filter outputs on thin strips of the
C05t field. For example, for each month, concentration-discharge can be plotted
with the LOWESS fit for that month's data and the LOESS fit at the mid point
of that month. Similarly, for a series of discrete flow ranges, seasonal plots may be
developed with overlays of the respective SRM filter output and the LOESS fit at
the range mid point. With experimentation, it is expected that objective rules can
be formulated for adapting LOESS and other two dimensional smoothers to jointly
fitting the covariate and seasonal effect as the two dimensional field C05t •
AUTOCORRELATION EFFECTS
While autocorrelation effects can be eliminated during fitting of the systematic
model components, to simulate daily water quality concentrations it is necessary to
develop a model for autocorrelation structure that is evident in the residual noise
Ci of model (2) particularly during periods of high flow. To first approximation,
simple autoregressive or ARMA error structures will probably suffice. Practically,
280 B. A. BODO ET AL.
-.....I
~
""- ~
E: ~
e ~
e
t::J)
~
"'i
.~
E: ~
-
"'i
Cl... "I-
"-
<:;::) ~
t::J)
C ~
-.....I ....:
error models can be fit only to the series of daily concentrations derived by reduc-
ing multiple sample concentrations obtained on certain runoff event days to a single
value. Presuming that a suitable noise model can be fit, concentrations can then
estimated for missing days j by Monte Carlo methods. As the task involves filling
gaps in the concentration series, closure rules are required to assure that concen-
tration trajectories simulated forward from the left end of the gap, close within
plausible proximity of the next observed concentration at the right end of the gap.
HYSTERESIS EFFECTS
Hysteresis is a secondary effect that may merit attention for some water quality
variables. The hysteresis effect is observed over runoff events when the concentra-
tion peak precedes the discharge peak with the net result that for the same flow,
higher concentrations are observed on the rising limb of the hydrograph than on the
recession limb. Sediment and sediment associated variables like P have exhibited
hysteresis in both the Grand and Saugeen River monitoring records. As a result, the
covariate response function underestimates concentration on rising flows and over-
estimates on receding flows. The net effects on cumulative annual mass-discharge
estimates are likely small; nonetheless, some investigation is necessary to verify this
hypothesis.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 281
(6a)
(6b)
for the two respective reference periods where the systematic components C}j) and
S~j) have been fit separately to two multi-year segments of the record. Prelimi-
nary analysis suggests that nutrient P appears relatively unaffected, but further
confirmatory analysis is warranted.
In contrast to P, the nutrient species total nitrate (NO;=N03" +N02") is a dis-
solved ionic constituent with unique biochemical response. As the Saugeen lliver
plot (Figure 4) shows, a well defined seasonal cycle dominates the series. Discharge
variation is a virtually negligible determinant of seasonal NO; concentration vari-
ation which is driven by the annual growth-decay cycle of terrestrial and aquatic
biomass. In agricultural watersheds, year-to-year shifts in mean level reflect mainly
variations in annual fertilizer application rates. For the Saugeen lliver, the annual
amplitude increases as annual mean level increases. Also the amplitude is affected
asymmetrically as the late winter maxima increase disproportionately relative to
the late summer minima that vary only from one year to the next.
While the fixed amplitude seasonality models of (6a,b) are adequate to evaluate
changes in mean annual NO; mass delivery between two reference periods with
stable mean levels, introducing flexible models that permit year-to-year variation
seasonal response would be advantageous for NO; series. Specifically the variable
amplitude Fourier harmonic representation
....I
.........
OJ
3
. . :
~
E .0'
o
c:
-
2
~'-
c:
~
c:
8
0
74 76 78 80 82 84 86 88 90
SUMMARY
Nonpoint source agricultural pollution of waterways is a stochastic process driven
by episodic hydrometeorological events. Thus, riverine NPS mass delivery reduction
targets must be estimated as mean annual loads over a suitably long, representative
hydrological sequence. It is proposed to estimate the cumulative gradual impacts of
farm scale NPS remediation measures implemented in the headwater catchments of
southwestern Ontario Great Lakes tributaries by modelling and simulation of water
quality concentration time series. Because the crucial systematic model components
characterizing the river basin's hydrochemical response, namely discharge and sea-
sonal effects, have remained largely stable over the horizon of the available data
sets, probable mean annual NPS mass delivery reductions may be estimated using
water quality time series simulated to represent (1) pre-remediation, and (2) post-
treatment scenarios. It is expected that good results can be obtained by treating
water quality series as generalized additive models and fitting the systematic model
components by nonparametric smoothing filters. Preliminary trials with arbitrarily
chosen smoothers have been able to capture 50-75% of the variability in available
data series for the major nutrients phosphorus and total nitrates. Detailed inves-
tigations of sampling biases, secondary systematic effects and correlated residual
error structure embedded in the available data are required to obtain an optimal
model fit and ultimately to assure that good quality estimates of NPS mass delivery
reductions are available to provide policy makers with a means a to assess progress
towards the Lake Erie phosphorus reduction targets and, generally, the success of
agricultural NPS remedial measures being applied in southwestern Ontario.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 283
ACKNOWLEDGEMENT
The first author's efforts were in part supported by a Natural Sciences and Engi-
neering Research Council of Canada grant.
REFERENCES
Baker, D.B. (1988) Sediment, nutrient and pesticide transport in selected lower
Great Lakes tributaries, EPA-905/4-88-00l, U.S. Environmental Protection Agency,
Great Lakes National Program Office, Chicago.
Baker, D.B. (1992) "Roles of long-term chemical transport in river basin manage-
ment", In Abstracts, 13th Annual Meeting, November, 1992. Society of Environ-
mental Toxicology and Chemistry.
Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly
spaced water quality time series", Environ. Monitoring Assessment, 12, 407-428.
Bodo, B.A. (1991a) "Trend analysis and mass-discharge estimation of atrazine
in southwestern Ontario Great Lakes tributaries: 1981-1989", Environ. Toxico!.
Chem., 10, 1105-1121.
Bodo, B.A. (1991b) TRENDS: PC software, user's guide and documentation for
robust graphical time series analysis of long term surface water quality records.
Environment Ontario, Toronto.
Bodo, B.A., and Unny, T.E. (1983) "Sampling strategies for mass-discharge estima-
tion", ASCE J. Environ. Eng. Div., 109, 812-829, 1984; "Errata and discussion",
110, 867-871.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter-
plots", J. Am. Stat. Assoc., 74, 829-836.
Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: An approach
to regression analysis by local fitting", J. Am. Stat. Assoc., 83(403),596-610.
Cleveland, W.S., and Grosse, E. (1991) "Computational methods for local regres-
sion", Statistical Computing, 1, 47-62.
Cleveland, W.S., Devlin, S.J., and Terpenning, I.J. (1979) "SABL A resistant sea-
sonal adjustment procedure with graphical methods for interpretation and diagno-
sis" , In A. Zellner {ed.) Seasonal Adjustment of Economic Time Series, U.S. Dept. of
Commerce, Bureau ot the Census, Washington, D.C.
Cochran, W.G. (1977) Sampling Techniques, 2nd ed., Wiley, New York, NY.
Coote, D.R., MacDonald, E.M., Dickinson, W.T., Ostry, R.C., and Frank, R. (1982)
"Agriculture and water quality in the Canadian Great Lakes Basin: I. Representa-
tive agricultural watersheds", J. Environ. Qual., 11(3), 473-481.
Dolan, D.M., Yui, A.K., and Geist, R.D (1981) "Evaluation of river load estimation
methods for total phosphorus", J. Great Lakes Res., 7, 207-214.
EI-Shaarawi, A.H., Esterby, S.R., and Kuntz, K.W. (1983) "A statistical evaluation
of trends in water quality of the Niagara River" , J. Great Lakes. Res., 9(2), 234-240.
Eubank, R.L. (1988), Spline Smoothing and Nonparametric Regression, Dekker,
New York.
Frank, R., Braun, H.E., Holdrinet, M.V.H., Sirons, G.J., and Ripley, B.D. (1982)
"Agriculture and water quality in the Canadian Great Lakes Basin: V. Pesticide
use in the 11 agricultural watersheds and presence in stream water, 1975-1977",
J. Environ. Qual., 11(3),497-505.
284 B. A. BODO ET AL.
Hastie, T.J., and Tibshirani, R.J. (1990) Generalized Additive Models, Chapman
and Hall, London.
IJC (1983) Nonpoint source pollution abatement in the Great Lakes Basin: an
overview of post-PLUARG developments, International Joint Commission, Great
Lakes Regional Office, Windsor, Ontario.
IJC (1987) Revised Great Lakes Water Quality Agreement of 1978 as amended by
Protocol signed November 18, 1987, International Joint Commission, Great Lakes
Regional Office, Windsor, Ontario.
Miller, M.H., Robinson, J.B.,Coote, D.R., Spires, A.C., and Draper, D.W. (1982)
"A~riculture and water quality in the Canadian Great Lakes Basin: III. Phospho-
rus ,J. Environ. Qual., 11(3), 487-493.
Neilsen, G.H., Culley, J .L.B., and Cameron, D.R. (1982) "Agriculture and water
quality in the Canadian Great Lakes Basin: IV. Nitrogen", J. Environ. Qual., 11(3),
493-497.
McLeod, A.I., Hipel, K.W., and Camacho, F. (1983) "Trend assessment of water
quality time series", Water Resour. Bull., 19, 537-547.
McLeod, A.I., Hipel, K.W., and Bodo, B.A. (1991) "Trend analysis methodology
for water quality time series" , Environmetrics, 2(2), 169-200.
Mosteller, F., and Tukey, J.W. (1977) Data Analysis and Regression: A Second
Course in Statistics, Addison-Wesley, Reading, MA.
Preston, D.S., Bierman Jr., V.J., and Silliman, S.E. (1989) "An evaluation ofmeth-
ods for the estimation of tributary mass loads", Water Resour. Res., 25,1379-1389.
Richards R.P., and Holloway, J. (1987) "Monte Carlo studies of sampling strategies
for estimating tributary loads", Water Resour. Res., 23, 1939-1948.
Richards R.P., and Baker, D.B. (1987) "Trends in nutrient and suspended sediment
concentrations in Lake Erie tributaries, 1975-1990", J. Great Lakes Res., 19(2),
200-211.
Tin, M., (1965) "Comparison of some ratio estimators", J. Am. Stat. Assoc., 60,
294-307.
Wall, G.J., Dickinson, W.T., and Van Vliet, L.P.J. (1982) "Agriculture and wa-
ter quality in the Canadian Great Lakes Basin: II. Fluvial sediments", J. Envi-
ron. Qual., 11(3), 482-486.
Young, T.C., and DePinto, J.V. (1988) "Factors affecting the efficiency of some
estimators of fluvial total phosphorus load", Water Resour. Res., 24, 1535-1540.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR
SUDBURY, ONTARlO 1973-1992
Historically, Clearwater Lake on the Precambrian Shield near Sudbury, Ontario, has
received significant acid deposition from local smelters and remote sources. From
1.46x106 tonnes (t) in 1973, local S02 emissions fell to .64x106 tin 1991. To assess
lake response, temporal trends were examined for 26 water quality variables with
records dating from 1973-1981. But for brief drought-induced reversals, aqueous
SO!- fell steadily from 590 p.eq/L in 1973 to under 320 p.eq/L in 1991-92, while pH
rose from 4.3 in 1973 to exceed 5 in May 1992 for the first time on record. Dispropor-
tionate lake response to local S02 emission reductions suggests that remote-source
acid deposition is an important determinant of Clearwater Lake status. Chloride
adjusted base cation, AI and Si trends mirror SO~- trends indicating that geochemi-
cal weathering is decelerating as acid deposition declines. Lake levels of toxic metals
Cu and Ni derived from local smelter emissions seem to have fallen appreciably in
recent years and there has been a small surge in biological activity that may have
declined abruptly in 1991. With its unique long term record, continued sampling in
Clearwater Lake is advised to monitor the success of local and remote S02 emission
reduction commitments in the U.S. and Canada.
INTRODUCTION
For nearly a century terrestrial and aquatic ecosystems in the Sudbury area of
northeastern Ontario have suffered the consequences of mining and smelting of
nickel and copper bearing sulphide ores. Reports in the 1960s of soils and acidic
surface waters (Gorham and Gordon, 1960a,b) with elevated levels of metals Cu
and Ni (Johnson and Owen, 1966) prompted comprehensive limnological studies of
Sudbury area lakes begun in 1973 (Dillon, 1984; Jeffries, 1984; Jeffries et al., 1984).
Clearwater Lake, in a small headwater catchment 13 km south of the large nickel
smelter in Copper Cliff (Figure 1) was maintained as an unmanipulated control for
neutralization and fertilization experiments conducted in downstream lakes. The
past two decades have seen substantial reductions in local smelter emissions of acid
precursor sulphur dioxide (S02) and metals Cu and Ni. Pre-1972 mean annual
S02 emissions of 2.2 x 106 tonnes (t) fell to a mean of 1.41 x 106 t/yr over 1973-
1977 (Figure 2). Low emissions over 1978/79 and 1982/83 reflect extended smelter
285
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 285-298.
© 1994 Kluwer Academic Publishers.
286 B. A. BODO AND P. J. DILLON
• SMELTER SITES
(Conislon inoclive)
"Y
• FALCONBRIDGE
.CONISTON
"""" "i,"
oI
5
I
10
I
?7 CLEARWATER LAKE Km
81°00'
shutdowns. S02 emissions have declined gradually from .767 x 106 t/yr in 1984 to
.642 x 106 t/yr in 1991 for an average yearly decrease of 17,900 t, at which rate
target emissions of .47 x 106 t/yr or 1/2 the 1980 level should be reached in about
10 years.
The Sudbury area also receives acid deposition from long range southerly air
flows. Though quantitative estimates vary, eastern North American sulphur emis-
sions peaked about 1970, declined significantly to about 1982 and then stabilized
through 1988 (Dillon et al., 1988; Husar et al., 1991). Bulk deposition data south of
Sudbury in Muskoka-Haliburton for 1975-1988 (Dillon et al., 1988; OMOE, 1992)
and west of Sudbury in the Algoma highlands for 1976-1985 (Kelso and Jeffries,
1988) reflect the general eastern North American emission trends. As local and
remote S02 emissions have declined, water quality has improved in the Sudbury
region's acid-stressed lakes (Dillon et al. 1986; Gunn and Keller, 1990; Keller and
Pitibaldo, 1986; Keller et al., 1986; Hutchinson and Havas, 1986); however, some
lakes remain beset by chronic acidity and heavy metal concentrations that would
likely prohibit the recovery of desirable aquatic biota. Recently, Keller and Pitibaldo
(1992) noted that ongoing de-acidification trends in Sudbury area lakes reversed
following drought in 1986/87. In addition to dry acid deposition that accumulates
steadily throughout dry periods, drying and exposure to air induces re-oxidation
of reduced sulphur retained in normally saturated zones of the catchment. Thus,
when it resumes, normal precipitation can generate substantial acid loads. Keller
and Pitibaldo present data for Swan Lake in the watershed neighbouring Clearwater
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 287
Lake that shows drought related acidification began as the drought ended in late
1987, was most prominent in 1988 and had begun to subside in 1989.
Spanning nearly 20 years, Clearwater Lake records offer a unique, ongoing
chronicle of de-acidification processes. The primary objectives of the present work
were to rigorously review the entire historical data base from 1973, to update the
chemical status of Clearwater Lake, last reported by Dillon et al. (1986) and to
examine trophic indicators for signs of incipient biological recovery.
DATA VALIDATION
Ionic species data were examined for consistency with (a) carbonate equilibria,
(b) electroneutrality, and (c) equivalent conductance. Validation exercises were
generalized as tests of agreement between two independently determined quantities
within the framework of robust regression analysis, specifically, Rousseeuw and
Leroy's (1987) 'high breakdown' Least Median of Squares [LMS] regression which
can withstand contamination by up to 50% outliers. Resistance to outliers imparts
greater confidence to the results and means to assess the frequency of measurement
errors - an important aspect of retrospective data analysis.
288 B. A. BODO AND P. 1. DILLON
Carbonate Equilibria
At the low pHs ranging from 4 to 5 in Clearwater Lake over 1973-1992, the definition
of Acid Neutralizing Capacity (ANC) in fresh surface waters simplifies to
(1)
where concentrations are expressed in equivalents. Virtual 1:1 correspondence be-
tween concurrent Gran alkalinity (ALKG) and negative hydrogen ion data (Figure
4) confirms that ALKG accurately represents Clearwater ANC. Scatter in the lower
left quadrant is due to the earliest ALKG measurements of 1980/81. Robust regres-
sion between total alkalinity ALKT and -[H+] yields similar results. The relation
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 289
Figure 2. Sudbury smelter S02 annual missions; Figure 3. Annual precipitation deviations
1960-1991. from norms; Sudbury airport 1956-1990
200
E
E
I/)
100
E
(;
I:
E o
g
~
::s
t::
[-100
CD
o
-200
-5r---------------------~
115
E
-15 Q.
~ 105
~
I:
~ 95
"8
I:
E 85
CD
"ii
>
'3
ti1 75
-45
between the two alkalinity measurements is less precise than their respective rela-
tions with hydrogen ion; however, with ALKT as independent variable, the intercept
and slope coefficients [-31.2, .731] are remarkably close to the theoretical approxi-
mation [-31.1, 1.0] of Harvey et al. (1980, Appendix V) for acidified Precambrian
Shield waters.
where all [.] are analytical total concentrations in p.eq/L except [Alm+] which should
be total monomeric Al concentration. Available data for total aluminum; however,
supplementary speciation data suggest that most Al in Clearwater Lake samples is
inorganic monomeric. In (2a), net charge m of monomeric Al species is unspecified.
According to speciation models (e.g., Schecher and Driscoll, 1987), below pH 4 most
monomeric Al exists as AIH. As pH rises from 4 to 5 - the change in Clearwater
Lake from 1977-1992 - the AlH fraction gives way to complexes that yield a net
Al charge approaching 2. A comprise value of m = 2.5 was selected.
Data quality was investigated via the relationships between C+ and C-, and
between ANCE and titrated ALKG. A set of 172 'complete' ionic records for 1977-
1992 was prepared by substituting time series model estimates (see Trend Analysis
section) for missing observations if no more than two of the major species (Ca2+,
Mg2+, Na+, CI-, SO!-) were absent. Before 1983, fluoride was estimated as [F-] =
1.57 + .0181 [SO;-] where [·1 are p.eq/L. Given the numerous potential sources
of measurement error, the C1earwater charge balance for 1977-1991 shows good
agreement with only a few outliers. Because ANCE estimates at Clearwater Lake pH
comprise largely ·the cumulative measurement error of 13 ionic species, the ANCE-
ALKG relationship was not especially insightful.
Equivalent conductance (Laxen, 1977) was calculated for 167 'complete' 1977-
1992 ionic records with concurrent specific conductance measurements. Figure 5,
the plot of equivalent conductance against independently measured conductance
with robust LMS regression line, shows good agreement but for several, mostly pos-
itive, outliers occurring mainly in 1977 when pHs fell to about 4, the lowest recorded
observations. Were true pH 4.2, an incorrect measurement of 4 would overestimate
[H+] equivalent conductance by 13 p.S/cm which would explain the conductance dis-
crepancies of 1977. Regression of III records for 1981-1992 reveals that since 1981,
about 3.6% of ionic records contain potentially erroneous measurements; however,
in terms of total measurements performed, the error frequency is well below 1%.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 291
Zi = Ti + Ci (3a)
Zi = Ti + Si + C; (3b)
where index i corresponds to sample time ti and Zi is the data series comprising
trend Ti representing the temporally local level of the constituent, optionally sea-
sonal Si representing stable annually recurrent phenomena, and a residual noise Ci.
Zi may be either raw concentrations G;, or an appropriate skew reducing transform
of the raw series. Time trends in the sense of changes in Ti over time and the signifi-
cance of seasonal cycle Si are judged by visual inspection of graphics and statistical
diagnostics including the seasonal Mann-Kendall test (Hirsch et al., 1984).
Often, it is most expedient to initially assume seasonal adjustment model (3b)
which is solved iteratively for Ti and Si, and revert to the simple model (3a) if
seasonal effects are negligible. With model (3b) plots are generated of (a) trend Ti
superimposed on the seasonally adjusted series Zi-S;, and (b) the relative seasonal
Si against the de-trended series Zi-Ti. Numerically, T; is fit by modified LOWESS
smoothing filters (Cleveland, 1979) controlled to approximate short term year-to-
year trends. Better results for a few variables (DOC, Secchi depth, Chlorophyll
a) were obtained with a companion algorithm that fits Ti as the calendar year
medians of de-seasonalized data in which case Ti plots as a step function. Seasonal
Si is a zero centred function fit by a fixed bandwidth running median filter (Bodo,
1990) applied to de-trended series reordered by date within the year. Filter design
minimizes distortions introduced by seasonally variable sampling density.
a
cations adjusted by subtracting chloride equivalence, i.e, C = [Ca2 +)+[Mg2+)+
[Na+)+[K+)-fCI-), mirrors the SO~- trend including the 1987/88 drought-induced
reversal. Simiiar trends are strongly evident in Al and Si, and weakly in Mn and F.
Collective ionic trends between 1977/78 and 1990/91 are summarized in Figure 8.
Substantial decreases in H+, Alm+ and SO~- charge equivalence are compensated
by road salt constituents Ca2+, Na+ and CI-; so that, ionic strength is virtually
identical for the two periods.
Trend Results: Heavy Metals
Heavy metals Cu, Fe, Ni and Zn have been significant components of Sudbury
smelter emissions (Hutchinson and Whitby, 1977; Chan et al., 1984). While Cu, Fe
and Ni are emitted as coarse particulate that deposits near the source, Zn emissions
are fine particulates prone to wider dispersal. Concern focuses mainly on 'toxic
metals' Cu and Ni which are present at significantly elevated levels in precipitation
near the Sudbury smelters (Jeffries and Snyder, 1981; Chan et al., 1984). Cu and
Ni trends (Figure 9) generally parallel the declining SO~- levels except for the
1987/88 drought-induced reversal. Recent 1990/91 data show substantial decreases
in lake concentrations to 15-20 p.g/L Cu and 70 p.g/L Ni; however, since 1989,
measurements are too few to consider these encouraging results conclusive. Zn
shows similar but less dramatic decline as levels have fallen from near 50 p.g/L in
1976 to 10-15 p.g/L in 1991. Other than a late 1970s decline that may be related to
reduced smelter emissions, Fe shows neither appreciable trend nor correlation with
any other time series.
Trend Results: N, P, DOC, Chlorophyll and Secchi depth
Trends for total N [TN~TKN+NOS"), organic N [ON=TKN- NHt), and inorganic
N [IN=N0S" +NHtJ are overlaid on Figure 10. Over 1973-1985, TN fell from 250 to
100 p.g/L, driven by declining IN as NOs dropped from 120 to 25 p.g/L and NHt
fell 50 to 15 p.g/L. The cause is unclear as N has not been associated with smelter
emissions (Chan et al., 1984) and no significant long range emission or deposition
trends have been noted since the early 1970s (Dillon et al., 1988; Husar et al.;
1991, OMOE, 1992). From 1984, TN climbed back to over 150 p.g/L mostly due
to a rise in ON to 100 p.g/L over 1989/90 indicating increased biological fixation
that declined somewhat in 1991. Organic N correlates strongly with DOC whose
1981-1987 median level of .4 p.g/L, rose to .75 p.g/L over 1988-1991, a small but
statistically significant increase (Figure 11).
Phosphorus has shown little significant trend except for a decline from pre-
1979 mean of 4.5 p.g/L to a mean of 2.7 p.g/L since then that is likely the artefact
of imp-roved measurement technology. Echoing recent organic N trends, after the
1986/87 drought P rose slightly in mean to 3.3 p.g/L over 1989/90 and declined
to 1.7 p.g/L in 1991. Except for a 1973 high of 1.25 mg/L and a 1991 record low
of .2 mg/L, annual median chlorophyll a concentrations have varied from .5-.85
mg/L about long term mean .66 mg/L with no appreciable trend. Historically,
annual median Secchi depth varied in the range 7-10 m about long term mean 8.6
m without perceptible trend; however, the 1991 annual median rises to 11.1 m,
matching a previously recorded high in 1973. Collectively, this suite of variables
suggests that, in the late 1980s, there was a modest increase in primary biological
productivity that may have fallen off abruptly in 1991.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 293
Figure 6. pH and 804 (dashed line) Figure 7. Chloride adjusted base cation
trends 1973-92. & aluminum (dashed line) trends.
r-----------------~1650
~r-----------------~600
5.0 \
',\ \ \
550
.
\
'.
"
4.6 o:::!.
...: \- Ii"
450 ::L
C5
en
\
"
4.2
'. -, \"\ " , 350
,.,
350
H
500-
Na 01
Na t
400- 90
Mg Mg
..J
IT 300-
~ 804
200- Oa 50
804
150
Oa
100-
0- 50 10
1977-78 1990-91 72 76 80 84 88 92
294 B. A. BODO AND P. J. DILLON
Figure 10. Organic, inorganic & total FiQure 11. DOC trends 1981-92.
Nitrogen trends 1973-92.
2~~~~~~~~~~~-----; 1.2.---------------,
......TN
xx
200
0.9 x _Xli(
1~
...J
:::!. Cio.6
g E
100
0.3
xx xx x x
x x
xx
FiQure 12. Total chlorophyll a trends Figure 13. Inorganic N seasonal plot.
1973-92.
60
x
..
-....
x
!IE lE ...J
•• •
x
x
Ci
~
40
• •
x ~
•
)( )(
:J
x 1::
xx x
1.0
xx IiCD
...J
0; "o
C
I
E
..••••,.•• -.
U
()
~-40
• ••
.... ' .....
• ••• •
• •• I "
•
~L-~...J..~~~~L-~...J..~....J
0.0 0.2 0.4 0.6 0.8 1.0
0.1 l...I...-'-'-...i....I.............J.....'-'-.i-.l..-'-'-...I...J..................~
72 76 80 84 88 92 Seasonal date as fraction of year
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 295
SEASONAL CYCLES
For most Clearwater time series, strong cnronological trends overwhelm season-
ality; nonetheless, examination of Si provides some useful insights into seasonal
biogeochemical cycling. Figure 13 shows the strong, distinctive seasonal function
for inorganic N that peaks between mid March and mid April (.2 yr) just before
ice break-up, drops quickly to the end of May (.4 yr) just before stratification, de-
clines more gradually to its annual minimum near the end of August (.7 yr), and
then rises steadily to the late winter peak. The declining phase reflects biomass
assimilation during the growing season, and the rising phase reflects IN release by
decaying organic debris. The rapid loss of IN between ice break-up and stratifica-
tion is mainly as NHt which then remains relatively stable until autumn overturn.
In contrast, N0 3 declines gradually from ice break-up to late August. Similarly,
NHt was preferentially consumed over N0 3 early in the growing season during
fertilization experiments (Yan and Lafrance, 1984) in downstream lakes. The net
IN seasonal amplitude for 1973-1992 is 60 ""giL to which NH+ and N0 3 contribute
equally. Seasonal amplitude of IN species is most likely level dependent. Analysis
of the log transformed IN series suggests that at recent IN levels of 50 ""giL the
expected seasonal amplitude is about 35 ""giL.
Other variables also show perceptible seasonal variation. The mineral ions
(CaH , MgH, Na+, K+, Mn H , 80~-, Cl-, F-) and conductance exhibit a uni-
modal pattern with a broad peak extending from ice formation (mid December) to
break-up (mid March to mid April) followed by rapid decrease to the annual low
about mid May that is likely the result of dilution by low ionic strength snowmelt
water. Though the net amplitude is only .4 mglL, silica has a strong cycle with a
broad peak from ice break-up to stratification followed by a decline to a late sum-
mer low. Aluminum has shown a generally similar pattern. Fe has weak bimodal
seasonality with a first higher peak that occurs after ice break-up and persists into
stratification before falling to a mid summer low followed by a smaller secondary
peak at autumn turn-over.
are poorly predicted due to drought effects, and (4) 1991 for which pH is underpre-
dicted. Forecasts of future steady state lake response under constant annual target
Sudbury smelter S02 emissions of .47x10G t degenerate to implausible predictions
2-6 years forwards of the 1990-91 Clearwater Lake median concentrations used as
initial values.
At 19 years, annual lake concentration series are too brief to reliably fit a
forecasting model; however, some tentative conclusions emerge. To varying degrees,
statistical model residuals and plots of lake concentrations against S02 emissions,
suggest that long term decline in Clearwater H+ and SO!- levels has been greater
than expected if lake status were governed by simple linear dynamical response to
local S02 emissions. Aerometric and bulk deposition data of the late 1970s (Scheider
et al., 1981; Chan et al., 1984) suggested that beyond the immediate vicinity of the
smelters (>5 km, Jeffries, 1984) acid deposition was dominated by remote sources.
Thus regression results suggest that the much of the drop in Clearwater SO!- and
H+ over 1973-1985 was attributable to general decline in North American S02
emissions over 1970-1982. Reasons for the continuing recent decline in Clearwater
SO!- and H+ cannot be assessed until concurrent bulk deposition figures become
available at sites remote from Sudbury smelters.
SUMMARY
Clearwater Lake continues to respond favourably to declining acid deposition from
local and remote sources, and declining heavy metal emissions from local smelters.
In May 1992, a pH reading above 5 was observed for the first time in recorded
water quality history and the lake is poised to experience significant biological re-
covery as further emission controls are implemented. Declining levels of chloride
adjusted base cations, aluminum, silica, manganese, and fluoride confirm that min-
eral weathering rates have decelerated. Concentrations of toxic metals Cu and Ni
have fallen appreciably over 1990/91. Since 1988, a small surge in biological activ-
ity occurred that appears to have declined abruptly in 1991 as indicated by DOC,
organic N, P, chlorophyll and Secchi depth data. Though droughts of 1975/76 and
1986/87 induced brief reversals of de-acidification trends, Clearwater Lake is rela-
tively drought resistant. Comparable data for neighbouring Swan Lake (Keller and
Pitibaldo, 1992) reveal that some Sudbury area waters may remain at serious risk
from episodic drought induced re-acidification and metal toxicity for some time after
acid emission targets are achieved. Clearwater Lake acid-base status has improved
disproportionately relative to local smelter S02 emission reductions supporting in-
dications that remote source acid deposition is an important determinant of surface
water status in the Sudbury area and that further improvements depend on re-
duced acid deposition from both local and remote sources. Maintaining adequate
surface water monitoring in an era of severe fiscal restraint presents an immediate
challenge. Metal analyses for 1990/91 are so sparse that the ability to characterize
ambient levels and time trends within practical time horizons is severely jeopardized.
With its unique long term record of de-acidification processes in an unmanipulated
headwater catchment, Clearwater Lake ranks foremost among Sudbury area sites
for continued surveillance to judge the success of remedial actions implemented in
Canada and the U.S. through the forthcoming decade.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 297
ACKNOWLEDGEMENT
The first author's efforts were supported by a research grant from Limnology Sec-
tion, Water Resources Branch, Environment Ontario.
REFERENCES
Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly
spaced water quality time series" , Environ. Monitoring Assessment 12,407--428.
Bodo, B.A. (1991) TRENDS: PC-Software, users guide and documentation for ro-
bust graphical time series analysis of long term surface water quality records, On-
tario Ministry of the Environment, Toronto.
Chan, W.H., Vet, R.J., Ro, C., Tang, A.J., and Lusis, M.A. (1984) "Impact oflnco
smelter emissions on wet and dry deposition in the Sudbury area", Atmos. Environ.,
18(5), 1001-1008.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter-
plots", J. Am. Stat. Assoc., 74(368), 829-836.
Dillon, P.J. (1983) "Chemical alterations of surface waters by acidic deposition in
Canada", p. 275-286, In Ecological Effects of Acid Deposition, National Swedish
Environment Protection Board, Report PM 1636.
Dillon, P.J. (1984) "The use of mass balance models for quantification of the ef-
fects of anthropogenic activities on lakes near Sudbury, Ontario", p. 283-347, In
J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York.
Dillon, P.J., Reid, R.A., and Girard, R. (1986) "Changes in the chemistry of lakes
near Sudbury, Ontario following reductions of S02 emissions", Water Air Soil Pol-
lut., 31, 59-65.
Dillon, P.J., Lusis, M., Reid, R., and Yap, D. (1988) "Ten-year trends in sulphate,
nitrate and hydrogen ion deposition in central Ontario", Atmos. Environ., 22,901-
905.
Gorham, E., and Gordon, A.G. (1960a) "Some effects of smelter pollution northeast
of Falconbridge, Ontario", Can. J. Bot., 38, 307-312.
Gorham, E., and Gordon, A.G. (1960b) "The influence of smelter fumes upon the
chemical composition of lake waters near Sudbury, Ontario, and upon the surround-
ing vegetation", Can. J. Bot., 38, 477-487.
Gunn, M.J., and Keller, W. (1990) "Biological recovery of an acid lake after reduc-
tions in industrial emissions of sulphur", Nature 345, 431--433.
Harvey, H., Pierce, R.C., Dillon, P.J., Kramer, J.R., and Whelpdale, D.M. (1981)
Acidification in the Canadian aquatic environment, Pub. NRCC No. 18475 of the
Environmental Secretariat, National Research Council of Canada, Ottawa.
Hirsch, R.M., and Slack, J.R. (1984) "A nonparametric trend test for seasonal data
with serial dependence", Water Resour. Res., 20, 727-732.
Husar, R.B., Sullivan, T.J., and Charles, D.F. (1991) "IDstorical trends in atmo-
spheric sulfur deposition and methods for assessing long-term trends in surface water
chemistry", p. 65-82, In D.F. Charles [ed.] Acid Deposition and Aquatic Systems,
Regional Case Studies. Springer-Verlag, New York.
Hutchinson, T.C., and Havas, M. (1986) "Recovery of previously acidified lakes
near Coniston, Canada following reductions in atmospheric sulphur and metal emis-
sions", Water Air Soil Pollut., 29, 319-333.
298 B. A. BODO AND P. 1. DILLON
Hutchinson, T.C., and Whitby, L.M. (1977) "The effects of acid rainfall and heavy
metal particulates on a boreal forest ecosystem near the Sudbury smelting region
of Canada", Water Air Soil Pollut., 7, 123-132.
Jeffries, D.S. (1984) "Atmospheric deposition of pollutants in the Sudbury area",
p. 117-154, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New
York.
Jeffries, D.S., Scheider, W.A., and Snyder, W.R. (1984) "Geochemical interactions
of watersheds with precipitation in areas affected by smelter emissions near Sudbury,
Ontario", p. 195-241, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley,
New York.
Jeffries, D.S., and Snyder, W.R. (1981) "Atmospheric deposition of heavy metals
in central Ontario", Water Air Soil PoI1ut., 15, 127-152.
Johnson, M.G., and Owen, G.E. (1966) Report on the biological survey of survey of
streams and lakes in the Sudbury area, 1965. Ontario Water Resources Commission,
46 pp.
Keller, W., Pitibaldo, J.R., and Carbone, J. (1992) "Chemical responses of acidic
lakes in the Sudbury, Ontario, area to reduced smelter emissions, 1981-89", Can.
J. Fish Aquat. Sci., 49(Suppl. 1), 25-32.
Keller, W., and Pitibaldo, J.R. (1986) "Water quality changes in Sudbury area
lakes: a comparison of synoptic surveys in 1974-76 and 1981-83", Water Air Soil
Pollut., 29, 285-296.
Keller, W., Pitibaldo, J.R., and Conroy, N.!. (1986) "Water quality improvements
in the Sudbury, Ontario, Canada area related to reduced smelter emissions", Water
Air Soil Pollut., 31, 765-774.
Kelso, J .R.M., and Jeffries, D.S. (1988) "Response of headwater lakes to varying at-
mospheric deposition in north central Ontario, 1979-1985" , Can. J. Fish Aquat. Sci.,
45, 1905-1911.
Laxen, D.P.H. (1977) "A specific conductance method for quality control in water
analysis", Water Res., 11, 91-94.
Locke, B.A. and Scott, L.D. (1986) Studies of Lakes and Watersheds in Muskoka-
Haliburton, Ontario: Methodology (1976-1985), Ontario Ministry of the Environ-
ment, Data Rep. DR-86/4, Dorset, Ontario, Canada.
OMOE (1982) Studies of lakes and watersheds near Sudbury Ontario: finallimno-
logical report, supplementary volume 10. Sudbury Environmental Study, Ontario
Ministry of the Environment, Toronto.
OMOE (1992) Summary: some results from the APIOS atmospheric deposition
monitoring program (1981-1988). Environment Ontario, Toronto.
Rousseeuw, P.J., and Leroy, A.M. (1987) Robust Regression and Outlier Detection.
Wiley, New York.
Schecher, W.D., and Driscoll, C.T. (1987) "An evaluation of uncertainty associated
with aluminum equilibrium calculations", Water Resour. Res., 23(4),525-534.
Scheider, W.A., Jeffries, D.S., and Dillon, P.J. (1981) "Bulk deposition in the Sud-
bury and Muskoka-Haliburton areas of Ontario during the shutdown of Inco Ltd in
Sudbury" , Atmos. Environ., 15, 945-956.
Yan, N.D., and Lafrance, C. (1984) "Responses of acidic and neutralized lakes
near Sudbury, Ontario, to nutrient enrichment", p. 457-521, In J. Nriagu [ed.]
Environmental Impacts of Smelters, Wiley, New York.
PART VI
SPATIAL ANALYSIS
MULTIVARIATE KERNEL ESTIMATION OF FUNCTIONS OF SPACE
AND TIME HYDROLOGIC DATA
U. LALL. Utah Water Research Laboratory. Utah State Univ .• Logan'UT 84322-8200.
K. BOSWORTH. Dept. of Mathematics, Idaho State 'Univ:, poCatello, ID 83209-8500.
INTRODUCTION
METHODOLOGY
The pointwise bias and variance oft(u) depend on the sample size, the underlying density,
the scale parameter or bandwidth h, and the Kernel function. If the weight or Kernel
function K(.) is a valid p.d.f., t(u) is also be a valid p.d.f. Typical choices for K(.) are
symmetric Beta p.d.f.'s, and the Gaussian p.d.f. In terms of asymptotic Mean Square Error
(MSE) of approximation it has been shown that there is little to choose between the typically
used kernels. The critical parameter is then h, since it governs the degree of averaging.
Increasing h reduces variance of t(u) at the expense of increasing bias. Objective, data
based methods (based on minimization of a fitting metric with respect to the class of smooth
p.d.f.'s, and data based measures of their attributes, or with respect to a parametric p.d.f.
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 303
0.5 ·····1)~····L···~···t·············+···········+········.....L. . . . . ..
0.4 +--l!d::-o=--+-:--r---+..--l---l
c:
i
o
0.3
~
a.. 0.2
0.1 +---;---+----+--+---+---1
·----t--~---t·--t--
10.---~------~---------,
. .
6+----+'---0-+i----'~_0-~i~-~
~ 0 ~ 0
£ 4+--~+-=n~~~~~~~----4
o 91
: 0
0,
i
2 ···~·······;······· ..········t·-······-·····
o i i
00 : \
O+-~~~~-~+-~~~--l
0.1 0.2 0.3 0.4 0.5 0.6
Precipitation
· ----I----r----+--t-~--
:
6 ····_·········-r-··_·_···_··r·······o·······l·················t·__············
~ i 0 o~~ i
S 1001 o~ i
4 0:
rP.
2+-_~~~~Oi:o~~~~~_~0~
O+---~--~+-~-+~--~--~
-1.5 -1.3 -1.1 -0.9 -0.7 -0.5
net precipitation
Figure 1
Relationships between GSL Data (Solid Line is a Cubic Smoothing Spline)
304 U. LALL AND K. BOSWORTH
considered as a viable choice) for choosing h are available (see Scott (1992». Minima of
MSE based criteria with respect to h, tend to be broad rather than sharp. Thus, even when
using an objective method for choosing h, it is desirable to examine the estimate at a range
of h values in the neighborhood of the optimum.
In the multivariate setting K(.), two popular choices for K(.) are :
1 n d 1 .. '
"The Product Kernel": ~(u) =Ii' LIT h K.«u.- u ..)Ih')
i=l j=l j J J 1J J
with independent bandwidths hj and univariate kernels ~(.) located at ~j; and
·1(2 n
"Sphering:" t(u) = det (S) L K(h-2(U - u/8-\u - u J )
nhd i=l 1 1
where 8 is a dxd data covariance matrix, and K(v) =(21t)d/2e-v/2; i.e. a Gaussian function.
The motivation for these choices is that the attributes of the p.d.f are likely to vary by j, and
that a radially symmetric, multivariate kernel (i.e. h invariant with j) would be deficient, in
the sense that it may simply average over empty space in some of the directions j.
Consequently, at least variation of h by j is desirable. The product kernel has the dis-
advantage that it calls for the specification of d bandwidths by the user. Where a Gaussian
kernel is used, the product kernel is equivalent to the sphering kernel with an identity co-
variance matrix. If the variables of interest are correlated, sphering is attractive since it
aligns the kernel with the principal axes of variation of the data. The advantage of this is
striking, if the rank of the data covariance matrix, 8, is r« d, i.e., the data cloud in d
dimensions can be resolved completely in r linearly independent directions. Further, the
bias oft(u) is proportional to the Hessian matrix off(u). The matrix 8 is proportional to an
approximation to the Hessian. Thus sphering can help reduce the bias in t(u) by adjusting
the relative bandwidths. Wand and Jones (1993) show that while sphering offers significant
gains in some situations, it can be detrimental where the underlying p.d.f. exhibits a high
degree of curvature, and/or it has multiple modes that have different orientations.
It is clear that it is worth exploring locally adaptive bandwidth variation. For example,
suppose the underlying p.d.f. has two distinct modes that have different directions for their
principal axes, and perhaps involve different subsets r1 and r2 of the d variables. A
reasonable adaptation of the bandwidth would be achieved by partitioning the raw data into
the 2 appropriate subsets, and using matrices 81 and 82 that are aligned with the principal
axes of the two modes. This is indeed the strategy followed here. Our k.d.e. algorithm is
outlined in Table 1. A manuscript with a more formal presentation is in preparation.
First, it is desirable to scale the data so that all variables are compatible. This can be done by
mapping each coordinate to the interval [0,1], "normalizing", i.e. subtract the mean and
then divide by the standard deviation, or through an appropriate logarithmic or other
transform.
Then the scaled data is recursively partitioned into a k-d tree (Friedman (1979». An
illustration of this process with the q, p data is shown in Figure 2. The first partition was
for p, at 0.34. The two resulting partitions were then split along q, as shown. Another
iteration takes us to the eight partitions shown, each with 17 points. Note that requiring
each partition
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 305
TABLE 1
K.D.E. ALGORITHM
1. Scale the data so that each column has the same scale
2. Partition the data ui> i=1..n into a k-d tree (npan partitions)
- Split each partition at the median of the coordinate with'the greatest spread
- Stop splitting partitions if resulting number ofpoints in the partition <mink
mink =max(d+2,..Jn)
- Define indicator function Iij=1 if ui is, or 0 if it isn't in partition j
3. Compute a (dxd) Robust Covariance Matrix Cj for each partition j (see Appendix)
4. Perform a Singular Value Decomposition (SVD) of Cj = EjAjE{
where Ej is an eigenvector matrix (EjTEj=I) and Aj is an ordered (descending)
diagonal eigenvalue matrix corresponding to Ej- All matrices are (dxd)
5. Retain the rj leading components of the decomposition, such that
r.
J d
Dn .1 ~)'n . ~ crit ,where crit is 0.95 or 0.99
1=1 J 1=1 J
6. Form Sj = EjAjEjT ,where Ej is (dxrj), Aj is (rjxrj), and EjT is (rjxd)
8. For a partition of u defined as (y, x), where the dimension of y is q, and of x is (d-q),
the conditional p.d.f. f(ylx) is estimated as:
n npart n
t(ylx) =(I, I, I..G ..(y) i(x.) )/I,t(x.)
i=l .i=1 IJ IJ 1 i=l 1
where t(Xi) is the p.d.f. of x evaluated at Xi, and Gij(y) is a q-variate Gaussian p.d.f. with
mean Rj.
"
=y,+ S YX,J,S XX,J.(x - x,), and covariance Syy x J' =S YY,j,- SYX,j,S XX,j,S XY,j..
1
~
1 .,
~
0
0
0.8 0.55
-0.20 0
c
0 00
°ta
,
c
a
0
.(3 0.6 o 0
0 000
00 I 0
!!?
Q.
"iii
::l
j,':'c 0
0
c
c
1\1
0.4 0.2l oX; 0 0 o 0 4110
~ 0
00 0.17
"0
<II o 'b "0 OJ 0 0 0 0 0
Figure 2
to have the same number· of points leads to partitions that are large in data sparse regions.
The partition variance is larger, leading to a larger effective bandwidth. A natural adaptation
of the bandwidth to tails and modes of the data is thus effected. The emphasized numbers in
each box report the robust correlation between q and p values in the box. We find, the
clustering of data shown by a k-d tree to be a useful data exploratory tool as well. Here,
note that the partition corresponding to the highest p,q values has the highest p,q
correlation; correlation varies with partition; and that most partitions have p,q correlations
that are higher than the full sample value. As the number of points in a partition approaches
the dimension of the space, d; the matrix Cj becomes singular. This brings up the need to
decide on an optimum data partitioning.
Our approach is exploratory. The number of partitions is a smoothing parameter - bias
decreases, variance increases as the number of partitions increases. Arguments for
parsimony suggest that using a minimum number of partitions. Our experiments (also,
NSSS(1992» with a variety of synthetic data, suggest that a value of mink greater than d,
and somewhere between vn and n4/(4+<1) works well. Since the variation in mink is by
factors of 2, consistent results upon partitioning are significant. So the strategy is to form a
sequence of estimates, starting with the full sample.
A robust covariance matrix is computed for each partition, and a truncated singular value
decomposition of the matrix is performed. The resulting matrix Sj is then used for
multivariate k.d.e, as outlined. Robustness is important, since the effect of an outlier is
pronounced as the sample size shrinks upon partitioning. It can force the covariance matrix
to be near singular, and lead to the specification of a wild eigenvector sequence, and hence
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 307
of kernel orientation.
One can choose the bandwidth h "automatically" by optimizing a criteria such as maximum
likelihood or mean integrated square error (MISE) through cross validation. However, it is
known that such estimators have very slow convergence rates (O(n-l/l~ for d=I), high
variance (induced by fme scale structure), and are prone to undersmoothing. The recent
trend in the statistical literature (Scott (1992» is to dev.elop data based in~tl)ods for choosing
h, that use recursive estimation of the terms (involving derivatives of f(u» needed in a
Taylor series expansion of MISE, and thereby developing an estimate of the optimal h. In
the univariate case, this can get a close to theoretically optimal convergence rate of O(n- 1f2).
Similar developments in the multiva.tiate case are forthcoming. In the data exploration
context, choosing h by reference to a known parametric p.d.f. is reasonable. It has
Bayesian connotations - correct choice under proper specification, and adaptation under
mild mis-specification. Since we are interested in discerning structure in the data, it is
desirable to oversmooth to avoid focusing on spurious structure. Choosing h by reference
to a Gaussian p.d.f. is attractive in this context, since for a given variance, one would
oversmooth relative to a mixture, and the multivariate framework is well known. In our
context, Silverman (1986), gives the MISE optimal h with reference to a multivariate
Gaussian as:
1/(d+4) -1/(d+4)
h
opt
={4/(d+2)} n
It was pointed out earlier that the number of partitions is a smoothing parameter as well. We
have not yet determined the optimal h taking that into account The use of mink, instead of n
in the above has been used with some success. NSSS (1992) uses a strategy similar to our
kd.e., and simply takes h=l, regarding the estimator as a convolution of Gaussians with
locally estimated covariance structure. At this point, recall the insensitivity of MISE to h,
and note that increasing h, or reducing variance smooths out structure and vice versa,
without affecting MISE very much. If the goal is data exploration, varying h and noting
data structures resistant to h variations is desirable. We shall illustrate the effect of such
choices by example in the next section.
The conditional p.d.f. estimator (item 8, Table 1) is based on a weighted convolution of
conditional p.d.fo's centered at each observation, with weights proportional to the estimated
density at the point. One can also estimate this as t(u)h(x), or equivalently by taking the
appropriate slice out of an estimate t(u) and normalizing it.
The regression estimator (item 9, Table 1) is presented in Owosina et al (1992), and
compared with other nonparametric regression schemes for spatial estimation. NKERNEL
by NSSS(1992) has the same framework as described here, except for data partitioning,
and treatment of the covariance matrix (no SVD, no robustness).
APPLICATIONS
Selected kd.eo's for the data set introduced earlier are presented in figures 3 through 8. In
each case, the variable referenced first is on the x-axis, and second on the y-axis. The
kd.e. estimate of the p and e p.d.f. (fig. 3), with I partition and h=O.4 (Gaussian ref-
erence), appears consistent with a bivariate Gaussian density with a correlation coefficient
of -0.8. The k.d.e. of the q and p p.d.f. (fig. 4) was constructed with 8 partitions (as in fig.
2), and h=1 (oversmoothed), clarifies the features apparent from the clusters in figure 2.
308 U. LALL AND K. BOSWORTH
P
d
Evaporation
0.00 0.10 0.20 0.30 0.40 O.SO 0.60 0.70 0.80 0.90
Precipitation
Figure 3
Joint p.d.f. ofp and e, npart=l, h=O.42
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 309
0.96
Precipitation
P
70 r
e
c
p
i
t
a
t
i
0
n
10
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
Inflow
Figure 4
Joint p.d.f. of q and p, npart=8, h=O.42
310 U. LALL AND K. BOSWORTH
K.d.e.'s of the p.d.f. of q, p and e were also constructed and evaluated along a slice
defined through the line segment between (0,0.72) and (0.72,0) on the p,e axis (i.e. scaled
p-e=-O.72). This line segment corresponds approximately to the principal axis of the p, e
p.d.f. in figure 3. Figure 5 was constructed with 1 partition and a h chosen by reference to
a Gaussian p.d.f.. The kernel orientation and the bandwidth are global~y prescribed. We
see two weakly separated modes in the conditional p.d.f. in contourS that suggest a
skewed, p.d.f., with principal axes consistent with the eigenvalues and eigenvectors of the
(q,p,e) covariance matrix. In figure 6, we worked with four partitions, and h=1. With this
modification we see one mode in the p.d.f., but more complex structure in the joint p.d.f.
than in figure 5. Finally if' Figure 7, we worked with 4 partitions, but with h=O.4
(Gaussian reference). The structure of figure 6, has now sharpened into 2 modes with
major principal axes that are nearly orthogonal to each other. This is consistent with figure
1, where we saw low dependence between q and (p-e) in the middle range of the data. The
antimode between may reflect an instability in the q, (p-e) relationship, speculated about in
figure 1.
Correlation between q and the other variables is weak, and the n is small for estimating a
trivariate k.d.e., let alone its conditionals. The correlation between p and e is relatively
high. One would expect k.d.e. to perform poorly for conditionals of q on the other
variables. A direct k.d.e. of q and (p-e) (figure 8) is similar to the p.d.f. for the slice from
(q,p,e) for npart=1 and 4 (figures 5 and 7), h by Gaussian reference, but d=2, instead of 3.
The purpose of this application has been to illustrate k.d.e's potential for revealing the
underlying structure in hydrologic processes, as an aid to better understanding. We find the
nonparametric estimates (regression in fig.l, scatterplot organization in fig. 2 and p.d.fs in
the others) to be very useful tools for data exploration. The application shows the effect of
varying the bandwidth and the number of partitions on the resulting k.d.e. As expected
partitioning affects the orientation of the kernels, as well as the degree of local smoothing
and hence of the resulting p.d.f., while the bandwidth controls the degree of smoothing and
the ability to see modes. Varying both, clarifies the underlying structure.
SUMMARY
The utility of multivariate k.d.e. for using data to improve our understanding of hydrologic
processes is obvious. Model specification, as well as estimation of hydrologic variables can
be improved. Pursuit of k.d.e. to generate vitriolic debates on the multimodality of a data
set, or to justify one's favourite parametric p.d.f is counterproductive. Choices of models
and estimation procedures are always subjective at some level. The spirit of k.d.e is to
highlight features of the data set at the expense of estimation efficiency (in the classical
parametric sense). Clearly artifacts of the particular realization at hand are likely to be
highlighted as well as genuine underlying features. Implementations of multivariate k.d.e.
need to be carefully designed to balance such a trade-off.
The k.d.e. algorithm presented here was shown to be effective in circumventing the
detrimental effect of sphering with heterogeneous data as shown by Wand and Jones
(1993). The p.d.f. in figures 5 through 8 appears to be precisely of the type that would be
obscured by global sphering. We noted that was the case, and that the structure was
resolved upon partitioning. Further work on improving parameter specification is needed
and is in progress.
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 311
/'
//~ 1 .
25.00 /
22.50~
20.00-i
17.50 1
15 .00 ~
c
e
J-I-'-4-..L.1-LLj-.l....!....I....I-I-.Ll....L...LO.OO
0.00 0.21 0.42 0.62 0.83
Inflow
Figure 5
Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=1, h=O.42
312 U. LALL AND K. BOSWORTH
0.85
Inflow 042 0 .64
~0 0.21' p.e slice
o .00
p
+ H-+t-t-1H-++++ 0.64 ~
s
->-H-+t-t-1H-+O.42 I
c
e
Figure 6
Joint p.df. of q and slice of p,e along (0.72) to (0.72,0), npan=4,h=1
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 313
60.00
54.00
48.00
42.00
~ 36.00
~ 30.00
~ 24.00
l I8.00
~ 12.00
L6.00
0.85
0.64
Inflow
:;:3 0.21 0 .42 p,eslice
0 0.00
++-+-H++-+-H+ +++0.85
p.e slice
Figure 7
Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=4, h=O.45
U. LALL AND K. BOSWORTH
314
1-+-+++4--1-4+-1--W 0.8
(p-e)
Figure 8
Joint p.d.f. of q and (p-e), (a) npatt=l, h=O.42, (b)npart=4, h=O.42
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 315
APPENDIX
Robust estimators of scale and covariance as suggested by Huber [1981] are used. The pair
wise covariance Cij is computed as rijtitj' where ti is a robust estimator of standard
deviation obtained as 1.25(Mean Absolute Deviation of ui)' and rij is a robl,!st estimator for
correlation, given as:
2 2
t[(au.+bu.) ] - t[(au.-bu) ]
1 J 1 J
riO = 2 2
J t[(au.+bu.)] + t[(au.-bu.) ]
1 J 1 J
where a = 1/ti and b = 1/tj. Huber indicates that this estimator has a breakdown point of
1/3, i.e. up to 1/3 of the data can be contaminated without serious degradation of the
estimate.
ACKNOWLEDGEMENTS
The work reported here was supported in part by the U.S. Geological Survey through
Grant No. 14-08-0001-G1738 and in part through his 1992-3 assignment with the Branch
of Systems Analysis, WRD, USGS, National Center, Reston VA, while on sabbatical
leave.
REFERENCES
Precipitation data from Columbia River Basin was analyzed using different spatial esti-
mation techniques. Kriging, Locally weighted regression (lowess) and Smoothing Spline
ANOVA (SS-ANOVA) were used to analyze the data. Log(precipitation) was considered as
a function of easting, northing and elevation. Analysis by kriging considered precipitation
only as a function of easting and northing. Various quantitative measures of comparisons
were considered like maximum absolute deviation, residual sum of squares and scaled vari-
ance of deviation. Analyses suggested that SS-ANOVA and lowess performed better than
kriging. Residual plots showed that the distribution of residuals was tighter for SS-ANOVA
than for lowess and kriging. Precipitation seemed to have an increasing trend with elevation
but seemed to stabilize after certain elevation. Analysis was also done for Willamette River
Basin data. Similar results were observed.
INTRODUCTION
Spatial estimation of precipitation is of fundamental importance and a challenging task in
hydrology. It has significant application in flood frequency analysis and regionalization of
precipitation parameters for various watershed models.
The irregularity of sampling in space and the fact that precipitation exhibits substantial
variability with topography (i.e nonstationarity) makes the spatial estimation task more
difficult. Kriging is the most popular geostatistical techniques used by hydrologists for
spatial estimations. It assumes a priori specification of the functional form of the underlying
function that describes the spatial variation of the parameter of interest. Most often, this
assumption is never satisfied in nonstationary situations, resulting in possible errors in
the estimates. Akin (1992) has extensively compared kriging with other nonparametric
techniques on a large number of data sets, and found that kriging was inferior to all the
other methods. Yakowitz and Szidarovszky (1985) compared the theoretical properties of
kriging and kernel functions estimation and gave comparative results from Monte Carlo
simulations for one and two dimensional situations. The kernel estimator was superior in
their theoretical and applied analyses. These serve as a motivation for our exploratory data
analysis.
In tIllS paper, we present results from preliminary analysis of precipitation data from
mountainous region in Columbia River Basin for the relative performance of three methods
for spatial interpolation. The methods considered are kriging, locally weighted regression
(lowess) and smoothing spline analysis of variance (SS-ANOVA). The rest of the paper is
organized as follows. A brief discussion on kriging, SS-ANOVA and lowess are presented
first followed by a note on the study area, data set and statistical models. Comparative
317
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 317-330.
© 1994 Kluwer Academic Publishers.
318 J. SATAGOPAN AND B. RAJAGOPALAN
KRIGING
Kriging is a parametric regression procedure due to Krige (1951) and Journel (1977). It
has become synonymous with geostatistics over the last decade and represents the state of
the art for spatial analysis problems. Isaacs and Srivastava (1989) present a comprehensive
and applied treatment of kriging, while Cressis (1991) provides a comprehensive treatment
that covers much of the recent statistical research on the subject. Most of the work has
been largely focused on ordinary kriging. The model considered is
y=f(x)+€ (1)
where the function f(x) is assumed to have a constant but unknown mean and stationary
covariance, y is the observed vector and € is the vector of LLd noise. Most often the
assumptions for the function f are not satisfied, especially in the case of mountainous
precipitation. In our data analysis we have looked at ordinary kriging only. Cressie (1991),
Journel (1989) and de MarsHy (1986) have detailed discussions on the various types of
kriging and their estimation procedures as applied to different situations.
Kriging is an exact interpolator at the points of observation, and at other points it
attempts to find the best linear unbiased estimator (BLUE) for the underlying function
and its mean square error (MSE). The underlying function f( x) is assumed to be a random
function . f( x) and f( x + hx) are dependent random variables leading to ergodicity and
stationarity assumptions. The kriging estimate !k of f is formed as a weighted linear
combination of the observations as
n
fk(XO) =L )..o;y; (2)
;=1
where the subscript k stands for kriging estimate. The weights are determined through a
procedure that seeks to be optimal in a mean square error sense. The weights relate to the
distance between the point at which the estimate is desired and the observation points, and
to the degree of covariance between the observations as a function of distance, as specified
by the variogram r(h). The variogram is given as
(5)
;=1
where JL can be interpreted as a Lagrange multiplier for satisfying the constraint that the
weights sum to unity, in an optimization problem formed for minimizing the mean square
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 319
error estimation. AOi'S are obtained by solving the above two equations. The ideas of
Gaussian linear estimation are thus implicit in the kriging process.
The MSE of the estimator II< is given by Cressie (1991) as
n
MSE(j,,(xo)) = EAoir(Cxi) + I' (6)
i=1
i = 1,"',n (7)
where Ylo Y2, •• " Yn are observations, I is the function to be estimated, Xlo X2,' " , Xl: are
variables such that the jth variable Xj E Xi, some measurable space, and flo' • " fn are LLd
320 J. SATAGOPAN AND B. RAJAGOPALAN
with Ei '" N(O, 0- 2 ),0- 2 unknown. Usually the space considered is Xj = [0,1]. Whenever the
variables are not in the range [0,1], we can rescale them to lie in this range. Wahba (1990),
Gu (1989) and Gu and Wahba (1992) give an overview of the SS-ANOVA models. They
discuss applications to polynomial splines, tensor product splines and thin plate splines.
The SS-ANOVA model is described briefly in what follows.
The assumption in this model is f E H, where H is a Hilbert space. The function f
J:
is required to be smooth in its domain with f, f(1) absolutely continuous, f(2) E £2, where
f{i) denotes the ith derivative of f, and f(t)dt = O. The space H is uniquely decomposed
into a tensor sum as
(11)
and so on. These are similar to the side conditions in any analysis of variance model. The
SS-ANOVA procedure obtains 1>. as an estimate of f which minimizes the quantity
where A, (Ji, (Ji,j,' .• are smoothing parameters. J's are smoothness penalty functionals. The
polynomial space does not have any penalty. Kimeldorf and Wahba (1971) and Wahba
(1990) discuss different penalty functionals. Gu and Wahba (1991) discuss the idea of thin
plate splines and Bayesian confidence intervals for the main effect and interaction terms for
a thin plate spline. For further details, we refer the reader to Gu and Wahba (1991).
The publicly available code RKPACK by Gu (1989) enables one to fit tensor product
splines, polynomial splines and thin plate splines for SS-ANOVA models. We have used the
thin plate spline program in RKPACK for our data analysis.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECWITATlON ANALYSIS 321
(14)
where D is a design matrix and the coefficients /3 are determined as the solution to a
weighted least squares problem defined as
where k(x) is the index set of Xi that are the k nearest neighbors of x and Wi(X) is the
weight function defined as
() W(P(X,Xi»
Wi x = dk(x) (16)
where p(.) is the Euclidean distance function and dk(x) is the Euclidean distance from x to
the kth nearest neighbor in Xi. W(.) is usually taken as the tricubic weight function.
The number of nearest neighbors, k, acts as a smoothing parameter. As k increases,
bias increases, but variance decreases. The fit determined by lowess depends on the choice
of the number, k, of nearest neighbors and the order, r, of the fit. Cleveland et al., (1988)
propose a graphical method, based on analyzing an M-plot for the selection of both these
parameters. For further details we refer the reader to Cleveland et al. (1988).
For the present analysis, the span (fraction of sample used to compute local regres-
sion) and degree (order oflocal approximation, where 1 represents linear and 2 represents
quadratic) were chosen by using the F -statistic to compare alternate values of span and
degree. A lowess surface is fit for a span and degree and the residual sum of squares is com-
puted. The F-statistic using the proper number of degrees offreedom in each case is used to
compare the residuals sums of squares (RSS) at a significance level of 0.01. In this case the
residual sum of squares are significantly different (at 0.01 level) and the span/degree with
322 J. SATAGOPAN AND B. RAJAGOPALAN
the lowest RSS is selected. Otherwise the span/degree with the lower number of equivalent
parameters (higher degrees of freedom) is selected.
Cleveland (1979) suggests that local linear fitting produces good results, especially in
the boundary region. Akin (1992) has shown that the scheme works well even for a small
number of independent variables and, from comparisons of lowess with kriging on a large
number of known data sets, found that lowess performed very well in reproducing the known
functions.
Model
We have used log(precipitation) throughout our analysis. This is a very common trans-
formation. One of the advantages of this transformation is it stabilizes the variance. The
basic requirements in any analysis is the assumption of constant variance for error. This
assumption is violated usually when the response y follows a probability distribution in
which the variance is functionally related to the mean. Since the inclusion of elevation as
a third variable in the model only makes estimation of variogram more complicated, we
have considered precipitation as a function of easting and northing only for kriging. For
lowess and SS-ANOVA the response was considered as a function of easting, northing and
elevation. The model considered for SS-ANOVA was
effects of easting and northing separately, we have looked at them as a two dimensional
variable (easting,northing) and used a thin plate smoothing spline (Gu and Wahba (1991))
approach. Gu and Wahba (1991) show simulation results and suggest that for geographical
data, such an approach would be appropriate.
• SVD = vary-y)
var")"
where y's denote the response (log(precipitation)) and Y denote the estimated values.
MAD signifies how far away the estimate is from the true underlying function at the
point where the fit is poorest, but does not ensure the best fit over the entire range of the
function. MNAD represents the average nearness of the fit to the true function over the
entire range of the function. SRSS1 and SRSS2 measure the nearness of the fit to the true
function over the entire range of the function and to ensure no regions of relatively poor
fit. SVD would return the squared signal to noise ratio if the estimate matches the true
function. This also tells how well the techniques discriminate the noise in the function.
Akin (1992) has detailed discussion on these measures.
Table 1 gives various measures for comparing the three spatial techniques for both
Columbia and Willamette River Basins. SS-ANOVA and lowess seem to do better function
estimation than kriging for this data. Kriging is comparable to lowess on the Columbia data.
This substantiates the fact that kriging performs better as a global estimator, while on the
local scale (Willamette River Basin) it performs poorly compared to the other methods.
Lowess and SS-ANOVA are less sensitive to heterogeneous discontinuity in the data unlike
kriging.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 325
Columbia Willamette
Measure SS-ANOVA Lowess Kriging SS-ANOVA Lowess Kriging
MAD 0.1772 1.6769 1.2823 0.1412 0.2861 0.6777
MNAD 0.0402 0.2256 0.2657 0.0366 0.0657 0.1532
RSS 1.3691 55.5073 59.7683 0.1251 0.4133 2.1886
SRSS 1 0.0060 0.2444 0.2632 0.0280 0.0925 0.4899
SRSS2 0.0002 0.0063 0.0429 0.0001 0.0003 0.0017
SVD 160.9792 3.8041 2.8387 31.8066 10.1950 1.2468
Figure 2 gives the histogram of residuals for the three techniques for Columbia River
Basin. The residual distribution for SS-ANOVA method is tighter than for lowess and krig-
ing. For kriging, the precipitation was considered only as a function of easting and northing
whereas for SS-ANOVA and lowess, precipitation was a function of easting, northing and
elevation.
Figure 3 gives the histogram for residuals for Willamette River Basin. The fit for
Willamette River Basin was not as good as for Columbia River Basin. The distribution
of residuals for SS-ANOVA method was again tighter than for lowess and kriging. Even
though the figures in Table 1 suggest that lowess would be a better approach than kriging
for precipitation data, the distribution of residuals for kriging was tighter than for lowess
in the case of Columbia River Basin.
The contour plot of the estimated function obtained from the three methods for Columbia
River Basin and Willamette River Basin are given in figures 4 and 5 respectively. Though
precipitation was obtained as a function of easting, northing and elevation for SS-ANOVA
an lowess, we have shown precipitation against easting and northing only in the plot for
these two methods for ease of comparison with kriging. For the Columbia River Basin data,
SS-ANOVA and lowess were found to handle the boundary points better than kriging. Also,
SS-ANOVA and lowess did more smoothing than kriging for this data.
Figures 6 and 7 give plots of the effect of elevation on precipitation. The effect of
elevation was obtained from SS-ANOVA. the plots also give a 95% (Bayesian) confidence
interval for the effect of elevation. An increasing trend in precipitation with elevation can
be observed. From the plot corresponding to Columbia River Basin it could be seen that
though elevation seems to have an increasing trend, it seems to settle down after some time
(or certain elevation).
326 J. SATAGOPAN AND B. RAJAGOPALAN
SS-ANOVA KRIGING
••
Figure 2. Histogram of residuals of the three methods for Columbia lliver Basin.
KR G
Figure 3. Histogram of residuals of the three methods for Willamette lliver Basin.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 327
......
........ ........
f
.......
U"'"
-- _(1<_
1.2.'" u ....
·Uto.. ...
.,..., .,.r,... ,,.","
Figure 4. Contour plot of the estimated function for Columbia River Basin .
.8O".ft ......
_I. . . . ,
·Z.OFt'" '2""
Figure 5. Contour plot of the estimated function for Willamette River Basin.
328 J. SATAGOPAN AND B. RAJAGOPALAN
olovOlionollOd
15%oonI.irO.
I I
J
/
( Columbia River Basin
Eating
Figure 6. Elevation Effect for Columbia Rh'er Basin obtained from SS-ANOVA.
- .......on.ffId
- _conI.1n!.
Figure 7. Elevation Effect for Willamette lliver Basin obtained from SS-ANOVA.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 329
So far we have looked at an exploratory data analysis which used three spatial estima-
tion techniques for precipitation analysis. Kriging has so far been the most widely used
geostatistical technique by hydrologists. As mentioned earlier, kriging has become synony-
mous with geostatistics over the last decade. Results and research by various authors like
Yakowitz and Szidarovszky (1985) and Akin (1992) motivated us to look at other spatial
estimation techniques and we found that the other techniques - SS-ANOVA and lowess -
better estimated the precipitation function than kriging. These methods suggest a possible
alternative for spatial interpolations of precipitation data, especially when the process is
non-stationary. Future work is required in terms of more data sets and other nonparametric
techniques.
ACKNOWLEDGEMENT
We would like to thank Dr. Upmanu Lall for valuable suggestions and for providing us with
relevant manuscripts. We also thank Mr. D.L. Phillips, Mr. J. Dolph and Mr. D. Marks
of USEPA, Corvallis, OR, USA for providing us with the precipitation data sets and also
the results of their analysis which motivated our study. We would also like to thank our
numerous friends for conversations and discussions on spatial estimation techniques.
REFERENCES
Akin, 0.0 (1992) "A comparative study of nonparametric regression and kriging for ana-
lyzing groundwater contamination data", M.S. Thesis, Utah State University, Logan, Utah.
Bowles, D.S., Binghan, G.E., Lall, U., Tarboton, D.G., AI-Adhami, M., Jensen, D.T., Mc-
Curdy, G.D., and Jayyousi, E.F. (1991) "Development of mountain climate generator and
snow pack model for erosion predictions in the western United States using WEPP", Project
report - III.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter plots",
Journal of American Statistical Association 74,368,829-836.
Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: an approach to
regression analysis by local fitting", Journal of American Statistical Association 83, 403,
596-610.
Cleveland, W.S., Devlin, S.J., and Grosse, E. (1988) "Regression by local fitting", Journal
of Economics 37, 88-114.
Cressie, N. (1991) Statistics for Spatial Data, John Wiley and Sons, New York.
de Marsily, G. (1986) Quantitative Hydrogeology, Groundwater Hydrology for Engineers,
Academic Press, California.
Gu, C. (1989) "Rkpack and its applications: fitting smoothing spline models", Technical
Report 857, University of Wisconsin - Madison.
Gu, C. and Wahba, G. (1992) "Smoothing spline ANOVA with component-wise Bayesian
confidence intervals", Technical report 881 (rev.), University of Wisconsin - Madison.
Isaaks, E.H., and Srivastava, R.M. (1989) An Introduction to Applied Geostatistics, Oxford
University Press, New York.
Journel, A.G. (1977) "Kriging in terms of projections", Journal of Mathematical Geology
330 J. SATAGOPAN AND B. RAJAGOPALAN
9, 6, 563-586.
Journel, A.G. (1989) Fundamentals of Gcostatistics in five lessons, American Geophysical
Union, Washington, D. C.
Kimeldorf, G., and Wahba, G. (1971) "Some results on Tchebycheffian spline functions",
Journal of Mathematical Analysis and Probability 33,82-95.
Krige, D. G. (1951) "A statistical approach to some mine valuations and allied problems in
Witwaterstrand" unpublished Masters' Thesis, University of Wit waterstrand, South Africa.
Muller, H.G. (1987) "Weighted local regression and kernel methods for nonparametric curve
fitting", Journal of American Statistical Association 82, 397, 231-238.
Phillips, D.L., Dolph, J., and Marks, D. (1991) "Evaluation of geostatistical procedures for
spatial analysis of precipitation", USEPA Report, Corvallis, Oregon.
Wahba, G. (1990) Spline Models for Observational Data, SIAM series in Applied Mathe-
matics, Pennsylvania.
Yakowitz, S., and Szidarovszky, R. (1985) "A comparison of kriging with nonparametric
regression methods", Journal of Multivariate Analysis 16, 1,21-53.
PART VII
SPECTRAL ANALYSIS
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES
ANDRZFJ LEWANDOWSKI
Wayne State University
Detroit, MI 48202
U.S.A.
INTRODUCTION
Over 20 years have passed since the publication of a book by Box and Jenkins (Box and
Jenkins, 1970) where the principles of time series analysis and modeling were formu-
lated. During this period the Box-Jenkins methodology has been successfully applied
to numerous practical problems and has become the standard procedure for time se-
ries modeling and analysis (Pankratz, 1983). Most commercial statistical computer
packages support this methodology.
According to the Box - Jenkins approach, the process of model building consists of
3 steps:
The model building procedure is interactive and iterative in nature. The procedure
is repeated until the model attains desired form and accuracy.
Although several methods and algorithms for model estimation constituting the
2nd stage of the above procedure have been developed, the identification and validation
stages are somewhat more diffuse in nature.
The original identification method proposed by Box and Jenkins is based on the
visual inspection of the autocorrelation and the partial autocorrelation functions. This
approach is very exploratory in its nature but rather difficult to apply. Although in
several cases structure of the model generating the time series can be relatively easily
deduced from the shape of the autocorrelation function, in many other cases it is
difficult or is not possible at all. Moreover, since the dependence between values of
the autocorrelation function and parameters of the model generating the time series is
rather complicated, values of model coefficients cannot be easily determined without
performing complicated numerical calculations. Several attempts have been made to
classify possible patterns of the autocorrelation function and to build a catalog of such
333
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 333-346.
© 1994 Kluwer Academic Publishers.
334 A. LEWANDOWSKI
which is denoted by
X(z) ~ txt! (3)
Under certain conditions this transformation is invertible and X(z) uniquely charac-
terizes the series {xt! (Eliott, 1987).
Let two time series {Xt} and {Ut} be connected by a linear relationship (operator)
G:
{xd=G({ud) (4)
The complex function G(z)
G( ) = X(z) (5)
z U(z)
will be called the transfer (unction of the operator G. It is easy to calculate the transfer
function of an ARMA model. Taking (2) and multiplying both sides by z, the following
relationship can be obtained
+00 +00
zX(z) = IXiZi+1 = I Xi-l Zi (6)
-00 -00
Thus, if
X(z) ~ {xd (7)
then
zX(z) ~ {Xt-d (8)
It follows from the above that the complex variable z can be formally interpreted as
the shift operator B used by Box and Jenkins. Therefore, in order to obtain the transfer
function for this model it is sufficient to replace operator B by complex variable z.
The term spectral or frequency transfer (unction will be used to describe the follow-
ing formula
. P(e- jW )
G{Jw) = Q( e-}W
.) (9)
where P and Q are respectively the numerator and denominator of the operator transfer
function (from now on we will consider only rational operator transfer functions). The
spectrum of the output signal x generated by the model (which is actually the time
series being analyzed) is equal to the modulus of the transfer function
few) = IG(jw) I (10)
336 A. LEWANDOWSKI
Formula (10) shows where most of the basic difficulties in interpreting the spec-
trum arise. Although the structure of the transfer function itself is rather simple, the
spectrum is a highly nonlinear function of frequency w since it is the modulus of
the transfer function evaluated on the unit circle and therefore includes trigonometric
functions.
This leads to an important question: does the modulus of a complex function eval-
uated on the unit circle characterize this function uniquely? The answer is generally
no. If a function G(z) is given such that its modulus evaluated on the unit circle is
(11)
,\=l-z (13)
l+z
This transformation has the following properties:
1. It is invertible
1-,\
Z=-- (14)
1+,\
2. It transforms the region outside the unit circle in the complex plane z into the left
half of the complex plane ,\ and the unit circle into the imaginary axis (Silverman,
1975),
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 337
3. If this transformation is applied to the rational transfer function then the resulting
function will also be rational. If the ARMA model is stable and invertible then all
poles and zeros of the transfer function of a model generating this time series will
be located outside the unit circle. After applying the A transform to this transfer
function, all the poles and zeros of the transformed transfer function will be
located in the left half of the A complex plane. This means that the resulting
transfer function has the minimum phase property (Robinson, 1981).
The minimum phase property is important since it allows one to work with the
modulus of the function rather than with the function itself.
The rational function
P (1- A)
o
(A) = G(1I+A
-A) = Q(I-A) p(A)
"i""+X = q(A) (15)
I+A
will be called the linearized transfer (unction. Similarly,
(·w)=p(jw) =G(I- i
OJ q(jw) 1 +jw
W) (16)
After applying the A transformation, the above transfer function will have the following
form
l-i\ (1-9)(I+~~:i\)
9 (i\) = 1 - 9 1 + i\ = 1 + i\ (22)
or
(.00) = Kl + j9'oo = p(joo) (23)
9 J 1 + jw q(jw)
where
K=1-9 (24)
It follows from the above formula that for low frequencies such as 9' 00 « 1
log Ip(jw) I ::: 0 (27)
It follows from (28)-(29) that if suitable axes are chosen, the function (26) can be
approximated by 2 asymptotes: a line of zero slope for 00 < W and a line of slope + 1
for w > ii:J where
ii:J = ; , (29)
log(1+8)
o ~----~----+---~~----------~----------~-----
-2 1 log(cu)
log(i-S) t - - Asymptotic
-1 - - Exact
Always zero
one of the basic tools in electronics and control engineering (Sage. 1981). The Bode
plot for MA(l) model can be easily constructed using the asymptotic representation
presented above. If the linearized transfer function of the MA(1) model is presented in
logarithmic form
After applying the A transformation, the following linearized transfer function will be
obtained
1 1+~
g(~) = - (33)
1 - 9 - ~ - (1 - 9) (1 + 1 + 9~)
1
1+~ 1-9
and, analogously
. l+joo
g(Joo) = K(l + 9' joo) (34)
where
1
K=1_9 (35)
9' 1+9
= (36)
1-9
The methodology presented in the previous section can be used to construct a Bode
plot for AR(l) model. The only difference is that the transfer function of the AR(1)
model always has zero at 00 = 1 and pole at
A 1- 9
00=-- (37)
1+9
The Bode plot for AR(1) model is presented in Figure 2.
G(z) = 1- 9z (38)
1- cpz
the following linearized transfer function will be obtained
1 + 1 + 9~
9 (~) = ( 1 - 9 ) 1- 9 (39)
1-cp l+l+cp~
1- cp
This function has one pole and one zero. The corresponding Bode plot can be easily
constructed applying a procedure similar to that used in building the Bode plot for
MA(1) and AR(1) models (Figure 3).
-log(1+0)
- - Asymptotic
--- Exact
('.0)=(1-0)/(1+9)
o r---~-----T----~----~----~--------~~----
-2 -1 o 1 log «('.0))
-log (1-9)
-1 Always pole
After applying the A transformation this transfer function reduces to the following
form
(i\.) - 1 _ K (1 + ;\)2 (41)
(1_i\.)2- 1+9~i\.+9;i\.2
B -
91 1- 1
(I-i\.)
+ i\. - 92 + i\. 1
where
K= 1 (42)
1- 91 - 92
9' _ -2(1 + 92) (43)
1 - 1 - 91 - 92
9' _ 1 + 91 - 92 (44)
2 - 1 - 91 - 92
It is more convenient to use the canonical form of the denominator of (41)
i\.9 1
1-9 9 = 1+ (_1_)
,
12 - ~ + (_1_) = 1+
'2
22
i\.2 2~i\. (i\.)2
+ 2 W y W y (45)
~ ~
342 A. LEWANDOWSKI
loglf(jw)1
w=(1-9)/(1 +9)
o r---~------r---~----~----~----r_--~------
-2 -1 o 1 log(w)
w=(1-~)/(1 +~)
- - Asymptotic
--- Exact
-1
Pole generated by AR( 1) part log (1 +9)/(1 +~)
The value
1
Wy = .J9I (46)
~= zF;
9'
(47)
is the damping factor. These two factors determine the resonance properties of the
transfer function. If ~ > 1 then the quadratic polynomial has real roots and can be
represented as the product of two first order factors. For 0 < ~ < 1 the roots are
complex and more careful analysis is required since for small values of ~ (low damping)
and for frequencies close to the resonance frequency, asymptotic approximation will
not be very accurate. However, experience has shown that asymptotic analysis can
be useful even in these cases and therefore the asymptotic behavior of (41) must be
investigated.
Making the substitution
.\ = jw (48)
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 343
and considering the canonical form (45), the following expression will be obtained
g( '00) = K + joo)2
(1 (49)
J
1+ 2~joo
- - - (00)2
-
ooy ooy
c:rr
The logarithm of the modulus of this function is as follows
The first two terms of this formula are similar to those for the MA(l) model, and their
asymptotic approximation has been discussed in previous sections. The results ob-
tained previously can be used in this case. The only difference between this and MA(1)
model is that in this case the slope of the asymptote of the second term is +2. Analysis
of the third term is also straightforward. For low frequencies 00
(51)
(52)
Comparing this result with (SO) we observe that the third term of (SO) has two
asymptotes: for low frequencies the slope is equal to zero, while for high frequen-
cies the slope is -2. The asymptotes cross at the vertex corresponding to resonance
frequency 00 = ooy.
The following component of the frequency response (SO)
(53)
generates a peak for small values of~. The amplitude and frequency at which this peak
occurs are given by the following expression
M = log ( 1)
2~~1- ~2
(54)
It is now not difficult to construct a Bode plot for (49) (see Figure 4). The only
difference between the asymptotic frequency response presented in Figure 4 and the
frequency response of the AR(1) model is that the slope of the asymptote is equal to
- 2. The AR(2) model with complex roots always has a double zero at 00 = 1 and a
double pole at 00 = OOy.
344 A. LEWANDOWSKI
o ~----~----------~----------~----------~-----
-2 o 1 log(c.J)
--- Asymptotic
Exact ~=O.9
-1 - - - Exact ~=O.3
t
Always double pole
-2
1. A horizontal line is plotted. This line is the asymptote to the linearized spectrum
for w = O. This asymptote determines the amplification factor K of the transfer
function,
2. A line with slope -1 is plotted to approximate the spectrum for frequencies close
to w = 0.06. When the exact frequency response is calculated and plotted then
it becomes clear that the line with slope -2 must be used to approximate the
linearized spectrum for w > 0.5,
3. A horizontal line must be plotted to approximate the linearized spectrum for
w> 1
Therefore, the asymptotic Bode plot consists of 4 line segments (Figure 5). The lin-
earized transfer function has poles for w = 0.06 and w = 0.6 and 2 zeros for w = 1.
Comparing these results with standard patterns of AR(1) model it is possible to con-
clude that the model generating this time series must be AR(2) with real roots. Since
there is one-one relationship between the frequency w and values of roots and zeros
of the linearized transfer function it is possible to determine the parameters of the
model directly from the plot. These values for the model under study are 0.53 and
346 A. LEWANDOWSKI
0.94. Therefore, the transfer function of the model generating this time series is
1 1
(57)
G(z) = (1 - 0.53z)(1 - 0.94z) 1 -1.47z + 0.498z2
Parameters of the ARMA(2,O) model have been also estimated using the statistical pack-
age MINITAB. The transfer function of the model generating the time series and ob-
tained from MINITAB is as follows
1
G(z) = i - 1.54z + 0.593z2 (58)
REFERENCES
Bishop, A B. (1975) Introduction to Discrete Unear Controls: Theory and Application.
Academic Press.
Box G. E. P. and G. M. Jenkins (1970) Time Series Analysis: Forecasting and Control.
Holden-Day, San Francisco.
Choi B. S. (1992) ARMA Model Identification. Springer-Verlag, New York.
De Gooijer, j.G. and R.M.J. Hents (1981) "The corner method: an investigation of an or-
der discrimination procedure for general ARMA processes". University of Amsterdam,
Faculty of Agricultural Science and Econometrics, Report AE 9/81.
Eliott, D. F. (1987) "Transforms and Transform Properties" in D. F. Elliot (ed.) Digital
Signal Processing: Engineering Applications, Academic Press, New York.
Hannan, E. J. (1970) Multiple Time Series. John Willey and Sons, New York.
Jones, R.H. (1978) "Multivariate autoregression estimation using residuals", in D.F. Find-
ley (ed.), Applied Time Series Analysis, Academic Press.
Lewandowski, A (1983) "Spectral methods in the identification of time series", Interna-
tional Institute for Applied Systems Analysis, WP-83-97, Laxenburg, Austria.
Lewandowski, A (1993) "EXSPECT - computer program for exploratory spectral analy-
sis of time series", to be published.
Pankratz, A (1983) Forecasting with Univariate Box-Jenkins models: Concepts and
Cases, j. Willey, New York.
Polasek, W. (1980) "ACF patterns in seasonal MA processes", in O. D. Anderson (ed.),
Time Series, North-Holland Publ. Co., Amsterdam.
Robinson, E. A (1981) "Realizability and minimum delay aspects of multichannel mod-
els", in E. A Robinson (ed.), Time Series Analysis and Applications. Goose Pond Press.
Sage, AP. (1981) Linear System Control. Pitman Press.
Silverman, H. (1975) Complex Variables. Houghton-Mifflin.
Tukey, j. W. (1977) Exploratory Data Analysis. Addison-Wesley Publishing Company,
Inc, Reading, Massachusetts.
ON THE SIMUlATION OF RAINFALL BASED ON TIlE CHARACfEruSTICS OF
FOURIER SPECTRUM OF RAINFALL
INTRODUCTION
Design rainfall is the most basic quantity in the planning of flood control projects. This design
rainfall is based on the frequency analysis of historical records of point rainfall, i.e., T-year
return period. Because rainfall varies in time and space, the design rainfall is sometimes
reproduced from historical rainfall data by adjusting the total amount of rain to be equal to the
T-year rainfall depth. This method however, gives umealistic results when the magnitude of
the rainfall referred to is different from the design rainfall depth. To improve the determination
of the design rainfall, a stochastic simulation procedure is needed wherein the design rainfall
is distributed in time and space and corresponds to a certain return period.
As for the simulation of rainfall distribution in time and space, there are several procedures
which can be used (i.e., Amorocho and Wu 1977, Cortis 1976, Meija and Rodriguez-lturbe
1974, Bras et al. 1986, Kavvas et al. 1987 and Matsubayashi 1988). However, these methods
do not succeed in introducing the return period. With this point of view, this paper utilizes the
Fourier series to express the rainfall field subjected to a given magnitude (i.e. the areal averaged
rainfall).
347
K. W. Hipel et al. (eds.j,
Stochastic and Statistical Methods in Hydrology and Environmental Engineermg, Vol. 3, 347-359.
© 1994 Kluwer Academic Publishers.
348 U. MATSUBAYASHI ET AL.
THEORETICAL DISCUSSION
The rainfall field is a typical random field which originates from the fluctuation in the
meteorological properties of the atmosphere such as vapour pressure, temperature, phase
change of matter and wind field. However it is also recognized that the rain field consists of
several scales of phenomena individually known as extratropical cyclone, rain band, cell
cluster and convective cells where a number of smaller scale phenomena are included in a
larger phenomenon (Austin and Haze 1972). This hierarchical characteristics is a kind of
regularity in the randomness found in turbulent flow. These characteristics can be expressed
by a Fourier series which consists of various scales of waves expressing each scale of rainfall
phenomena stated above.
The Fourier series for a two-dimensional field can be expressed by equation 1.
m n
r(x,y) =aoo + ,.z;;,';1 Aijsin ((JZlj + kz,x) x sin ((Jyij + ky jY)
(1)
_iz..
k z;- L
x
In equation 1, ao is the areally averaged rainfall in a region of the length L, Ai and (J; are
amplitude and phase angle of the sinusoidal function with a wave number ki. In these
parameters, randomness is included in ao,Ai and (J;. As mentioned above, the rainfall field is
not simply a random process but also a deterministic process shown in the hierarchical
structure of the rain band and convective cell and others. Therefore, to simulate the rainfall
field, these variables should be properly evaluated based on their stochastic and deterministic
properties.
Among three variables, the amplitude Ai, or the Fourier spectrum A,.2 has two aspects for
treatment in the analysis. One is from the knowledge about the turbulence. Kolmogorov
derived a relationship between the energy spectrum Ek of turbulence and wave number k by
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 349
(3)
This equation explains the distribution of energy of turbulence where energy is transferred
from a long wave to a short wave like a cascade, and finally dissipates as heat of molecular shear
stress.
By analogy with turbulence, the rainfall field may also be considered to have a similar
mechanism wherein rainfall has large scale of rain band and numerous small convective cells
where water is transferred to larger scale of phenomena to a smaller scale and falls as rainfall.
Based on this consideration and applying the achievement in turbulent studies, the Fourier
spectrum of rainfall field is expected to have a form shown in equation 4 for the one
dimensional case, corresponding to Eq. (3) for turbulence.
(4)
On the other hand, the Fourier spectrum ,Ai2 , is theoretically obtained by applying the Fourier
transformation to the auto-correlation function of the rainfall field. The characteristics of the
auto-correlation function of the rainfall have been studied by many researchers (1.1. Zawadzki
1974, Ohshima 1992) reporting that the A.C.F. of the rainfall decreases linearly or exponen-
tially. The Fourier spectrumAi2 derived by applying the inverse Fourier transform to the
exponential A.C.F. is shown as,
(5)
(6)
In this research equation 6 is also used to express the Fourier spectrum of the rainfall, which
can express both equations 4 and 5. That is, equation 6 with k=2 is reduced to equation 5 and
equation 6 with as=O means equation 4.
These expressions of the Fourier spectrum give clues about the discussion of the rainfall
simulation.
Various rainfall simulators mentioned before can produce rainfall in time and space. However
they cannot properly simulate the rainfall field corresponding to a certain return period T. The
return period, an average recurrence interval of rainfall, is usually defined for point rainfall
depth of an event. In considering the spatial distribution of rainfall, the return period should
350 U. MATSUBA YASHI ET AL.
be analyzed for rainfall of a specified area because rainfall has centrifugal characteristics
originating from the hierarchical structure of the rain band, the convective cell, and etc.
Therefore the maximum value of spatially averaged rainfall for a certain area is strongly
dependent on the area considered. These characteristics are usually discussed as DAD analysis.
In the Fourier series, arJ2 in equation 3 is the spatial average value of rainfall within the area
where the Fourier series is spanned. Therefore from the above discussions the ao characterizes
the total amount of rainfall in the area and is evaluated for a certain return period through DAD
analysis. In other words, through the parameter ao, the statistical characteristics (i.e., return
period of the occurrence of rainfall) are explicitly introduced in the simulation model. On the
other hand, the parametersAj and 8i determine how the total rainfall (a0l2)L should be
distributed in space and time. So these parameters should be carefully evaluated to mimic the
stochastic characteristics of rainfall. In this research, however, we concentrate on the
discussion of the spatial distribution of rainfall and the time changing of rainfall is not treated.
RAINFALL DATA
Figure 1. Nagoya radar site and region Table 1. Characteristics of rainfall data
of rain data
Rainfall data used to analyze the Fourier spectrum and the phase angle are the radar rain gage
data obtained at the Nagoya Meteorological Observatory (Nagoya City, Aichi, Japan) shown
in Figure 1. PPI data are given at grid points of 2.5 km by 2.5 km mesh in a 500 km by 500 km
square region at every 7.5 minutes. We analyze five storms which occurred during the years
1986 and 1988. The characteristics of the rainfall are listed in Table 1. Because we do not
discuss the time changing of rainfall, spatial distribution of hourly rainfall are analyzed
independently with time. In the case of one dimensional rainfall field, rainfall data along a
certain grid line are used.
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 351
150
~
.§.
100
~
iii
50
a: 0
2.5 10 17.5 25 100
~ .....~............ .. ~·~r~<I:·~~~··..····-..·..··
~ EQ.(6) 'I....... ~
10 ········r·...···················..·_··
~ ....;~~:~~~....................-. ! ....: .::::-..:-._ .::.-:.
! 0.1 .... +.. . . . ..... . .. ..... -..+~.
.~ )...3.3 (1.0 .134 ~._._.u~ ....... u..~:.
~s -3.9 a. -0.04 ~
&.:J 0.001 .l..-~_ _ _ _ _ _ _ _ _
1.
i--____..L..:" - '
0.1 1 2 3 456 7
Wave Number. k (km") Wave Number, 'k (km" )
Figure 2-b. Fourier spectrum distribution Figure 2-c. Distribution of the phase angle
_I
and the pbase angle difference
i·..·..l-r·····~··········!,....···
r "'.· .,T"i .....
R •• !: .! ;. ' I · i ~ 1
,90 c,..._·..
! i. ! •• ~ .....
·.,·--·i.......·i·........!-.......1 .. ..... .
,180.•• :
: - - :"!II" .,;,,'"
.~ i " i .. k !
!II. :
~: : :
o 1 234 5 6 7
Wave Ntmb8r, k (km") Wave Ntmber, k (km" )
Figure 3-b. Fourier spectrum distribution Figure 3-c. Distribution of the phase angle
and the phase angle difference
352 U. MATSUBA YASHI ET AL.
In the following discussions, one dimensional rainfall fields are analyzed because the
simplified treatment is helpful in understanding the fundamental characteristics of the rainfall.
Figure 2 and 3 show typical examples of rain events and the results of the analysis. Figure 2-
a is an example of uniformly distributed rainfall and Figure 3-a, a rainfall with a single peak.
These two types of events are called as "uniform rainfall" and "single peak rainfall" in the
following discussions. In between these typical cases, there is a type of rainfall with several
peaks in one event and is called "complex rainfall." Both lengths of the rain field analyzed in
these figures are 100 kIn.
Figures 2-b and 3-b shows the Fourier spectrum A,.2 in relation with the wave number ki for two
rainfalls. From these figures, it can be seen that Ai2 decreases with an increase of the wave
number ki for both cases. However, the different characteristics are also found for the two types
of rainfall, that is, the single peak rainfall has rather large values of Ai2 in almost the whole
region of ki and is convex in the low wave number range. On the other hand, the uniform rainfall
shows smaller Ai2 values and linear recession in a log-log axis.
In the modeling of the Fourier spectrum, three types of relationships (equations 4 to 6) are
used based on the previous discussion. One of these relationships is from the analogy of
turbulence, where the equation is assumed from Kolmogorov's power law. Another is from the
exponential auto-correlation function of rainfall. The other is a combined form of these two
relations. These relationships are applied to the observed spectrum and is shown as a solid line
for equation 4, as a broken line for equation 5 and as a dotted line for equation 6 in Figures 1-
b and 2-b. From these figures, it may be concluded that equation 4 is applicable to the spectrum
of uniform rainfall for a wide range of wave numbers, but for the single peak rainfall, the solid
SO-r--------------.
• Sharp Rain
90~--------------------------~
40 III Complex Rain
• Sharp Rain
67.5
m Complex Rain
C Uniform Rain
~ ,
~ 45 ~
~
II..
10 22.5 '
line shows a remarkable deviation from the observed data in the lower range of kj. Because the
vertical axis is logarithmic, the deviation cannot be ignored. This difference may originate
from the non-uniform characteristics of the minfall in nature while equation 4 is derived on the
assumption of a unifonn field. Figure 4 shows the histogram of parameter A in equation 4 for
363 cases of rainfall. In this figure, rainfall events are classified into three types, namely, single
peak rain, uniform rain and complex rain. This figure shows that a single peak rain has a large
value of A compared to uniform rainfall. It is interesting to note that the overall average value
A(2.28) is almost comparable to Kolmogorov's theoretical value of 5/3. On the other hand,
equation 5 seems to be applicable especially in the lower range of kj. Although the apparent
errors in the larger range of kj are big, real errors of the speCtrum are small. Figure 5 shows the
histogram of the parameter a. This figure shows that a is large for the single peak rain event
and almost zero in the uniform rain event which converges to equation 4.
Compared to equation 4 and equation 5, equation 6 is most applicable for both single peak
rain and uniform rain because of its high degree of freedom to express the Fourier spectrum.
The estimated dotted line has a good fitting in both small and high range of kj.
In addition to the deterministic parts discussed above, random fluctuations in the Fourier
spectrum is also observed in Figures 2-b and 3-b. The deviation of plots from the deterministic
trend is not clear, but it seems to depend on the type of rainfall whether it is a single peak rain,
a complex rain or a uniform rain.
Phase angle 8i
Figure 2-c and 3-c show the phase angleq; and the phase angle difference .18;(=8;-8;-1). From
these figures, an obvious difference can be seen between the single peak rainfall and uniform
rainfall. The uniform rainfall shows random scatter in both 8; and .18;. On the other hand, the
phase angle 8; of the single peak rainfall shows almost a linear change and consequently ..18;
takes a constant value, to understand these differences, it should be recognized that the phase
angle 8; determines the location of the peak of the basic wave, and that the other phase angle 8;
has meanings with respect to 81. In addition, for the Fourier series where the peak of the sine
curve coincides at a certain point which produces a single peak rain, it can be proved
theoretically that .18; should satisfy the relation .18j = 81 -nl2. These characteristics can be
found in figure 2-c of the single peak rainfall.
Although the deterministic component is dominant in the pbase angle of the single peak
rain, a random component is also observed around the deterministic part. The randomness in
the phase angle, however, is found to be strong in the order of uniform rain, complex rain and
weak in the single peak rain.
The magnitude of the stonn and tbe amplitude of tbe Drst term
As explalJ1ed above, the spatial distribution of rainfall is described by the average rainfall aol
354 U. MATSUBAYASHIET AL.
2 and the Fourier coefficient Ai. Among these properties, afl2 is determined from DAD
analysis and Ai can be determined by equation 4 to equation 6 for adequate parameters.
Parameter Ai, however, also cannot be independent of the rainfall magnitude. Here we focus
on the AI, the amplitude of the filst term as the representative parameter, and relate it to the areal
average rainfall. Figure 6-a and figure 6-b show the relationship between Al2 and the square
of the areal average rainfall for sharp and uniform rainfall. Although some scatter was
300 300
200 200
100 100
o o
o 50 100 150 200 o 50 100 150 20(
Square of Average Rain «mm/h~) Square of Average Rain «mm/h)2)
Figure 6-a. Relationship of A/ to r2 at a Figure 6-b. Relationship of A/ to r2 at a
single peak rain uniform rain
observed, almost linear relationships can be assumed. It can also be agreed with that the slope
of the relationship for single peak rainfall is steeper than that for uniform rainfall. Because
Figure 6-a and Figure 6-b are plotted by selecting typical homogenous and single peak rain
events, complex rain plots may fall in between these two linear relationships.
For the purpose of simulation these results promise to determine not only aol2 but also Ai
based on the return period of the design rainfall.
The two-dimensional field is described by equation 1. Among the parameters in equation 1 the
Fourier spectrum Aij2 is expressed in equation 7 by multiplying equation 5 applied to two
directions. This expression is based on the assumption of independence of the process in two
directions. This simplified approach, however, can express heterogeneity of the field which is
reported by Zawadzki (1974), Matsubayashi et al. (1987) and Oshima (1972).
A2 _ 16 r20 (7)
2 X 2 k2)
U-( ax2 + k xl a y + yl
Figure 7-a and 8-a are the typical examples of the single peak rainfall and uniform rainfall
respectively. Figure 7-b and 8-b are the obsetved Fourier spectrum distributions and Figure 7-
c and 8-c are the distribution estimated by equation 7. In Figure 7-a and 7-c, the Fourier
spectrum in the small kx zone are relatively large which means that the rainfall field is
prolonged in the N-S direction. This assures that equation 7 can express heterogeneous
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 355
;:::
E
§ a.=O.060
E a ,=0.33
2
U
"c.
(/)
Figure 7-b. Observed 2-D spectrum Figure 7-c. Estimated 2-D spectrum
-I
Wave Number,k y (km ) Wave Number,k y (km- 1)
0.961 .26
~~~~
'E
:!!.
Figure 7-d. 2-D phase angle (J'<ij Figure 7-e. 2-D phase angle (JYij
356 U. MATSU BAYASH IET AL.
a.=O.0l2
a, .... O.070
23.14
21.47
.1 Wave Number,ky(km"'J
Figure 8-b. Observed 2-D spectrum Figure S-c. Estimated 2-D spectrum
~~~~~~.26
-
Figure S-d. 2-D phase angle ()"ij Figure S-e. 2-D phase angle ()yij
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 357
characteristics of the field. It can also be seen that they have similar characteristics with the one-
dimensional analysis. These results shows the applicability of equation 7 to express the two-
dimensional Fourier spectrum.
As for the phase angle, Figure 7-d, e and Figure 8-d, e shows the distribution of 8xij and Byij
of single peak rainfall and uniform rainfall. The thick and thin lines are contour lines of positive
and negative phase angles. A cyclic change of the phase angle is observed in the single peak
rainfall, on the contrary, homogenous rainfall shows a random distribution. These character-
istics correspond to the ones observed in the one-dimensional case.
Based on the characteristics of the rainfall described above, the simulation procedure for the
120
)..-3.9
~ 88.75
.s 57.5
a s -O.04
Averl9t of Rat na 13.6 mm/h
~
iii 26.25
cr
-5
2.5 10 17.5 25 32.5 40 47.5 55 62.5 70 n.5 85 92.5 100
Oi stence (lem)
120
~ ). -Z.7
E 88.75 s
.s 57.5
as-O.OZ
Averl9t of Rain-9 .ee mm/h
~iii
cr 26.25
-5
2.5 10 17.5 25 32.5 40 47.5 55 62.5 70 n .s 85 92.5
.-100
Dlstence (km)
one-dimensional rainfall field is proposed here. The two-dimensional case is not presented
because the data analyzed are not sufficient to obtain reliable characteristics of the parameters.
The procedures are as follows:
1) Evaluate the areally averaged rainfall aol2, the parameter of spectrum distribution as, A.s and
the phase angle (JJ of the first term.
2) Evaluate Ai2 from the relationship between Ai2 and average rainfall.
358 U. MATSUBAYASHIET AL.
3) Calculate Al.2 distribution based on equation 4,5 or 6 with random noise which will express
the scatter observed in Figures 2b and 3b.
= =
4) Calculate the 8; by using ti(Ji 8; - 8;-1 81 - 9lr and a certain scatter for single peak rainfall
and by using uniform random distribution for uniform rainfall.
5) Calculate rainfall distribution by equation 2.
Figures 9 and 10 show two examples of simulation for single peak rainfall and uniform
rainfall respectively. In these figures, especially in Figure 9, it is found that rainfall is negative
in some points for the single peak rainfall case. This is an intrinsic characteristic of the Fourier
series and is difficult to remove. But the effect of these negative values is small enough in
practice. These two results, compared to the observed rainfall events, are feasible as the natural
rainfall.
CONCLUSIONS
In this paper, the Fourier series is utilized to simulate the design rainfall. It is easy to introduce
the return period to the simulated rainfall. The characteristics of the Fourier coefficient and
phase angle are discussed in which they play an important role in reproducing the spatial
distribution of the rainfall.
Results obtained here are summarized as follows;
1) The return period can be explicitly included in arJ2 through DAD analysis.
2) The Fourier spectrum for uniform rainfall can be formulated by a similar relationship as
Kolmogorov's law of turbulence. On the other hand, the spectrum for the single peak rainfall
with a single dominant peak can be derived from the exponential auto-correlation function.
3) The formulation incorporated both Kolmogorov type and exponential auto-correlation type
can be adopted to almost all rainfall.
4) The Fourier coefficient of the first term can be linearly related to the areally averaged
rainfall. Therefore, these coefficients can also be related to the return period.
5) The phase angle varies randomly for uniform rainfall and linearly change with the number
of terms for the single peak rainfall.
6) It is shown that the proposed simulation procedure can reproduce rainfall field with similar
characteristics with natural rainfall.
REFERENCES
1) Austin, P.M. & Houze, R.A.(1972) "Analysis of the structure of precipitation patterns in
New England", Jour. of Applied Meteorology, Vol. 11 ,926-935 ,.
2) Amorocho, J. & Wu, B.(1977) "Mathematical models for the simulation of cyclonic storm
sequences and precipitation fields", Jour.Hydrology ,32,329-345.
3) Bras, R.L. & Rodoriguez-Iturbe, 1.(1984) "Random Functions and Hydrology", Adison
Wesley Publication, Menlo Park, California.
4) Corotis, R.B.(1976) "Stochastic considerations in thunderstorm modeling", Jour.of the
Hydroulic Division ,ASCE , HY7 , 865-879.
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 359
5) Hobbs, P.V. (1978) "Organization and structure of clouds and precipitation on the
mesoscale and microscale in cyclonic storm", Review of Geophysics and Space Physics, 16,
No.4,741-755
6) Kavvas, M.L., Saquib, M.N. & Puri , P.S.(1987) "On a stochastic description of the time-
space behavior of extratropical cyclonic precipitation field", Stochastic Hydraulics, Vol.1, 37-
52.
7) Matsubayashi, U. and Takagi F. (1987) "On the probabilistic characteristics of point and
areal rainfall", Proc. of International Conference on Hydrologic Frequency Modelling, 265-
275.
8) Matsubayashi, U.(1988) "On the simulation of the rainfall of the extratropical cyclones",
Bull. Nagoya University Museum, 4,81-94.
9) Meija, J.M. & Rodriguez-lturbe, 1.(1974) "On the synthesis of random field sampling from
the spectrum: An application to the generation of hydrologic spatial process", Water
Resour.Res., Vo1.10, No.4, 705-711.
10) Oshima, T.(1992) "On the statistical spatial structure of rainfall field", Graduation thesis,
Civil Engineering Department, Nagoya University.
ll)Waymire, E. , Gupta, V.K. & Rodriguez-Iturbe, 1.(1984) "A spectral theory of rainfall
intensity at the mesoscale", Water Resour.Res., Vo1.20, No.10, pp.1453-1465.
12) Zawadzki, 1.1.(1973) "Statistical properties of precipitaion patterns", Journal of Applied
Meteorology, Vo1.12 ,459-472.
PART VIII
Traditional methods of streamflow analysis and synthesis are based upon information
contained in individual data. These methods ignore information contained in and among
groups of data. Recently, the concept of extracting information from data groupings
through pattern recognition techniques has been found useful in hydrology. For
streamflow analysis, this paper proposes several objective functions to minimize the
classification error encountered in currently used techniques employing minimum
Euclidean distance. The relevance of these functions has been tested on the streamflow
data at the Thames river at Thamesville. Specifically, three objective functions
considering the properties of shape, peak, and gradient of streamflow pattern vectors are
suggested. Similar objective functions can be formulated to consider other specific
properties of streamflow patterns. AIC, intra and inter distance criteria are reasonable
to arrive at an optimal number of clusters for a set of streamflow patterns. The random
initialization technique for the K-mean algorithm appears superior, especially when one
is able to reduce initialization runs by 20 times to arrive at optimal cluster structure. The
streamflow synthesis model is adequate in preserving the essential properties of historical
streamflows. However, additional experiments are needed to further examine the utility
of the proposed synthesis model.
INTRODUCTION
Various kinds of data related to hydrologic variables such as precipitation, snow fall and
streamflows are measured at a number of points using various equipment with the_view
of assessing and controlling the water resources systems. Several techniques exist for
separately handling the time sequenced and multi-point data. However, one can extend
363
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 363-380.
© 1994 Kluwer Academic Publishers.
364 T. KOJIRI ET AL.
such techniques to time sequenced multi-point data by considering all kinds of data at a
measurement point to form time sequenced data vectors. Such vectors can then easily be
treated and analyzed as pattern vectors [panu et al. (1978) and Unny et al. (1981)].
Based on the consideration of spatial and temporal correlations, one can classify
time sequenced spatial pattern vectors corresponding to precipitation or streamflows for
the extraction of representative reference vectors. Similarly, the differences among
pattern vectors can be utilized to classify them by incorporating information related to
precipitation, meteorology, geology, physiography etc.
Consideration of groups of data makes the process of estimation and prediction
easier. It is in this vein that the capability of pattern recognition techniques in handling
the time sequenced multi-point data becomes readily useful.
A pattern recognition system (PRS) utilized by Panu et al (1978) and Unny et al
(1981) for streamflow analysis and synthesis was based on the minimum Euclidean
distance concept. In their investigations, it was found that in some cases, the minimum
distance concept tends to misclassify streamflow patterns.
In such cases, additional constraints were invoked to minimize the classification
error. It is noted that the misclassification of streamflow patterns by PRS is caused by
the consideration of the entire shape of patterns in the minimum distance concept. To
overcome such difficulties in the PRS for classification of streamflow patterns in
particular and hydrologic patterns in general, the following classification functions are
suggested.
Streamflow patterns inherently possess several characteristics. Among them, the most
obvious is the peak flow. Other characteristics could be long periods of low flows. The
following objective functions are based on some of these characteristics. Each of these
functions considers a specific property of streamflow patterns. These functions are later
shown to be effective in dealing with specific problems of streamflow analysis and
synthesis such as in cases of flood and drought conditions.
This function examines the shape aspect of streamflow patterns in the form of individual
distances as follows.
Where, Xi(t) is observed or transformed data value at time, t of the ith pattern vector; zj(t)
is the value of the jth reference vector (or cluster centre) at time, t. The absolute
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 365
The peak discharge is the single most important characteristic influencing any flood flow
analysis and planning of flood control structures. The function defined below examines
the peak flow component of a pattern vector and consequently facilitates the classification
of all the patterns related to flood flows.
The streamflows tend to rise and fall sharply (Le., steep gradient) in response to rain or
snowmelt conditions. Such rises and falls are more pronounced in daily rather than in
monthly streamflows. Further, the rises and falls are milder (Le., mild gradient) during
the low streamflow periods. The gradient can thus distinguish the streamflow patterns
with sharp fluctuations from those with low fluctuations, while such streamflow patterns
may have the same value of the OFl function. The objective function OF3 based on
normalized gradient is given below.
Where, {3 represents the normalizing factor for comparing the above three functions
(OFI, OF2 and OF3) in the same order of magnitude.
In some situations, one may need all the above functions collectively to improve
upon the classification process. In such cases, one can formulate an aggregate objective
function (OFa) as follows.
OFa [X i, Z j] =max[oFl (x i , Z j) , OF2 (x i , Z j) , OF3 (x i , Z j)] (4)
The aggregate function collectively involves all the above three functions. Therefore, this
function can be used for simultaneously classifying streamflow patterns corresponding
to different events, measurement points, and seasonal variations.
366 T. KOJIRI ET AL.
The objective functions OFt, OF2, OF3, and OFa are used in the K-mean algorithm for
classification of streamflow patterns. The manner in which the K-mean algorithm is
applied for classification is described in the Appendix. Further, the bias in selection of
initial centres of clusters is avoided by using the random initialization of K-mean
algorithm suggested by Ismail and Kamel (1986).
For processes obeying linear transition among observed data at different time periods,
one would obtain the same number of clusters (or reference vectors) containing the same
number of pattern vectors within a specified time period. Most of hydrologic processes
are inherently non-linear and as a result in an actual process, one may obtain a different
number of clusters, or the clusters may have different combinations of pattern vectors.
The structural relationships among various clusters within a process or among processes
are evaluated through the concept of goodness of fit. Secondly, one defines the
conditional probability of occurrence, p(j/j') of a reference vector [Suzuki (1973)] as
follows:
k(j)
p(j/j') = n(j/j')/L, n(j/j') (5)
j=l
Where, n(j/j') is the number of pattern vectors associated with the cluster, j given j' has
occurred. k(j) is the number of clusters considered for the analysis. It is advantageous
to develop structural relationships among clusters exhibiting higher correlations within
a process or among processes. Such structural relationships are, in tum, utilized in the
prediction or simulation of streamflow patterns. The Markovian structure among clusters
is obtained as follows:
k(u)
p(j/j') > L, p(u/u') V u excluding j (6)
u
Simulation algorithm
Step 2: Synthesize each cluster to its pattern vector by using a multivariate normal
distribution.
Step 3: Test whether the elements of a synthesized pattern vector lie within their
specified limits. If not, synthesize another pattern vector until its elements
are found within limits.
Prediction algorithm
(9)
368 T. KOJIRI ET AL.
Further, assuming that the fuzzy membership function of each cluster has the same
weights as the frequency of occurrence of a cluster, the membership function is
represented as follows.
k(i)
Vj =exp { (-a j h j D&'served) / Ei h (i) } (10)
Where hj denotes the frequency gained in the classification procedure and a;; is a constant
depending on the logic situations of the distance i.e., large, medium, and small related
to l)iobserv.... One can then predict the pattern vector based on the fuzzy inference
technique [Kojiri et al (1988)] as follows:
k(j) k{j)
Predicted Pattern Vector = E Xiredicted/ E
j j
Vj (11)
The Thames river basin covering 4300 krn2 area at Thamesville was selected to test the
applicability of the proposed pattern synthesis and forecasting procedures. The monthly
discharge and precipitation records are available from October 1952 to September 1967.
The mean monthly discharge values are used in the analysis. Based on the correlogram
and spectral analysis, the discharge data was divided into two seasons: a dry season from
October to March, and a wet season from April to September. In general, every seasonal
segment appears to be different from the rest, and the variation in standard deviation for
some months is very large.
The seasonal segments (or pattern vectors) are now clustered into groups to derive
the structural relationships among them. The K-mean algorithm is used for grouping the
seasonal segments. A random initialization technique [Ismail and Kamel(1986)] is used
to achieve the global optimum. Because, the behaviour of the K-mean algorithm is
influenced by several factors such as the choice of initial cluster centres, the number of
cluster centres, the order in which seasonal segments are considered in clustering
process, and the geometrical properties of seasonal segments. Several test runs indicated
that four clusters would be adequate to capture the relationships among and within
various seasonal segments. In general, there exists ISC4 combinations to group 15
seasonal segments into four clusters in each season. To find out the minimum possible
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 369
run of the K-mean algorithm for optimal cluster configurations, 200 runs of the K-mean
algorithm were made to group 15 seasonal segments into four clusters. The value of OF1
was evaluated for each run [Figure 1]. From this figure it is apparent that a significantly
small value of OF1 has occurred twice in 200 initial runs of the K-mean algorithm.
These significantly small values can be attributed to a situation when four clusters have
attained optimal cluster configuration, i.e., a condition when the intra distance DK(K)
is minimum and inter distance EK(K) is maximum. Therefore, the number of initial
conditions could be appreciably less [Table 1] for various combinations. Further, this
table also contains the values of the intra distance DK(K), inter distance EK(K) and the
Akaike Information Criteria (AIC) [see; Appendix] for the OF1. The values of DK(K),
EK(K) and AIC are plotted against the number of clusters in Figure 2. An examination
of the figure and the table indicates that for a case of four clusters, and a reasonable
number of 100 initialization runs, the value of AIC is minimum, the intra distance is
continuously decreasing up to four clusters and the rate of decrease is very small from
four to eight clusters, and the inter distance is fluctuating but is maximum for the four-
cluster case. Based on such considerations, it was assumed reasonable that four clusters
sufficiently describe the variability of pattern vectors in both the seasons. Considerations
of intra and inter distances and the values of AIC provide a useful but somewhat
inflexible method of obtaining optimal number of clusters for a set of given pattern
vectors.
3.5
.......
..... 3
u..
Q.
c 2.5
0
:p
(.)
c 2
::J
u..
4)
~
(.)
1.5
4)
"JS
0
0.5
0
0 20 40 60 80 100 120 140 160 180 200
Run Number
4 30
3.5
25
en 3
-...
<D
0
5i 20
en 2.5
i5
2 15 0
~ <
"0
fa 1.5
i 1
10
5
0.5
0 0
0 2 4 6 8 10 12 14 16
Number of Clusters
3~-----------------------------------'
C (Goal: de, eo)
2.5
2
a>
0
c:
!9
Ul
0 1.5
L- OplimaJ Poinl
...-c:a> / Transformotlon Clrve (TC)
1 ---~---- ___41 ~
--------3~
'2
- - - - _ _ _ .!..rId~ferent::e Ctrve QC)A (d l l e 1 )
B (~, ell - - :- - - - - - - ____ ~" 1
o ----~
o 0.5 1 1.5 2 2.5 3
Intra Distance
Based on the results of above considerations, the constraints for the K-mean
algorithm are obtained [Table 2]. The values of constraints are found to be less than
seven tenth of the maximum value of DK(K) and greater than two tenth of the inter
distance EK(K). The optimum number of clusters is obtained as the minimum number
satisfying the constraints that such a number should not be greater than half the total
number of pattern vectors. In cases, where neither the inter distance nor the intra
distance satisfies the constraints, the inter distance takes priority over the intra distance.
Table 2. Constraints Used in AIC, Intra and Inter Distances for Obtaining Optimal
Number of Clusters
Multi-Optimization 4 None
AlC 4 None
The same optimum number of clusters was obtained using the K-mean algorithm
for various cases of the objective functions [Table 3]. The objective functions OFa and
OF! render the same structure for optimal number of clusters because the resulting
values of OFa are strongly influenced by the function OFl. However, OF2 function
related to peak and OF3 function related to gradient give different structure to optimal
number of clusters. These functions evaluate properties of pattern vectors such as
occurrence of peak flows or gradient between successive events and as a result, deals
with properties which are least correlated. It is in this vein that these functions will
provide optimal structure of clusters in specific situations such as flood or drought
analysis.
The Markovian transition from one cluster to another is summarized in Table 4.
The cluster centres in each season are exhibited in Figure 4. As each reference vector
is unique, the OFa function has been effective in classifying streamflow data, especially
for peak considerations. It is noted that if one were to consider drought characteristic,
one would replace the OF2 function to reflect the low flow characteristics.
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 373
Cluster-I: 13
Dry Season OFI 4 (optimum) Cluster-2: 10,12,15
Cluster-3: 3,4,6,8,11
Cluster-4: 1,2,5,7,9,14
Cluster-I: 1,3,4,6,9,10,11,12,13,14,15
2 (optimum) Cluster-2: 2,5,7,8
Cluster-I: 1,2,3,4,5,6,7,10,13,15
2 (optimum) Cluster-2: 8,9,11,12,14
Based on the above cluster configuration and their intra and inter structural
relationships, streamflow patterns were synthesized for the Thames River at Thamesville.
The observed and synthesized Markovian transition probabilities for various clusters are
summarized in Table 5. In this table, the variation between the observed and synthesized
Markovian structure is less than 5 % . In other words, the Markovian structure is
preserved in the synthesized streamflow patterns. A few sample realizations of
synthesized streamflow patterns are exhibited in Figure 5. The variations in these
realizations indicate the flexibility of the proposed procedure in synthesizing the extreme
as well as the normal streamflow characteristics.
The results of the forecast model are given in Figure 6. The forecast sequence at
three sequential time stages from April, May and June 1966 are made on the assumption
that these data points are not known. The forecast model needs further improvements.
374 T. KOJIRI ET AL.
120
Ca)
100
:@" 80
CO)
<
.s
(I) 60
01
la
.s::.
0
.!!!
0 40
20 "'I" ............... ..
0
0 2 3 4 5 6
Elements of Pattern Vectors
120
.
.,
(b) "
!\
100 ! ~
l \
.
.
~
. \
! ,
.. .
~
:§: 80
CO) f \
,
.s.
(
! ~
: "
CD 60 : \
! \
~
.c. :
.'
.\
\
'
0 2" :
is 40
:
','
'0'
t·"
'"
I
\
•
\
1" '" ,
3 ."......... ...... \
20 .;
..
, "
'- '"
1
\ I """
/
••••• ~"'" \ . - - .. - - - - - .... - ................... J-#"
Figure 4. Representative Reference Vectors for (a) Dry Season [Oct. to March]
and (b) Wet Season [April to September].
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 375
/" 1 2 3 4
.... c: 0.0 (0.0) 0.0 (0.0) 1.0 (1.0) 0.0 (0.0)
o ~ 1
~ ~
.... tI:l 2 0.496 (0.5) 0.0 (0.0) 0.26 (0.25) 0.244 (0.25)
~~ 3 0.135 (0.143) 0.0 (0.0) 0.602 (0.571) 0.263 (0.286)
Uel
4 0.0 (0.0) 0.330 (0.333) 0.316 (0.333) 0.354 (0.334)
~ote: HistOrIcal values of probalJihties are given in parenthesIs.
/' 1 2 3 4
120
100
'0 80
Cli
(
S
60 3rd Cluster
'\
Q)
Cl
"-
2nd Cluster
co
.J::.
0
UI
is 40
20
..', " ,,,
'
0
~
--
-6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24
Observed Simulated
lime in Monltls
50
45 U:GENO
Observed
40
-+-- 1=.4
35 -)« ~~
1=5 Predlellon TIme
..·.,
·
~
··· ,,,,
(')
( 30 -~- 1=6 ,,
S
··· ...,.
---7
Q) 25 ,
~
·
·· .,.,
.J::.
0 20
UI ............. ,
is ,
15
,,
,
,
\" ://
10 ,, ,,
,
,, ~'
5 ~
---. '.
-
0
0 2 3 4 5 6 7 8 9 10 11
Season 2 Season 1
TIme in Monltls
CONCLUSIONS
Several objective functions are proposed to improve upon the existing pattern recognition
(PRS) system for streamflow pattern analysis and synthesis. Specifically, three objective
functions considering the properties of shape, peak, and gradient of streamflow pattern
vectors are proposed. Similar objective functions can be formulated to consider other
specific properties of streamflow patterns. AlC, intra and inter distance criteria are
reasonable to arrive at optimal number of clusters for a set of streamflow patterns. The
random initialization technique for the K-mean algorithm appears superior, especially
when one can reduce 20 times the initialization condition runs to arrive at an optimal
structure of clusters. The streamflow synthesis model is adequate in preserving the
essential properties of historical streamflows. However, additional experiments are
needed to further examine the utility of the proposed synthesis model.
REFERENCES
Ismail, M.A. and M.S. Kamel (1986) Multidimensional Data Clustering Using Hybrid
Search Strategies. Unpublished report, Systems Design Engineering, Univ. of Waterloo.
Kojiri, T., Ikebuchi, S., and T. Hori (1988) "Real-Time Operation of Dam Reservoir by
Using fuzzy Inference Theory". A paper presented at the Sixth APD/IAHR Conf. held
at Kyoto, July 20-22, 1988.
Panu, U.S., Unny, T.E. and Ragade, R.K. (1978) "A Feature Prediction Model in
Synthetic hydrology Based on Concepts of Pattern Recognition". Water Resources
Research, Vol. 14, No.2, pp. 335-344.
Panu, U.S., and Unny, T.E. (1980a) "Stochastic Synthesis of Hydrologic Data based on
Concepts of Pattern Recognition: I: General methodology of the Approach". Journal of
Hydrology, VoL, 46, pp. 5-34.
Panu, U.S., and Unny, T.E. (1980b) "Stochastic Synthesis of Hydrologic Data based on
Concepts of Pattern Recognition: I: Application to Natural Watersheds". Journal of
Hydrology, VoL, 46, pp. 197-217.
Suzuki, E. (1973) Statistics in Meteorology, Modern Meteorology No.5, Chijin-syokan
Co., 4th Edition, pp. 254-261, [in Japanese].
Unny, T.E., Panu, U.S., MacInnes, C.D. and Wong, A.K.C. (1981) "Pattern Analysis
and Synthesis of Time-dependent Hydrologic Data". Advances in Hydroscience, Vol. 12,
Academic Press, pp. 222-244.
378 T. KOJIRI ET AL.
APPENDIX
Classification procedure
By using the K-mea.ns algorithm, the reference vectors at K cluster centres are obtained
as follows:
(i) define K initial cluster centres, in other words, the tentative reference vectors.
The arbitrary pattern vectors are available for these centres. The following
matrices are defined to explain the procedure.
z(j,1)
z(j,2)
Z(j,u) = (12)
z(j,6)
x(i,1)
x(i,2)
X(i,t) = (13)
x(i,6)
Where, Z(j,u) is the jth reference vector at uth iterative step in the K clusters, and
X(i) is the pattern vector consisting of data point x(i,t), t=I,2, ... ,6.
The new centre at the cluster, j is decided among the pattern vectors belonging
to the cluster, j as follows.
N(j)
Z(j, u+1) =1/N(j) E x(i) (16)
i=l
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 379
Where, NO) is the number of the pattern vectors in the rearranged cluster, j.
(iv) if Z(j, u+ 1) = Z(j,u), the iteration stops. Otherwise, go back to the step (ii),
iteratively.
(v) calculate the maximum intra distance, DK(K) and the distance between the centres
i.e., the inter distance, EK(K) according to DK(K) at K clusters.
(vi) go back to the step (i) with the next number of clusters, (K + 1). Otherwise, the
iterations are completed.
Upon completion of the above procedure, the optimum number of clusters should be
decided using criteria as described below.
[1] The first criterion is to give the thresholds to the objective function and choose the
minimum number of clusters. For example,
(i) the intra distance, DK(K) which is similar to the objective function at
cluster centres, is less than 3.
The inter distance is calculated as the mean values of distances for same combinations
of the reference vectors, because any reference vector is possible to be the centre of the
objective function [Equation (10)].
[2] The second criterion is to decide the optimum number of clusters through the multi-
optimization technique. The objective function is formulated as the vector as follows.
[DK(K) --min]
EK(K) "max
(19)
When the restriction of EK(K) is relaxed in the optimization and the value of EK(K) is
calculated according to the value of DK(K) in the same cluster number K, each
combination [DK(K), EK(K)] is plotted in Figure 3.
380 T. KOJIRI ET AL.
Considering the arc AB as the pareto optimum, (also called as transformation curve
(TC», one can convert this curve on the equivalent coordinates. The arc AB represents
one part of TC, which is defined by the limited number of clusters. Coordinate of EK(K)
is multiplied by the rate II BC II1II AC II, because the points A and B have the same
weight and the same difference from the goal, which has been found to be the most
desirable position [Kojiri et al (1988)]. The multiplication factor 'Y is calculated as
follows.
y="/«e2-eO)2_ (el-eo)2)/«dl-dO)2- (d2-dO) 2) (20)
Where, (eO, el, e2) and (dO, dl, d2) are the elements of EK(K) and DK(K) coordinates
of the points A, B and C. On an equivalent coordinates, the shortest point from the goal
becomes the optimum. The optimum number of clusters is decided by the corresponding
cluster number. The element dO is taken as the minimum value among all the probable
intra distances or the value zero such that all pattern vectors are same. The element eO
is taken as the maximum value among the probable inter distances or the intra distance
in the case of cluster number 1, where all pattern vectors are in the significant range of
reference vectors. If the necessary conditions are not satisfied, the indifference curve (IC)
is drawn parallel to the line which passes through points A and B.
[3] The third criterion is to estimate the distribution of the pattern vectors using the
Akaike Information Criterion (AlC). Assuming that the K-means algorithm gives
the optimum value of the objective function for pattern vectors distributed
normally around the centre, the maximum log-likelihood of the cluster, j is
represented as follows.
I:
N(j)
w(j) = [(log21t/4) (OFa(x(i),Z(j,u»2] (22)
~
As the whole information is given by summarizing W(j) against j, the optimum number
of the clusters is decided as the number which denotes the minimum value of the
following equation among all the clusters.
K
AIC= ~ w(j) + 2K - min (23)
Where, the second term means the number of the parameters treated in the objective
function. In the multi-optimization and the AIC, it is not necessary to consider the
constraints to arrive at the optimum solution. However, the K-means algorithm has the
flexibility according to the observed pattern vectors.
REMUS, SOFTWARE FOR MISSING DATA RECOVERY
INTRODUCTION
In order to manage adequately their water resources, Hydro-Quebec often uses
simulation of energy at different points of the hydrological network. This simulation is
based on monthly means computed from daily observed flows. However, it is possible
that some daily values are missing. Hydro-Quebec will then reject the calculated
monthly mean flow when four or more daily observations are not available or if they
seem incorrect. Furthermore, the monthly means may be rejected for many consecutive
months at certain sites. Also, when a basin is large, more than one station is needed to
obtain a good estimation of flows at sites of reservoirs. As very few stations have
complete series on a long period, it is therefore very important to be able to estimate
these missing values in order to obtain reliable estimates from the energy production
simulation models.
The missing values for a given site are estimated by multiple regression using data
from other sites. Until recently, Hydro-Quebec used the software REMUL, which is an
adaptation ofHEC-4, developped by the US Army Corps of Engineers (Beard, 1971).
HEC-4 and REMUL suffer from many weaknesses mainly due to the theoretical
hypotheses which must be verified prior to using the regression models. Some of these
are:
If theoretical assumptions are not fulfilled, data may be incorrectly recreated and the
interpretation of the results may be fallacious. Therefore, in order to overcome those
weaknesses, ReMuS has been developed at INRS-Eau, within a partnership project with
Hydro-Quebec and NSERC.
The first part of this paper shows how it is possible to make data recovery using
multiple regression. In the second part, we discuss problems caused by multicollinearity
and the solution proposed in ReMuS, that is the ridge regression. Also, we explain the
procedure that gives an optimal value of the ridge parameter k. In the third part, we
381
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 381-393.
© 1994 Kluwer Academic Publishers.
382 H. PERRON ET AL.
present the multivariate regression. A procedure that considers the relation between
dependent variables. We show how the parameters are estimated and the data
reconstituted.
Finally, in the last part, we present the software ReMuS and all its characteristics.
Estimation of parameters
In the multiple regression procedure used by Hydro-Quebec for extending data series,
the relation between the dependent variable Yand the explanatory variables Xl, X2, .. ,
Xp is linear, i.e. given by the following expression:
(l.)
The parameters, Pi, must be estimated from observed historical data series. This is done
by the least square (LS) method, which consists in minimizing the squared residuals ei :
(2.)
I n
Ifwe let:
n
[11
m
x\2
x"
y =
X 21 X 22 ... xx"
... 2p b = PI
Y2 X = .. e
nxl ;n nx(p+l) : : : (p+l)xl :
and
nxl = (3.)
1 x nl x n2 xnp Pp
Prior to making the data reconstitution, it is necessary to examine the residuals to verify
that basic assumptions are fulfilled, especially that :
REMUS, SOFfWARE FOR MISSING DATA RECOVERY 383
If the analysis of residuals indicates that the underlying assumptions are violated, it is
usually possible to correct the problem by an appropriate transformation of the
variables. Two common situations where a transformation is needed are when :
1- the relation between the dependent variable and the explanatory variables is not
linear. In this case, one may transform the explanatory variables to linearize the
relation;
2- the residuals are not normally distributed and/or their variance is not a constant.
One may apply a transformation (for instance Box-Cox, 1964) to the dependent
variable
Reconstitution of data
Not only the prediction of the dependent variable is important for Hydro-Quebec, but
also extensions of data series, which preserve the mean and the variance of the observed
series. To this end, one computes first the value predicted by the model:
(5.)
It can be shown that the mean of the estimated value ofY is equal to that of the observed
series. However, the variance is not reproduced by this estimator. Fiering (1923)
proposed to add the random term given by :
(6.)
=L (Yi - yy .
n
SeE (7.)
;=1
(8.)
preserve the mean and the variance of the n observations Yj. Y2• ...• Yn used in the
regression.
384 H. PERRON ET AL.
RIDGE REGRESSION
Estimation of parameters
The problems of inverting the matrix X'X in the classical regression model (eqn I) when
multicollinearity is present can be transposed to the matrix rxx constructed from the
standardized variables XI,.Xn. Ridge regression estimators are obtained by introducing
a constant k?O in the normal equations of the LS procedure:
(9.)
where b R = (b~, b:, ... , b:) is the vector of standardized parameters estimated by ridge
regression, Ip is the p x p identity matrix, and ry.x the vector YX obtained from the
standardized variabels Xj, .. X.IJ. and Y. A constant k IS thus added to each element on the
diagonal in the matrix rXX' This facilitates the inversion of the matrix. The solution of
the system of equations depends now on the constant k.
(10.)
The value of the k is related to the bias of the estimators. If k=O, equation (10) is
equivalent to (4), and the ridge estimators correspond to those obtained by ordinary LS.
When k>O, the estimators are biased, but more stable than the LS-estimators. As for an
ordinary regression, the analysis of residuals can be done, the data can be transformed if
necessary, and finally missing data can be reconstituted by ridge regression following the
same procedure as in the case of ordinary LS analysis.
Determination of k
It can be shown (Hoerl and Kennard, 1970a) that increasing the value of k leads to
increased bias of b R , but its variance decreases. In fact, it is always possible to find a
value of k such that the ridge estimators have smaller mean square error than the
ordinary LS estimators. However, the choice of the optimal value of k is difficult. A
commonly used method for determining k is based on a graphical inspection of traces of
the estimates of the p parameters as function of k. Figure 1 is a typical example of ridge
traces for a model with three independent variables.
In general, the values of the estimated parameters can fluctuate considerably when k
is close to zero, and can even change sign. However, as k increases, the values of the
estimated parameters stabilize. In practice, one examines the ridge traces and chooses
graphically the smallest value of k in the zone where all traces show reasonable stability
(Hoerl and Kenard, 1970b). However, Vinod (1976) showed that this procedure may
lead to an overestimation of k, and devised an alternative method by which k is
estimated automatically. This procedure uses the index ISRM defined by :
REMUS, SOFfW ARE FOR MISSING DATA RECOVERY 385
ISRM = L [S;A -k )
P 2
-1
1 (11.)
;=1 L~
j=1 Aj +k
where AI' A2 , ... , A p are the eigenvalues of the matrix r.\X. The index is zero if the
explanatory variables are uncorrelated. Vinod (1976) suggests to use the value of k
which corresponds to the smallest value of the index ISRM.
k
Figure 3.1. Example of a trace for a model with 3 variables.
MULTIVARIA TE REGRESSION
Let ~, li, ... , ~ be a set of q dependent variables, and XI' X 2 , ••• , X p be p explanatory
variables. Assu~e that we have n .correspon~ing measurements of J:'li' Y2i' ... , Yqi and
Xli' X 2i ' ... , X pi , 1 =1, 2, ... , n (for mstance, discharges measured dunng n years at p+q
sites). Moreover, the values of the explanatory variables are assumed known exactly.
Hence, using matrix notation, the multidimensional regression model can be written in
the following form:
y= b X + qxn
e (12.)
qxn qx(p+l)(p+l)xn
where
YI2 YI3
Y =[Y"
Y:I
Y22 Y23
Y'"]
Y2n
:
_ [
- YI Y2 •••. Y n]
qxn :
Yql Y q2 Y q3 Yqn
X
(p+l)xn
=[ 1
XII
:
XI2 xl3
.:.]-[
: - XI x2 ... xn ]
x pl x p2 Xp3 xpn
[P,"
/311 /312
P" ]
b
qx(p+l)
= /320
:
/321 /322
/3~p = [b o bl '" bpJ
/3qo /3ql /3q2 /3qp
[."8:
...
8 21
8 12
8 22
813
8 23 ... &,.]
e = 8 2n
:
_ [
- el e2 ... en]
:
qxn
1
8 q2 8 q3 8 qn
The matrix Y contains n column vectors YI' Y2' ... , Yn whose q elements correspond to
measurements of the q dependent variables for a given period. The X matrix contains n
column vectors xl> x2 ' ... , xn with p+ 1 elements. The first element of each of these
vectors is equal to one, whereas the others correspond to the p explanatory variables for
a given period. The b matrix contains the column vectors /30' /31 , ... , /3P' The first vector
corrresponds to the intercept and each of the following vectors corresponds to a given
REMUS, SOFTWARE FOR MISSING DATA RECOVERY 387
explanatory variable (f3ij' j -::1= 0, is the parameter of the explanatory variable Xj for the
dependent variable Yi).
Finally, the e matrix contains the n vectors &1' &2' ... , &n of error terms. We assume
that each of the residual vectors is multi dimensionally normal distributed with zero mean
and covariance matrix I:
e; "" N( 0, s),
qxl q><q
Vi
therefore, there is, for this model, a non-zero correlation between error terms
corresponding to different dependent variables.
Estimation of parameters
B
fl20 fl21 fl22 fl2p
bl ... b p]
qx(p+l) =
-- [b 0 (13.)
the LS method is once again invoked. Srivastava and Carter (1983) show that the
estimator ofB is given by
B=[b o b l bp ]
=)T)(,(){){,)-I (14.)
The estimators /3; of flij are jointly distributed according to a multidimensional normal
distribution with ~ean and covariance matrix given by
and (15.)
where a;- and ail are elements of the matrix (xx't l ,and CTu and CTij are elements ofthe
matrix 1:. Note that the parameter estimators are unbiased and correlated, reflecting the
correlation that exists between the dependent variables. If q independent multiple
regressions had been made, this correlation would have been zero, that is, for the two
variables 1; and ~, we would have:
Once the parameters for each dependent variable have been obtained, their significance
should be tested. The software ReMuS permits one to test, for each explanatory
variable, if the corresponding parameter vector is equal to the null vector. For a given
variable X k , the following hypotheses are tested:
388 H. PERRON ET AL.
Ho :Plk =P2k =.. , =Pqk = 0 against HI: at least one Pik :1; 0
(n-p-q-l) (I-U)
F.k -- k
(17.)
q Uk
where:
Reconstitution of data
In a given period, for example a year, where the dependent variables are missing, we
want to reconstitute the vector YI = (YII' Y21, ... , Yql ) from the observed vector
XI =(XII' X21' ... , Xpl) by means of multidimensional regression in a way that preserves
the mean, the variance, and the correlation structure observed in the series of dependent
variables. The same principle as in the case of multiple regression is used here. In fact,
we first compute the prediction, YI , and then add a random vector drawn from a
multidimensional normal distribution to obtain the reconstituted data vector, YI' More
precisely, we have the following expression :
REMUS, SOFrWARE FOR MISSING DATA RECOVERY 389
0"]··· - [~,,]
f310 f31l f312 f3IP
021 Y21
y = B X +d = f320 f321 f322 f32P X2/
where the vector 0 1 has a multidimensional normal distribution with zero mean and
covariance matrix 1:6 defined by
The software ReMuS was developed in order to overcome some of the deficiencies in
REMUL, which was previously used by Hydro-Quebec. The following improvements
have been introduced in the new software:
- Automatic optimal value of "k". If the user wishes so, ReMuS suggests an optimal
value for k (in ridge regression), depending on the chosen variables.
- Testing the hypotheses. One of the important weaknesses ofREMUL and HEC-4, is
that they do not check the following hypotheses:
390 H. PERRON ET AL.
ReMuS includes many graphical tests allowing the user to examine those hypotheses.
Transformations. Residuals may be normalized by a Box-Cox transformation of the
dependent variable.
ReMuS also gives the user a choice of transformations (the most widely used in
practice) of the independent variables in order to obtain a linear relationship with the
dependent variable.
Moreover, ReMuS is a user-friendly software that provides many other tools to help the
user in the modeling phase:
• correlation matrix;
• the Y vs X graphic;
• the graphic of concomitances;
• on-line theoretical and technical help.
After the estimation of a given model, its adequateness can be graphically visualized.
The output of the regression is :
ReMuS contains a procedure which permits the user to automatically select the
optimal choice of the constant k as a function of the chosen independent variables.
Having chosen the explanatory variables and the constant k, the user can invoke the
model computation procedure, examine the results graphically, and analyse the residuals
just as in the case of multiple regression. If k is set equal to zero, one obtains the same
result as with mUltiple regression.
Analysis of residuals
In multiple (or ridge) regression, it is important to verifY the basic assumptions
concerning the residuals:
It can also show the residuals as function of an independent variable which permits the
user to identifY the variables that convey information to the model.
Reconstitution of data
The reconstitution of data is based on known explanatory variables. Equation (8)
permits one to preserve the mean and the variance of the observed data series. However,
the user can choose to omit the random term, 0;. In this case, the reconstituted data are
those obtained directly from the regression.
ReMuS produces the following output, which can be used to verify the quality of the
reconstitution:
We have included the Wilcoxon's (1945) test in ReMuS which can be used to verify if
the means of observed and predicted data are significatively different. Likewise,
Levenne's (1960) test for equal variance in two data sets is available in ReMuS.
graphic of Y versus X, that is the relation between the dependent variable and
each of the explanatory variables
graphic of concomitance, which may help to choose the explanatory variables
correlation matrix
Box-Cox transformation, which permits the user to normalize the residuals
other classical transformations :
• 1/ X
• 11 X 2
• 10gX
• .JX 2
• X
which permits the user to linearize the relation between the explanatory variables
and the dependent variables
technical as well as theoretical on-line help;
can be used for monthly, annual, weekly (or other time period) means and for
other type of data (ex. flood flows vs rain, basin area, etc ... ).
CONCLUSIONS
As discussed in the introduction, HEC-4 and REMUL are based on hypotheses witch
are not always valid. This may result in incorrect data reconstitution. The software
ReMuS, which includes basic functions similar to those in HEC-4 and REMUL, has been
developed in order to cope with the following problems :
These additions make ReMuS a powerful tool to reconstitute missing data and to extend
short data series in hydrology as well as in many other domains.
REMUS, SOFfWARE FOR MISSING DATA RECOVERY 393
REFERENCES
Beard, L.R (1971). HEC-4 Monthly Streamflow Simulation. The Hydrologic
Engineering Center, Corps of Engineers, US Army, Davis, California 95616.
Bernier, J. (1971). Modeles probabilistes a variables hydrologiques mUltiples et
hydrologie synthetique. International Symposium on Mathematical Models in
Hydrology, Warsaw. _
Box, G.E.P., Cox, D.R (1964). An analysis of transformations. Journal of the Royal
Statistical Society, Ser. B, 211-252.
Devroye, L. (1986). Non-uniform random variate generation. Springer-Verlag, New
York.
Feiring, M.B. (1963). Use of Correlation to Improve Estimates of the Mean and
Variance. United States Geological Survey, Professionnal Paper 434C.
Hoerl, A.E. and Kennard, RW. (1970a). Ridge regression: Biased estimate for non
orthogonal problems. Technometrics, 12, 55-67.
Hoerl, A.E. and Kennard, RW. (1970b). Ridge regression: Applications to non
orthogonal problems. Technometrics, 12, 69-82.
Law, A.M. et Kelton, W. (1982). Simulation Modeling and Analysis. McGraw-Hill,
Inc.
Levene, H. (1960). Robust tests for the equality of variances, in Contributions to
Probability and Statistics, ed. I. Olkin. Palo Alto, Stanford University Press, 278-292.
Srivastava, M.S. and Carter, E.M. (1983). An Introduction to Applied Multivariate
Statistics. North-Holland, New York.
Vinod, H. D. (1976). Application of new ridge regression methods to a study of Bell
system scale economies. Journal of the American Statistical Association, 71, 835-841.
Wilcoxon, F. (1945). Individual comparison by ranking methods. Biometries, 1, 80-
83.
SEASONAliTY OF FLOWS AND ITS EFFECf ON RESERVOIR SIZE
which gives the inflation factor required when the annual flows have a
serial correlation p , as compared to when they are independent. The
above two results are valid for over-year systems where the effect of
seasonality of flows is damped out.
There are some situations, however, when we are interested only in a
preliminary estimate but simple and elegant expressions such as (1)
and (2) are not available. In such situations, one takes recourse to
charts or tables, quantitatively relating the various quantities of
interest. A prime example of such results is the charts of Svanidze
(see Kartvelshvili (1969») giving the reservoir size against the
coefficient of variation of annual flows, for different draft ratios,
reliability and annual serial correlation coefficients, for flows
having the general three-parameter gamma distribution - the Kritskiy
and Menkel distribution.
In this paper we consider the effect of seasonality of flows on the
reservoir SIZe for within-year systems, and for reasons given below,
our results belong to the middle category just mentioned, i.e. are of
fairly general aPflicability yet not simple enough to be expressed in
analytical forms 0 the type (1) and (2).
It is known, of course that a reservoir with flows which have a
seasonal variation would need to be larger than one without seasonal
variation for the same reliability, yield and annual flow
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 397
with the same value of p for all the seasons. The values of n for
the four seasons are denoted by nl,n2,~,n4' The annual flow also is
negative binomial with the same value of p and with
n = nl+~+n3+n4' to be denoted by k. The mean of X for the above
distribution is nq/p and Cv is (nqrl.l2. The probability
generating function (p.g.f.) is P(,) = pn/(1-q,)n. We fust take
values of k and p such that the annu8i flows have a specific mean
" and Cy- Table 2 gives the cases considered.
TABLE 1
SEQUENCES OF SEASONAL MEANS FOR VALUES OF R',
T' SHOWN
T' R' 3 4 5
TABLE 2
Values of parameters p and k for given Ii and Cv
(6)
To use the above result for our case, (N = 4), with the seasonal
flows having negative binomial distributions, we take, for given
values of p and k, values of nl,~,n3n4 such that the seasonal
means form a specified sequence. Using these Pi(8) in (6), the
probability of emptiness (or failure) of the reservoir was calculated
for all the 116 sequences of seasonal means for all the cases
considered in Table 2. It was found that within each group with the
same values of R' and T'. they were fairly close to each other.
5. EFFECT OF SEASONALITY ON PROBABIllTY OF FAILURE
First, to remove the dependence of the two measures on the unit of
measurement, as well as for convenience, we introduce the adjusted
measures R = R'!9 and T = T' /-6. The range for both R and T is
(0,1). Note, because of the way we selected the unit of measurement
in Section 2, the above relations are valid for both the cases of
draft values, and in general we have R = 4R' /3", and T = -zr, /""
with R' and T' calculated as in Section 2. To obtain an effective
relation between PE , the probability of emptiness or failure of the
reservoir and Rand T, (for a given combination of p and k and
reservoir size S) we tried fitting regression equations of PE on
various functions of R and T. The best fit obtained was for a
relation of the form
PE = fJ + '1T + sR2 .
Table 3 give the required regression coefficients fJ, '1, S for all
the cases considered. Figure 1 gives the probability PE plotted as
a function of R, for different values of the reservoir size Sf""
Cv and draft ratio 67%, and Figure 2 gives similar graphs for the
draft ratio 50%.
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 403
TABLE 3
Values of parameters fJ , .., , 6 for various cases.
S/p
S/p
(i) Cv
0.30
0.707
0.24
0.646
P 0.18
0.577
0.12
0.500
0.06
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(ii) Cv
0.20
0.707
0.15
0.646
Q.10
p 0.577
0.05 0.500
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(iii) Cv
0.15
0.12 0.707
0.09 0.646
P 0.06 0.577
0.03 0.500
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(i) Cv
0.20
0.612
0.15 0.570
0.10 0.500
P
0.433
0.05
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(iI) Cv
0.15
0.12 0.612
0.09 0.570
P 0.06 0.500
0.03 0.433
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(ill) Cv
0.20
0.791
Ol5
OlO
P
0.612
0.05 0.570
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
6. CONCLUSIONS
The seasonality of flows, for the case of four seasons, is quantified
by two measures, R and T; if one considers the situation where we
have more than four seasons, say months, then the measure R would
remain the same, but, perhaps, T would be modified to be the
depletion of the last three seasons. For different values of these
two parameters and draft-ratios of 50% and 67%, and for six values of
Cv ' the probabilities of emptiness of the reservoir are calculated
for different reservoir sizes. The method used for this calculation
was analytical, using the al?proximating technique of the bottomless
reservoir. These probabilities enable(l regression equations to be
formulated, linking probability of emptiness to R, for various
values of T, Cv ' draft-ratio, and reservoir size.
REFERENCES
Feller, W. (1967) "An Introduction to Probability theory and its
applications, 3rd ed. YoU, John Wiley, New York.
Fiering, M.B. (1967) Streamflow Synthesis, Harvard Univ. Press,
Cambridge, Mass.
Gould, B.W. (1964) Discussion of Paper by Alexander, In Water
Resources Use and Management, Melbourne University Press, Melbourne;
161-164.
Hazen, A. (1914) "Storage to be J>rovided in impounding reservoirs for
municipal supply". Trans. Am. Soc. Civ. Engrs. 77, 1539-1640.
Kartvelshvili, NA (1969) Theory of Stochastic Processes in Hydrology
and River Runoff regulation, Israel Program for Scientific
Translation, Jerusalem.
Kritskiy, S.N. and Menkel, M.F. (1940) "A generalized approach to
streamflow control computations on the basis of mathematical
statistics" (in Russian) Gidrotekhn. Stroit., 2, 19-24.
Moran, PAP. (1959) The Theory of Storage, Methuen, London.
Phatarfod, R.M. (1979) "The Bottomless dam" J. Hydrol., 40, 337-363.
Phatadod, R.~. (1980) ''The Bottomless dam with seasonal inputs"
Austral. J. Statist., 22, 212-217.
Phatadod, R.M. (1980) ''The effect of serial correlation on Reservoir
size" Water Resources Research, 22, 927-934.
Rippl, W. (1883). ''The capacity of storage-reservoirs for water-supply"
MID. Proc. Instn. CIV. Engrs. 71, 270-278.
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 407
The Hurst effect is approached from the hypothesis that there is a fundamental problem
in the estimation of the Hurst exponent, h. The estimators given throughout the literature
are reviewed, and a test is performed for some of those estimators using i.i.d. and a non-
stationary stochastic processes. The so-called GEOS diagrams (R,,*/nO.5 vs. n) are
introduced as very powerful tools to determine whether a given time series exhibit the
Hurst effect, depending on the value of the scale of fluctuation. Various cases of the test
model are presented through both the GEOS and GEOS-h diagrams. Results indicate that
indeed there are problems in estimating h, and in some cases it could be due to an
erroneous estimation when using the classical estimators. A proposed estimator gives
better results which confirms the pre-asymptotic behavior of the Hurst effect.
INTRODUCTION
The Hurst exponent, h, has become one of the most important scaling exponents in
hydrology, transcending its old presence in hydrology and reaching status in the recent
literature on chaos and fractals (Mandelbrot, 1983; Feder, 1988; Schroeder, 1991). In
hydrology the whole paradox of the Hurst effect (Hurst, 1951) has received a renewed
attention due to the implications and the physical significance of its existence in
geophysical and paleo-hydrological time series (Gupta, 1991; Poveda, 1992), and also
because we have shown that the existence of the Hurst effect is not such a widespread
universal feature of time series, neither geophysical nor anthropogenic (Poveda, 1987;
Poveda and Mesa, 1991, Mesa and Poveda, 1993).
In part 2 we make a brief introduction on the Hurst effect. In part 3 some of the
approaches given to explain the paradox are mentioned. In part 4 we review the
estimators of h, and the hypothesis of the Hurst effect as the result of an incorrect
estimation of h, is developed. Section 5 presents the so-called GEOS and GEOS-H
diagrams for a non-stationary stochastic process. And part 6 presents the conclusions.
409
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 409-420.
© 1994 Kluwer Academic Publishers.
410 G. POVEDA AND O. 1. MESA
The Hurst effect has been extensively studied in hydrology since the original paper by
Hurst (1951), and therefore its classical definition will not be developed here (see Mesa
and Poveda, 1993 or Salas et al., 1979, for detailed reviews). Let us define the Hurst
effect as an anomalous behavior of the rescaled adjusted range, Rn", in a time series of
record length n. For geophysical phenomena and "anthropogenic" time series Hurst
(1951) found the power relation Rn" = anh, with a=0.61 and the mean value of h=O.72.
For processes belonging to the Brownian domain of attraction it can be shown that the
expected value and the variance of the adjusted range is (Troutman, 1978; Siddiqui, 1978,
Mesa and Poveda, 1993):
(1)
(2)
I
where the scale of fluctuation, 9, is given as (Taylor, 1921; Vanmarcke, 1983)
tp(~)d~
8 = w:
T-
Tr(T)
(3)
n8(0)
Different types of hypotheses have been set forth to explain the paradox are reviewed in
Mesa and Poveda (1993), and a brief review of the models proposed to mimic the Hurst
effect (preserve h>O.5) is made by Boes (1990) and Salas et al.(1979b). Basically, the
problem has been explained as the result of violations of the functional central limit
theorem hypotheses: a) the correlation structure of geophysical processes, b) a pre-
asymptotic transient behavior, c) non-stationarity in the mean of the processes, d) self-
similarity, e) fat tail distributions with infinite second moments. In addition to these, we
have examined the possibility of an incorrect estimation of the Hurst exponent.
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 411
Non-stationarity of the mean of geophysical time series has been found in several
phenomena (Potter, 1976). This means that either their central tendency changes in time,
or it exhibits sudden changes (shifting levels). This last idea has re-emerged in the context
of climate dynamics (Demaree and Nicolis, 1990), trying to explain climatic variability
in Central Africa as a result of recurrent aperiodic transitions between two stable states
whose dynamics is governed by a non-linear stochastic differential equation.
Bhattacharya et al. (1983) showed that the Hurst effect is asymptotically exhibited
by a process X(n) formed by weakly dependent random variables perturbed with a small
trend, as the following:
K(n) = Yen) + c(m+n)p , (4)
where Y(n) is a sequence of iid random variables with zero mean and unit variance, and
c and m are integer constants. The value of Pis tightly linked to the asymptotic value of
the Hurst exponent, h, in the following way: for - 00 < P : ; -0.5 then h = 0.5; for -0.5<
P < 0 then h = 1 + P; for P = 0 then h=0.5; and for P> 0 then h=1.
ESTIMA TORS OF THE HURST EXPONENT
Different estimators have been proposed in the literature in order to determine the value
of h in finite time series. Each of them has been linked to the hypotheses presented to
explain the paradox. In this section we make a review of these estimators in order to test
their performance for both iid and non-stationary processes (equation 4). In his original
work, Hurst (1951) used three different estimators:
Estimator 1. The slope of the line passing through the point (log 2, log 1) and the
center of gravity of sample values of log ~. vs. log n.
Estimator 2. For each of the i sample values of ~', an estimator Kj is given as
Chow (1951) questioned the adequacy of the linear relationship between log ~. and log
n passing through the point (2, 1) in logarithmic space. He proposed an estimator of h
(our estimator number 4) as the least-squares slope regression for sample values of log
~. vs. log n. That procedure applied to Hurst's data led to the relationship ~·=0.31no.87.
Estimator 5. Mandelbrot and Wallis (1969) suggested an estimator H defined as
the least-squares slope regression including all subsets of length j, 5 ::;; j ::;;n.
Estimator 6. Wallis and Matalas (1970) proposed a modified version of the
estimator 5, using the averaged values of ~'. Both the estimators 5 and 6 are biased in
the sense that they exhibit a positive asymmetrical distribution that diminishes as n
increases, and they also exhibit a large variance.
Estimator 7. In the chronology of the h estimation history, Gomide (1975) gives
a turning point because his estimator, YH, does not deal with least-squares slope
412 G. POVEDA AND O. J. MESA
regression. It is based on the expected value of R,,* for iid processes, in such a way that
Estimator 8. Using the functional central limit theorem, Siddiqui (1976) introduced
the asymptotic result for the expected value of R" * for ARMA (p,q) process. Based on that
result he suggested the SH estimator
where 'Yo is the ratio of theoretical variance of the process and the noise variance. The
ai •S and (j>;'s are the paraeters of the corresponding ARMA process. From (S) a similar
estimator, J, can be given as
Poveda (1987), showed that a large set of geophysical time series exhibit values
of SH which are close to O.S. This result is in agreement with analyses developed in the
context of Bhathacharya et al.'s (1983) definition: a value of SH=O.5 (SH<O.S) (SH>O.S)
implies that, for the largest value of n, the sample value of R,,*/nO.5 is exactly (below)
(above) its expected value, which can be derived from (1). As a result, it turns out that
the Hurst effect is not such a widespread feature of geophysical time series. Estimator 8
is also tantamount to the slope SH of the regression line of R,,* vs. n (log space),
impossing a fixed intercept: log (a1t/2)0.5. Therefore, this seems to confirm our hypothesis
of the incorrect estimation of the exponent as the cause of the Hurst effect. As a matter
of fact, these results confirm a pre-asymptotic behavior in the relation R,,* vs. n before
than the behavior h=O.5 settles.
Estimator 9. Anis and Lloyd (1976) introduced an estimator for h in the case of
iid normal distributed processes, as a function of the sampling interval n, as
E(R:) = [P(n-l/2)]
nO.s r( n /2 ) ,=1
E(n -r)lf2r
(12)
E(R*)
E(K) = II
log(nI2)
(13)
= 1 2 r[(n+l)/2] E(n-r)lf2
log(n!2) [1t n(n -l)f.5 r (n /2) ,...1 r
Estimator 11. McLeod and Hipel (1978) proposed two estimators of h. One is
based on the result obtained for E(R,.*) by Anis and Lloyd (1976), which is
Some of the estimators of h that have been proposed have been evaluated, and results
appear in Table 1. According to those results, the following conclusions can be drawn:
- Sen (1977, p.973, Table 1) presents an erroneous result for estimator 10 (Table
1, column 3) and the corrected values are shown here in Table 1, column 5. Also,
estimator 12 shows differences in Poveda's (1988) results compared with those of
McLeod and Hipel (1978), as can be seen in columns 6 and 7 of Table 1.
- Note that similar results are obtained with estimators 10 and 11 for values of n
~ 200. Despite the fact that estimator 10 was introduced for small samples , it produces
the same results as estimator 11 for n large. It is too simple to show their analytical
equality as n goes to infinity.
- Results obtained with estimators 9 and 13 differ for n=250, 500, and 2500. The
differences are due to simulated samples used to evaluate the latter one, as the former one
is an exact estimator.
414 G. POVEDA AND O. J. MESA
i I I 'I
II
<>
l
0 <>
0 0
0 0 0
<>
•
0
<08 0 0
I
<> 0° 0
iS IIi·
118 ,.,': ~
I 1;0
0 o 0 ~,
I
<> <>
<> S <><><><>~o
<> 0 0
81
et egg 0
<> 0 0 8<>0 0
J
0
-I
j
I I I I d 1IIIiI II ~~5
102 103 10 4
! ,
A non-stationary process such as the one given by (4) permits one to know the asymptotic
Hurst exponent, depending on the value of ~. We used that model to generate synthetic
time series of 20,000 terms to evaluate some of the aforementioned estimators of h, for
different values of ~. The estimators of h used are those numbered as 1, 2, 3, 6, 7 and
three other estimators described as follows.
ESTIMATION OF THE HURST EXPONENT h AND GEOS DIAGRAMS 415
Estimator IS. The least-squares slope regression of all sample values of log R",'
vs. log m, taking only values of m larger than or equal to n.
Estimator 16. Analogous to estimator 15, but in this case using averaged values
of R", * for each value of n.
Equation (4) was used to generate simulated non-stationary sequences, with
different sets of parameters c and m. Detailed analyses were conducted using two
different groups of parameters, the ftrst one for c=1 and m=I,OOO, and the second for c=3
and m=O. For the trend we used the values of /3= -1.0; -0.5; -0.4; -0.3; -0.2; -0.1; 0 and
0.5. As an illustration, Tables 2 and 3 show the results of the different h estimators for
the cases of c=l, m=I,OOO, /3=-0.3 (h=0.7), and c=3, m=O, /3=-0.2 (h=0.8), respectively.
For values of /3 different from 0 (existence of trend), the obtained results confirm
a poor performance of all the estimators, except for the estimator 16 which reproduces
with good accuracy the asymptotic results of h according to the value of /3, although the
pre-asymptotic interval is variable. With the ftrst set of parameters c=l, m=1000 the
estimator gives very good results for values of n ~ 2500, and in the second simulation
there is a variable interval n, from which the asymptotic result of h is reached. Again,
these results seem to confirm the hypothesis of the Hurst effect as a pre-asymptotic effect.
As it was mentioned before, the more precise definition of the Hurst effect deals with the
convergence in distribution of Ru"nb, with h > 0.5 (see Bhatthacharya et al., 1983).
416 G. POVEDA AND O. J. MESA
Recently, based on that definition we have introduced the so-called GEOS diagrams
(R,,"nO. 5 vs. n) and the GEOS-H diagrams (R,,"nb vs n, with h > 0.5) (Poveda,1987; Mesa
and Poveda, 1993). Based on those diagrams is a statistical test of the existence of the
Hurst effect in a given time series. The asymptotic distribution of R,,"nO. 5, for processes
belonging to the Brownian domain of attraction have a mean (I..t') and a standard deviation
(a') derived from (1) and (2) (Siddiqui, 1976; Troutman, 1978; Mesa and Poveda, 1993).
Convergence of sample values of R,,"n°.5 into the asymptotic interval given by 11'
± 2a' permits one to accept the hypothesis of non-existence of the Hurst effect. Thus, the
estimation of e becomes a fundamental issue for processes with a finite scale of
fluctuation (see Vanmarcke, 1983; Poveda and Mesa, 1991; Mesa and Poveda, 1993). On
the other hand, divergence of sample values of R,,*'nO.5 from that interval does not permit
the rejection of the hypothesis of non-existence of the Hurst effect in a time series.
For the case of the simulated sequences obtained using (4), the estimation of the
scale of fluctuation, e, makes no sense because its value is a function of time, and the
ergodic property fails to hold. Nevertheless, the qUalitative behavior of the sample values
of R,,"nO. 5 in the GEOS and GEOS-H diagrams was examined, for different values of c
and m. Some of the obtained results are shown in Figures 1 to 5.
Figure 1 shows GEOS diagram for the case c=lO, m=25 and ~=-1.0 (h=0.5). It is
clear that sample values of R,,*'n°.5 converge into the asymptotic interval 11' ± 2a'· For the
case of iid processes (11'=1.2533, a'=0.2733). The effect that trend produces on the iid
process is clearly observed in Figure 2 (GEOS for c=10, m=25, ~=-0.3, h=0.7). Sample
values of R,,"nO. 5 are contained within the asymptotic interval 11* ± 2a' corresponding to
the underlying iid process, except by a notorious bifurcation due to the trend itself. In this
case there is a clear evidence of the Hurst effect. This shows the power of GEOS
diagrams.
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 417
15
10 000
00/
~
00
0
~
0
0
~ I
'"
l
1
sL
0 0000
0
0
0
10 10
I
10
IIIU_ 10
§ ~ $8 0 000
10
~
lOS
I
~
"l
o
o
o 0
3 0 0
i "1
0
0
I 8 0
0
0
0
0
g 0 0
0
II
.~
°T 8
a8
0
00
o.L e
I
0
o 80 0
00 $00
0 0
O$C\>oo°""'l,
0
10 10 lOS
n
The parameter c indicates the relative weight of the trend. The stronger the trend
the slower the convergence to the asymptotic value of R..·/n h • Figures 3 and 4 illustrate
this point: the theoretical limit values are 0.19 and 19, respectively, and the only
difference in the parameters is in the value of c. Notice that in Figure 3 the limit is
already reached, whereas in Figure 4 not yet.
l ''']
" jill 'I i j "I
6.- / ~I
I 00
00
l
j
00
00
0
0
0
~I
0
I
2~
0
00
00
0/
<> <> 00 0 00
1
--,
00 <paP
j
0
0
I I II iii,
o 0 og 0
o 8
0
10 10 10 10 105
n
CONCLUSIONS
The non-stationary model given by (4) facilitates the performance of experiments on the
estimation of the Hurst exponent, h, and allows one draw conclusions about the existence
of the Hurst effect.
The estimation of the Hurst exponent, h, has been found to be a delicate an
sensitive task. Most of estimators given in the literature make a poor performance
according to the experiment developed in this work. Especially for the cases where bO.5.
This behavior is explained due to the treatment of the regression intercept in the relation
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 419
~01_---'----'-L_LLLL~~2---...L-L-L...L...L.LL;~3--L-_L-L~~~~'-L---L-L'..L'~05
n
~. vs. n (log space). The intercept is not independent of the slope in the relation Rn' =anh.
This consideration is violated in most of the estimators proposed in the literature, and in
the whole approach to the Hurst effect. Indeed, estimator 16 (Poveda, 1987) provides
better results because it is concentrated in the region of values of n where the pre-
asymptotic behavior has disappeared, or at least is less dominant, and therefore the values
of both the intercept and the slope (the exponent) are statistically fitted to the limit values.
This work has shown that the estimation of the Hurst exponent h is not a trivial
problem: there are many difficulties involved, and therefore the tests of hypothesis that
use the GEOS and GEOS-H diagrams constitute a more direct and conclusive tool to
identify the presence of the Hurst effect.
Possible errors in the estimation of h using least-squares slope in the POX
diagrams are produced by pre-asymptotic effects, and the shortness of the record.
Estimation of e provides an answer to that problem, even in the case of a short record.
For in that case nle, a measure of the record length, will be small. This fact is not
apparent in the POX analyses.
The Hurst effect holds its importance in hydrology and geophysical time series
modeling and prediction due to its links with self-similar processes, fractals and
multifractals.
420 G. POVEDA AND O. J. MESA
REFERENCES
Anis, A.A. and Lloyd, E. H. (1976) "The expected value of the adjusted rescaled Hurst
range of independent nonnal summands", Biometrika, 63, 111-116.
Battacharya, R. N., Gupta, V. K. and Waymire, E. (1983) "The Hurst effect under trends", Jour. Appl.
Probab., 20, 3, 649-662.
Boes, D. C. (1988) "Schemes exhibiting Hurst behavior", in J. N. Srivastava (ed.) Essays in honor of
Franklin A. Graybill, Elsevier, 21-42.
Chow, V. T. (1951) "Discussion on 'Long-term storage capacity of reservoirs' by H. E. Hurst", Trans.
A.S.C.E. pap. 2447, 8~802.
Demaree, G. R. and Nicolis, C. (1990) "Onset of the Sahelian drought viewed as a fluctuation-induced
transition", Q. J. R. Meteorol. Soc., 116, 221-238.
Feller, W. (1951) "The asymptotic distribution of tbe range of sums of independent random variables", Ann.
Math. Stat., 22,427-432.
Gomide, F. L. S. (1975) "Range and deficit analysis using Markov chains" Hydrol. Pap. No. 79, Colorado
State University, Fort Collins, 1-76.
Gupta, V. K. (1991) "Scaling exponents in hydrology: from observations to theory". In: Self-similarity:
theory and applications in hydrology. Lecture Notes. AGU 1991 Fall meeting, San Francisco, 1991.
Hipel, K. W. and McLeod, A. I. (1978) "Preservatiou or the rescaled adjusted range, 2. Simulatiou studies
using Box-Jenkins models". Water Res. Res., 14,3, 509-516.
Hurst, H. E. (1951) "Long-term storage capacity of reservoirs", Trans. ASCE, 116,776-808.
Mandelbrot, B. B. (1983) The/ractal geometry o/nature. Freeman and Co., New York.
Mandelbrot, B. B. and Wallis, J. R. (1969) "Some long-run properties of geophysical records", Water Res.
Res., 5, 2, 321-340.
Mesa, O. J. and Poveda, G. (1993) "The Hurst pbenomenon: The scale of fluctuation approacb", Water Res.
Res., in press.
McLeod, A. I. and Hipel, K. W. (1978) "Preservation of the rescaled adjusted range. 1. A reassessment
of the Hurst phenomenon". Water Res. Res., 14, 3, 491-508.
Potter, K. W. (1976) "Evidence for nonstationarity as a physical explanation of the Hurst phenomenon",
Water Res. Res., 12, 5, pp. 1047.
Poveda, G. (1992) "Do paleoclimatic records exhibit the Hurst effect"" Fourth international conference on
Paleoceanography (Abstracts), GEOMAR, Kiel, Germany.
Poveda, G. (1987) "El Fen6meno de Hurst" (The Hurst phenomenon, in spanish). Unpublished Master
Thesis. Universidad Nacional de Colombia. Medellin, 230 p.
Poveda, G., and Mesa O. J. (1991) "Estimaci6n de la escala de fluctuaci6n para la determinaci6n del
fen6meno de Hurst en series temporales en hidrologia" (Estimating the scale of fluctuation to determine the
Hurst effect in hydrological time series, in spanish). II Colombian Congress on Time series analysis. Bogota,
Universidad Nacional de Colombia.
Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram, G. G. S. (1979a) "On the Hurst phenomenon", in H.
J. Morel-Seytoux (ed.) Modeling hydrologic processes. Water Resources Publ., Fort Collins, Colorado.
Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram., G. G. S. (l979b) "Hurst phenomenon as a
pre-asymptotic behavior", Jour. 0/ Hydrology, 44, 1-15.
Siddiqui, M. M. (1976) "The asymptotic distribution of the range and other functions of partial sums of
stationary processes", Water Res. Res., 12,6, 1271-1276.
Taylor, G. I. (1921) "Diffusion by continuous movements". Proc. London Math. Soc. (2),20, 196-211.
Troutman, B. M. (1978) "Reservoir storage with dependent, periodic net inputs". Water Res. Res., 14, 3,
395-401.
Vanmarcke, E. (1983) Randomjields: analysis and synthesis. The M.1.T. Press, Cambridge.
Wallis, J.R. and Matalas, N.C. (1970) "Small sample properties of H and K estimators of the Hurst
coefficient h", Water Res. Res., 6, 1583-1594.
OPTIMAL PARAMETER ESTIMATION OF CONCEPTUALLY·BASED
STREAMFLOW MODELS BY TIME SERIES AGGREGATION
INTRODUCTION
Claps and Rossi (1992), and Murrone et al. (1992) identified stochastic models of
streamflow series over different aggregation scales starting from a conceptual
interpretation of the runoff process. In this conceptual-stochastic framework there are
conceptual parameters common to models related to different scales. The point of view
characterizing this framework, which is summarized in the next section, is that the
analysis of streamflow series should be extended beyond the scale at which data are
collected, taking advantage of information available from models of the aggregated data.
The question arise if there is a particular time scale (and, consequently, a particular
model) leading to an optimal estimation of a given parameter. The choice of an optimal
time scale is important because aggregation of data tends to reduce correlation effects
due to runoff components with small lag time with respect to the effect produced by
components with high lag time. At the same time aggregation reduces the number of
data and, consequently, the quality of estimates.
In the above approach, Claps and Rossi (1992) and Murrone et al. (1992) considered
a limited number of time scales, such as annual, monthly, and T-day (with T ranging
from 1 to 7) and showed that conceptual parameters of models of monthly and T-day
runoff are more efficiently estimated using different scales of aggregation.
An attempt to introduce a more systematic procedure in the selection of the optimal
time scale for the estimation of each parameter is made in this paper. In this direction,
simulation experiments are performed with regards to ARMA (Box and Jenkins, 1970)
and Shot Noise (Bernier, 1970) stochastic models equivalent to a simple conceptual
model of the runoff process.
In the approach by Claps and Rossi (1992) and Murrone et al. (1992), formulation of a
conceptual model for river runoff is founded on the "observation" of riverflow series over
different aggregation scales and on the knowledge of the main physical (climatic and
geologic) features of basins.
Considering Central-Southern Italy watersheds, dominated by the hydrogeological
features of Apennine mountains, distinct components can be recognized in the runoff: (1)
the contribution provided by aquifers located within large carbonate massifs, that has
over-year response time to recharge (deep groundwater runofi); (2) a component, which
is due to both overflow springs and aquifers within geological non-carbonate formations,
which usually run dry by the end of the dry season (seasonal groundwater runofi); (3)
the contribution by soil drainage, having a delay of several days with respect to
precipitation (subsurface runofi); (4) the surface runoff, having lag-time that depends on
the size of the watershed (for the rivers analyzed by the Murrone et al. (1992), this lag
ranges between a few hours to almost two days). In some cases, the deep groundwater
component is lacking, reducing runoff components to three. The snowmelt runoff in the
region considered is negligible. The above runoff components assume different
importance with respect to the time scale of aggregation, leading to conceptual models of
increasing complexity moving from the annual to the daily scale.
Bases for conceptual-stochastic model building proposed for the monthly scale
(Claps and Rossi, 1992; Claps et aI., 1993) and for the daily scale (Murrone et aI., 1992)
are essentially: (1) subsurface and groundwater systems are considered as linear
CONCEPTUALLY-BASED STREAMFLOW MODELS 423
reservoirs, with storage coefficients Kl , K2 , Ka, going from the smallest to the largest; (2)
runoff is the output of a conceptual system made up of the above reservoirs in parallel
with a zero-lag linear channel reproducing the direct runoff component; (3) when a
storage coefficient is small with respect to the time scale considered, the related
groundwater component becomes part of the direct runoff term, which is proportional to
the system input; (4) the effective rainfall, i.e. total precipitation minus
evapotranspiration, is the conceptual input to the system; this variable is not explicitly
accounted for in the models, which are univariate; (5) effective rainfall volumes
infiltrate into subsurface and groundwater systems at constant rates (recharge
coefficients Cl, C2, Ca, respectively) over time.
The main issues of model identification for annual, monthly and daily scales are
summarized below.
Annual scale
Rossi and Silvagni (1980) first supported on conceptual basis the use of the ARMA(l,l)
model for annual runoff series, based on the consideration that the correlation structure
at that scale is determined by the deep groundwater runoff component. The use of this
model for annual runoff modelling was proposed by O'Connell (1971) in virtue of its
capacity of reproducing the long-term persistence displayed by annual runoff data. Salas
and Smith (1981) showed how a conceptual system composed by a linear reservoir in
parallel with a linear channel fed by a white noise input behaves as an ARMA(1,!)
process.
Given an effective rainfall input It which infIltrates with a rate Calt and whose part
(l-ca)It goes into direct runoff, based on the hypothesis that the input is concentrated at
the beginning of the interval [t-1, t], the volume balance equations produce
where Dt is the runoff in year t. This hypothesis can be removed considering different
shapes of the within-year input function (Claps and Murrone, 1993). The hypothesis that
It is a white noise process leads to an ARMA(1,l) model
(2)
in which dt equals Dt - E[Dtl, cI> and e are the autoregressive and moving average
coefficients, respectively, and Et is the zero-mean model residual. Conceptual and
stochastic parameters in (1) and (2) are related by:
9 = cI>(1 - c3)
(1 - c3cI»
(3)
cI>-9
K3 = -1/ In(cI>); c3 = 1-9
The expression of Ca for uniform within-period distribution of input is
424 P. CLAPS AND F. MURRONE
<1>-6
(4)
(1-6) K3 (1_e- IIK3 )
The ARMA model residual is proportional to the effective rainfall by means of:
(5)
In absence of significant groundwater runoff, Rossi and Silvagni (1980) showed that
annual runoff in the hydrologic year is an independent process that follows a Box-Cox
transformation of the Normal distribution. The notion of hydrologic year, which starts at
the end of the dry season, is important because if a wet season and a dry season can be
distinguished, the absence of significant runoff in the dry season determines absence of
correlation in the hydrologic year runoff series.
Monthly scale
The assumptions recalled above on the role of the different components in streamflow
lead to consideration that correlation effects in monthly runoff are due both to long-term
persistence, due to the deep groundwater runoff, and to short-term persistence due to
the seasonal groundwater runoff. The conceptual system identified by means of these
considerations consists of two parallel linear reservoirs plus a zero-lag linear channel.
This latter accounts for the sub-monthly response components included into the direct
runoff.
The share C3It of the effective rainfall is the recharge of the over-year groundwater,
with storage coefficient Ka, while C2It is the recharge of the seasonal groundwater, with
storage coefficient K2. All Cj and Kj parameters are kept constant. Approximations
determined by the latter assumption are compensated for by parsimony in the number of
parameters and by the significance given to the characteristics of the input It,
considered as a periodic-independent process. Periodic variability of the recharge
coefficients C2 and C3 is substantially due to variability in soil moisture, which is a
product of rainfall periodic variability.
Claps and Rossi (1992) and Claps et al. (1993) have shown that volume balance
equations for the conceptual model under exam are equivalent to an ARMA(2,2)
stochastic process with periodic-independent residual (PIR-ARMA), expressed as
(6)
with dt and Et having zero mean. The formal correspondence between the stochastic and
conceptual representations is obtained through the relations:
(7)
(8)
CONCEPTUALLY -BASED STREAMFLOW MODELS 425
(9)
(11)
N(1)
D{'t) = 'LYih(t-t) (12)
N(-~)
h(s) - elK
- 0 0 e-
s/Ko + elK e- s/KI + elK e- s/K2 + elK e- s/K3
II 22 33 (13)
with s = 't-'t;.. The basin response is dermed by 8 parameters: the four storage
coefficients, Kj , and the four recharge coefficients, Cj, of which only 7 are to be estimated
given the volume continuity condition, ~=1. The Cj coefficients represent the share of
runoff produced, in average, by each component. To limit the number of parameters and
to take advantage by the linearity hypotheses, coefficients Cj and K; are considered
constant, i.e. the response function h(·) is kept constant.
The process (12) has wmite memory, which represents the current effect of
previous inputs to the system. This effect can be evaluated at a fIxed initial time, to = 0,
by knowing the groundwater runoff quota at that time. At the beginning of the
hydrological year (October 1 in our case), the seasonal and subsurface groundwater
contributions are negligible relative to the deep groundwater runoff. Therefore the value
Do of discharge at that time can be a good preliminary estimate of the groundwater
runoff amount, thus expressing (12) as:
N(t)
D(t) = Do e-t/K3 + I,Yj h(t-t j ) (14)
N(O)
The discretized form of the continuous process (14) is obtained by its integration
over the interval [(t-l)T, tTl, where t = 1,2,... is the index describing the set of sampling
instants and T is the sampling time interval. If the aggregation occurs on a T-day scale
and integration is applied according to the linearity and stationarity hypotheses, the
following discretized formulation is obtained:
t
D t -- K 3 e-tT/K3(eT/K3_1)X0 +~y.
~ t-s+1
h
s (15)
s=1
where yt represents the sum of impulses occurred during the interval [(t-l)T, tTl and
the integrated response is expressed as:
(16)
3 c. -Kj [T/K.
h = I, e +e-T/K· 2] e -T(s-I)/K·
SIT
I 1_ I
'
s>1
j=O
The function hs represents the response of the system determined by a unit volume
impulse of effective rainfall, uniformly distributed within the interval.
When the scale of aggregation T is chosen as considerably larger than the surface
runoff lag-time, the surface runoff component can be considered as the output of a zero-
lag linear channel, which has response function col)(O), with ~(.) as the Dirac delta
function. This reduces to six the number of parameters to be estimated.
The structure of daily effective precipitation has been represented as uncorrelated,
like in Poisson white noise models (Bernier, 1970) or characterizes by Markovian arrival
CONCEPTUALLY-BASED STREAMFLOW MODELS 427
process (e.g. Kron et aI., 1990) or described by models based on the arrival of clusters of
cells, such as the Neyman-Scott instantaneous pulse model (e.g. Cowpertwait and
O'Connell, 1992). The distribution considered by Murrone et al.(1992) is a Bessel
distribution, corresponding to a Poisson white noise probabilistic model.
SIMULATION STUDY
residual variance (taken as the variance of the surface runoff component in the Shot
Noise model) and (J2 indicates the variance of the synthetic runoff series.
Application
Main points to focus with the aid of simulations are: (1) In which manner the
resolution of a linear reservoir depends on the relative mean (coefficient c) of its output
with respect to total runoff? (2) Is there a preferential scale for the estimation of the
storage coefficient?
Results of parameter estimation on simulated data, reported below, suggest a
number of comments.
ARMACl,l) model
c
The following comments arise from estimation of and K through the ARMA(l,l)
model:
1. Results of parameter estimation, reported in Table 1a to 1c (referred to c=O.5,
c=O.8 and c=l, respectively), show that aggregation reduces the variance of the fraction
of input not entering the reservoir, producing higher values of the explained variance
R2. The obvious exception is the case c=l, in which there is no pure white noise
component. For the case c=l, the model fitted to the data is still the ARMA(1,l), for it is
the most general model of a single linear reservoir with generic within-period form of
the input function (Claps and Murrone, 1993). This adds information in providing an
estimate of c, obtained through (4).
2. A progressive increase in the standard error of the estimates also occurs with the
aggregation, due uniquely to the decrease in the number of data. Table 2 shows that
estimations made on the reference scale (1 t.u.) over limited samples produce standard
errors greater than the corresponding standard errors for data aggregated on 7,15 and
30t.u.
3. For cd, K is clearly underestimated. This tendency becomes more noticeable
with increasing K and with decreasing c. Values found for R2, which decreases in the
same circumstances, reflect the poor estimate of K. A tendency toward a preferential
scale for parameter estimation is not recognizable from the results shown in Table 1a
and lb.
4. More understandable results are obtained by estimating K and C on scales
aggregated in unit steps, from 1 to 15 t.u. In this regard, Figures 1-3, clearly show that
when K is much greater than the reference scale, aggregation produces better conditions
for parameter estimation. The progressive increase in K and c up to a sill (Figure 1),
give sufficient indications of this benefit. Therefore, the preferential scale, must be the
one in correspondence of which the sill is reached (7 t.u. for this case), as a trade-off
between the increase in R2 and the increase in the standard error of estimates.
Based on the results reported above, it seems that the preferential scale decreases
with increasing c and with decreasing K (in general one should speak in terms of non-
dimensional preferential scale, Le. divided by K). Figure 2, with c=O.5 and K=15
confirms this tendency showing a substantial constancy both in K and c, that would
indicate that the sill is reached at the reference scale. On the other hand, when c=l the
reference scale is the best one for estimation regardless of K, since quality of estimates
CONCEPTUALLY ·BASED STREAMFLOW MODELS 429
degrades with aggregation (see the decrease of c and R2 in Table Ie and the decrease of
c in Figure 3).
TABLE la. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.5 (scale in t.u.)
e e
A A A A
scale :t{(t.u.) C :t{(t.u.) C
K=120 K=90
1 10.49 0.227 0.909 0.884 0.002 17.01 0.322 0.943 0.917 0.004
7 104.5 0.490 0.935 0.877 0.029 89.53 0.508 0.925 0.853 0.037
15 96.18 0.508 0.856 0.727 0.063 81.30 0.524 0.832 0.677 0.078
30 83.31 0.528 0.698 0.457 0.100 72.00 0.548 0.659 0.383 0.117
60 84.12 0.538 0.490 0.171 0.111 73.20 0.553 0.441 0.102 0.117
K=60 K=30
1 29.08 0.461 0.966 0.938 0.008 22.74 0.509 0.957 0.914 0.018
7 67.72 0.522 0.902 0.805 0.051 34.42 0.513 0.816 0.657 0.074
15 61.82 0.537 0.785 0.588 0.098 36.08 0.537 0.660 0.393 0.117
30 57.12 0.565 0.592 0.272 0.135 36.30 0.559 0.438 0.092 0.128
60 57.82 0.557 0.354 0.012 0.111 36.67 0.523 0.195 -0.084 0.069
K=15 K=7
1 13.14 0.515 0.927 0.855 0.033 7.22 0.529 0.871 0.744 0.059
7 14.78 0.492 0.623 0.380 0.090 6.33 0.491 0.331 0.048 0.083
15 18.63 0.505 0.447 0.154 0.098 7.63 0.471 0.140 -0.083 0.046
30 25.99 0.483 0.315 0.041 0.078
TABLE lb. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.8 (scale in t.u.)
scale :t{(t.u.) C
A
e
A
:t{(t.u.) C
A
eA
K=120 K=90
1 83.24 0.784 0.988 0.946 0.067 71.23 0.795 0.986 0.934 0.089
7 119.9 0.799 0.943 0.746 0.273 95.48 0.806 0.929 0.683 0.317
15 120.4 0.811 0.883 0.507 0.398 96.02 0.814 0.855 0.413 0.430
30 109.8 0.816 0.761 0.164 0.456 90.78 0.820 0.719 0.068 0.462
60 109.6 0.791 0.578 -0.079 0.386 87.66 0.781 0.504 -0.141 0.350
K=60 K=30
1 55.27 0.809 0.982 0.910 0.126 29.45 0.811 0.967 0.835 0.205
7 64.34 0.806 0.897 0.565 0.369 30.44 0.796 0.795 0.289 0.414
15 67.23 0.812 0.800 0.266 0.448 33.87 0.787 0.642 0.017 0.399
30 67.80 0.812 0.643 -0.039 0.436 40.41 0.747 0.476 -0.109 0.303
60 62.46 0.750 0.383 -0.1920.273 36.44 0.654 0.193 -0.188 0.126
K=15 K=7
1 15.57 0.813 0.938 0.707 0.300 7.59 0.814 0.877 0.481 0.398
7 14.36 0.780 0.614 -0.008 0.384 6.59 0.743 0.345 -0.207 0.255
15 15.57 0.733 0.381 -0.168 0.255 5.51 0.762 0.066 -0.265 0.093
30 27.79 0.596 0.340 -0.039 0.137
430 P. CLAPS AND F. MURRONE
TABLE lc. Estimations from: ARMA(I,I) model, Gaussian input, c = 1.0 (scale in t.u.)
A A
scale I{(t.u.) C «i> ~ R2 I{(t.u.) C «i> ~ R2
K=120 K=90
1 128.8 1.000 0.992 -0.981 0.996 98.27 1.000 0.990 -0.980 0.995
7 124.1 0.985 0.945 -0.301 0.937 94.22 0.980 0.928 -0.300 0.917
15 132.7 0.968 0.893 -0.260 0.863 99.47 0.957 0.860 -0.258 0.823
30 135.2 0.938 0.801 -0.252 0.752 104.7 0.920 0.751 -0.242 0.691
60 115.7 0.878 0.596 -0.285 0.546 89.46 0.848 0.511 -0.277 0.458
K=60 K=30
1 66.11 1.000 0.985 -0.980 0.993 32.74 1.000 0.970 -0.977 0.985
7 62.92 0.971 0.895 -0.299 0.878 30.81 0.943 0.797 -0.299 0.767
15 65.88 0.937 0.796 -0.255 0.747 32.13 0.879 0.627 -0.253 0.555
30 73.47 0.884 0.665 -0.221 0.581 41.82 0.781 0.488 -0.158 0.348
60 63.00 0.791 0.386 -0.253 0.326 36.34 0.663 0.192 -0.196 0.134
K=15 K=7
1 16.14 1.000 0.940 -0.969 0.969 7.46 1.000 0.875 -0.947 0.933
7 14.97 0.893 0.627 -0.303 0.585 6.82 0.812 0.358 -0.304 0.330
15 15.08 0.793 0.370 -0.266 0.309 4.98 0.858 0.049 -0.304 0.103
30 29.16 0.602 0.358 -0.030 0.144
The fust consideration arising from the observation of Tables 3-4 and Figure 4 is that
aggregation has quite different effects on the Shot Noise model estimates than for the
ARMA model. With the Shot Noise model there are no evident benefits arising from
aggregation, since best estimates are always obtained at the reference scale. The
increasing bias of the estimated values of both parameters with aggregation does not
leave much room for other considerations.
This outcome could be due to the alterations that aggregation induces in the
impulse occurrence and intensity and reflects some peculiar characters of this class of
models. The positive aspect of this behavior is that even large storage constants can be
identified (with some bias) at the reference scale.
CONCEPTUALLY-BASED STREAMFLOW MODELS 431
0.6
0.5 .
<.>
!l
I
0.4
0.3
0.2
0 4 6 8 10 12 14 16
aggregation scale (t.u.)
150
~ 100 .
i
~:J 50
0
0 2 4 6 8 10 12 14 16
aggregation scale (tu.)
Figure 1. Arma(1,l) model: Parameter estimates on aggregated data
(c=O.5, K=120 t.u.).
0.54r----,.---..---~--,..----___,_--__._--_,_-___,
<.>
0.52
i
~ 0.5·
:J 0.48
O.46L-----'-----'---.L-----'----'-----'---":----'
o ~
aggregation scale (l.u.)
30
25 •••••• , ••••••••••••••• _.! ••••••••••••••~ •••••••••••••• ,"
~
'0 20 .......... .. _....
~ 15
il
10
~L-----'-2--~4~-~6--~8~--1~0--~12~-~1~4-~16
0.99 ...
u
J" 0.98
0.97 .........
0.96
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)
135
130
~
120
115
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)
TABLE 3. Estimations from: Shot Noise model, Bessel input, conceptual model of one
reservoir with a linear channel (K=60 t.u.)
c=0.5 c=0.8 c = 1.0
scale
A
C it R2 A
C it R2 C
A
:It R2
1 0.532 48.52 0.026 0.813 55.19 0.092 0.993 66.39 0.933
7 0.704 49.54 0.210 0.872 77.68 0.386 0.969 110.56 0.796
15 0.756 82.36 0.242 0.896 112.24 0.474 0.953 172.23 0.707
30 0.837 214.76 0.263 0.912 261.86 0.419 0.942 229.29 0.613
TABLE 4. Estimations from: Shot Noise model, Bessel input, (c=O.5, K=120 t.u.)
c=O.5
scale
A
C it R2
1 0.533 87.11 0.018
7 0.695 69.35 0.158
15 0.766 162.39 0.175
30 0.837 339.92 0.216
CONCEPTUALLY-BASED STREAMFLOW MODELS 433
0.8
u
0.7
~
.§'"
~ 0.6
0.5
0 4 6 8 10 12 14 16
aggregation scale (t.u.)
100
80
:.:
~ 60
.'E"~"
40
20
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)
FINAL REMARKS
REFERENCES
Bartolini, P. and J.D. Salas (1993) "Modeling of streamflow processes at different time
scales", Water Resour. Res., 29 (8),2573-2587.
Benjamin, J.R. and C.A. Cornell (1970) Probability, Statistics and Decision for Civil
Engineers, Mc. Graw Hill Book Co., New York.
434 P. CLAPS AND F. MURRONE
Box, G.E. and G. Jenkins (1970). Time Series Analysis, Forecasting and Control. Holden-
Day, San Francisco (Revised Edition 1976).
Bernier, J. (1970) "Inventaire des Modeles des Processus Stochastiques applicables ala
Description des Debits Joumaliers des Rivieres", Rev. Int. Stat. Inst., 38, 49-61.
Claps,P. e F. Rossi (1992) "A conceptually-based ARMA model for monthly streamflows",
in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92, Proc. of the Sixth IAHR IntI.
Symp. on Stochastic Hydraulics, Dept. of Civil Engrg., NTU, Taipei (Taiwan), 817-824.
Claps, P. (1992) "Sulla validazione di un modello dei deflussi a base concettuale", Proc.
XXIII Conf. Hydraul. and Hydraul. Struct., Dept. Civil Eng., Univ. of Florence, D.91-
D.102.
Claps, P. and F. Murrone (1993) "Univariate conceptual-stochastic models for spring
runoff simulation", in M.H. Hamza (Ed.) Modelling and Simulation, Proc. of the XXIV
lASTED Annual Pittsburgh Conference, May 10 - 12, 1993, Pittsburgh, USA, 491-494.
Claps, P., F. Rossi and C. Vitale (1993) "Conceptual-stochastic modeling of seasonal
runoff using Autoregressive Moving Average models and different scales of
aggregation", Water Resour. Res., 29(8), 2545-2559.
Cowpertwait, P.S.P. and P.E. O'Connell (1992) "A Neyman Scott shot noise model for the
generation of daily streamflow time series", in J.P. O'Kane (Ed.) Advances in Theoretical
Hydrology, Part A, chapter 6, Elsevier, The Netherlands.
Jakeman, A.J. and G.M. Hornberger (1993) "How much complexity is warranted in a
rainfall-runoff model?", Water Resour. Res., 29 (8), 2637-2649.
Kavvas, M.L., L.J. Cote and J.W. Delleur (1977) "Time resolution of the hydrologic time-
series models", Journal of Hydrology, 32, 347-361.
Kron W, Plate E.J. and Ihringer J. (1990) "A Model for the generation of simultaneous
daily discharges of two rivers at their point of confluence", Stochastic Hydrol. and
Hydraul., (4), 255-276.
Obeysekera, J.T.B. and J.D. Salas (1986) ''Modeling of aggregated hydrologic time
series", Journal of Hydrology, 86, 197-219.
O'Connell, P.E. (1971) "A simple stochastic modeling of Hurst's law", Int. Symp. on
Math. Models in Hydrology, Int. Ass. Hydrol. Sci. Warsaw.
Murrone, F., F. Rossi and P. Claps (1992). "A conceptually-based multiple shot noise
model for daily streamflows", in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92,
Proc. of the Sixth IAHR Inti. Symp. on Stochastic Hydraulics, Dept. of Civil Engrg.,
NTU, Taipei (Taiwan R.O.C.), 857-864.
Rossi,F. and G. Silvagni (1980). "Analysis of annual runoff series", Proc. Third IAHR Int.
Symp. on Stochastic Hydraulics, Tokio, A-18(1-12).
Salas, J.D. and R.A. Smith (1981) "Physical basis of stochastic models of annual flows",
Water Resour. Res., 17(2),428-430.
Salas J.D., Delleur J.W., Yevjevic V. and Lane W.L. (1980). Applied Modeling of
Hydrologic Time Series. Water Resources Publications, Littleton, Colorado.
Vecchia, A.V., J.T.B.Obeysekera, J.D. Salas and D.C. Boos (1983) "Aggregation and
estimation oflow-order periodic ARMA models", Water Resour. Res., 19(5),1297-1306.
Acknowledgments
This work was supported by funds granted by the Italian National Research Council -
Group for Prevention from Hydrogeological Disasters, grant no. 91.02603.42.
ON IDENTIFICATION OF CASCADE SYSTEMS BY
NONPARAMETRIC TECHNIQUES WITH APPLICATIONS TO
POLLUTION SPREAD MODELING IN RIVER SYSTEMS
A. KRZYZAK
Department of Computer Science
Concordia University
1455 de Maisonneuve Blvd. West
Montreal, Quebec, Canada H3G 1M8
INTRODUCTION
z z 2 z
1 n
1
x~ Sl
xl .1 S2 ~2 Sn
xn
1
~ ~21 ~n
u1
Ml M2 Mn
1 1
by our kernel estimate includes the class of all Borel measurable functions. This
class is too large to be finitely parameterized: therefore a nonparametric approach
is chosen in this paper. Standard nonlinearities such as dead-zone limiters, hard-
limiters and quantizers are included in the class of nonlinearities we can identify
by our techniques.
Remark 1 If the whole function 'Il is invertible then 'Ilt ='Il- l , where 'Il- l is
an inverse of 'Il. To clarify the stated results consider the following example. Let
d = 1 and 'Ill(u) = u 2 for - 1 ~ U ~ 1 and 'Ill(u) = exp(u - 1) for U ~ 1.
Then 'Ilt(u) = -v'U for - 1 ~ U ~ 0, 'Ilt(u) = v'U for 0 ~ U ~ 1 and
'Ilt(u) = In(u + 1) for u ~ 1.
In order to estimate the optimal model (2) we assume that we have a se-
quence of independent observations (Xt, Yi, Z1),"', (Xn' Yn, Zn) of the random
vector (X, Y, Z) and X has distribution J.L. We apply the kernel regression esti-
mate to recover cI> and a combination of kernel regression estimate and regression
inverse estimate to recover M 2 •
IDENTIFICATION OF SUBSYSTEM S1
hn -t 0
n(s-1)/8h~/logn - t 00 (7)
then
I1>n(x) -t l1>(x) almost surely as n - t 00
Remark 2 Convergence in Theorem 2 takes place for all input distributions (that
is for ones having density or discrete or singular or any combinations of afore-
mentioned) and we impose no restrictions on regression function 11>. Examples of
kernels satisfying (6) are listed below
a) rectangular kernel
I«x) = {10/2 for IIxll S; 1
otherwise
b) triangular kernel
I«x) = { 1 -lIxll for IIxli. S; 1
o otherwIse
c) Gaussian kernel
1
I«x) = rn=exp{( -1/2)lIxIl 2 }
v21r
d) de la Valee-Poussin kernel
I«X) = {
I (8in(X/2))2
2". x/2
if x =J 0
1/21r if x = O.
Theorem 3 Let ess supx E {IIYW IX = x} < 00. Suppose that there exist positive
constants r, a, b, r S; b such that
(8)
where A is a compact subset of Rd , SX,T is the ball with radius r centered at x and
). is the Lebesgue measure. Let I< assume finitely many k values. If
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 441
hn -t 0 (9)
nh~jlog n - t 00 (10)
as n - t 00, then
esssup II<l>n(X) - <l>(X)II-t 0 almost surely (ll)
A
as n - t 00.
If K is a bounded kernel satisfying (6) then (ll) follows provided that condition
(10) is replaced by
nh!d / log n -t 00. (12)
Remark 3 Hypothesis (ll) follows under condition (10) if K is for example a
window kernel. Essential supremum is taken with respect to the distribution of
the input. Notice that we do not assume that X and Y have densities. Condition
(8) guarantees that there is sufficient mass of the distribution in the neighborhood
of point x. The condition imposes restrictions on fl and on the shape of A. It
also implies that A is contained in the support of fl, that is the set of all x such
that fl(Sx,T) > 0 for all positive r. If X has density f then (8) is equivalent to
ess infxEA f( x) > O.
Proof of Theorem 3. Define Y = E{YI{lYI:::>nl/2}} and <l>(x) = E{YIX = x}. Let
also Kh(x) = K(x/h). Clearly
<l>(x) - <l>n(x) = l:i=l (Y; - Y;)Kh(x - X;)/ l:i=l Kh(x - Xi)
+ l:i=l (Y; - ~(Xi))Kh(X - X i)/ l:i=l Kh(x - Xi)
+(l:i=l ~(X;) - <l>(X))Kh(X - X i)/ l:i=l Kh(X - Xi)
=I+II+III.
By the moment assumption I = 0 a.s. for n large enough. By the result of
Devroye (1978, p. 183) there exist finite constants C3, C4 andc5 such that for hn
small enough
n
P{esssup IIIII > €}:::; P{in(~=Kh(x - Xi) < C3nh:} :::; c4h;;-dexp(-C5nh:).
A A i=l
Term I I needs special care. Suppose that K takes finitely many different values
al,"', ak. Vector [K((x - Xd/h),···, K((x - Xn)/h)] as a function of x can take
at most (2n )c(d)k values contrary to the intuitive number kn (Devroye 1978, Theo-
rem 1). Constant c(d) is equal to the VC dimension of the class of d-dimensional
balls. We thus obtain
P{esssuPA IIII > €}:::; P{(Xl"",Xn) f/. B}
+E{IB(XI,"', Xn)ess sup(X1 , ••• ,xn )(2ny(d)k SUPAjEA
,P{I l:i=l (Y; - m(Xi))aj;f l:i=l ajd ~ €IX l , " ' , Xn} (13)
442 A.KRZYZAK
sup If(xI, ... ,Xi, ... Xn) - f(xl! ... ,x~, ... xn)1 ~ c;, 1 ~ i ~ n.
xi,xi
Then
n
P{lf(XI,'" ,Xn) - Ef(XI,'" ,Xn)1 ;::: f} ~ 2exp(-2f2 / L:c~).
i=l
Using this inequality we obtain for the second probability in (13)
where the last inequality follows from the fact that on set B
n
L: aji/ m~x aji;::: const nh~.
i=l •
So the second term on the rhs of (13) is not larger than
2(2ny(d)kexp( -Csnh~).
Theorem 3 follows from (10) and the Borel-Cantelli lemma. In case when K
assumes infinitely many values we can use the stepwise approximation of K and
obtain an upper bound for (13)
ESTIMATE OF \]1
As we shall see later we would need a consistent estimate of <I>-1 in order to identify
M 2 • The estimate of <I>-1 will be derived from <I>n. Since <I>n may not be invertible
even when <I> is we need to define a psudoinverse.
Definition 1 Let </> : X -- Y where X is a complete space and let s = infxEx II</>(x)-
yllfor some y E y. A function </>+ : Y -- X is called a pseudoinverse of </> if for
any y E y, </>+(y) is equal to any x E X in the set
00 00
Remark 4 Since cl(An) are closed and nonempty and X is complete set then set
A(y) is nonempty and </>+ is well defined. If </> is continuous and X is compact then
</>+(y) is equal to any x* such that
min
",EX
II</>(x) - yll = II</>(x*) - yll
If </> is invertible then </>+ coincides with </>-1. The pseudoinverse depends on the
norm.
Two versions of </>+ will be useful in applications in the case of a scalar function
</>+(y) = YEA(y)
inf y (16)
and
</>+(y) = sup y. (17)
yEA(y)
then
IDENTIFICATION OF SUBSYSTEM S2
(18)
and
xn-x
as n - 00, then
Theorem 5 Assume that <.I> and \II are continuous on A and esssupx E{IIYWIX} <
00,
esssupx E{IIZWIX} < 00. If [{ assumes finitely values and (8-10) hold then
(19)
xn Sn Yn
cp r '"'\ wn
{k;}
\....J
(20)
where Xn is Rd-valued stationary white noise with distribution f.L and ~n is a
stationary white noise with zero mean and finite variance at.
No correlation is
assumed between ~n and X n • Assume for simplicity that 'IjJ is a scalar function.
The linear dynamic subsystem is described by the ARMA model (assumed to be
stable):
where 1 is unknown order of the system and Sn is its noise-free output. The linear
subsystem can also be described by state equations.
(21)
Yn = EkjWn - j (23)
j=O
446 A.KRZYZAK
TJn
un rn tn Zn
{b i } f 1\ ?jJ
\..L/
where 10 = e -I 0, Ii = f Bi-1e, i = 1,2, ... and E~o 11il < 00 guarantees asymptotic
stability of the linear subsystem. It can be shown that estimation techniques for
inverses of regression functions from Section 5 are be applicable to Wiener system
identification (see Greblicki (1992) and Krzyzak (1993)). Wiener and Hammer-
stein systems can be combined into a cascade of memoryless nonlinear systems
interconnected with dynamic linear components. Such models are very general
but still simple enough to obtain identification algorithms.
CONCLUSION
ACKNOWLEDGEMENTS
This research was sponsored by NSERC grant A0270 and FCAR grant EQ
2904.
REFERENCES
PEGRAMGGS
Department of Civil Engineering
University of Natal
King George V Avenue
4001 Durban, South Africa
Water Resource Systems in many parts of the world rely almost exclusively on surface
water. Streamflow records are however woefully short, patchy and error-prone, therefore
a major effort needs to be put into the cleansing, repair and possible extension of the
streamflow data-base. Monthly streamflow records display an appreciable amount of serial
correlation, due mainly to the effects of storage in the catchment areas, both surface and
subsurface. A linear state-space model of the rainfall-runoff process has been developed
with the missing data and the parameters of the model being estimated by a combination of
the Kalman Filter and the EM algorithm. Model selection and outlier detection were then
achieved by recursively calculating deleted residuals and developing a cross-validation
statistic that exploits the Kalman filtering equations. The method used here that relates
several streamflow records to each other and then uses some appropriate rainfall records
to increase the available information set in order to facilitate data report, can be developed
if one recasts the above models in a state space framework.
Experience with real data sets shows that transformation and standardization are not
always necessary to obtain good patching. "Good" in this context is defined by the cross-
validation statistic derived from the deleted residuals. These in tum are a fair indicator of
which data may be in error compared to the remainder as a result of them being identified
as possible outliers. Examples of data patching and outlier detection are presented using
data from a river basin in Southern Africa.
INTRODUCTION
Water resource system designs depend heavily on the accuracy and availability of
hydrological data in addition to economic and demand data; the latter being more difficult
to obtain. However system reliability is becoming more commonly based on simulation
for its assessment.
449
K. w. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 449-457.
© 1994 Kluwer Academic Publishers.
450 G. G. S. PEGRAM
In order to simulate one has to have data to mimic and when records of streamflow in
particular are short or patchy, the reliability of the system as a whole can be severely called
into question. In southern Africa there has been a strong emphasis on water resource
system development and analysis since the savage drought of the early eighties and again
in the early nineties until late 1993. The Lesotho Highlands Scheme for hydro-electric
generation for Lesotho with a spin-off of water supply bought by the Republic of South
Africa bears testimony to the expense and ingenuity required to fuel major economic
growth in the region. Part of the water resources analysis programme involved the
patching of rainfall data in Lesotho to enable streamflow records to be constructed and in
addition the lengthening and repair of existing streamflow records in South Africa. The
methodology discussed in this paper was developed to support this endeavour.
Streamflow records are strongly correlated to records of streams which flow in their
vicinity and less strongly correlated to rainfall records where the rain gauges are located
on or around the appropriate catchments. This is especially so with short-term daily
flows but is somewhat better for the monthly data typically used in water resources
studies, which primarily involve over-year storage.
Where data are missing in a streamflow record it is tempting to use other streamflow and
raingauge records to infill the missing data. There are several difficulties which arise when
one uses conventional regression techniques. The first and most serious is that there are
frequently pieces of data missing in the records of the control stations being used to infill
the record of the target station. These gaps may and often do occur concurrently. A
conventional regression requires that special steps be taken such as re-estimating the
regression for the reduced set by best sub-set selection procedures etc. or alternatively,
abandoning the attempt to infill or patch. A second difficulty arises from the strong
dependence structure in time, due to of catchment storage, making standard regression
difficult to apply to the untransformed flows. The third problem is one of seasonality
which violates the assumption of homoscedasticity. A fourth is the problem of non-
linearity of the processes. These problems were addressed by Pegram (1986) but the
difficulty of the concurrent missing data was not overcome in that treatment. Most other
methods of infilling are defeated by the concurrently missing data problem.
estimated by the Kalman Filter to estimate parameters in the missing data context and also
as a bi-product to estimate the missing data. This procedure will be referred to as the
EMKF algorithm in this paper.
It was important to ascertain what characteristics of the rainfall runoff process producing
monthly streamflow totals could be maintained without sacrificing the validity of the
EMKF approach. Specifically one is concerned about the seasonal variation of the process
especially in a semi-arid climate where the skewness of the flows during the wet season
and dry season tend to be quite different, as do the mean and variance. The state-space
model lends itself to modelling the dependence structure we expect to be in streamflow.
Whether the linear aspect of the model is violated can only be tested by examining the
residuals of the fitted regression model. Thus the EMKF algorithm has promise in
addressing the four difficulties associated with patching monthly streamflow referred to
above - the concurrently missing data, the seasonality, the dependence and the non-
linearity.
The EMKF algorithm of Shumway and Stoffer (1983) provides estimates of the
parameters and the missing data and some idea of the variance of the state variables.
There still remains the problem facing all modellers as to which records to include in the
regression and which model to fit where model is defined by the number of lags in time
and the paramaterization. The Ale is a possible answer but presents considerable
difficulties in the context of the Kalman Filter when data are missing. An alternative was
suggested by Murray (1990) using a method of cross-validation for model selection. This
technique has the added advantage that the cross-validation calculation reduces a so-called
deletion residual which gives estimates of how good the intact data are in relation to the
model, and flags possible outliers for attention or removal. The methodology is fully
described with examples in Pegram and Murray (1993).
To demonstrate the efficacy of this approach it was decided to perform the following
experiment:
Twenty-eight years (1955 to 1983 inclusive) of the streamflow record into the Vaal Dam
in South Africa were selected for the experiment together with flow into an upstream dam
- Grootdraai - and six rainfall gauges scattered around and in the catchment. Three years
of the Vaal Dam flows (1964 to 66) were hidden and re-estimated using the EMKF
452 G. G. S. PEGRAM
In Figure 4 are shown the deleted residuals for the 36 months during 1974 to 1976 which
were years which most closely corresponded to the "missing" years (deleted residuals are
not estimated for missing data but only for intact data). One of the reasons why this
section of data was used, is because it includes the largest flow on record - that in
February 1974 - which was 2 200 units (millions of cubic metres). The value concerned is
shown as a very large deleted residual skewing the regression above the line. This
suggests that the nonlinearities have not been satisfactorily handled, however the obvious
choice of log-transformation does not eliminate the problem but raises others.
CONCLUSIONS
A methodology has been suggested in this paper which should provide a useful tool for the
water resources practitioner in the search for better ways of repairing and extending
streamflow data.
ACKNOWLEDGEMENTS
Thanks are due to the Department of Water Affairs and Forestry, Pretoria, South Africa,
for sponsoring this work and for giving permission to publish it. The software to
accomplish the streamflow patching (PATCHS) and the manual to go with it are available
from them at nominal cost.
'"0
~
:I:
~
Comparison - Series 1/Recorded (1964-6)
(Series 1 =Standardized) ><
1400
I
V,l
;J
1200
I f Record~. Series 1
.! 1000
0
I'CII
I
~
;;
.... 800
~
it
>-
:E 600
c
-0
:E 400
200
Months
Figure 1. Comparison of Patched with hidden recorded flows for Vaa1 Dam during 1964-66. The flows are patched using the ~
...,
fully standardised flows in both target and controls.
Vt
"""
"""
Comparison - Series 21Recorded (1964-6)
(Series 2 =Scaled)
1400
1200
r~ - Recorded - . --Sen~ 2I
!! 1000
~~ 800
o
u:::
~600
:;
c
o
:e 400
200
Months
p
P
t:I>
Figure 2. Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the
scaled (unshifted but divided by monthly standard deviation) flows in both target and controls. ~
>
~
~
g
Comparison - Series 3/Recorded (1964-6) ~
(Series 3 = Untransfonned)
1400
en
1200
~
;;3
[ milll __ Recorded • Series 3
It 1000
'ii
0
I-
- 800
~
I~
u::
>- 600
:E
C
0
~ 400
200
Months
Figure 3. Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the
untransfonned flows in both target and controls.
.l>-
v.
v.
~
0\
10
8 •
II)
7i 6
"';j=
II .
D::
11
Ii •
~ 2+ ,
'i
III
> 0
. - ...... - 600 1000 1600 2000 2600
·2 + ....
• •
-4
~
PATCHING MONTHLY STREAMFLOW DATA 457
REFERENCES
Dempster, A. P., N.M. Laird, and D.B. Rubin, (1977) "Maximum likelihood from
incomplete data via the EM algorithm", 1. of the Royal Statist. Soc., Ser. B, 39, 1-38.
Murray, M. (1990) "A Dynamic Linear Model for Estimating the Term Structure of
Interest Rates in South Africa", Unpublished Ph.D. thesis in Mathematical Statistics,
University of Natal, Durban.
Pegram, G. G. S. and M. Murray, (1993) "Cross-validation with the Kalman Filter and
EM algorithm for patching missing streamflow data", Resubmitted to Journal of American
Statistical Association in January.
Shumway, R. H. and D. S. Stoffer (1982) "An approach to time series smoothing and
forecasting using the EM algorithm", Journal of Time Series Analysis, Vol. 3,253-264.
RUNOFF ANALYSIS BY THE QUASI CHANNEL NETWORK KDEL
IN THE 'l'OYOHIRA RIVER BASIN
H.SAGA
Dept.of civil Eng.,Hokkai-Gakuen Univ.,Chuo-ku,Sapporo 064,Japan
T • NISHIMURA
Hokkaido E.P.Co.,Toyohira-ku,Sapporo 004,Japan
M.FUJITA
Dept.of civil Eng.,Hokkaido Univ.,Kita-ku,Sapporo 060,Japan
INTRODUCTION
This paper describes runoff analysis using the quasi channel network
model of the Misumai experimental basin, which is part of the Toyohira
River basin. The Toyohira River flows through Sapporo which has a popu-
lation of 1.7 million. Four multi-purpose dams are located in the TOYo-
hira River basin; thus, it is very important to verify the runoff
process not only analytically but also based on field observations.
In this study, a quasi channel network model and a three-cascade tank
model were adopted as runoff models. Both provide for treatment of the
spatial distribution of rainfall.
RUNOFF MODEL
/
mesh-size is 250x250nl. Each
Misuni
link of this model has an inde-
pendent sub-catchment area. The 100 Experilental
tank model transforrndng rainfall Basin
into discharge in the sub- +
catchment is adopted because 80
this model includes the mecha-
nism of rainfall loss. The
three-cascade tank model is 80
shown in Figure 3, and the state
variables and parameters of this 40
model are defined in the next
section. The data from the five
raingauges indicate that the 20
distribution of rainfall inten-
sity generally depends on alti-
tude. Figure 4 shows the o 200 400 BOOAltitudeC.)
distributions of 9 rainfall
events observed in 1989. In a FiguEe 4. Distribution of rainfall.
large-scale rainfall, the amount
of rainfall is proportional to
the altitude, though for a small-scale event, the amount is independent
of altitude.
The combination of a quasi channel network model and tank model ap-
peared to be effective for treating the spatial distribution of rainfall
because a quasi channel network model can treat the rainfall data ob-
served by five raingauges as multi-inputs. The estimation of the veloc-
ity of propagation of a flood wave is one of the problems that must be
solved in order to apply this model to practical cases of runoff analy-
sis. In this paper, the velocity Vis assumed to be spatially constant
along the channel, and not vary with time. Figures 5 and 6 show the
QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN 461
r(II/IO.in)
<:> 0
~IVW"I'T
No.1 No.1
l:r=1l3.. l:r=51.5 ••
E. L=750. E. L=750.
co co
0 0
T
No.2 No.2
l:r=97.5 .. l:r=S3.S ..
E.L=600. E. L=600.
rr
co
0 0
U1U1Ua1
No.3 No.3
l:r=94 •• l:r=46.511
E. L=510. E. L=51O.
rr
co
0 0
lllllJlIU
,..
No.4 No.4
..... l:r=88 .. l:r=43 ..
E.L=460. E. L=460.
<0
0
0
No.5 No.5
l:r=84.S.. l:r=49.S ..
E. L=390. E. L=390.
q(II/IOlin) q(u/lO.in)
V=2.0./sec
V=2.0./sec
U) Obs er ved
o
o Calculated
20 t(hr) o 20 t(hr)
hyetographs and hydrograph for a heavy storm and weak rainfall respec-
tively. In this calculation, V was assumed to be 2.0 mlsec and the tank
model parameters were set by trial and error. In spite of using the same
values for the tank parameters and propagation velocity, the calculated
results were in good agreement with the observed ones.
d;1 ==/1 "" r(/) - qluY(Cl,hlu) - QlmY(Cl, hIm) - QllY(Cl,hll) - il (I) (1)
where, Qi =ai x (C1 -hi) ,ij '" bj x Cj, Y(C,h) ... k{tan- C ;h + ~}, (0 <
1 £ <)
Y(C,h) is a Heviside function and £ '" 10-6
The unknown parameters are identified at each time that the discharge is
observed.
(6)
°[jcl"""""""
: measurement error i i
MODEL VERIFICATION ~
"<>"
from the true values; The symbol
denotes the discharge cal- 10 20 t(hr)
culated by using the initial Figure 7.Results of calculation using
values. The identification E.K.F.
464 H. SAGA ET AL.
~r---------------------------~ 00
0
o 81u bI
0
------- -
0r---------------------------~ '-
(0. 04)
(0. 01)
t
0
"<I'
~,---------------------------~ 0
o 81m
0 b2
01- - -
--
(o.OO})
(0. o-1 ) --=
0
CO
0
b3
0
- - --(6'-:-0·05) -------
0
1--
0
00
0
82
0 :Parameters after the first pass
:Parameters after the scond pass
I- \ :True value of parameters
'-
(0.003)
0
LO
0 83
CONCLUSION
A model based on the combination of a quasi channel network model and
tank model has been shown to be an effective model for treating the spa-
tial distribution of rainfall. The estimates of the unknown parameters
of the model converge well toward to the true values in successive,
automatic iterations of an Extended Kalman Filter.
466 H. SAGA ET AL.
No. I
Lr=39.5u
E.L=75011
No.2
L r=42. 5..
E.L=600.
No.3
L r=45. 5u
E. L=510.
No.4
Lr=48.5111
E.L=46011
No.5
L r=43. 5..
E.L=390.
co
q (l1li11 011 in)
V=3.01/sec
Calculated
Observed
U")
00
XJ-..J.....L--l-I I I I I I I I I I LLLI I I I I I I I
o 10 20 t(hr)
REFERENCE
A K
Alpaslan, N. 135,177 Kapur, J. N. 149
Kesavan, H. K. 149
B Kojiri, T. 363
Baciu, G. 149 Krzyzak, A. 435
Bardossy, A. 19
Bass, B. 33 L
Benzaquen, F. 99 Lachtermacher, G. 229
Bobee, B. 381 Lai, L. 99
Bodo, B. A. 271,285 Lall, U. 47, 301
Bogardi, I. 19 Lee, T.-Y. 87
Bosworth, B. 301 Lennox, W. C. 77
Bruneau, P. 381 Lettenmaier, D. P. 3
Lewandowski, A. 333
C Liu, C.-L. 87
Chili, C.-L. 121
Claps, P. 421 M
Cohen, J. 33 Matsubayashi, U. 347
Corbu, I. 99 McLeod, A. I. 245, 271
Mesa, O. J. 409
D Murrone, F. 421
Dillon P. J. 285 Muster, H. 19
Druce, D. J. 63
Duckstein, L. 19 N
Nishimura, T. 459
F
Fujita, M. 205, 459 p
Fuller, J. D. 229 Panu, U. S. 191,363
Pegram, G. G. S. 449
G Penn, R. 99
Goodier, C. 191 Pereira, B. de B. 105
Perreault, L. 381
H Perron, H. 381
Harmancioglu, N. B. 135, 163 Phatarfod, R. M. 395
Hashimoto, N. 205 Poveda, G. 409
Hayashi, S. 347
Hipel. K. W. 245,271 R
Huang, G. 33 Rajagopalan, B. 47, 317
469
470 AUTHOR INDEX
S
Saga, H. 459
Sales, P. R. H. 105
Satagopan, J. 317
Singh, V. P. 135
Srikanthan, R. 395
T
Takagi, F. 347
Tao, T. 77,99
Tarboton, G. 47
U
Unny, T. E. 363
V
Vieira, A. M. 105
W
Watanabe, H. 217
y
Yamada R. 217
Yin, Y. 33
Yu, P.-S. 87
Z
Zhang, S. P. 217
Zhu, M.-L. 205
SUBJECT INDEX
A C
Abelson-Tukey trend test 245, 248 Canada - U.S. Great Lakes Water Quality
Akaike information criterion 369, 380, 451 Agreement 272, 282
aggregation of data 422 Central Southern Italy watersheds 422
Aras river basin, Eastern Turkey 168, 171 change of information 175
atmospheric variables 3 circulation patterns (CPs) and classifica-
autocorrelation 279-280 tions 19-21
autocorrelation function (ACF) 349, 352, circular pipe 121, 124-129
358,264,267 Clearwater Lake, Ontario 285
autoregressive (AR) 340-341 climatic change 1
autoregressive integrated moving average clustering/clustering analysis 194,196,363
(ARIMA) 223 Cold Lake, Alberta 38, 44
autoregressive moving average (ARMA) 106- Columbia River Basin, Washington and
107,110,116,230,234,239,334- Oregon 317,322-323
conceptual-stochastic models 422-427, 434
335, 340, 342, 422, 424, 427-431,
434,445-446 convergence of identification algorithms
436,440,448
autoregressive moving average exogenous
cost effectiveness of monitoring network
(ARMAX) 106-107, 111, ll6
138, 140, 142-143, 147
auto-series (AS) model 193-194
cross series (CS) model 193, 195
cross validation 451
B cumulative periodogram 263, 265, 266, 268
backfitting 276 curse of dimensionality 301
'back-of-the-envelope' methods 396
back propagation 229, 231-232 D
backward propagation 207-208 DAD analysis 350,354,358
data vs. information 137, 141, 147, 164
Bayes/Bayesian 11-12, 149-155, 161
de-acidification trends 285
Bayesian decision theory 142
decision-making 80-81
bin filter 276
decision support system for time series anal-
bode plot 338-340
ysis 263
Box-Jenkins model identification 334
dynamic efficiency index (DEI) 179, 187
Box-Jenkins models 229-231, 237
Box-Cox transformation llO, ll6 E
Brillinger trend test 245-248 East River watershed, China 82-83
British Columbia (B.C.) Hydro 63, 65-66, efficiency of water and wastewater treat-
68-69,72 ment plants 177-178
Butternut Creek, N.Y. 205, 215-216 ELETROBRAS 105, 109, ll6
471
472 SUBJECT INDEX
H M
hammersten systems 436, 445-447 Mann-Kendall trend test 245,248,249
SUBJECT INDEX 473
T
thermodynamic efficiency 179, 185, 187
Thames river at Thamesville 368
three-cascade tank model 460
transfer function models 88
trend analysis 243
two-reservoir system 78, 82
U
uncertainties 121
univariate autoregressive moving average
(Univariate ARMA) 110
universal law 125, 128
user interface 99
V
Vaal Dam, South Africa 451
variogram fitting 319
velocity distribution 121-125, 129-131, 133
Water Science and Technology Library
1. AS. Eikum and R.W. Seabloom (eds.): Alternative Wastewater Treatment.
Low-Cost Small Systems, Research and Development. Proceedings of the
Conference held in Oslo, Norway (7-10 September 1981). 1982
ISBN 90-277-1430-4
2. W. Brutsaert and G.H. Jirka (eds.): Gas Transfer at Water SUifaces. 1984
ISBN 90-277-1697-8
3. D.A Kraijenhoff and J.R. Moll (eds.): River Flow Modelling and Forecasting.
1986 ISBN 90-277-2082-7
4. World Meteorological Organization (ed.): Microprocessors in Operational
Hydrology. Proceedings of a Conference held in Geneva (4-5 September
1984). 1986 ISBN 90-277-2156-4
5. J. Nemec: Hydrological Forecasting. Design and Operation of Hydrological
Forecasting Systems. 1986 ISBN 90-277-2259-5
6. V.K. Gupta, I. Rodriguez-Iturbe and E.F. Wood (eds.): Scale Problems in
Hydrology. Runoff Generation and Basin Response. 1986
ISBN 90-277-2258-7
7. D.C. Major and H.E. Schwarz: Large-Scale Regional Water Resources
Planning. The North Atlantic Regional Study. 1990 ISBN 0-7923-0711-9
8. W.H. Hager: Energy Dissipators and Hydraulic Jump. 1992
ISBN 0-7923-1508-1
9. V.P. Singh and M. Fiorentino (eds.): Entropy and Energy Dissipation in Water
Resources. 1992 ISBN 0-7923-1696-7
10. K.W. Hipel (ed.): Stochastic and Statistical Methods in Hydrology and
Environmental Engineering. A Four Volume Work Resulting from the
International Conference in Honour of Professor T. E. Dnny (21-23 June
1993). 1994
1011: Extreme values: floods and droughts ISBN 0-7923-2756-X
1012: Stochastic and statistical modelling with groundwater and surface water
applications ISBN 0-7923-2757-8
10/3: Time series analysis in hydrology and environmental engineering
ISBN 0-7923-2758-6
10/4: Effective environmental management for sustainable development
ISBN 0-7923-2759-4
Set 10/1-10/4: ISBN 0-7923-2760-8
11. S. N. Rodionov: Global and Regional Climate Interaction: The Caspian Sea
Experience. 1994 ISBN 0-7923-2784-5
12. A Peters, G. Wiuum, B. Herrling, D. Meissner, C.A Brebbia, W.G. Gray and
G.F. Pinder (eds.): Computational Methods in Water Resources X. 1994
Set 1211-1212: ISBN 0-7923-2937-6