0% found this document useful (0 votes)
645 views469 pages

Stochastic and Statistical Methods in Hydrology and Environmental Engineering Time Series Analysis in Hydrology and Environmental Engineering PDF

Uploaded by

Geomar Perales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
645 views469 pages

Stochastic and Statistical Methods in Hydrology and Environmental Engineering Time Series Analysis in Hydrology and Environmental Engineering PDF

Uploaded by

Geomar Perales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 469

STOCHASTIC AND STATISTICAL METHODS

IN HYDROLOGY
AND ENVIRONMENTAL ENGINEERING

VOLUME 3

TIME SERIES ANAL YSIS


IN HYDROLOGY AND
ENVIRONMENTAL ENGINEERING
Water Science and Technology Library
VOLUME 10/3

Series Editor:
V. P. Singh, Louisiana State University,
Baton Rouge, US.A.

Editorial Advisory Board:

S. Chandra, Roorkee, u.P., India


J. C. van Dam, Pijnacker, The Netherlands
M. Fiorentino, Potenza, Italy
W. H. Hager, Zurich, Switzerland
N. Harmanciogiu, Izmir, Turkey
V. V. N. Murty, Bangkok, Thailand
J. Nemec, GenthodiGeneva, Switzerland
A. R. Rao, West Lafayette, Ind., U.S.A.
Shan Xu Wang, Wuhan, Hubei, P.R. China

The titles published in this series are listed at the end of this volume.
STOCHASTIC AND STATISTICAL METHODS
IN HYDROLOGY
AND ENVIRONMENTAL ENGINEERING

Volume 3

TIME SERIES ANALYSIS


IN HYDROLOGY AND
ENVIRONMENTAL
ENGINEERING
edited by

KEITH W. HIPEL
Departments of Systems Design Engineering and Statistics and Actuarial Science,
University of Waterloo, Waterloo, Ontario, Canada

A. IAN McLEOD
Department of Statistical and Actuarial Sciences,
The University of Western Ontario, London, Ontario, Canada
and Department of Systems Design Engineering, University of Waterloo,
Waterloo, Ontario, Canada

U.S.PANU
Department of Civil Engineering,
Lakehead University, Thunder Bay, Ontario, Canada

VIJAY P. SINGH
Department of Civil Engineering,
Louisiana State University, Baton Rouge, Louisiana, U.S.A.

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.


Library of Congress Cataloging-in-Publication Data
Stochastic and statistical methods in hydrology and environmental
engineering.
p. cm. -- (Water science and technology library; v. 10)
Papers presented at an international conference held at the
University of Waterloo, Canada, June 21-23, 1993.
Inc I udes index.
Contents: v. 1. Extreme values: floods and droughts I edited by
Keith W. Hipel -- v. 2. Stochastic and statistical modelling with
groundwater and surface water applications I edited by Keith W.
Hipel -- v. 3. Time series analysis in hydrology and environmental
engineering I edited by Keith W. Hipel ... [et al.l -- v.
4. Effective enVironmental management for sustainable development I
edited by Keith W. Hipel and Liping Fang.
ISBN 978-90-481-4379-5 ISBN 978-94-017-3083-9 (eBook)
DOI 10.1007/978-94-017-3083-9
1. Hydrology--Sratistical methods--Congresses. 2. Stochastic
processes--Congresses. I. Series.
GB656.2.S7SS15 1994
551.4S'01'5195--dc20 94-2770S

ISBN 978-90-481-4379-5

Printed on acid-free paper

All Rights Reserved


© 1994 Springer Science+Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 1994
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means. electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
In Memory of Professor T.E. Unny
(1929 - 1991)

Professor Unny is shown exammmg posters for the International Conference on


Stochastic and Statistical Methods in Hydrology and Environmental Engineering held
in his honour June 21 to 23, 1993. The photograph was taken at the University of
Waterloo on December 20, 1991, eight days before Professor Unny's untimely death .
TABLE OF CONTENTS

PREFACE . . • • • • . . • . . . . . . . • . • . • xi

AN INTERNATIONAL CELEBRATION .• xv

ACKNOWLEDGEMENTS . . . . . • . XIX

PART I: CLIMATIC CHANGE

Applications of Stochastic Modeling in Climate Change Impact Assessment


D. P. LETTENMAIER 3

Knowledge Based Classification of Circulation Patterns


for Stochastic Precipitation Modeling
A. BARDOSSY, H. MUSTER, L. DUCKSTEIN and I. BOGARD! 19

Grey Theory Approach to Quantifying the Risks Associated


with General Circulation Models
B. BASS, G. HUANG, Y. YIN and S. J. COHEN 33

A Nonparametric Renewal Model for Modeling Daily Precipitation


B. RAJAGOPALAN, U. LALL and D. G. TARBOTON 47

PART II: FORECASTING

Forecasting B.C. Hydro's Operation of Williston Lake


- How Much Uncertainty is Enough
D. J. DRUCE 63

Evaluation of Streamflow Forecasting Models


T. TAO and W. C. LENNOX 77

Application of a Transfer Function Model to a Storage-Runoff Process


P.-S. YU, C.-L. LIU and T.-Y. LEE 87

Seeking User Input in Inflow Forecasting


T. TAO, I. CORBU, R. PENN, F. BENZAQUEN and L. LAI 99

Linear Procedures for Time Series Analysis in Hydrology


P. R. H. SALES, B. de B. PEREIRA and A. M. VIEIRA 105

PART III: ENTROPY

Application of Probability and Entropy Concepts in Hydraulics


C.-L. CHIU 121
viii TABLE OF CONTENTS

Assessment of the Entropy Principle as Applied to


Water Quality Monitoring Network Design
N. B. HARMANCIOGLU, N. ALPASLAN and V. P. SINGH 135

Comparisons betweeen Bayesian and Entropic Methods


for Statistical Inference
J. N. KAPUR, H. K. KESAVAN and G. BACIU 149

An Entropy-Based Approach to Station Discontinuance


N. B. HARMANCIOGLU 163

Assessment of Treatment Plant Efficiencies by the Entropy Principle


N. ALPASLAN 177

Infilling Missing Monthly Streamflow Data Using a Multivariate Approach


C. GOODIER and U. PANU 191

PART IV: NEURAL NETWORKS

Application of Neural Networks to Runoff Prediction


M.-L. ZHU, M. FUJITA and N. HASHIMOTO 205

Prediction of Daily Water Demands by Neural Networks


S. P. ZHANG, H. WATANABE and R. YAMADA 217

Backpropagation in Hydrological Time Series Forecasting


G. LACHTERMACHER and J. D. FULLER 229

PART V: TREND ASSESSMENT

Tests for Monotonic Trend


A.1. MCLEOD and K. W. HIPEL 245

Analysis of Water Quality Time Series Obtained for


Mass Discharge Estimation
B. A. BODO, A. 1. MCLEOD and K. W. HIPEL 271

De-Acidification Trends in Clearwater Lake near Sudbury, Ontario 1973-1992


B. A. BODO and P. J. DILLON 285

PART VI: SPATIAL ANALYSIS

Multivariate Kernel Estimation of Functions of Space and Time


Hydrologic Data
U. LALL and K. BOSWORTH 301

Comparing Spatial Estimation Techniques for Precipitation Analysis


J. SATAGOPAN and B. RAJAGOPALAN 317
TABLE OF CONTENTS ix

PART VII: SPECTRAL ANALYSIS

Exploratory Spectral Analysis of Time Series


A. LEWANDOWSKI 333

On the Simulation of Rainfall Based on the Characteristics of


Fourier Spectrum of Rainfall
U. MATSUBAYASHI, S. HAYASHI and F. TAKAGI 347

PART VIII: TOPICS IN STREAMFLOW MODELLING

Cluster Based Pattern Recognition and Analysis of Streamflows


T. KOJIRI, T. E. UNNY and U. S. PANU 363

ReMus, Software for Missing Data Recovery


H. PERRON, P. BRUNEAU, B. BOBEE and L. PERREAULT 381

Seasonality of Flows and its Effect on Reservoir Size


R. M. PHATARFOD and R. SRIKANTHAN 395

Estimation of the Hurst Exponent hand Geos Diagrams


for a Non-Stationary Stochastic Process
G. POVEDA and O. J. MESA 409

Optimal Parameter Estimation of Conceptually-Based Streamflow Models


by Time Series Aggregation
P. CLAPS and F. MURRONE 421

On Identification of Cascade Systems by Nonparametric Techniques


with Applications to Pollution Spread Modeling in River Systems
A. KRZYZAK 435

Patching Monthly Streamflow Data


- A Case Study Using the EM Algorithm and Kalman Filtering
G. G. S. PEGRAM 449

Runoff Analysis by the Quasi Channel Network Model


in the Toyohira River Basin
H. SAGA, T. NISHIMURA and M. FUJITA 459

Author Index . 469

Subject Index. 471


PREFACE

Objectives

To understand how hydrological and environmental systems behave dynamically, sci-


entists and engineers take measurements over time. In time series modelling and
analysis, time series models are fitted to one or more sequences of observations de-
scribing the system for purposes such as environmental impact assessment, forecast-
ing, simulation and reservoir operation. When applied to a natural system, time
series modelling furnishes an enhanced appreciation about how the system functions,
especially one that is heavily affected by land use changes. This in turn means that
better decisions can ultimately be made so that human beings can properly manage
their activities in order to live in harmony with their natural environment. The major
objective of this edited volume is to present some of the latest and most promising
approaches to time series analysis as practiced in hydrology and environmental engi-
neenng.

Contents

As listed in the Table of Contents, the book is. divided into the following main parts:

PART I CLIMATIC CHANGE


PART II FORECASTING
PART III ENTROPY
PART IV NEURAL NETWORKS
PART V TREND ASSESSMENT
PART VI SPATIAL ANALYSIS
PART VII SPECTRAL ANALYSIS
PART VIII TOPICS IN STREAMFLOW MODELLING

An important topic of widespread public concern in which time series analysis has a
crucial role to play is the systematic study of climatic change. In Part I, significant
contributions to climatic change are described in an interesting set of papers. For
instance, the first paper in this part is a keynote paper by Dr. D. P. Lettenmaier
that focuses upon time series or stochastic models of precipitation that account for
climatic driving variables. These models furnish a mechanism for transcending the
spatial scales between general circulation models and the much smaller spatial scale
at which water resources effects have to be studied and interpreted.

The contributions contained in Part II provide useful results in hydrological fore-


casting. A range of intriguing applications in hydrological forecasting are given for
case studies involving reservoir operation in British Columbia, Canada, Guangdong
Province in China, Taiwan, the Canadian Province of Ontario, and Brazil.
xi
xii PREFACE

Within Part III, new developments in entropy are described and entropy concepts
are applied to problems in hydraulics, water quality monitoring, discontinuance of
hydrologic measurement stations, treatment plant efficiency and estimating missing
monthly streamflow data. Neural networks are employed in Part IV for forecasting
runoff and water demand.

Trend assessment techniques have widespread applicability to environmental impact


assessment studies. In Part V, a number of trend assessment techniques are evaluated
and graphical, nonparametric and parametric trend methods are applied to water
quality data.

In Part VI, nonparametric and parametric approaches to spatial analysis are described
and applied to practical hydrological problems. Next, some unique findings in spectral
analysis are given in Part VII. Finally, Part VIII is concerned with a variety of
interesting topics in streamflow modelling.

Audience

This book should be of direct interest to anyone who is concerned with the latest
developments in time series modelling and analysis. Accordingly, the types of Pro-
fessionals who may wish to use this book include:

Water Resources Engineers


Environmental Scientists
Civil Engineers
Earth Scientists
Hydrologists
Geographers
Planners
Statisticians
Systems Engineers
Management Scientists

Within each professional group, the book should provide useful information for:

Researchers
Teachers
Students
Practitioners and Consultants
PREFACE xiii

When utilized for teaching purposes, the book could serve as a complementary text
at the upper undergraduate and graduate levels. The recent environmetrics book
by K. W. Hipel and A. 1. McLeod that is entitled Time Series Modelling of Water
Resources and Environmental Systems (published by Elservier, Amsterdam, 1994,
ISBN 0 444 89270-2), contains an extensive list of time series analysis books (see Sec-
tion 1.6.3) that could be used in combination with this current volume in university
courses. Researchers should obtain guidance and background material for carrying
out worthwhile research projects in time series analysis in hydrology and environmen-
tal engineering. Consultants who wish to keep their companies at the leading edge
of activities in time series analysis and thereby serve their clients in the best possible
ways will find this book to be an indispensable resource.
AN INTERNATIONAL CELEBRATION

Dedication

The papers contained in this book were originally presented at the international
conference on Stochastic and Statistical Methods in Hydrology and Envi-
ronmental Engineering that took place at the University of Waterloo, Waterloo,
Ontario, Canada, from June 21 to 23, 1993. This international gathering was held in
honour and memory of the late Professor T.E. Unny in order to celebrate his life-
long accomplishments in many of the important environmental topics falling within
the overall conference theme. When he passed away in late December, 1991, Professor
T.E. Unny was Professor of Systems Design Engineering at the University of Water-
loo and Editor-in-Chief of the international journal entitled Stochastic Hydrology and
Hydraulics.

About 250 scientists from around the world attended the Waterloo conference in
June, 1993. At the conference, each participant was given a Pre-Conference Pro-
ceedings, published by the University of Waterloo and edited by K.W. Hipel. This
584 page volume contains the detailed conference program as well as the refereed
extended abstracts for the 234 papers presented at the conference. Subsequent to the
conference, full length papers submitted for publication by presenters were mailed
to international experts who kindly carried out thorough reviews. Accepted papers
were returned to authors for revisions and the final manuscripts were then published
by Kluwer according to topics in the following four volumes:

STOCHASTIC AND STATISTICAL MODELLING WITH


GROUNDWATER AND SURFACE WATER APPLICATIONS
edited by
Keith W. Hipel
EFFECTIVE ENVIRONMENTAL MANAGEMENT FOR
SUSTAINABLE DEVELOPMENT
edited by
Keith W. Hipel and Liping Fang
EXTREME VALUES: FLOODS AND DROUGHTS
edited by
Keith W. Hipel
as well as the current book on:

TIME SERIES ANALYSIS IN HYDROLOGY AND


ENVIRONMENTAL ENGINEERING
edited by
Keith W. Hipel, A. Ian McLeod, U. S. Panu and Vijay P. Singh

xv
xvi AN INTERNATIONAL CELEBRATION

The Editors of the volumes as well as Professor Unny's many friends and colleagues
from around the globe who wrote excellent research papers for publication in these
four volumes, would like to dedicate their work as a lasting memorial to Professor
T. E. Unny. In addition to his intellectual accomplishments, Professor Unny will be
fondly remembered for his warmth, humour and thoughtful consideration of others.
Conference Organization and Sponsorships
The many colleagues and sponsors who took part in the planning and execution of the
international conference on Stochastic and Statistical Methods in Hydrology
and Environmental Engineering are given below.

Organizing Committee
K. W. Hipel (Chairman) A. I. McLeod
U. S. Panu V. P. Singh

International Programme Committee


S. Al-Nassri (Malaysia) Z. Kundzewicz (Poland)
H. Bergmann (Austria) Gwo-Fong Lin (Taiwan)
J. Bernier (France) C. Lemarechal (France)
B. Bobee (Canada) 1. Logan (Canada)
B. Bodo (Canada) D. P. Loucks (U.S.A.)
D. S. Bowles (U.S.A.) I. B. MacNeill (Canada)
W. P. Budgell (Norway) A. Musy (Switzerland)
S. J. Burges (U.S.A.) P. Nachtnebel (Austria)
F. Camacho (Canada) D. J. Noakes (Canada)
S. Chandra (India) N. Okada (Japan)
C-L. Chiu (U.S.A.) R. M. Phatarford (Australia)
J. Ding ( China) V. Privalsky (U .S.S.R.)
L. Duckstein (U.S.A.) D. Rosbjerg (Denmark)
A. H. El-Shaarawi (Canada) J. D. Salas (U.S.A)
M. Fiorentino (Italy) G. A. Schultz (Germany)
E. Foufoula (U.S.A.) S. Serrano (U.S.A.)
I. C. Goulter (Australia) U. Shamir (Israel)
Y. Y. Haimes (U.S.A.) S. P. Simonovic (Canada)
N. Harmancioglu (Turkey) S. Sorooshian (U.S.A.)
S. Ikebuchi (Japan) A. Szollosi-Nagy (France)
Karmeshu (India) C. Thirriot (France)
M. 1. Kavvas (U.S.A.) W. E. Watt (Canada)
J. Kelman (Brazil) S. J. Yakowitz (U.S.A.)
J. Kindler (Poland) V. Yevjevich (U.S.A.)
G. Kite (Canada) Y. C. Zhang (China)
T. Kojiri (Japan) P. Zielinski (Canada)
R. Krzysztofowicz (U.S.A.)
AN INTERNATIONAL CELEBRATION xvii

University of Waterloo Committee

A. Bogobowicz T. Hollands
S. Brown J. D. Kalbfleisch
D. Burns E. LeDrew
C. Dufournaud E. A. McBean
1. Fang K. Ponnambalam
G. Farquhar E. Sudicky

Financial Support

Conestoga/Rovers and Associates


Cumming Cockburn Limited
Department of Systems Design Engineering, University of Waterloo
Faculty of Engineering, University of Waterloo
Natural Sciences and Engineering Research Council (NSERC) of Canada

Sponsors

American Geophysical Union


American Water Resources Association
Association of State Floodplain Managers
Canadian Society for Civil Engineering
Canadian Society for Hydrological Sciences
IEEE Systems, Man and Cybernetics Society
Instituto Panamericano de Geografia e Historia
International Association for Hydraulic Research
International Association of Hydrological Sciences
International Commission of Theoretical and Applied Limnology
International Commission on Irrigation and Drainage
International Institute for Applied Systems Analysis
International Statistical Institute
International Water Resources Association
Lakehead University
Louisiana State University
North American Lake Management Society
The International Environmetrics Society
The Pattern Recognition Society
The University of Western Ontario
University of Waterloo
xviii AN INTERNATIONAL CELEBRATION

University of Waterloo
President James Downey, Opening and Banquet Addresses
D. Bartholomew, Graphic Services
Danny Lee, Catering and Bar Services Manager
D. E. Reynolds, Manager, Village 2 Conference Centre
T. Schmidt, Engineering Photographic
Audio Visual Centre
Food Services
Graduate Students in Systems Design Engineering

Technical Assistance
Mrs. Sharon Bolender
Mr. Steve Fletcher
Mr. Kei Fukuyama
Ms. Hong Gao
Ms. Wendy Stoneman
Mr. Roy Unny
ACKNOWLEDGEMENTS

The Editors would like to sincerely thank the authors for writing such excellent papers
for publication in this as well as the other three volumes. The thoughtful reviews
of the many anonymous referees are also gratefully acknowledged. Moreover, the
Editors appreciate the fine contributions by everyone who attended the Waterloo
conference in June, 1993, and actively took part in the many interesting discussions
at the paper presentations.

Additionally, the Editors would like to say merci beaucoup to the committee members
and sponsors of the Waterloo conference listed in the previous section. Dr. Roman
Krzysztofowicz, University of Virginia, and Dr. Sidney Yakowitz, University of Ari-
zona, kindly assisted in organizing interesting sessions at the Waterloo conference
for papers contained in this volume. Furthermore, Dr. R. M. Phatarford, Monash
University in Australia, and Dr. K. Ponnambalam, University of Waterloo, were
particularly helpful in suggesting reviewers as well as carrying out reviews for papers
published in this book. Finally, they sincerely appreciate all the thoughtful personnel
at Kluwer who assisted in the publication of the volumes, especially Dr. Petra D.
Van Steenbergen, the Acquisition Editor.

Keith W. Hipel A. Ian McLeod


Professor and Chair Professor
Department of Systems Department of
Design Engineering Statistical and Actuarial Sciences
The University of Western Ontario
Cross Appointed Professor to
Department of Statistics and Adjunct Professor
Actuarial Science Department of Systems
University of Waterloo Design Engineering
University of Waterloo

U.S. Panu Vijay P. Singh


Professor Professor
Department of Civil Engineering Department of Civil Engineering
Lakehead University Louisiana State University

April, 1994

xix
PART I

CLIMATIC CHANGE
APPLICATIONS OF STOCHASTIC MODELING IN CLIMATE CHANGE
IMPACT ASSESSMENT

DENNIS P. LETTENMAIER
Department of Civil Engineering FX-10
University of Washington
Seattle, W A 98195

The development of stochastic models of precipitation has been driven primarily


by practical problems of hydrologic data simulation, particularly for water
resource systems design and management in data-scarce situations, and by
scientific interest in the probabilistic structure of the arrival process of
precipitation events. The need for better methods of developing local climate
scenarios associated with alternative climate simulations produced by global
atmospheric general circulation models (GCMs) has provided another application
for stochastic models of precipitation, but necessitates different model structures.
Early attempts to model the stochastic structure of the precipitation arrival
process are reviewed briefly. These include first order homogeneous Markov
chains, as well as more advanced point process models designed to represent the
clustering of precipitation events often recorded in observations of daily and
shorter time scale observation series. The primary focus of this paper, however, is
stochastic models of precipitation that account for climatic driving variables.
Such models provide a means of transcending the spatial scales between GCMs
and the much smaller spatial scale at which water resources effects need to be
interpreted. The models reviewed generally make use of two types of information.
The first is a set of atmospheric variables measured over the GCM grid mesh with
node spacing of several degrees latitude by longitude. The second is a set of
concurrent point precipitation observations, at several locations within the large-
scale grid mesh, observed at the same time frequency (usually one day or less) as
the large-scale atmospheric variables. A variety of methods of summarizing the
atmospheric variables via subjective and objective weather typing procedures are
reviewed, as are various approaches for stochastically coupling the large-scale
atmospheric indicator variables with the precipitation arrival and amounts
process.

INTRODUCTION
Stochastic models of the precipitation arrival process were originally developed to
address practical problems of data simulation, particularly for water resource
systems design and management in data-scarce situations, and to aid in
understanding the probabilistic structure of precipitation. Early attempts to
model the stochastic structure of the precipitation arrival process (wet/dry
occurrences) were based on first-order homogeneous Markov chains (e.g., Gabriel
and Neumann 1957; 1962). Various extensions of Markov models have since been
3
K. W. Hipel etal. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 3-17.
© 1994 Kluwer Academic Publishers.
4 D. P. LEITENMAffiR

explored to accommodate inhomogeneity (such as seasonality) of the transition


probabilities (e.g., Weiss, 1964; Woolhiser and Pegram, 1979; Stern and Coe,
1984) and to incorporate precipitation amounts (e.g., Khanal and Hamrick, 1974;
Haan, et al., 1976). Markov models have fallen from favor, however, because they
are unable to reproduce the long-term persistence of wet and dry spells and the
clustering observed in rainfall occurrence series at daily or shorter time intervals
(Foufoula-Georgiou, 1985). Since that time, more advanced point process models
have been developed, such as those of Kavvas and Delleur (1981), Rodriguez-
Iturbe, et al. (1987), Foufoula-Georgiou and Lettenmaier (1987), Smith and Karr
(1985), and others. Most of this recent work, which is reviewed in Georgakakos
and Kavvas (1987), is based on point process theory (e.g., LeCam, 1961). The
Markov chain and point process models are similar to the extent that they are
restricted to single-station applications, and are not easily generalizable to
multiple station applications, at least without (in the case of Markov chain
models) explosive growth in the number of parameters. In addition, all of the
above models describe the precipitation process unconditionally, that is, they do
not incorporate cause-effect information, such as descriptors of large-scale
meteorological conditions that might give rise to wet, or dry, conditions.
Recent interest in assessments of the hydrologic effects of climate change has
placed different demands on stochastic precipitation models. Much of the concern
about global warming has been based on simulations of climate produced by
global general circulation models of the atmosphere (GCMs). These models
operate at spatial scales of several degrees latitude by several degrees longitude,
and time steps usually from several minutes to several tens of minutes. The
models are fully self-consistent with respect to the energy and water budgets of
the atmosphere, and therefore produce predictions of both free atmosphere
variables (e.g., vertical profiles of atmospheric pressure, temperature, wind, and
liquid and vapor phase moisture) as well as surface fluxes (precipitation, latent
and sensible heat, short and long-wave radiation, ground heat flux). In principle,
the surface fluxes could be used directly to drive hydrologic models which could
serve to disaggregate the GCM surface fluxes spatially to predict, for instance,
streamflow. However, this approach is not at present feasible for two reasons.
First, the sc~e ~sm'2tch between the GCM grid mesh and the catchment scale
(typically 10 -10 km) that is of interest for effects studies presents formidable
obstacles. Second, GCM surface flux predictions are notoriously poor at scales
much less than continental. Figure 1 shows, as an example, long-term average
rainfall predicted by the CSIRO GCM (Pittock and Salinger, 1991) for present
climate and CO 2 doubling for a grid cell in southeastern Australia as compared
with an average of severallong-ter~ precipitation gauges located in the grid cell.
Although the seasonal pattern (winter-dominant precipitation) is the same in the
observations and model predictions, the model underpredicts the annual
precipitation by a factor of about two. Such differences in GCM predictions of
precipitation are not atypical (see, for instance, Grotch, 1988), and in fact the
predictions shown in Figure 1 might in some respects be considered a "success"
because the seasonal pattern is correctly predicted by the GCM. These results do
highlight one of the dangers in attempting to dis aggregate GCM output directly:
the signal (difference between 2 x CO2 and 1 x CO 2 climates) is considerably less
than the bias (difference between 1 x CO 2 and historical climates).
Giorgi and Mearns (1991) review what they term "semi-empirical approaches"
to the simulation of regional climate change. These are essentially stochastic
models, which rely on the fact that the GCM predictions of free atmosphere
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 5

variables are usually better than those of surface fluxes. They therefore attempt
to relate local

200 X 1 x CO2
E o 2xC02
E 150 + HIstorical
C

-
~
°a
0Q
100
-+
,..+
...
c..
50 -+ -+
0

J F M A M J J A SON 0
MONTH

Figure 1. Comparison of monthly average precipitation simulated by CSIRO


GCM (Pittock and Salinger, 1991) for a southeastern Australia grid cell with
historical average of station data in the grid cell.

(e.g., catchment-scale). surface variables (especially precipitation) to GCM free


atmosphere variables. Among these methods are Model Output Statistics (MaS)
routines, which are essentially regressions that adjust numerical weather
predictions (produced on grid meshes smaller than those used for GCM climate
simulations, but still "large" compared to the scale required for local assessments).
MaS adjustments to precipitation are used, for instance, in quantitative
precipitation forecasts (QPFs) for flood forecasting. One drawback of these
routines is that they attempt to produce "best estimates" in the least squares
sense. While this may be appropriate for forecasting, least squares estimates will
usually underestimate the natural variability, which is a critical deficiency for
climate effects assessments.
Other semi-empirical approaches have been developed to relate longer term
GCM simulations of free atmosphere variables to local precipitation. Among
these are the canonical correlation approach of von Storch, et al. (1993), and the
regression approach of Wigley, et al. (1990). The disadvantage of these
approaches is that the resulting local variable predictions are at a time scale much
longer than the catchment response scale, hence there is no practical way to
incorporate the predictions within a rainfall-runoff modeling framework from
which water resources effects interpretations might be made. This difficulty could
presumably be resolved by using shorter time steps (e.g., daily rather than
6 D. P. LETTENMAIER

monthly, which would be more appropriate for hydrologic purposes). However, in


the case of the regression approach of Wigley, et al. (1990), the input variables for
local precipitation predictions include large-scale precipitation, which is
responsible for much of the predictive accuracy. However, in their analysis,
Wigley, et al. (1990) used the mean of station data for the large-scale
precipitation. Unfortunately, as shown by Figure 1, GCM precipitation
predictions are often badly biased, and this bias would be transmitted to the local
predictions. Nonetheless, the considerable experience that has been developed
over the last thirty years in developing local meteorological forecasts has largely
been unexploited for local climate simulation. There is sufficient similarity
between the two problems that investigation of extensions of methods such as
MOS to hydrological simulation may prove fruitful.

STOCHASTIC PRECIPITATION MODELS WITH EXTERNAL FORCING


Several investigators have recently explored stochastic precipitation models that
operate at the event scale (defined here as daily or shorter) and incorporate,
explicitly or implicitly, external large-area atmospheric variables. The motivation
for development of these methods has been, in part, to provide stochastic
sequences that could serve as input to hydrologic (e.g., precipitation-runoff)
models. Most of the work in this area has utilized, either directly or via summary
measures, large-scale free atmosphere variables rather than large-area surface
fluxes. In this respect, their objective has been to simulate stochastically realistic
precipitation sequences that incorporate large area information as external drivers.
This approach is fundamentally different than disaggregation methods, such as
MOS, which attempt to relate large-scale predictions directly to smaller scales.

Weather classification schemes

Weather classification schemes have been the mechanism used by several authors
to summarize large-area meteorological information. The general concept of
weather classification schemes (see, for example, Kalkstein, et al., 1987) is to
characterize large-area atmospheric conditions by a single summary index.
Externally forced stochastic precipitation models can be grouped according to
whether the weather classification scheme is subjective or objective, and whether
it is unconditional or conditional on the local conditions (e.g., precipitation
occurrence) .
Subjective classification procedures include the scheme of Baur (1944), from
which a daily sequence of weather classes dating from 1881 to present has been
constructed by the German Federal Weather Service (Bardossy and Caspary,
1990), and the scheme of Lamb (1972), which has formed the basis for
construction of a daily sequence of weather classes for the British Isles dating to
1861. These subjective schemes are primarily based on large-scale features in the
surface pressure distribution, such as the location of semipermanent pressure
centers, the position and paths of frontal zones, and the existence of cyclonic and
anticyclonic circulation types (Bardossy and Caspary, 1990).
Objective classification procedures utilize statistical methods, such as principal
components, cluster analysis, and other multivariate methods to develop rules for
classification of multivariate spatial data. For instance, McCabe, et al. (1990)
utilized a combination of principal components and cluster analysis to form daily
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 7

weather classes at Philadelphia. The statistical model was compared to a


subjective, conceptual model, which was found to give similar results. Wilson, et
al. (1992) explored classification methods based on K-means cluster analysis, fuzzy
cluster analysis, and principal components for daily classification of weather over a
large area of the Pacific Northwest. Both of the above methods are unconditional
on local conditions, that is, no attempt is made to classify the days in such a way
that local precipitation, for instance, is well-described by the weather classes.
Figure 2, taken from Wilson, et al. (1992) shows that for one of the precipitation
stations considered, the unconditional classification scheme used (principal
components of surface pressure, geopotential heights at 850 and 700 mb, and the
east-west wind component at 850 mb) over a widely spaced grid mesh) resulted in
good discrimination of precipitation for only one (winter) of the four seasons
considered.

12

E 10 Season 1
.
....
•0
~ S
Season 3
.
·
.l!-
z z • 0
a •

.....*"
8 K
0 Class 1 0 .0
~
~
.
6 4

·· .... .
0 CI..s2 o t:
K.O

n. 0 n. '0'
4 Class 3 00
13 0 0
13 o-
W Class 4 o
W
a: 2
a: 2 .. \<t(o
~.
n.
o'
n. .8Jro.o
6+ft "'0

. .
&0_._""" K~A:Jt if
..
.D . . . . . . . IO . . . ~1f60
0 0

.... 95 90 SO 20 10 5 0.1 .... 95 90 80 50 20 10 5 0.1

EXCEEDANCE PROBABILITY, PERCENT EXCEEDANCE PROBABILITY. PERCENT'

E
& 6
Season 2 E
&
15 Season 4
. 0

..
z • II z

...••...
0 '0
Q
10
.,a
+.0 0
~ 4 • Do
~
t: n.
, D-

n. o.
13 ;:; t .. ~
W 2 W 5

.
a: +h.IIO~ a: A*~+
n.
~.
n.
..,.,...., .o ... Q6+ef' +
.,. ...
.. . ....
• x-e ....... _ .. )C~
0 '"
.... ....
~r

50 20 10 5 0.1 M 9590 so SO 20 10 5 nI

EXCEEDANCE PROBABILITY, PERCENT EXCEEOANCE PROBABIUTV. rr-nCFNT

Figure 2. Cumulative distribution of precipitation by three-month seasons


(JFM, AMJ, JAS, OND) and weather class for Stampede Pass, WA (from
Wilson, et al., 1992).

Hughes, et al. (1993) used an alternative approach that selected the weather
classes so as to maximize the discrimination of local precipitation, in terms of
joint precipitation occurrences (presence/absence of precipitation at four widely
separated stations throughout a region of dimensions about 1000 km). The
procedure used was CART (Breiman, et al., 1984), or Qlassification and
8 D. P. LETTENMAIER

Regression Trees. The large area information was principal components of sea
level pressure. Figure 3 shows the discrimination of the daily precipitation
distribution at one of the stations modeled,

-
o statet
A state 2
en + state 3
~<O x state 4
o o state 5
C v state 6
~
c
o v
-
.~

'0.
'g.... (\J

a..
o

0.1 5 20 50 80 99
Exceedance probability. percent

Figure 3. Cumulative distributions of precipitation by weather state for Forks,


WA in winter, using CART weather classification procedure of Hughes, et al.
(1993).

Forks, according to weather class. As expected, because the classification scheme


explicitly attempts to "separate" the precipitation (albeit occurrence/absence
rather than amount) by the selected classes, the resulting precipitation
distributions are more distinguishable than those obtained by Wilson, et al.
(1992). Hughes, et al. (1993) also simulated daily temperature minima and
maxima. For this purpose, they used a Markov model conditioned on the present
and previous days' rain state.
A final method of weather class identificatioIl is implicit. Zucchini and
Guttorp (1991) describe the application of a set of models known as hidden
Markov to precipitation occurrences. The objective of their study was to model
the (unconditional) structure of the precipitation arrival process. The properties
of the hidden states, which could be (although do not necessarily need to be)
interpreted as weather states, were not explicitly evaluated. Hughes, et al. (1993)
explored a larger class of nonhomogeneous hidden Markov models (NHMM), of
which the model of Zucchini and Guttorp is a special case. He explored models of
the precipitation occurrence process in which the atmospheric states were explicit,
but were inferred by the NHMM estimation procedure. In this model, therefore,
the weather state and stochastic precipitation structure are completely integrated.
For this reason, further comments on the NHMM model are deferred to the next
section.
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 9

Conditional stochastic precipitation models

Hay, et al. (1991) used a classification method based on wind direction and cloud
cover (McCabe, 1990) which was coupled with a semi-Markov model to simulate
temporal sequences of weather types at Philadelphia. Semi-Markov models (Cox
and Lewis, 1978) with seasonal transition probabilities and parameters of the
sojourn time distribution, were used to simulate the evolution of the weather
states. This step is not strictly necessary if a lengthy sequence of variables
defining the large-area weather states (the classification used required daily wind
direction and cloud cover data) is available. Where such sequences are not
available, which is sometimes the case for GCM simulations, fitting a stochastic
model to the weather states has the advantage that it decouples the simulation of
precipitation, and other local variables, from a particular GCM simulation
sequence. The method of simulating daily precipitation conditional on the
weather state used by Hay, et al. (1991) was as follows. For each weather state
and each of 11 weather stations in the region, the unconditional probability of
precipitation was estimated from the historic record. Then, conditional on the
weather state (but unconditional on precipitation occurrence and amount at the
other stations and previous time) the precipitation state was selected based on the
unconditional precipitation occurrence probability. Precipitation amounts were
drawn from the product of a uniform and exponential distribution. Retrospective
analysis of the model showed that those variables explicitly utilized for parameter
estimation (conditional precipitation occurrence probabilities, mean precipitation
amounts) were reproduced by the model. An analysis of dry period lengths
suggested that the length of extreme dry periods was somewhat underestimated.
Bardossy and Plate (1991) also used a semi-Markov model to describe the
structure of the daily circulation patterns over Europe, with circulation types
based on synoptic classification. They developed a model of the corresponding
rainfall occurrence process that was Markovian within a weather state (circulation
type), but independent when the weather state changed. Precipitation
occurrences were assumed spatially independent. Bardossy and Plate (1991)
applied the model to simulate the precipitation occurrences at Essen, Germany.
For this station, they found that the persistence parameter in the occurrence
model was quite small, so that the model was almost conditionally independent
(that is, virtually all of the persistence in the rainfall occurrence process was due
to persistence in the weather states). The model reproduced the autocorrelations
of the rainfall occurrences, as well as the distributions of dry and wet days,
reasonably well. This is somewhat surprising, since other investigators (e.g.,
Hughes, et al., 1993) have found that conditionally independent models tend to
underestimate the tail of the dry period duration distribution. However, this
finding is likely to depend on both the structure of the weather state process, and
the precipitation occurrence process, which is regionally and site-specific.
Bardossy and Plate (1992) extended the model of Bardossy and Plate (1991) to
incorporate spatial persistence in the rainfall occurrences, and to model
precipitation amounts explicitly. The weather state classification procedure was
the same as in Bardossy and Plate (1991), and they retained the assumption of
conditional independence under changes in the weather state. Rather than
modeling the occurrence process explicitly, they modeled a multivariate normal
random variable W. Negative values of W corresponded to the dry state, and (a
transform of) positive values are the precipitation amount. Within a run of a
10 D. P. LETTENMAIER

weather state, W was assumed to be lag-one Markov. Spatial correlation in the


occurrence process, and in the precipitation amounts, was modeled via the first
two moments of W, which were weather state-dependent. The model was applied
to 44 stations in the Ruhr River basin. The model was able to reproduce the first
two unconditional moments of rainfall amounts, and precipitation probabilities, as
well as the dry day durations, reasonably well at one of the stations (Essen, also
used in the 1991 paper) selected for more detailed analysis.
Wilson, et al. (1991) developed a weather classification scheme for the Pacific
Northwest based on cluster analysis of surface pressure and 850 mb temperature
over a 10 degree by 10 degree grid mesh located over the North Pacific and the
western coast of North America. A ten-year sequence of weather states (1975-84)
was formed, and was further classified according to whether or not precipitation
occurred at a station of interest. The partitioned weather state vector was then
modeled as a semi-Markov process. For wet states, precipitation amounts were
simulated using a mixed exponential model. The wet and dry period lengths were
simulated quite well, although some of the weather state frequencies were mis-
estimated, especially in summer. The authors suggested that a heavier tailed
distribution than the geometric, used for the lengths-of-stay in the semi-Markov
model, might give better performance. The model is somewhat limited in that its
generalization to multiple stations results in rapid growth in the number of
parameters.
Wilson, et al. (1992) explored a slightly different multiple station model, based
on a Polya urn structure. Rather than explicitly incorporating the wet-dry state
with the weather state, they developed a hierarchical modified model for the
rainfall state conditioned on the weather state and the wet-dry state of the higher
order station(s). In a Polya urn, the wet-dry state is obtained by drawing from a
sample, initially of size N + M, a state, of which N are initially wet, and Mare
initially dry. For each wet state drawn, the sample of wet states is increased by
n. Likewise, for each dry state drawn, the sample of dry states is increased by m,
and in both cases the original state drawn is "replaced". Thus, the Polya urn has
more persistence than a binomial process, in which the state drawn would simply
be replaced, and the probability of the wet or dry state is independent of the dry
or wet period length. The modification to the Polya urn (employed by others as
well, e.g., Wiser, 1965) is to replace the persistent process with a binomial process
once a given run (wet or dry period) length w has been reached. In addition, the
parameters of the model (N, n, M, m, and w) are conditioned on the weather state
and the wet-dry status of the higher stations in the hierarchy, but the memory is
"lost" when the state (combination of weather state and wet-dry status of higher
stations in the hierarchy) changes. The model was applied to three precipitation
stations in the state of Washington, using a principal components-based weather
classification scheme for a region similar to that used by Wilson, et al. (1991).
The precipitation amounts were reproduced reasonably well, especially for the
seasons with the most precipitation. The dry and wet period lengths were also
modeled reasonably well, although there was a persistent downward bias,
especially for the lowest stations in the hierarchy. The major drawback of this
model is that the number of parameters grows rapidly (power of two) with the
number of stations. Also, the model performs best for the highest stations in the
hierarchy, but there may not be an obvious way of determining the ordering of
stations.
All of the above models define the weather states externally, that is, the
selection of the weather states does not utilize station information. Hughes, et al.
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 11

(1993) linked the selection of the weather states with observed precipitation
occurrence information at a set of index stations using .the CART procedure
described above. Precipitation occurrences and amounts were initially modeled
assuming conditional independence, by simply res amp ling at random from the
historical observations of precipitation at a set of target stations, given the
weather states. They found that this model tended to underestimate the
persistence of wet and dry periods. Model performance was improved by
resampling precipitation amounts conditional on the present day's weather state
and the previous day's rain state. Unlike the models of Bardossy and Plate (1991;
1992) the Markovian persistence was retained regardless of shifts in the weather
state. Inclusion of the previous rain state reduced the problem with simulation of
wet and dry period persistence. However, by incorporating information about the
previous day's rain state the number of parameters grows rapidly with the number
of stations.
A somewhat different approach is the hidden Markov model (HMM), initially
investigated for modeling rainfall occurrences by Zucchini and Guttorp (1991).
The hidden Markov model is of the form

p( R t ISI,Rr- 1) = P(RtIS t ) (la)


l-
p(StI S 1) = P(St ISt_1) (lb)

where R t is the rainfall occurrence (presence-absen~ at time t, St is the value of


the hidden state at time t, and the notation SJ denotes the values of the
unobserved process S from time 1 to T. Essentialfy, the model assumptions are
that the rainfall state is conditionally independent, that is, it depends only on the
value of the hidden state at the present time, and the hidden states are Markov.
R t can be a vector of rainfall occurrences at multiple stations, in which case a
model for its (spatial) covariance is required. The shortcoming of the HMM is
that the hidden states are unknown, and even though they may well be similar to
weather states, they cannot be imposed externally. Therefore, only unconditional
simulations are possible, and in this respect the model is similar to the
unconditional models of the precipitation arrival process discussed in Section 1.
Hughes (1993) explored a class of nonhomogeneous hidden Markov models
(NHMM) of the form

p(RtlsI,R~ ,Xl) = P(RtIS t ) (2a)


P(StISr-1,X1) = P(StI St_1'X t ) (2b)
where X t is a vector of atmospheric variables at time t. In this model, the
precipitation process is treated as in the HMM, that is, it is conditionally
independent given the hidden state St. However, the hidden states depend
explicitly on a set of atmospheric variables at time t, and the previous hidden
state. As for the HMM, if R t is a vector of precipitation states at multiple
locations, a model for the spatial covariances is required. Also, X t can be (and in
practice usually will be) multivariate. Hughes (1993) explored two examples in
which X t was a vector of principal components of the sea level pressure and 500
mb pressure height, and the model for P(StISt_1'Xt) was either Bayesian or
autologistic. The Bayes and autologistic moilers are similar in terms of their
parameterization; the structure of the autologistic model is somewhat more
obvious structurally and is used for illustrative purposes here. It is of the form
12 D. P. LETTENMAIER

exp(aSt 1 St + Xtb st 1 St)


P(S IS X) - - , - ,
t t-l' t - L
k
exp( a
St_l,k
+ Xb
t St_l,k
)
(3)

where St denotes the particular values of St. In this model, if there are m hidden
states, and w atmospheric variables (that is, X t is w-dimensional) the logistical
model has m(m-l)(w+l) free variables. Note that the model for the evolution of
St conditioned on St-l and Xt is effectively a regional model, and does not depend
on the precipitation stations. Hughes (1993) explored two special cases of the
model:

1: as t _l ,St = a St and b st _l ,st = bst' and


2: b st _l ,st = bst .
In the first case, the Markovian property of the NHMM is dropped, and the
evolution of the hidden states depends only on the present value of the
atmospheric variables. In the second model, the "base" component of the hidden
state transition probabilities is Markov, but the component that depends on the
atmospheric variables is a function only of the present value of the hidden state,
and not the previous value. In one of the two examples explored, Hughes found,
using a Bayes Information Criterion to discriminate between models, that the
second model was the best choice. In the other example, the full Markov
dependence was retained.
In the two examples, Hughes evaluated the means of the weather variables
corresponding to the hidden states. He found that the large area characteristics
were reasonable. For winter, the states with the most precipitation on average
corresponded to a low pressure system off the north Pacific coast, and the cases
with the least precipitation corresponded to a high pressure area slightly inland of
the coast. In the'second example, with modeled precipitation occurrences at 24
stations in western Washington, transitional states with differences in the surface
and 500 mb flow patterns were shown to result in partial precipitation coverage
(precipitation at some stations, and not at others). These results suggest that the
NHMM may offer a reasonable structure for transmitting the effects of large area.
circulation patterns to the local scale.

APPLICATIONS TO ALTERNATIVE CLIMATE SIMULATION


Case studies
Although the development of most of the models reviewed above has been
motivated in part by the need for tools to simulate local precipitation for
alternative climate scenarios, there have only been a few applications where
climate model (GCM) scenarios have been downscaled using stochastic methods.
Hughes, et al. (1993) estimated parameters of semi-Markov models from five-year
1 x CO 2 and 2 x CO 2 GFDL simulations of surface pressure and 850 mb
temperature. From these five-year sequences, they computed daily weather states
using algorithms developed from historical sea level pressure observations, and fit
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 13

semi-Markov models to the weather states as described above. The semi-Markov


models were used to simulate 40-year weather state sequences corresponding to
the 1 x CO 2 and 2 x CO 2 runs. Daily precipitation (and temperature maxima-
minima, using the model described in Section 2.2) were then simulated for the 40-
year period, and were used as input to a hydrologic model which was the basis for
an assessment of shifts in flood risk that might be associated with climate change.
Zorita, et al. (1993) used a model similar to that of Hughes, et al. (1993) to
simulate daily precipitation for four sites in the Columbia River basin. They
found that the model performed reasonably well in winter, but they had
difficulties in application of the CART procedure to determine climate states in
summer. They found that when stations relatively far from the Pacific Coast
were used in definition of the multistation rain states in the CART algorithm, no
feasible solutions resulted. They were able to avoid this problem by restricting
the index stations to be relatively close to the coast, but when they attempted to
apply the model to the middle Atlantic region, they could obtain CART weather
states only when the index stations were quite closely spaced, and then only in
winter. The difficulty appeared to be the shorter spatial scale of summer
precipitation, and weaker coupling of local precipitation with the regional
circulation patterns. Summer precipitation in the middle Atlantic region is
dominated by local, convective storms, the occurrence of which is not well
predicted by large-scale circulation patterns. This is true also, to a lesser extent,
in the inland portion of the Columbia River basin.

Some complications

One of the motivations for development of stochastic models thai couple large-
area atmospheric variables with local variables, such as precipitation, is to provide
a means of downscaling simulations of alternative climates for effects assessments.
However, as noted in the previous sections, most of the applications to date have
been to historic data. For instance, local precipitation has been simulated using
either an historic sequence of weather states (e.g., Hughes, et al., 1993) or via a
stochastic model of the historic weather states (e.g., Hay, et al., 1991, Bardossy
and Plate, 1992; Wilson, et al., 1992).
For most of the models reviewed, it should be straightforward to produce a
sequence of weather states corresponding to an alternative climate scenario (e.g.,
from a lengthy GCM simulation). There are, nonetheless, certain complications,
the most obvious of which is biases in current climate (baseline) GCM
simulations. For instance, Zorita, et al. (1993) found it necessary to use a weather
state classification scheme based on sea level pressure anomalies to filter out
biases in the mean GCM pressure fields. Otherwise the stochastic structure of the
weather state sequences formed from the baseline GCM simulation were much
different than those derived from the historic observations.
Selection of the variables to use in the weather state classification is
problematic. Wilson, et al. (1991) classified weather states using sea level
pressure and 850 mb temperature. However, if this scheme is used with an
alternative, warmer climate, the temperature change dominates the classification,
resulting in a major change in the stochastic structure of the weather class
sequence that may not be physically realistic. Although this problem is resolved
by use of variables, such as sea level pressure, that more directly reflect large-area
circulation patterns, elimination of temperature from consideration as a classifying
variable is troublingly arbitrary. A related problem is the effect of the strength of
14 D. P. LETTENMAIER

the linkage between the weather states and the local variables. In a sense, the
problem is analogous to multiple regression. If the regression is weak, i.e., it
doesn't explain much of the variance in the dependent variable (e.g., local
precipitation) and changes in the independent variables (e.g., weather states)
won't be evidenced in predictions of the local variable. Therefore, one might
erroneously conclude that changes in, for instance, precipitation would be small,
merely because of the absence of strong linkages between the large-scale and local
conditions (see, for example, Zorita, et al., 1993).
Application of all of the models for alternative climate simulation requires that
certain assumptions be made about what aspect of the model structure will be
preserved under an alternative climate. All of the models have parameters that
link the large-area weather states with the probability of occurrence, or amount
of, local precipitation. For instance, in the model of Wilson, et al. (1992) there
are parameters that control the probability of precipitation for each combination
of weather state and the precipitation state at the higher order stations. In the
model of Bardossy and Plate (1991) there is a Markov parameter that describes
the persistence of precipitation occurrences given the weather state. These
parameters, once estimated using historical data, must then be presumed to hold
under a different sequence of weather states corresponding, for instance, to a GCM
simulation. Likewise, many of the models (e.g., Bardossy and Plate, 1992;
Hughes, 1993) have spatial covariances that are conditioned on the weather state.
The historical values of these parameters likewise must be assumed to hold under
an alternative climate.
Essentially, the assumption required for application of the models to
alternative climate simulation is that all of the nonstationarity is accounted for by
the weather classes. One opportunity that has not been exploited is use of
historical data to validate this assumption. For instance, Bardossy and Caspary
(1990) have demonstrated that long-term changes have occurred in the
probabilities of some European weather states. It should be possible, by
partitioning the historical record, to determine whether conditional simulations
properly preserve associated shifts in local precipitation, such as wet and dry spell
lengths, and precipitation amounts.
Another complication in application of these models to alternative climate
simulation is comparability of the GCM predictions with the historic observations.
For instance, Hay, et al. (1991) used a weather classification scheme based on
surface wind direction and cloud cover. The resulting )'Veather classes were shown
to be well related to precipitation at a set of stations in the Delaware River basin.
Unfortunately, however, GCM predictions of cloud cover and wind direction for
current climate are often quite biased as compared to historic observations, and
these biases will be reflected in the stochastic structure of the weather class
sequence.
Finally, most of the models have been limited to simulation of local
precipitation, although other variables, such as temperature, humidity, and wind
are usually required for hydrological simulations. Hughes, et al. (1993) developed,
along with the daily precipitation model described earlier, a model of the daily
mean temperature and temperature range. They conditioned these variables on
the present and previous days' rain state. For simulation of a CO 2-doubled
scenario, they incremented the conditional mean temperatures by the aifference
between the 1 x CO 2 and 2 x CO 2 850 mb temperature. Bogardi, et al. (1993a;b)
coupled a weather state-driven precipitation model similar to that of Bardossy and
Plate (1992) with a model of daily temperature conditioned on weather state, and,
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 15

via nonparametric regression, the 500 mb pressure height. The 500 mb pressure
height was found to provide a reasonable index to within-year surface temperature
variations for several stations in Nebraska. Both of these models have
disadvantages. The method used by Hughes, et al. (1993) to infer temperature
under CO 2 doubled conditions makes an arbitrary assumption that the change in
the mean station temperature would be the same as the regional mean
temperature change at 850 mb. The regression-based approach of Bogardi, et al.
(1993a) essentially assumes that the transfer function relating within-year
variations in the 500 mb pressure height to station temperature variations applies
to differences between climate scenarios as well. That this may not be realistic is
suggested by the fact that the simulated changes in winter station temperatures
for CO 2 doubling are much larger than those in summer, even though most GCMs
simulate large summer surface air temperature changes in the Great Plains.

CONCLUSIONS
The coupling of weather state classification procedures, either explicitly or
implicitly, with stochastic precipitation generation schemese is a promising
approach for transferring large-area climate model simulations to the local scale.
Most of the work reported to date has focused on the simulation of daily
precipitation, conditioned in various ways on weather classes extracted from large-
area atmospheric features. The approach has been shown to perform adequately
in most of the studies, although there remain questions as to how best to
determine the weather states. Further, no useful means has yet been proposed to
determine the strength of the relationship between large-area weather classes and
local precipitation, and to insure that weak relationships do not result in spurious
downward biases in inferred changes in local precipitation at the local level. This
is an important concern, since at least one of the studies reviewed (Zorita, et al.,
1993) found conditions under which weather classes well-related to local
precipitation could not be identified.
There have been relatively few demonstrated applications of these procedures
for climate effects interpretations. One of the major difficulties is accounting for
biases in the GCM present climate, or "base" runs. In addition, few of the models
reviewed presently simulate variables other than precipitation needed for
hydrological studies. Temperature simulation is especially important for many
hydrological modeling applications, but methods of preserving stochastic
consistency between local and large-scale simulations are presently lacking.

ACKNOWLEDGMENTS
The assistance of James P. Hughes and Larry L. Wilson in assembling the review
materials is greatly appreciated.

REFERENCES
Bardossy, A., and H. J. Caspary (1990) "Detection of climate change in Europe by
analyzing European circulation patterns from 1881 to 1989", Theor. Appl.
Climatol., 42(3), 155-167.
Bardossy, A., and E. J. Plate (1991) "Modeling daily rainfall using a semi-Markov
16 D. P. LETTENMAIER

representation of circulation pattern occurrence", J. Hydrol., 122(1-4), 33-47.


Bcirdossy, A., and E. J. Plate (1992) "Space-time model for daily rainfall using
atmospheric circulation patterns", Water Resour. Res., 28(5), 1247-1259.
Baur, F., P. Hess, and H. Nagel (1944) "Kalender der Grosswetterlagen Europas
1881-1939", Bad Homburg, 35.
Bogardi, 1., 1. Matyasovszky, A. Bardossy, and L. Duckstein (June 1993a)
"Estimation of local climatic factors under climate change, Part 1: Methodology",
in Proceedings, NATO Advanced Study Institute on Engineering Risk and
Reliability in a Changing Physical Environment, Deauville, France.
Bogardi, 1., I. Matyasovszky, A. Bardossy, and L. Duckstein (June 1993b)
"Estimation of local climatic factors under climate change, Part 2: Application",
in Proceedings, NATO Advanced Study Institute on Engineering Risk and
Reliability in a Changing Physical Environment, Deauville, France.
Breiman, L., J.H. Friedman, R.A. Olsen, and J.C. Stone (1984) Classification and
regression trees, Wadsworth, Monterey.
Cox, D.R., and P.A.W. Lewis (1978) The Statistical Analysis of Series of Events,
Metheun, London.
Foufoula-Georgiou, E. (1985) "Discrete-time point process models for daily
rainfall", Water Resources Technical Report No. 93, Univ. of Washington, Seattle.
Foufoula-Georgiou, E., and D. P. Lettenmaier (1987) "A Markov renewal model
for rainfall occurrences", Water Resour. Res., 23(5), 875-884.
Gabriel, K. R., and J. Neumann (1957) "On a distribution of weather cycles by
lengths", Q. J. R. Meteorol. Soc., 83, 375-380.
Gabriel, K. R., and J. Neumann (1962) "A Markov chain model for daily rainfall
occurrences at Tel Aviv", Q. J. R. Meteorol. Soc., 88, 90-95.
Georgakakos, K. P., and M. L. Kavvas (1987) "Precipitation analysis, modeling,
and prediction in hydrology", Rev. Geophys., 25(2), 163-178.
Giorgi, F., and L.O. Mearns (1991) "Approaches to the simulation of regional
climate change: A review", Rev. Geophys., 29(2), 191-216.
Grotch, S.L. (April 1988) "Regional intercomparisons of general circulation model
predictions and historical climate data", U.S. Department of Energy Report
DOEjNBB-0084, Atmospheric and Geophysical Sciences Group, Lawrence
Livermore National Laboratory, Livermore, CA.
Haan, T.N., D.M. Allen, and J.O. Street (1976) "A Markov chain model of daily
rainfall", Water Resour. Res., 12(3),443-449.
Hay, L. E., G. J. McCabe, Jr., D. M. Wolock, and M. A. Ayers (1991)
"Simulation of precipitation by weather type analysis", Water Resour. Res., 27(4),
493-50l.
Hughes, J.P., D.P. Lettenmaier, and P. Guttorp (1993) "A stochastic approach for
assessing the effects of changes in regional circulation patterns on local
precipitation" , in press, Water Res. Res.
Hughes, J.P. (1993) "A class of stochastic models ·for relating synoptic
atmospheric patterns to local hydrologic phenomena", Ph.D. Dissertation,
Department of Statistics, University of Washington.
Kalkstein, L.S., G. Tan, and J.A. Skindlov (1987) "An evaluation of three
clustering procedures for use in synoptic climatological classification", Journal of
Climate and Applied Meteorology 26(6), 717-730.
Kavvas, M. L., and J. W. Delleur (1981) "A stochastic cluster model of daily
rainfall sequences", Water Resour. Res., 17(4), 1151-1160.
Khanal, N.N., and R.L. Hamrick (1974) "A stochastic model for daily rainfall data
synthesis", Proceedings, Symposium on Statistical Hydrology, Tucson, AZ, U.S.
STOCHASTIC MODELING IN CLIMATE CHANGE IMPACT ASSESSMENT 17

Dept. of Agric. Publication No. 1275, 197-210.


Lamb, H.H. (1972) "British Isles weather types and a register of daily sequence of
circulation patterns, 1861-1971", Geophysics Memorandum No. 110, Meteorology
Office, London.
LeCam, L. (1961) "A stochastic description of precipitation", paper presented at
the 4th Berkeley Symposium on Mathematics, Statistics, and Probability,
University of California, Berkeley, CA.
McCabe, G. J., Jr. (1990) "A conceptual weather-type classification procedure for
the Philadelphia, Pennsyivania, area", Water-Resources Investigations Report 89-
4183, U.S. Geological Survey, West Trenton, NJ.
Pittock, A.B. and M.J. Salinger (1991) "Southern hemisphere climate scenarios",
Climate Change, 18, 205-222.
Rodriguez-Iturbe, I. B. Febres de Power, and J. B. Valdes (1987) "Rectangular
pulses point process models for rainfall: Analysis of empirical data", J. Geophys.
Res., 92(D8), 9645-9656.
Smith, J. A., and A. F. Karr (1985) "Statistical inference for point process models
ofrainfall", Water Resour. Res., 21(1), 73-80.
Stern, R.D., and R. Coe (1984) "A model fitting analysis of daily rainfall data", J.
R. Statist. Soc. A, 147, 1-34.
von Storch, H., E. Zorita, and U. Cubasch (1993) "Downscaling of climate change
estimates to regional scales: Application to winter rainfall in the Iberian
Peninsula", in press, Journal of Climate.
Weiss, L.L. (1964) "Sequences of wet and dry days described by a Markov chain
model", Monthly Weather Review, 92, 169-176.
Wigley, T.M.L., P.D. Jones, K.R. Briffa, and G. Smith (1990) "Obtaining sub-
grid-scale information from coarse-resolution general circulation model output", J.
Geophys. Res., 95, 1943-1953.
Wilson, L. L., D. P. Lettenmaier, and E. F. Wood (1991) "Simulation of daily
precipitation in the Pacific Northwest using a weather classification scheme", in
Land Surface-Atmosphere Interactions for Climate Modeling: Observations,
Models, and Analysis, E. F. Wood, ed., Surv. Geophys., 12(1-3), 127-142, Kluwer,
Dordrecht, The Netherlands.
Wilson, L.L., D.P. Lettenmaier, and E. Skyllingstad (1992) "A hierarchical
stochastic model of large-scale atmospheric circulation patterns and multiple
station daily precipitation", J. Geophys. Res., 97(D3), 2791-2809.
Wiser, E. H. (1965) "Modified Markov probability models of sequences of
precipitation events", Mon. Weath. Rev., 93(8~, 511-516.
Woolhiser, D.A., and G.G.S. Pegram (1979) 'Maximum likelihood estimation of
Fourier coefficients to describe seasonal variations of parameters in stochastic
daily precipitation models", J. Appl. Meteorol., 8(1), 34-42.
Zorita, E., J.P. Hughes, D.P. Lettenmaier, and H. von Storch (1993) "Stochastic
characterization of regional circulation patterns for climate model diagnosis and
estimation of local precipitation" , in review, J. Climate.
Zucchini, W., and P. Guttorp (1991) "A hidden Markov model for space-time
precipitation", Water Resour. Res., 27(8), 1917-1923.
KNOWLEDGE BASED CLASSIFICATION OF CIRCULATION PAT-
TERNS FOR STOCHASTIC PRECIPITATION MODELING

A. BA.RDOSSYI, H. MUSTER\ L. DUCKSTEIN 2 and I. BOGARDP


lInstitut fur Hydrologie und Wasserwirtschaft, University of Karlsruhe, Kaiserstr. 12,76128 Karl-
sruhe, Germany
2Systems and Industrial Engineering Department, University of Arizona, Tucson, Arizona, 85721
USA
3Department of Civil Engineering, W348, Nebraska Hall, University of Nebraska, Lincoln, NE 68588-
0531, USA

A fuzzy rule-based methodology is applied to the problem of classifying daily at-


mospheric circulation patterns (CP). The subjective classification of European CP's
given in Hess and Brezowsky (1969) provides a basis for constructing the rules. The
purpose of the approach is to produce a classification that can be used to simulate
local precipitation on the basis of the 700 hPa pressure field rather than reproduce
the existing subjective classification.
For comparison, an artificial neural network is applied to the same problem. 'The
performance of the fuzzy classification as measured by any of three precipitation-
related indices is in general better to that of the neural net. The performance is
about equal to that of Hess and Brezowsky. The fuzzy rule-based approach thus has
potential to be applicable to the classification of Global Circulation Model (GCM)
produced daily CP for the purpose of predicting the effect of climate change on
space-time precipitation over areas where no classification exists.

INTRODUCTION
The main purpose of this paper is to develop a methodology based on fuzzy rules
(FR) to reproduce the precipitation generation features of an existing subjective
classification of daily atmospheric circulation patterns (CPs) over Europe. This is a
novel approach both from the methodological and application viewpoint: FR have
been used extensively in control problems but not in modeling. One of the first
applications of fuzzy rule-based modeling in hydrology (groundwater) is found in
Bardossy and Disse (1992) but so far no surface hydrology or hydrometeorologic
examples could be found in the open literature.
Like FR, artificial neural networks (NN) provide a non-linear numerical mapping
of inputs into outputs. Neither the two approaches need a mathematical formulation
of how the output depends on the input. Unlike NN, FR needs the formulation of rules
which may be difficult even for experts but does not necessarily require a training
data set. On the other hand, unlike FR, NN may be applied to ill defined problems
but after training NN is a pure unstructerd black box and knowledge gained by the
19
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 19-32.
© 1994 Kluwer Academic Publishers.
20 A. BARDOSSY ET AL.

training algorithm cannot be decoded. Because the two approaches complement each
other, it seems interesting to compare their performance.
The present study belongs to a long-term collaborative project directed at devel-
oping a new methodology for the disaggregation of global hydrometeorogical input
in regional or local hydrological systems and the prediction of climate change effects
on these systems. In the first part of this approach, the CP's are classified, so that
their occurrence, persistence and transition probabilities may be investigated. The
CP's may be based on observation data or on the output of GCM's, in a changed
climate (for example 2 x CO 2 ) scenario case. The reason for selecting large scale
CP as input into local or regional hydrologic systems is that long records of reliable
pressure measurements are available. Furthermore the GCM's are based on weather
forcasting models which can predict the future pressure condition with much higher
accuracy than other parameters. Results obtained in West Germany, Nebraska and
Central Arizona indicate the existence of a strong relationship between daily CP
types and local hydrologic observations, such as precipitation, temperature, wind or
floods (Bardossy and Caspary, 1990; Bardossy and Plate,1991; Bogardi et al., 1992;
Matyasovszky et al., 1992; Duckstein et al., 1993). This relationship was essentially
described under the form of, say, daily precipitation at a given site conditioned upon
the event that the type of CP over the region was i = 1,···, I. A fundamental ele-
ment of this approach is thus a phenomenologically valid classification of the CP that
can generate a simulated series of precipitation events with high information content.
On the other hand, CPs are elements of complex dynamic large- scale atmospheric
phenomena, so that any classification scheme is fraught with uncertainty and impre-
cision (Yarnal and White, 1987, 1988; Bardossy and Caspary, 1990). Here we apply
a FR- and a NN- based approach to account explicitly for the imprecision in the sub-
jective classification of European CPs by Baur et al. (1944). These two approaches
should be able to reproduce the classification quite well with respect to the prediction
of climate change effects on local hydrological systems.
The paper is organized as follows: in the next section, background information
on existing CP classification schemes, is presented. The following section provides
a description of the fuzzy rule- based approach to modeling geophysical phenomena
and describes briefly the NN approach. Then the two approaches are applied to a
European case study and the results are evaluated. The final section consists of a
discussion and conclusions.

DEFINITION OF CmCULATION PATTERNS AND CLASSIFICATION


APPROACHES
Definition of circulation pattern
Following Baur et al. (1944) daily atmospheric circulation patterns consisting of
continent-size pressure contours (at sea level, 700 hPa or 500 hPa) are described only
in terms of three large scale features, namely:

1. The location of sea level semipermanent pressure centers, such as Azores high
CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING 21

or Iceland low.

2. The position and paths of frontal zones.

3. The existence of cyclonic and anticyclonic circulation types.

Furthermore, a distinction between seasons may be in order.

Classifications approaches
CP classification techniques may be grouped into subjective and objective procedures.
The former group has been in existence for over a century in Europe, as described in
Baur et al (1944), and about half a century in the USA. The latter type has emerged
as a result of the development of high-speed computers and availability of statistical
software packages, mostly principal component analysis and clustering.
Subjective or manual techniques, such as the one used as a case study herein
and described below, depend on the hydrometeorologist's interpretation of the major
features and persistence of a given pattern.
Objective techniques, in contrast, are based on statistical approaches such as
hierarchical methods (Johnson, 1967), k-means methods (Macqueen, 1967), and cor-
relation methods (Bradley et al., 1982; Yarnal, 1984). For example, in our Eastern
Nebraska study, 9 types of CPs have been identified after having performed a prin-
cipal components analysis coupled with the k-means method (Matyasovszky et al.,
1992).
Between these two groups, FR and NN are knowledge based classifications. They
are subjective and phenomenological. The subjectivity is taken into account either
explicitly by rules or implicitly by a training phase that uses expert knowledge. They
are objective because after model construction, same inputs are always classified in
the same way .

Subjective classification for European conditions


Using a record of 109 years, Baur et al. (1944) have developed a subjective classifi-
cation of CPs for European conditions; later Hess and Brezowsky (1969) have used
this classification to construct a catalogue of European CPs from 1881 to 1966. On
the basis of Baur et al. (1944), Hess and Brezowsky (1969) recognize 3 groups of CPs
divided into 10 major types, 29 subtypes and one additional subtype for the undeter-
mined cases. The 10 major types are divided into subtypes primarily by adding the
letter a or z at the end of the abbreviation of the major type to denote anticyclonic
(a) or cyclonic (z) circulation.
The three groups of CP are zonal, half-meridional and meridional. Hess and
Brezowsky (1969) describe the characteristics of each circulation group in some detail.
The zonal circulation group is described as follows: broad areas of high sea level
pressure cover subtropical and lower middle latitudes. Low sea level pressure occurs
in subarctic and higher middle latitudes. Upper air flow is west to east. Cyclone
tracks run from the eastern North Atlantic Ocean to the European continent. All
22 A. BARDOSSY ET AL.

circulations of the major type "West" (W) are classified as zonal circulations. As
an illustration, Figure 1 shows the 500 hPa contour map for the circulation subtype
"West, cyclonic" (Wz) which persisted for 8 days after November 11, 1987 (Deutscher
Wetterdienst, 1948-1990)

Figure 1: Typical 500 hPA contour map of circulation type Wz.

The half-meridional circulation group corresponds to a near equilibrium between


zonal and meridional components of air flow. Typical examples of half-meridional
circulations are the major types "Northwest" and "Southwest". In comparison with
the major type "West" the anticyclonic pressure centers are shifted northwards to
about 50 o N. The pressure centers are located above the eastern Atlantic Ocean in
the case of "Northwest" (NW) types, over Eastern Europe for the "Southwest" (SW)
types, over Central Europe for the" Central European high" (HM). Due to the varying
circulation components the major subtype "Central European low" (TM) has been
added to the half-meridional circulation types.
The meridional circulation group is characterized by stationary, blocking high
pressure centers at sea level. Due to the locations of the sea level pressure centers
CIRCULATION PAITERNS FOR STOCHASTIC PRECIPITATION MODELING 23

and the resulting main flow directions to Central Europe the major types "North"
(N), "South" (S) and "East" (E) can be distinguished. In addition all trough types
with a north to south axis are classified as meridional circulations. The major types
"Northeast" (NE) and "Southeast" (SE) are also included with the meridional circula-
tion group because the normally coincide with blocking North and Eastern European
highs.
Further illustrations of these three types of CP are found in Bardossy and Cas-
pary (1990). The fuzzy rule-based approach will describe CPs by assigning fuzzy
quantifiers to the normalized value of pressure at grid points or pixels. Linguistic at-
tributes such as "The pressure centers are located above the eastern Atlantic Ocean"
could also have been used, but certain properties of CPs would be very difficult to
expressed even by fuzzy sets.

KNOWLEDGE BASED CLASSIFICATION


Classification data base
Building up the FR classifier at a first step, expert knowledge encoded in the subjec-
tive classification of Hess and Brezowsky (1969) is used. No training data set (given
pressure maps and corresponding subjective circulation pattern) is needed. Applying
the FR classifier at a second step, in order to have a unified basis for the classifica-
tion,the daily observed 700 hPa values have to be normalized. The observed pressure
maps were available from the gridded data set of the National Meteorological Center
(NMC), USA. Let h( Uk, t) be the observed 700 hPa surface at location Uk and time
t, T be the total time horizon, and the temporal mean of k(Uk) be:

(1)

For each day t, the height is normed using the formula:

(2)

This way for each day the 700 hPa surface is mapped on the interval [0,1].
Training the NN at the first step needs a properly defined input/output - data
set. The input data set is encoded as follows: for each "subjectively" defined CP i
the mean and the standard deviation of the corresponding normalized 700 hPa daily
values are calculated:
(3)

(4)

The training of the NN was performed by a sequence of circulation patterns. For


a given subjective circulation pattern i the corresponding 700 hPa surface h( Uk, t)
24 A. BARDOSSY ET AL.

data was obtained using a normal distributed random variable with mean riii( Uk)
and standard deviation Si(Uk). The heights h(Uk,t) are normed by (2) and is used in
this form as input for the NN. The output was an activation of the output neuron i
corresponding to CP i.

Fuzzy rule based classification


To classify CPs by the use of fuzzy rules, each CP type is first described by a set of
rules and then the classification is done by selecting the CP type for which the so
called degree of fulfillment (DOF) is highest and at least given at a certain level. If
this level is not given, transition CP can defined as a combination of the CPs, for
example the two CPs with highest DOF.
Each fuzzy rule corresponding to a circulation pattern of type i consists of a set of
premises Bi,h given in the form of fuzzy numbers with properly choosen membership
functions f..LB;,h' h = 1, ... Hi,
If Bi,1 and Bi,2 ... and Bi,H; then CP is i (5)
Here Hi is the number of premises used to describe type i. As mentioned above,
the premises consist of normalized pressure values at a few selected pixels.
In contrast to ordinary (crisp) rules, fuzzy rules allow partial and simultaneous
fulfillment of rules. This means that instead of the usual case, in which a rule is either
applied or is not applied, a partial applicability becomes possible. For any vector of
premises (all' .. ,aK) the DOF of rule i can be defined as a function of the individual
fulfillment grades expressed by the corresponding value of the membership function:
(6)
Finally the classifier selects the index i with the highest Di value (highest DOF of
the rules) and at least at a given level as the class i.
Four classes v of rules are defined, according to the normalized pressure values:
• very low values, class v = 1
• not high values, class v =2
• not low values, class v = 3
• very high values, class v =4
The combination of the fulfillment grades within a rule class (for example, v:
very high values) is done by a combination of" AND" and" OR" operations. Suppose
Bi,I ... Bi,R correspond to the same class v, then the partial DOF Div corresponding
to this class v is taken as a convex combination of "OR" and "AND" fulfillment
grades, using a properly selected value of 'Yv as described in more detail below. With
o:$ 'Yv :$ 1, the value of Div is calculated as:
Div = 'YvFo (f..LB;) al) ... f..LB;,r (aR)) + (1 - 'Yv)Fa (f..LB;,l (al) ... f..LB;,R (aR)) (7)
CIRCULATION PATTERNS FOR STOCHASTIC PRECIPITATION MODELING 25

with Fo being the" OR" function:

Fo (XI, X2) = Xl + X2 - XIX2, (XI, X2)fR 2 (8)


For R variables, (XI,"" XR-I) Fo is defined recursively as:

Fo (Xl"'" XR) = Fo (Fo(xb"" xR-d, XR) (9)


The" AND" function Fa. is defined as in Bardossy and Disse (1992) as:
R
Fa (XI, .•. ,XR) = II Xr (10)
r=l

Finally the four class values Dib Di2' D i3 , Di4 are combined into the DOF of rule i,
Di as:
4
Di = II Div (11)
v=l

The FR classification is numerically very simple. November 29, 1986, has been
selected to illustrate the methodology. The CP type that persisted November 26-29,
1986 is "BM" with i = 2. The membership functions of the four classes are given as
above.
Normalized pixel values, membership grades for the selected day and rule defining
the CP type BM are shown in Table 1. For example, in the case v = 4, "very high" ,
14 :;:: 0.7, Fo = 1.00 Fa = 0.6375 and D24 = 0.7· 1.00 + 0.3·0.6375 = 0.8912. The
overall fulfillment grade is D2 = 0.3944 which is the maximum value. The DOF of
other CP types is less then 0.3, for Wz is 0 and that of type HM it is found to be
0.1247.
This example shows that even in the case when one pixel does not fulfill the
prescribed pressure level as it is the case of pixel 1 for v = 2, the fuzzy rules can
assign it to the selected case.

Neural Network based classification


NN are mathematical models of the brain activity. Several network architectures have
been developed for different applications (McCord and Illingworth, 1991). We have
used a four-layer feedforward architecture with a "back-propagation"(Rummelhardt
et al., 1986) learning algorithm.
The basic unit of an NN is the neuron. The function of a neuron is described by
the transformation of the input signals to an output signal. Given the output of a
neuron i, OJ, i=1 ... n, the input of a neuron j, inpi> is given by

inpj =L WijOi + OJ (12)


i

In (12), Wij are weights between neurons i and neuron j, choosen properly by a
learning algorithm described below, and OJ is the bias of the neuron j.
26 A. BARDOSSY ET AL.

Table 1: Numerical example of calculation of DOF for CP type BM that occurred on


Nov 29 , 1986
Rule Class Pixel Normalized Membership Function DOF
Long. Lat. Pressure 9 Value J.L D2v
v=l W25° N65° 0.13 0.68
Very low W15° N65° 0.00 1.00 0.8265
WOo N70° 0.15 0.62
W15° N35° 0.63 0.00
v=2 W25° N75° 0.12 0.84
Medium low WOo N80° 0.18 0.96 0.6983
E25° N75° 0.31 0.63
W20° N40° 0.62 0.40
v=3 W15° N45° 0.84 0.92
Medium high WOo N50° 0.96 0.68 0.7473
ElO° N50° 0.95 0.70
E15° N45° 0.85 0.90
v=4 W5° N55° 0.94 0.85
Very high E5° N55° 1.00 1.00 0.8912
E15° N55° 0.90 0.75

The output of neuron j, OJ, equals the activation of neuron j, aj, which is given
by applying a sigmoid transformation function on inpj:
1

3
= --:--:---""7"
1 + exp( -inpj)
(13)

The reason for employing the sigmoid function is that it is differentiable which is an
essential condition for back-propagation.
The NN utilized herein consists of a set of structured neurons with three different
types of layers:
• The input layer: these are the neurons which are activated by the input signal
comming from outside.
• Two hidden layers: these are the neurons which are supposed to perform the
transformation of the input to an output.

• The output layer: these are the neurons which provide signals for the outside.
Each neuron of a layer is conneced to each neuron of the adjacent layer. This means
that a signal is sent to the next layer (feedforward). Figure 2 shows the four layers
of the NN.
The interconnecting weights Wij have to be determined with the help of a back-
propagation supervised learning procedure. For this purpose a training set consisting
CIRCULATION PAITERNS FOR STOCHASTIC PRECIPITATION MODELING 27

Input layer Hidden layers Output layer

Figure 2: Structure of the Neural Network.

of measured input data and corresponding desired output data is used. Weights
which minimize the squared difference between the known output of the training set
and the calculated output of the NN have to be found stepwize from the output layer
to the input layer by a gradient search method.

APPLICATION
CP classification by fuzzy rules and Neural Networks
The above described two procedures were used to classify the CPs over Europe. As
stated earlier the basis of the classification is the subjective Baur classes given in
Hess and Brezowsky (1969). For each day the measured 700 hPa surface is taken at
51 selected points Uk. For both procedures, the data base for model building and
validation respectively was presented above.
FR was defined encoding the expert knowledge of Hess and Brezowsky (1969) by
28 A. BARDOSSY ET AL.

fuzzy rules as follows: a few (2 to 4) points are selected for each class v = 1"",4
(very high to very low). The "Iv values are selected depending on the class v with
"11,4 = 0.7 and "12,3 = 0.1. Classes v = 1,4 means higher uncertainty - thus a convex
combination with a higher "OR" component an a value of "Iv closer to 1 is needed.
Classes v = 2,3 are more restrictive - thus a convex combination with a lower "OR"
component an a value of "Iv closer to 0 is needed. The proper selection of "I was done
by trial and error; results turned out not to be very sensitive to the choice of 'Y - but
it is necessary to use some mix of AND and OR rules because a pure AND rule may
be too weak (one zero element makes the DOF equal to zero) and an OR rule too
strong (DOF too large).
The architecture of the NN used consists of 51 neurons corresponding to 51 data
sites, the output layer consists of 29 neurons, corresponding to the 29 subjective
classified CP's by Hess and Brezowsky (1969). The first hidden layer consists of 45
neurons, the second hidden layer consists of 40 neurons. Concerning the proedure for
building the NN architecture (number of hidden layers, number of neurons), there
exists some heuristic rules but nevertheless the answer has been found by trial and
error. With increasing number of neurons the estimation error in the training phase
decreases but the model becomes overdetermined.
Both classification schemes were applied to a measured sequence of daily 700 hPa
elevations for the 10 year period 1977 to 1986. The methods were unable to reproduce
exactly the subjective series, but the stated goal of the classification was to develop
a semi-objective classification method, which resemble the subjective one and whose
quality was measured by the difference between generated and measured precipitation
values, as in the next section.

Use for precipitation modeling


Parameters of the precipitation model (Bardossy and Plate, 1992) are estimated so as
to obtain the conditional probability of precipitation and the mean daily precipitation
at a site given the CPo To measure the quality of a classification for precipitation
estimation or generation, three information measures are introduced as follows.
The first one measures the squared difference between the conditional probability
PAt of precipition at day t, given that the CP is At and the unconditional probability
P of precipitation at a given site:

II = (r 1 "
_ 1 L.J(PAt -
2 1
p) )2' (14)
t

with
(15)
Thus the maximum value of II depends on p. In the present application, the possible
maximum of II is about 0.5.
CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING 29

The second information measure describes the squared difference between condi-
tional and unconditional mean precipitation.

(16)

with
(17)
where m is the unconditional mean daily precipitation amount at a given site and
rnA, is the mean daily precipitation conditioned on the CP being of type At.
The third information measure also depends on the mean precipitation and mea-
sures the relative deviation from the unconditional mean:

13 = -1 "l -1I
mA ,
L..... - (18)
T t m
with
OS h S 1 (19)

Table 2: Average information content of three CP classification schemes (summer).


Classification Method II 12 13
Hess-Brezowsky 0.243 1.996 0.609
Fuzzy rules 0.211 1.778 0.521
Neural Network 0.167 1.580 0.441

Table 3: Average information content of three CP classification schemes (winter).


Classification Method II 12 13
Hess-Brezowsky 0.265 2.559 0.679
Fuzzy rules 0.217 2.251 0.629
Neural Network 0.193 2.353 0.586

In order to compare the subjective and knowledge based classifications, the mean
information content of 25 stations in the Ruhr catchment was calculated for the
subjective classification of Hess and Brezowsky (1969) and then FR and NN classi-
fications. Tables 2 and 3 show the results in the summer and the winter seasons.
From these tables it is clear that the subjective classification delivers the best results.
However any of the three measures of information content, 11 ,12 ,13 does vary much
between the three approaches, the difference between the two seasons being larger
than that between the approaches.
30 A. BARDOSSY ET AL.

For the fuzzy rule based classification the information loss compared to the subjec-
tive classification is less then 20 %. The NN classification does not perform as well as
the FR. Given the simplicity of the fuzzy rule-based approach, we would recommend
it fore future use.

DISCUSSION AND CONCLUSIONS


Why is it important to develop a fuzzy rule based classification if the subjective
approach is slightly better than the fuzzy rule-based one? This question may be
answered as follows:

• We do not have a subjective classification for GCM-produced CPs and thus


intend to use the FR (or trained NN) classification to obtain a catalog of daily
CP types corresponding to the 1 x CO 2 and 2 x CO 2 cases. The use of stochastic
linkage between daily CP types and daily local climatic factors makes it possible
to predict the effect of climatic change on local/regional precipitation (Bartholy
et al., 1993) .
• There is no catalog of subjectively classified daily CP over the USA or most of
the regions of the world. Using FR, it is possible to obtain such a catalog of
CPs for any large-scale area. It has been often argued that there is no physical
basis for types obtained by objective classification schemes such as principal
component analysis and cluster analysis. The fuzzy rule-based classification
has the capability of using the identification of the main weather patterns by
meteorological experts and may thus be constructed on a phenomenological
basis.

In contrast to NN, the FR classifier has been built encoding expert knowledge
without using a time series of subjectively classified daily CP. Thus FR (but not NN)
can be applied in regions where no time series of CP exists but both approaches need
expert knowledge of CP for these regions.
For further research, the performance of an objective phenomenological classifier,
taking into account explicit local hydrological parameters (precipitation, temperature,
winds, floods) seems to be of interest. For this the FR classifier may has to be
modified using some features of NN. The two approaches complement each other in a
very evident way as demonstrated recently in research each by Takagi and Hayashi,
(1991), Kosko, (1992) or Goodman et al. (1992).
To conclude, the fuzzy rule-based methodology appears to perform better than
neural network and almost as well as the Baur-type subjective classification. It
appears to be a usable approach to construct a time series of classification where
none is available.
CIRCULATION PATIERNS FOR STOCHASTIC PRECIPITATION MODELING 31

ACKNOWLEDGMENTS
Research presented in this paper has been partially supported by the US National
Science Foundation under grants #BCS 9016462/9016556, EAR 9217818/9205717
and a grant from the German Science Foundation (DFG).

REFERENCES
Bardossy A. and Caspary, H. (1990) "Detection of climate change in Europe by an-
alyzing European Atmospheric Circulation Patterns from 1881 to 1989", The-
oretical and Applied Climatology 42, 155-167.

Bardossy, A. and Disse, M. (1993) "Fuzzy rule-based models for infiltration", Water
Resources Research 29, 2, 373-382.

Bardossy, A. and Plate, E.J. (1991) "Modeling daily rainfall using a semi-Markov
representation of circulation patterns" , Journal of Hydrology 122, 33-47.

Bardossy, A. and Plate, E.J. (1992) "Space-time model for daily rainfall using at-
mospheric circulation patterns", Water Resources Research 28, 5,1247-1260.

Bartholy, J., Bogardi, I., Matyasovszky, I. and Bardossy, A. (1993) "Prediction of


daily precipitation reflecting climate change", Session HS4/1 European Geo-
physical Society, XVIII General Assembly, Wiesbaden, Germany.

Baur, F., Hess P. and Nagel, H. (1944) Kalender der GroBwetterlagen Europas 1881-
1939, Bad Hornburg, FRG.

Bradley, R. S., Barry, R. G. and G. Kiladis (1982) Climatic fluctuations of the


Western United States during the period of instrumental Records, Final Report
to the National Science Foundation, University of Massachusetts.

Bogardi, I., Duckstein, L., Mat yasovszky, I. and Bardossy, A. (1992) Estimating
space-time local hydrological quantities under climate change, Proceedings,
Fifth International Conference on Statistical Climatology, Toronto.

Deutscher Wetterdienst (1948-1990) Die Groawetterlagen Europas, Amtsblatt des


Deutschen Wetterdienstes, 1-33, Deutscher Wetterdienst - Zentralamt, Offen-
bach am Main.

Duckstein, L., Bardossy, A. and Bogardi, I. (1993) "Linkage between the occurrence
of daily atmospheric circulation patterns and floods: an Arizona case study" ,
Journal of Hydrology, to appear.

Goodman, R.M., Higgins, C.M., Miller, J.W. (1992) "Rule-based neural networks
for classification and probability estimation", Neural Computation 4, 781-804.
32 A. BARDOSSY ET AL.

Hess, P. and Brezowsky, H. (1969) Katalog der Grosswetterlagen Europas, Berichte


des Deuschen Wetterdienstes Nr. 113, Bd. 15, 2. neu bearbeitete and ergiinzte
Aufi., Offenbach a. Main, Selbstverlag des Deutschen Wetterdienstes.

Johnson, S. C. (1967) "Hierarchical clustering schemes", Psychometrika 32, 261-274.

Kosko, B. (1992) Neural Networks and Fuzzy Systems, Prentice-Hall International,


London.

Matyasovszky, 1., Bogardi, 1., Bardossy, A. and Duckstein, L. (1992) Comparing


historical and global circulation model produced atmospherical circulation pat-
terns, working paper 93-2, SIE, Bldg 20, University of Arizona, Tucson, AZ
85721.

MacQueen, J. (1967) "Some methods for classification and analysis of multivariate


observations", Fifth Berkeley Symposium on Mathematics 1, 281-298.

McCord Nelson, M. and Illingworth, W.T. (1991) A practical guide to neural nets,
Addison-Wesley, Reading.

Rummelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) "Learning representa-
tions by back-propagation errors", Nature 323, 9, 533-536.

Takagi, H. and Hayashi, 1. (1991) "NN-Driven Fuzzy Reasoning", Int. Jour. of


Approximate Reasoning 5, 191-212.

Yarnal, B. (1984) "A procedure for the classification of synoptic weather maps from
gridded atmospheric pressure surface data", Computers and Geosciences 10,
397-410.

Yarnal, B. and White, D. (1987) "Subjectivity in a computer - assisted synoptic


climatology I: classification results", J. Climatol. 7,119-128.

Yarnal, B., White, D. and Leathers, D. J. (1988) "Subjectivity in a computer-


assisted synoptic climatology II: relationships to surface climate", J. Climatol.
8,227-239.
GREY THEORY APPROACH TO QUANTIFYING THE RISKS
ASSOCIATED WITH GENERAL CIRCULATION MODELS

IAtmospheric Environment Service, 4905 Dufferin Street,


Downsview, Ontario. M3H 5T4, Canada
2Dept. of Civil Engineering, McMaster University, Hamilton, Ontario L8S 4L7

Assessing the risk to water resources facilities under climate change is difficult
because the uncertainty associated with 2xC0 2 climate scenarios cannot be readily
quantified. Grey systems theory is used to develop a grey prediction model (GPM)
that provides an interval of uncertainty. The GPM is used to extrapolate a
numerical interval around the decadal averages of precipitation and temperature
tltrough the year 2010 for a site in Northwestern Canada. The extrapolation is
calibrated on 20 years of data and validated against observations for the 1980's.
The values in the 1990's correspond to observed trends in the area. The
temperature and precipitation values are used to develop a grey water balance
model. The grey' intervals for annual potential evapotransipiration, deficit and
surplus are used to evaluate the reliability of a transient and three equilibrium
climate change scenarios scenarios. The grey intervals are not coincident with the
transient output, but they are trending towards the equilibrium scenario values.
This suggests that this particular transient scenario is inadequate for risk
assessment, and although the equilibrium scenarios appear to be within the grey
interval, they represent years beyond a reliable GPM extrapolation.

INTRODUCTION

The possibility of global climate change, and the subsequent changes to climate at
the local level, may alter the viability of new and existing water resource
structures. One decision-making tool that has been gaining acceptance in water
resource management and hydrology is risk assessment - a series of techniques that
are used to evaluate decisions when the future cannot be forecast with .certainty.
Risk is defined as a combination of the probability of an event occurrence and the
consequences associated with that event. It is partly a function of the quality of
information used to define a climatological event, and the uncertainty in the
observed or predicted data is strongly linked to the level of risk in a decision.
Generally, the larger the uncertainty, the higher the risk in making a decision.
33
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 33-46.
© 1994 Kluwer Academic Publishers.
34 B. BASS ET AL.

General circulation models (GeMs) have been used in developing


scenarios because they provide a physically-based dynamic simulation of the
atmosphere. GeMs can be used to simulate another climate in its equilibrium state
(equilibrium model) or they can be used to simulate the transition leading up to a
new climate (transient model). The scenarios of a future climate equilibrium under
a doubling of atmospheric levels of carbon dioxide is an example of the former.
The output of the transient model is provided as a decadal average from the
present up to a decade when the climate is expected to achieve a new equilibrium.
With either type of simulation, many GeM variables, in particular precipitation,
are not easily verified. The GeM output is not sufficient for estimating future
variability, therefore it is not amenable for standard approaches of risk estimation.
Grey theory can be used to estimate an interval within which a variable is expected
to fall. This type of evaluation has been successfully applied in risk estimation in
agriculture (Bass et al., 1992).
Grey theory was first applied by Deng (1984) to deal with uncertainty in
systems analysis. In a grey theory approach, all components of a system are
divided into three categories: white (certain), grey (uncertain) and black
(unknown). Unlike a probability or fuzzy distributions, a grey set only has an
upper and lower limit which can approximate uncertainty when the available data
are insufficient for standard stochastic approaches. In dealing with climate
scenarios, grey systems approach could incorporate the data into a grey decision-
making model or a grey model could be used to extrapolate an interval within
which a variable is expected to fall - a grey prediction model (GPM). Thispaper
is a preliminary exploration of latter approach as a means of evaluating climate
scenarios derived from general circulation models (GeMs) for decision-making and
risk assessment.
A GPM is developed from an observed series of temperature and
precipitation (1965-1985). The GPM is used to extrapolate the series to 2013, and
the resulting grey interval is validated against observations and compared to the
Goddard Institute for Space Studies Transient GeM (GISST). During the 1970's
and 1980's, the GISST decadal average precipitation exceeds the observed averages
and the temperature change during the 1980's and 1990's is lower than what has
been observed. The grey interval is used to adjust the GISST scenario without
imputing the variability from the observed data onto the GeM scenario. The grey
decadal monthly averages are input into a water budget accounting model. The
grey water budget interval is interpreted as a range within which the surplus,
deficit and the potential evapotranspiration are expected to fall. The results are
compared with the water budgets derived from the GISST, the observed decadal
monthly averages, and three 2XC02 equilibrium climate change scenarios based
on the GISS, Geophysical Fluid Dynamic Laboratory 1987 (GFDL87) and the
Oregon State University (OSU) GeMs (Cohen, 1989).
Although the use of grey theory in this context appears to be similar to time
series and other stochastic approaches, it can be applied without the required
assumptions of these other methods. It is also appropriate in examining data that
lacks significant autocorrelation, and two different data sources can be incorporated
into one series. The GPM based climate change scenario does not provide a
GREY THEORY APPROACH TO QUANTIFYING RISKS 35

probability distribution, but the grey interval may be more appropriate since it is
difficult to make assumptions regarding the future variability of temperature and
precipitation. The analysis illustrates that grey theory may be an effective means
of using transient GeM scenarios, and evaluating the 2XC02 scenarios of other
GeMs. The analysis also suggest that the water balance may provide a better
means of detecting climate change although this is only a preliminary result and
bears further investigation.

GREY THEORY MODELLING

Grey theory is a method for estimating and incorporating uncertainty when


the data are too sparse for the use of standard stochastic approaches (Deng, 1984).
It has proven to be an appropriate method in systems analysis in linear
programming models (Huang and Moore, 1993). Grey theory creates a model of
the data from a minimum and maximum value. Grey theory can also be used in a
prediction mode to estimate and extrapolate a series of maximum and minium
values from a series of observations. In this approach the series is split into two
series of high and low values, and two series are generated thus yielding a
dynamic grey interval.

Let us consider a data set X(O) with n elements corresponding to n time


periods:

1f.O) '" { x(O) Ii'" 1, 2, ... , n} (1)

where x(O)(i) is the ith element corresponding to period i. The problem under
consideration, for the grey prediction model (GPM), is the prediction of x(O)(i) for i
> n when standard statistical approaches are not applicable. A GPM is introduced
which can effectively approximate the uncertainty existing in X(O). First, there are
some requisite definitions related to the GPM.

Definition 1. Let R denote a set of real numbers. A grey number ®


(y) is a closed
and bounded set of real numbers with a known interval but an unknown probability
distribution (Huang and Moore, 1993):

where ® 1v) is the whits,ed 10w~oundAf (y) and® ®


(y) is the whitened upper
bound ~ (y). When ~(y) = '(51 (Y), '(51 (y) becomes a deterministic number, y.

Definition 2. A grey vector ®(Y) is a tupel of grey numbers:


and a grey matrix ®(Y) is a matrix whose elements are grey numbers:
36 B. BASS ET AL.

The operations for grey vectors and matrices are defined to be analogous to those
for real vectors and matrices.
The concepts of accumulated generating operation (AGO) and inverse
accumulated generation operation (IAGO) are required for the GPM.

Definition 3: The rth AGO of X(O) is defined as follows (Huang, 1988):

~S') = {x(r)(i) Ii = 1,2, ... , n; n1} (Sa)

where:
k
X(D)(k) = ~ x(1ll)(i), k E [1, n], I~l
1:1.
and n = (r, r-I, '" , 1) and CJ) = (r-I, r-2, ... ,0). The rth IAGO of X (I) ( a(r)(X(t»
is defined in a similar manner (Huang, 1988). The AGO and IAGO are defined
from X(I) to x(r).

u(r)(X) = {u(r)(x(t)(i» I i=1,2, •.. ,n; IH} (5b)

where:

u('>(x(t)(k) = x(Y)(k), k E [l,n], t~l.

and f3 = (I, 2, ... , r) and y = (t-I, t-2, ... , t-r).

The concept of the grey derivative is introduced as follows:

d®(X)/d®(t) • u(1)(JC(k+1», k E [l,n], (6)

where ®(t) = [k, k+I]. The support of ®(x) corresponding to ®(t) is defined as
follows:

Thus, given the following GPM as a differential equation:


GREY THEORY APPROACH TO QUANTIFYING RISKS 37

.5l~t»] = [X(k+1) + X(k'J)/2 (7)

(8)

we can convert it to

"(1)(x(1)(k+1)) + a .s{X(l)(®(t»] = u, k E [1, n-1] (9)

Let:

C = [a, b)T (10)

y(O) = (x(O)(i) I i = 2,3, ... ,n}T (11)

S[x(l») = {[S[x(I)(®(t») with ®(t) = [k,k+I) I k = 0,1, ... ,nI} (12)

where C T is a vector of the parameters in (9);

y<Oll'

is a vector of x(O)(i) with i = 2, 3, ... , n; and S[X(I»)T is a vector of support of


x(l)(®(t» corresponding to ®(t). Thus, we have:

y(O) = a S[X(l)) + b E,
= [S[X(l»), E) [a, b)T,
= [S[X(l»), E) C, (13)

where E = (1, 1, ... , I)T.

Letting:

B = [S[X(I»), E),
(14)

where B is a matrix consisting of S[X(l)) and E, we have:

(15)

Hence, x(')(k+ 1), V k, can be obtained by solving (9). Thus from the definition of
the IAGO, x(O)(k+I), V k, can be obtained from x(l)(k+I), V k. Obviously, when k >
38 B. BASS ET AL.

n-l, the obtained x(O)(k+l) provides a prediction of the x value in a future period
HI.

WATER BUDGET MODElLING

The Thomthwaite climatic water balance model produces a climate-based


accounting of the water gains and losses at a location or for a region. The air
temperature and precipitation are used to compute the water budget estimating soil
moisture deficit and runoff as residuals ( Mather, 1978). Although the
Thomthwaite model does not incorporate wind effects, or the direct effect of
elevated levels of C02 on transpiration, it has been found to provide reasonable
reliable estimates of water balance components in most clim~tes (Mather, 1978).
The model can be used to compute daily water budgets, but the monthly water
budget mode is used in order to utilize the transient GeM model and for
comparison with other water balance studies in the same area (Cohen, 1991). The
soil moisture capacity was assumed to be 100 mm, and the minimum mean
monthly temperature required for snowmelt was O.l°C in order to match the
assumptions used by Cohen (1991) in a GeM-based assessment of water balance
in the same area.

SIDDY LOCATION

The analysis was carried out at Cold Lake, Alberta. This location was chosen
because it is adjacent to the Saskatchewan River sub-basin that Cohen (1991)
examined using two GeM scenarios and the Thomthwaite water balance model.
Being relatively close to the Mackenzie Basin, which is the focus of a major
climate impact assessment, the results may also have bearing for future research in
that area. Cold Lake is also situated half-way between two GISS-GeM grid
points, which are used as a geographic "grey interval" in the water balance model.
Although the Thomthwaite model may not be appropriate for locations at these
latitudes (SO~ - S8~ are at the northerly edge of acceptability for the
Thomthwaite model, this study is only an exploratory evaluation of this technique.

DATA

The GPM is computed with 20 years of monthly temperature and precipitation data
(1966-198S) for Cold Lake, Alta. The grey temperature model is validated against
four years of data (1986-1989), and the precipitation grey model is tested against
six years of data (1986-1991). The water budget model is run with the GPM and
input from two grid points from the GISS transient GeM (GISST). The two grid
points are almost equidistant, directly north (1l0'W,S8~) and -south
(llO"W,S~O ) of Cold Lake, Alta which is situated at 1l0"W,S4~. The
temperature and precipitation output from the GISST are only available as decadal
averages, and for this study, the first four decades (1970 - 2010) are compared to
the GPM for the same time period. Three water budget components are compared
to the GISST and the three equilibrium scenarios for the decade of the 2040's. The
GREY THEORY APPROACH TO QUANTIFYING RISKS 39

three equilibrium scenarios (GFDL87, GISS, OSU) have been interpolated to


11 O"W,54"N.

RESULTS

1be lrey pmliCdoD model (GPM)

The GPM was developed for monthly temperature and precipitation data (1965-
1985). The interval is validated against observations for the years 1986-1989
(Figure 1) for temperature and 1986-1991 for precipitation (Figure 2).

20~------------------------------~
~ 10+-~~~----~--~--~~~----~~~~
II

:
P
E
o+-~----~--~--~~~----~--~--~~
T
U

~ -10~~----~~------~~----~~~--~~

I-Observed a Grey-Max .. Grey.Min

Figure 1. Observed and grey monthly temperature


(OC) 1986-1989.

160
140
120
PREC
(mm) 100 co

80
,I J
60
40 A .~ ..< jl CI
co ;;;If.
1 ~L'" Ie N. P J 10 ~ 'A I!
CI

20 ito.
0

·~A ~~
.-'"1 ·u·
~. ~~ ~~~-~
BO .,
1--- 0bsaY0d • Grey-Max .. Grey-Min

Figure 2. Observed and qrey monthly precipitation


(mm) 1986-1991.
40 B. BASS ET AL.

The grey model interval appears to be a valid description of the monthly


temperature and precipitation at Cold Lake although it does not incorporate all of
the summer peaks in precipitation.
The decadal grey interval averages from the GPM are compared to the
GISST-GeM, at the two grid points for the 1970's through the first decade of the
twenty-first century (Figure 3). Climatological analysis of the annual and spring
temperature departures from normal, for the climate region enclosing Cold Lake,
indicate that there are significant positive anomalies during the 1980's and the early
part of the 1990's (Environment Canada, 1992; personal communication) which are
reflected in the GPM but not in the GISST output.

T 2o.---------------------~----~~--_
E
M 10~~~~----~~----~~~--~~~~
p
E 0 ~..j/.--_¥r--_J..1_...Jjk__"_--.......___..JI_--4\o~
R
A -10~~--~~~----~~~----~~----~
T

~ ~O~------~~-------=~------~------~
E

)C Obscnal - GeM -1I0.S0N


-GeM -1l0,5BN a Gn:y-Max
.. Orey-Min

Figure 3. Decadal monthly avg temperature (OC).

The GISST precipitation is greater than the both the observed and the GPM
precipitation for the four decades except during the summer months in the first
decade of the twenty-first century (Figure 4).

160
140
p 120
R
E 100
c
80
(mm)
60
40
20
0
18717. 1880'. 1880s 2000's

• Observed -GISST 1l0W,sON


-GISST HOW,saN a Grey-Max
.. Grey-Min
Figure 4. Decadal monthly average
precipitation (mm).
GREY THEORY APPROACH TO QUANTIFYING RISKS 41

Water budget modellin&

Three variables are extracted from the water balance modelling: the annual
potential evapotranspiration (PE), the deficit (D) and the surplus(S). The water
budget model is run using decadal averages derived from observations (1970's and
1980's), the GPM and the GISST-GCM (Table 1). During the 1970's and 1980's
the observed PE falls within the grey interval, and for the 1980's, the observed
deficit falls between the grey interval while it is smaller than the lower grey range
for the 1970's. This is most likely due to the higher monthly average rainfall
between April and August in the 1970's. During the same period the water budget
derived from the GISST temperature and precipitation produces the opposite result.

During the next two decades the grey deficits decrease (95.9 - 25.9mm) and a
small surplus is evident (20.1 - 93.1mm). The PE is slightly larger throughout the
four decades at the high end of the grey interval (530.6 - 595.6 mm) while it
remains almost constant at the lower end of the interval. This result is due to the
fact that the grey temperature interval is quite small (Figure 3), and the grey
precipitation interval was much larger (Figure 4).

TABLE l. Annual water budget (mm)

PE D S
1970 OBS 522.1 -63.1 0.0
GREYMAX 530.6 -9S.9 0.0
GREYMIN S13 -148.9 0.0
GISST SON 467.8 -1.7 396.4
GISST S8N 402.8 -0.2 347.9

1980 OBS 530.4 -140.5 0.0


GREYMAX 543.6 -6S.9 0.0
GREYMIN 51S.5 -230.4 0.0
GISST SON 478.2 -38.1 339.5
GISST 58N 406.8 0.0 398.1

1990 GREYMAX 565.3 -35.4 20.9


GREYMIN 513.8 -264.9 0.0
GISST 50N 477.3 -17.1 380.3
GISST 58N 414.9 0.0 365.3
GISST 50N(A) 536 -126.6 23.2
GISST 58N(A) 457.2 -110.5 0.0

2000 GREYMAX 595.6 -25.9 93.1


GREYMIN 519 -297.8 0.0
GISST 50N 495.8 -8.6 405.6
GISST 58N 435.6 -2.7 329.8
GISST 50N(A) 645 -153 16.1
GISST 58N(A) 549.3 -173.2 0.0

The GISST water budgets also exhibit patterns similar to the 1970's and
1980's over the next two decades. At llOW, 50N the surplus increases from the
42 B. BASS ET AL.

1980's through the twenty-first century due to an increase in spring and summer
precipitation (Figure 4). At llOW, 58N, the surplus decreases for the same period
due to smaller levels of summer precipitation. Since the GPM appears to be a
valid description of the monthly temperature and precipitation at Cold Lake during
the 1970's and 1980's, it is used to adjust the GISST temperature upward and the
GISST precipitation downward (Appendix 1). At both grid points (GISSTA) the
PE increased, and at both grid points the deficits and surplus are now within the
respective grey intervals. The GISSTA water budget corresponds more closely
then the unadjusted GISST to the GISS, GFDL87, and OSU 2XC02 equilibrium
scenarios for the Cold Lake location (Cohen, 1989; 1991).

Analysis of Oimafe-Dependent Decisions

Bass et al. (1994) present a method for evaluating data quality for weather-
dependent decisions. This framework presents data uncertainty as a numerical
interval which a decision maker interprets as encompassing the actual or "true"
value for a decision. How much risk a user is willing to accept has to do with the
importance of a decision and the size of the interval that is acceptable.
The annual grey deficit, surplus and PE, both the GISST and GISSTA
scenarios, and the three equilibrium scenarios are plotted in Figures 5-7. The
GFDL87, GISS and OSU scenarios are plotted for the 2040's, since their climates
are supposed to be representative of some future equilibrium. In addition the
GISST values for the 2040's are also plotted. For each water budget component,
the GISST scenario is outside of the grey interval. In each figure the grey interval
is directed to the three equilibrium scenarios although there are obvious limits in
projecting the grey interval to the 2040's.
In Figure 5, the grey interval is probably too large for an effective decision
(small level of risk) for the 1990's. Nevertheless, it demonstrates that the GISSTA
deficit is within the grey interval although the high end is very close to the GISST
deficit as well. Figure 6 provides a more reasonable decision interval,
encompassing the GISSTA surplus, for the first decade of the twenty-first century
that is clearly separated from the GISST surplus. In addition the grey water budget
points towards the general direction of the three equilibrium scenarios. Figure 7
provides a clear evaluation of the quality of both the GISST and the GISSTAPE.
GREY THEORY APPROACH TO QUANTIFYING RISKS 43

0
OISS

-50
osu
-100 • N
DEFICIT 1
OISS
(mm)150

-200

-250

70 80 90 0 10 20 30 40
-OREY-MAX -OREY-MIN _OISST 11OW.5ON
_OISST 11OW,5llN ..... OISST(A) 11OW.50N ..... OISST(A) 11OW.58N

pigure 5. Annual Deficits (1970-2040).

500 T"'""----------~1 OW.58N


'ils
E 400 +--.,.......,.,.....----,,......_ _ _ _ _...J,f·OW.5ON
-~
E
300 +-----------1

-I
Q.
~ 200 +-----------1
en
100 +-----=---------4
GISS

O 1~~~~~~~~~(~11~ow~·~~~losu
GFDl
70 80 90 0 10 20 30 40
-GREY-MAX • GREY-MIN ... GISST 11OW.5ON
_GISST 11OW.58N ..... GISST(A) 11OW.5ON..... GISST(A) 11OW.58N

pigure 6. Annual surplus (1970-2040).


44 B. BASS ET AL.

750r-----------------~

700 +----------=-1
650~------~--------~
PE
(mm)600 +----~-----~

550 +----,,,,,,.--+--r-----~

500 +--~~"=""----~

450+---~-----~~:.

400~~~~~~~~~~
70 80 90 0 10 20 30 40
-GREY-MAX -GREY-MIN .... GISST 11OW,5ON
_GlSST 11OW,58N -t-GISST(A) 11OW,5ON -t-GISST(A) 11OW,58N

Figure 7. Potential Evapotranspiration


(1970-2040).

In this case, most of the scenarios, all the GISST and two of the GISSTA,
fall outside of the grey interval and would most likely be rejected. The grey water
budget interval also indicates that beyond the first decade of the twenty-first
century, neither the GISST or the GISSTA appear to be valid. However, the grey
water budget points in the direction of the three equilibrium scenarios, and this
includes the GISST for I10W, SON.

CONCLUSIONS

The results of this analysis suggest that grey prediction model may be an
appropriate tool for evaluating the risks associated with a GeM, particularly. The
GPM adequately represented the decadal averages for temperature and precipitation
for the 1970's and the 1980's. Preliminary analysis of temperature trends in north-
western Canada in the early 1990's suggest that spring temperatures have been
anomalously warm, which is also reflected in the GPM. The grey water budget
components also enclosed the water budgets based on observations for the same
period. Assuming that the GPM is valid for the 1990's and the first decade of the
twenty-first century, it provides a means of evaluating scenarios of the PE and the
surplus. However, the grey deficit interval is most likely too large to provide a
useful evaluation of deficit scenarios in the twenty-first century. In addition the
grey PE and surplus intervals also point in the general direction of three
equilibrium 2XC02 scenarios, although it would be premature to suggest that these
scenarios will remain valid for Cold Lake, Alberta in the 2040's. While the GPM
GREY THEORY APPROACH TO QUANTIFYING RISKS 45

appears to be valid for the monthly temperature and precipitation data at Cold
Lake, further testing at other sites and through the 1990's is required in order to
generalize these results.

REFERENCES

Bass, B., Russo, J.M. and Schlegel, J.W. (1994) "Data Quality in Weather-
Dependent Decisions" (in press).

Cohen, S.J., Welsh, L.E. and Louie, P.Y.T. (1989). Possible impacts of climatic
warming scenarios on water resources in the Saskatchewan River sub-basin.
Canadian Climate Centre, Report No. 89-9. Available from Climatological
Services Division, AES, Downsview, Ontario, Canada.

Cohen, SJ. (1991). "Possible impacts of climatic warming scenarios on water


resources in the Saskatchewan River sub-basin, Canada" Climatic Change 19,291-
317.

Deng, J. (1984) The Theory and Methods of Socio-economic Grey Systems (in
Chinese), Science Press, Beijing, China.

Environment Canada (1992) The State of Canada's Climate: Temperature Change


in Canada 1895-1991. SOE Report 92-2.

Huang, G.H. (1988) "A Grey Systems Analysis Method for Predicting Noise
Pollution in Urban Areas", The Third National Conference on Noise Pollution
Control, Chengdu, Sichuan, China (in Chinese).

Huang, G.H. and Moore, R.D. (1993) "Grey linear programming, its solving
approach, and its application". International Journal of Systems Science 24, 159-
172.

Mather, J.R. (1978). The Climatic Water Balance in Environmental Analysis.


Lexington Books, Lexington, Mass. USA
46 B. BASS ET AL.

APPENDIX I

The GISST outputs were adjusted using the grey white mid value (WMV)
and the half-width between the two GISST grid points. The WMV is analogous to
a grey mean and was defined for each decade. A mean value was defined for
each month,

(16)

where i represents the month. The WMV is defined as

(17)

Similarly a mean GISST value was defined for each decade. Each monthly mean
GeM precipitation values were adjusted by the subtracting the difference between
the GeM mean and the WMV. For the GeM temperature values, this difference
was added to each monthly mean. The values for each grid point were recreated
by adding the GeM half-width to the adjusted monthly mean for HOW, SON and
subtracting this value from the adjusted monthly mean for 110W, SBN. The half-
width is defined as
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY
PRECIPITATION

Balaji Rajagopalan, Upmanu Lall and David G. Tarboton


Utah Water Research Laboratory
Utah State University, Logan, UT - 84322-8200
USA

ABSTRACT

A nonparametric wet/dry spell model is developed for describing daily precipitation at a


site. The model considers alternating sequences of wet and dry days in a given season of
the year. All the probability densities of interest are estimated nonparametrically using
kernel probability density estimators. The model is data adaptive, and yields stochastic
realizations of daily precipitation sequences for different seasons at a site. Applications of
the model to data from rain gauges in Utah indicate good performance of the model.

INTRODUCTION

Stochastic models for precipitation occurrence at a site have a long, rich history in
hydrology. The description of precipitation occurrence is a challenging problem since
precipitation is an intermittent stochastic process that is usually non stationary, can exhibit
clustering, scale dependence, and persistence in time and space. Our particular interest is in
developing a representation for daily precipitation in mountainous regions in the western
United States.
Webb et.al (1992) note that, a mixture of markedly different mechanisms leads to the
precipitation process in the western United States over the year and even within a given
season. A rigorous attack on the problem would perhaps need to consider the classification
of different precipitation regimes at different time scales, the identification of such classes
from available data, and the specification of a stochastic model that can properly reproduce
these at a variety of time scales. Our focus is on developing appropriate tools to analyze the
raw daily data without deconvolution of the mixture based on synoptic weather
classification.
In most traditional stochastic models, probability distributions are assumed for the
length of wet or dry spells and also for the precipitation amounts. While such distributions
may fit the data reasonably well in some situations and for some data sets, it is rather
disquieting to adopt them by fiat. It is our belief that hydrologic models should (a) show
(rather than obscure) the interesting features of the data; (b) provide statistically consistent
estimators; and (c) be robust. Consistency implies that the estimates converge in probability
to the correct behaviour. The standard practice of assuming a distribution and then
calibrating the model to it clearly obscures features of the data and may not lead to a
consistent estimator from site to site. This is particularly relevant where the underlying
process is represented by a mixture of generating processes and is inhomogeneous. The
47
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 47-59.
© 1994 Kluwer Academic Publishers.
48 B. RAJAGOPALANET AL.

issue of interest is not the best fit of a model but the ability to represent a heterogeneous
process in a reasonable manner.
This motivates the need for a stochastic model for the generation of synthetic precipitation
sequences that is conceptually simple, theoretically consistent, allows the data to determine
its structure as far as possible, and accounts for clustering of precipitation events and
process heterogeneity.
Here we present results from a nonparametric seasonal wet/dry spell model that is
capable of considering an arbitrary mixture of generating mechanisms for daily precipitation
and is data adaptive.
The model yields stochastic realizations of daily precipitation sequences for different
seasons at a site that effectively represent a smoothed bootstrap of the data and are thus
equivalent in a probabilistic sense to the single realization observed at the site. The
nonparametric (kernel) probability density estimators considered in the model do not
assume the form of the underlying probability density, rather they are data driven and
automatic. The model is illustrated through application to data collected at Woodruff, Utah.

MODEL FORMULATION

The random variables of interest are the wet spell length, w days, dry spell length, d
days, daily precipitation, p inches, and the wet spell precipitation amount, Pw inches.
Variables wand d are defined through the set of integers greater than 1 (and less than
season length), and p and Pw are defined as continuous, positive random variables. The
year is divided into four seasons, viz., Season - I (January - March), Season - U(April -
June), Season - III(July - September), and Season - IV(October - December). The
precipitation process is assumed to be stationary within these seasons. Precipitation
measurements are usually rounded to measurement precision (e.g., 0.01 inch increments).
We do not expect the effect of such quantization of the data to be significant relative to the
scale of the precipitation process, and treat precipitation as a continuous random variable. A
mixed set of discrete and continuous random variables is thus considered. The precipitation
process over the year is shown in Figure 1.

Figure 1.
Precipitation process over the year

The key feature of the model is the nonparametric estimation of the probability density
function (using kernel density estimators) for the variables of interest, rather than fitting
parametric probability densities. The reader is referred to Silverman (1986) for a pragmatic
treatment of kernel density estimation and examples of applications to a number of areas.
The model is applied to daily precipitation for each season. The pdf's estimated for
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 49

each season are, f(w) the pdf of wet spell length, f(d) the pdf of dry spell length, f(p) the
pdf of daily precipitation amount f(p). Kernel density estimators are used to estimate the
pdfs of interest from the data set.
Synthetic precipitation sequences are generated continuously from season to season,
following the strategy indicated in Figure 2. A dry spell is first generated using f(d), a wet
spell is next generated using f(w). Precipitation for each of the w wet days is then
generated from f(p). The process is repeated with the generation of another dry spell. If a
season boundary is crossed, the pdfs used for generation are switched to those for the new
season.
For the univariate continuous case, the random variable of interest (p) is generated from
the kernel density estimate following a two step procedure given by Oevroye (1986, p.
765) and also in Silverman (1986). While, the discrete variables (wand d) are generated
from the cumulative mass function.
The above procedure neglects correlation between sequential wet and dry spell lengths
and correlation between daily rainfall amounts within a wet spell. These correlations can be
incorporated through the use of conditional pdf's and the disaggregation of total wet spell
precipitation into daily amounts, (LaB and Rajagopalan, in preparation).
For the data sets analysed here, all the correlations mentioned above were found to be
insignificant. Consequently we did not use conditional pdfs and disaggregation here .

.......

-
Dry Spell Wet Spell
d days w days
from fed) .... from few)

,
Independent daily
precipitation • w days
from f(p)

Figure 2
Structure of the renewal model for daily precipitation

Kernel estimation of continuous univariate PDF


The continuous, univariate pdf of interest is f(p), the pdf of daily precipitation for each
season. The kernel density estimator (Rossen blatt, 1956) is defined as:

=L ...LK(p-Pi)
n
fn(p) (2.1)
i=l nh h

This estimates the probability density f(p) based on n observations Pi. K(.) is a kernel
50 B. RAJAGOPALAN ET AL.

function defined to be positive, symmetric, have unit integral, symmetric and has finite
variance. These requirements ensure that the resulting kernel density estimate is a valid
density. The symmetry condition is not essential, but is used to avoid bias. The subscript
en' emphasizes that this is an estimate based on en' data points. The bandwidth parameter h
controls the amount of smoothing of the data in our density estimate. An estimator with
constant bandwidth h, is called a fixed kernel estimator. Commonly used kernels are:

Gaussian Kernel K(t) =(21ttl12e-t212 (2.2a)


Epanechnikov Kernel K(t) = 0.75 (1 - t2 ) I tiS 1 (2.2b)
Bisquare Kernel K(t) =(15/16) (1 - t2 )2 I tiS 1 (2.2c)

One can see from Equation 2.1, that the kernel estimator is a convolution estimator. This
is illustrated in Figure 3. The kernel density estimate can also be viewed as a smoothing of
the derivative of the empirical distribution function of the data.

2.5

1.5

1 I. Data point I
0.5

o+-~-----~~~~~~~~-----~~~~~--~
o 5 10 15 20
x

Figure 3.
Example of Kernel pdf. using 5 equally spaced values (black dots) with
Bisquare Kernel, and a fixed bandwidth (h=4)
Note that x is assumed to be a continuous variable

The choice of the bandwidth and kernel can be optimized through an analysis of the
asymptotic mean square error (MSE), (E[(f(P)-fn (P»2]) or mean integrated square error
(MISE), the integral of MSE over the domain. Under the requirements that the kernel be
positive and symmetric, having unit integral and finite variance, Silverman (1986, p. 41)
shows that the optimal kernel in terms of minimizing MISE is the Epanechnikov kernel.
However it is only marginally better than others listed above. Silverman (1986, Eqn. 3.21)
shows that the optimal bandwidth, hopt, is a function of the unknown density f(p). In
practice a certain distribution is assumed for f(p) and the MISE is minimized to obtain
optimal bandwidth hopt with reference to the assumed distribution. Kernel probability
density estimation can also be improved by taking h to be variable, so that the smoothing is
larger in the tails where data is sparse, but less where the data is dense.
A number of bandwidth selection methods have historically been used, like the
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 51

cross validation methods (maximum likelihood and least squares cross validation, see
Silverman (1986), Sec. 3.4). These methods are prone to undersmoothing (Silverman,
1986). This is pronounced when the data is concentrated near a boundary. This is the case
with precipitation where there is a finite lower bound (precipitation> 0) to the domain.
Symmetric kernels near the boundary can violate this. One approach is to relax the
symmetry constraint and use boundary kernels such as suggested by Muller (1992). Here
however we chose to avoid the issue by working in a log transformed variable space. A
fixed bandwidth kernel density estimate (Eqn. 2.1) is applied to In(p) and the resulting
probability density is back transfonned, to get:

(2.3)

h was chosen as optimal with reference to the normal distribution in the log
space.Epenichnikov kernels were used. The optimal bandwidth is (using Silverman 1986,
Eqn.3.1)

hp = 2.1250 n-1/5 (2.4)

where 0 is the standard deviation of the log transformed data. This method provides
adaptability of bandwidth and also gets around the boundary issue. Figure 4(a) shows this
method applied to precipitation data collected at Woodruff, Utah over the years 1948-1989
for Season 1 (Jan.-Mar.). Note that this follows the data as reflected by the histogram well.
There are differences between the kernel estimate and a fitted exponential distribution, but
from Figure 4(a) it is hard to see which is better. Figure 4(b) shows the cumulative
distribution functions obtained by integrating the probability density function. Both kernel
and exponential cdf estimates are compared to the empirical cdf using the Weibul plotting
position (i / (n+ 1», with 95% confidence limits ± <fa set up around the empirical cdf,
calculated using (Kendall and Stuart, 1979, p.481)

<fa = 1.3851 n-1/2 (2.5)

It can be seen from Figure 4(b) that the cdffrom exponential distribution leaves the 95%
confidence interval, while that from the kernel estimator lies entirely within. This suggests
that the true density function of the data is different from exponential.

Kernel estimation of discrete univariate PDF


The discrete, univariate probability mass functions (pmf's) of interest are f(d) and f(w)
for each season. In a traditional alternating renewal model, the wet spell length and dry
spell length are assumed to be continuous and independent variables, often assumed to be
exponentially distributed. Roldan and Woolhiser (1982) consider wet and dry spells to be
discrete variables and assume a geometric distribution. Indeed, to account for clustering,
one could sample the dry spells from two geometric distributions with switching based on a
Markov Chain, as done by Foufoula-Georgiou and Lettenmaier (1987).
However, kernel method allows the representation of an arbitrary structure or
appropriate degree of mixing of distributions, that is "honest" to the data, and provides a
good, alternate building block. Even under the assumption of independence of wand d,
the kernel estimator will tend to reproduce wet spell lengths and dry spell lengths with
52 B. RAJAGOPALAN ET AL.

relative frequencies that match those in the historical data set. One nonparametric estimator
of the discrete probability distribution of w or d is the maximum likelihood estimator that
yields directly the relative frequencies (e.g., number of wiln, for the ith wet spelliengtb Wi
in a sample of size n). The kernel method is better, because (a) it allows interpolation and
extrapolation of probabilities to spell lengths that were unobserved in the sample, and (b) it
has higher MSE efficiency.
Wang and Van Ryzin (1981) developed geometric kernels for discrete variables.
Simonoff (1983) has developed a maximum penalized likelihood (MPLE) estimator for
discrete data. Hall and Titterington (1987) show how discrete kernel estimators can be
formed continuous kernels evaluated at discrete points and renormalized. These three
methods were compared and we found that the Hall and Titterington (1987) approach
worked best. The geometric tended to undersmooth, while MPLE oversmoothed.
The Hall and Titterington (1987) estimator is similar to the estimator in Equation (2.1),
but with an adjustment for use with discrete data. It is given as:
n
f(w) = n-IhL Kh{h(w-wj)} (2.6)
1
where, h is the bandwidth and h E (0,1], Kh (h(w-wj)) evaluated only for discrete
(integers) w-wiis defined as,

K(.h) is the kernel, s(h) is the scale factor to rescale the discrete kernel to sum to unity,
and is

s(h) = {hL KGh)}-1 (2.8)


j
This is effectively a sum over all integers j in (-h- 1,h- l ). For h ~ I the kernel estimator
equals the cell proportion estimator. Recognizing that in Equation (2.6) the pmf f(w) is
conditional on bandwidth h i.e f(wlh). The bandwidth h is selected as a minimizer of a
cross validation function which they suggested as,

CV(h) = L f2(wlh) - 2L fi(wi1h)


w
(2.9)

Where, f(w) is estimated using Equation (2.6), while fw(wlh) is also estimated using
Equation (2.6) but by dropping the point Wi. They proved that the above cross-validation
automatically adapts the estimator to an extreme range of sparseness types. If the data is
only slightly sparse, cross-validation will produce an estimator which is virtually the same
as the cell-proportion estimator. As sparseness increases, cross-validation will
automatically supply more and more smoothing, to a degree which is asymptotically
optimal. To alleviate the boundary problem, Dong and Simonoff (1992) have developed
boundary kernels for most of the commonly used kernels. Dong and Simonoff (1993) have
successfully tested the boundary kernels on data sets that are similar to the ones we have.
We have used the bisquare kernel (as defined in Equation 2.2c) and the corresponding
boundary kernels for our analysis.
Figures 4(c) and (d) illustrate this approach applied to wet and dry spells. In Figure
0
LJ)
N >
q
.... :~.--. z
.... ~ ..... 0
Z
~ j J "tl
Histogram of His!. Data co 95%C.I
Kernel Estimated PDF ci kernel CDF
>
i':l
~ j ~\ I _.- Fitted Exp. Distn. Exp. CDF 7:' >
~
# of data points = 718 u. '/ tI:I
Cl '"c:i .I >-l
() i':l
g 1 ~~ ./' (5
~
.. / i':l
ci ;"-/ tI:I
z
~ 1 ~~[jl:\..,
C\I ~
ci >
t""
o J O/..,I~;''''&'''',;y.'':'''>';w.Sfli$_QeIiIm:r-5ll~SJ:'(~.
~
0
t;j
0.0 0.2 0.4 0.6 -4 -3 -2 -1 0 tI:I
t""
ppt. (in) log(ppt.) 'Il
0
i':l
Figure 4(a) PDF of daily precip. for Season 1 Figure 4(b) CDF of daily precip. for Season 1 ~
0
t;j

~
Cl
0 t;j
C\I
6j \ c:i >
F
10 Observed Propn
><
Observed Propn. "tl
Kernel Estimate c:i
- Kernel Estimate i':l
E " 0 Geometric distn, Geometric dlstn ntI:I
a. \ E0.
0 ::a
c:i
- ::J
(\j
>
>-l
' ..
0 \ (5
10
0 Z
ci
'~
0 '.,
0

2 4 6 8 10 2 4 6 8 10 12 14

WeI Spell Length Dry Spell Length V\


W
Figurp 4«(;) Pml 01 weI spell length lor Season 1 Figure 4(d) Pmt of dry spell length tor Season 3
54 B. RAJAGOPALAN ET AL.

4(c) there is no appreciable difference between kernel estimate and fitted geometric
distribution. In Figure 4(d) the kernel estimate is seen to be a better smoother of the
observed proportion, than a fitted geometric distribution.
The above results indicate that the kernel estimators provide a flexible or adaptive
representation of the underlying structure, under weaker assumptions (e.g continuity,
smoothness) of the density than classical parametric methods.

Simulation Results

The above pdf and pmf estimates were used in the simulation model described earlier,
applied to the Woodruff Utah, data. In order to test the synthetic generation of the model,
the following statistics were computed for comparing the historical record.

1. Probability distribution function, mean, standard deviation and probability mass


function of
dry and wet spells per season.
2. Length of longest wet and dry spell per season.
3. Mean and standard deviation of daily precipitation per season.
4. Probability density function of daily precipitation per season.
5. Maximum daily precipitation per season.
6. Percentage of yearly precipitation per season.

Twenty five simulations were made and the above statistics were calculated for the three
variables. They are plotted along with the historical statistics as boxplots. The box in the
boxplots indicates the interquartile range of the statistic computed from twenty five
simulations, while the lines extending outward from the boxes go upto 95% range of the
statistic. The dots are the values of the statistic that fall outside the 95% range. The black
dot joined by solid lines is the statistic of the historical record. The boxplots show the range
of variation in the statistics from the simulations and also show the capability of the
simulations to reproduce historical statistics.
Figures 5, 6 and 7 show the boxplots of the various statistics, for each season, for the
three variables, daily precipitation, wet spell length and dry spell length respectively. It
can be seen from the above figures that the simulation procedure reproduces the
characteristics well. The number of wet spells and dry spells are simulated more than the
historic record. The reason is that in the historic data there are many missing values, which
results in lesser wet and dry spells, while simulations are made for the entire length of the
record. This introduces a small bias, as a result of which the historical statistics tend to fall
outside the boxes in the boxplots. This can be observed from Figures 6(c) and 7(a).
Thus, the model provides a promising alternative to the parametric approach. The
assumption free and data adaptiveness of the non parametric estimators makes the model
more robust to distributional assumptions.
Further work is required by way of analysing more data sets, and comparing with the
traditional stochastic models like Markov chain, Markov renewal model etc. The model is
being generalized by Lall and Rajagopalan (in preparation) to handle situations were the
correlations between the variables are significant.
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 55

.r::
0,
c:
~
Qi
Q.

'"~
"0

C'1
'0
>
Q)
I/)
c: E
0
I/) '"
'0
'"
Q)
C/)
I/)

~
Q.
)(

d1
e
t-
~
::>
C>
u:

L 9 s
4lfiualUads NP 10 "/lapIS

.r::
.r:: "6
c:
0, ~
c:
~ f-- I, f----f 'Of
Qi
a
Qi
Q.
I/)
'"~
"0
~
"0 '0
'0 I- ---i E
CO) f-- CO)
::>
~~
c:
E
'" E'"
c:
Q)
'"0
c: "x
'"E
~
0 In
'"'" '0 '"
Q)
(5
'"
Q)
C/) C/)
'0 1!l
Ci
x
f-; (\j 0
0
Ci
x
co 0
CO
3: r- f--- .....u
t-
;:::-
~
Q)
:; f--- ~ Q)
:;
C>
u: L-L-- Q>
u.

OL S9 09 SS US SP OP 0. Of:

lJI5uailiads AJp ueaw


co U\
0\

"! T
'" T "!
.c:
0, 0
c -a;
a. T
-'" '" '" "":
Qi Qi 1
a. ~ T
Vl co
Qi '0 N
;: - :> 1
c Q)
en ~
1
ill
E
'" - '"
'<T I ----,-- q~
~
0
1
2 3 4 2 3 4

Seasons Seasons

Figure 6(a) Boxplots of mean of wet spell length Figure 6(b) Boxplots of stddev. of wet spell length

",I
TI

0
T
Q)
a.
UJ
Qi
?;
p T
Vi co
Q) !':1
Ol :;0
c )-
0
-'
~ ......
)-
Cl
ill 1 0
"C
)-
1 l'
)-
2 3 4 Z
tIl
>-,j
Seasons )-
r'
Figure 6(c) Boxplots of maximum of wet spell length
>
<D

0
T ~ T z0
I z"0
>
:;0:1
.L
"
0
~ >
c:i
S'::
Ci. 8: 1:!1
Q.
N
0> ~
«> 0 ~
(f)
I/)
~
:;0:1
1:!1
0
# ~ ...c:i
~
0 T/ ::;:
0
>
t""'
00 1
0 S'::
...c:i
0 0
2 3 4 2 3 4 tl
1!l
Seasons Seasons 2l
:;0:1
Figure 5(a) Boxplots of mean of daily precipitation Figure 5(b) Boxplots of stdev. of daily precipitation
6tl
1:!1
t""'
Z
Cl
0
('") tl
0 I i~ 0 >
N T P
>-<:
Ci.
a.
>-
If)
N a
$ T
;g
1:!1
0 Co Ll'!
~
""'>- .,.c ~
'0 ~
~ >
~ 0
N 1 g
0 / C! T Z

2 3 4 2 3 4

Seasons Seasons ~
Figure 5(c) Boxplots of % of yearly precipitation Figure 5(d) Boxplots of maximum ot dally precipitation
58 B. RAJAGOPALAN ET AL.

Acknowledgements

Partial support of this work by the U.S. Forest Service under contract notes, INT-
915550-RJVA and INT-92660-RJVA, Amend #1 is acknowledged. The principal
investigator of the project is D.S. Bowles. We are· grateful to J.Simonoff, J. Dong, H.G.
Muller, M.e. Jones. M. Wand and SJ. Sheather for stimulating discussions, provision of
computer programs and relevant manuscripts. The work reported here was supported in
part by the USGS through their funding of the second author's 1992-93 sabbatical leave,
when he worked with BSA,WRD,USGS, National center, Reston, VA.

REFERENCES
Devroye, L. (1986) Non-uniform random variate generation, Springer-Verlag, New York.

Dong Jianping and J. Simonoff. June 10, (1992) "On improving convergence rates for
ordinal contingency table cell probability estimation", unpublished report.

Dong Jianping and J. Simonoff. (1991), "The construction and properties of boundary
kernels for sparse multinomials", unpublished report.

Foufoula-Georgiou, E. and D.P. Lattenmaier. (1986) "Continuous-time versus discrete-


time point process models for rainfall occurrence series", Water Resources Research 22(4),
531-542.

Hall, P. and D.M. Titterington. (1987) "On smoothing sparse multinomial data",
Australian Journal of Statistics 29(1), 19-37.

Kendall, Sir M. and A. Stuart. (1979) Advanced Theory of Statistics, Vol. 2, Macmillan
Publishing Co., New York.

Lall, U. and B. Rajagopalan. " A nonparametric wet/dry spell model for daily
precipitation", in preparation for submission to Water Resources Research.

Muller, H.G. (1991) "Smooth optimum kernel estimators near endpoints", Biometrika
78(3),521-
530.

Roldan J. and D.A. Woolhiser. (1982) "Stochastic daily precipitation models, 1. A


comparison of occurrence processes". Water Resources Research 18(5), 1451-1459.

Rosenblatt. M. (1956) "Remarks on some nonparametric estimates of a density function".


Annals of Mathematical Statistics 27, 832-837.

Silverman. B.W. (1986) Density estimation for statistics and data analysis. Chapman and
Hall, New York.

Simonoff, J.S. (1983) "A penalty function approach to smoothing large sparse contingency
tables", The Annals of Statistics 11(1),208-218.
A NONPARAMETRIC RENEWAL MODEL FOR MODELING DAILY PRECIPITATION 59

Wang, M.C. and J. Van Ryzin. (1981) "A class of smooth estimators for discrete
distributions", Biometrika 68(1), 301-309.

Wand, J.S. Marron and D. Ruppert. (1991) "Transformations in density estimation",


Journal of the American Statistical Association 86(414), 343-353.

Webb, R.H. and J.L. Betancourt. (1992) "Climatic variability and flood frequency of the
Santa Cruz river, Pima county, Arizona", USGS water-supply paper 2379.
PART II

FORECASTING
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE-
HOW MUCH UNCERTAINTY IS ENOUGH

OJ. DRUCE
Resource Planning, B.C Hydro
Burnaby Mountain System Control Centre
C/O Podium B, 6911 Southpoint Drive
Burnaby, B.C. Canada V3N 4X8

For the past several years, the British Columbia Hydro and Power Authority has made
available to the forest industry and others, probabilistic forecasts of month-end elevations
for its largest storage reservoir, Williston Lake. These forecasts consist of the median,
lower decile and upper decile values for each month over a 24 month period and are
updated on a monthly basis. They are generated by a stochastic dynamic programming
model, in combination with a simulation model. The SDP model derives a monthly
operating policy for Williston Lake, the adjacent 2416 MW G.M. Shrum generating
station and the 700 MW Peace Canyon run-of-river hydroelectric project located
downstream on the Peace River. The operating policy provides releases for each month
that are conditional on the reservoir storage state and on a randomized historical weather
state. The sample of month-end reservoir levels calculated by the simulation model is
easily manipulated to directly elicit the percentiles for the forecast. Analyses of the
forecasts issued since April 1989 indicate that while the median forecasts have been
relatively accurate, the decile forecasts with lead times of less than a year have
underrepresented the uncertainty in the reservoir levels. Furthermore, preliminary results
suggest that an ungraded version of the SDP model, that includes a stochastic
export/import market, will not add sufficient uncertainty for the shorter lead times. A
source of forecast error that deserves more attention has, however, been indentified.
INTRODUCTION

Williston Lake, with a live storage capacity of 40,000 x 106 m3, is the largest reservoir
operated by the British Columbia Hydro and Power Authority (B.C. Hydro). Williston
Lake was created when the W.A.C. Bennett Dam was constructed on the Peace River in
the 1960's. It filled for the first time in 1972. Adjacent to the W.A.C. Bennett Dam is the
2416 megawatt (MW) G.M. Shrum (GMS) hydroelectric power plant and just
downstream is the 700 MW Peace Canyon (PCN) run-of-river project. This hydroelectric
complex is located in northeastern British Columbia, as shown in Figure 1.

Traditionally, forecasts of Williston Lake levels were produced only for in-house use,
i.e., for operations planning. Then, in the late 1980's, it became apparent to B.C. Hydro
that other users of Williston Lake, primarily the forest industry, were keenly interested in
future water levels. B.C. Hydro responded by distributing the forecasts to the forest
companies and to area managers who deal more directly with the local interests. At the
same time, B.C. Hydro began relying more on economic criteria and less on water level
63
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 63-75.
© 1994 Kluwer Academic Publishers.
64 D.J.DRUCE

YUKON
B.C.

Figure 1. Location map.


forecasts for the operations planning of Williston Lake.
The more recent versions of the water level forecasts consist of the median, lower
decile and upper decile values for each month over a 24 month period and are updated
monthly. They are obtained by simulating the operation of Williston Lake and the
GMSIPCN power plants under a sample of historical weather years. The simulation
operating policy is developed by a stochastic dynamic programming (SOP) model in
which uncertainty is driven by randomized monthly weather sequences. The SOP model
also establishes the marginal value of water stored in Williston Lake, as a function of
time and storage level. Due to their size, Williston Lake and the GMSIPCN plants are
often used to balance the system load and resources over time periods ranging from hours
to years. The marginal value of water in Williston Lake is therefore a good proxy for the
B.C. Hydro system marginal cost. A forecast of the system marginal cost is produced by
combining the Williston Lake water level forecast with the marginal water value
information.

For this paper, the 24 month water level forecasts issued since April 1989 have been
analyzed to obtain performance statistics for the median forecasts and to determine
whether the methodology produces upper and lower decile forecasts that are credible.
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 65

The evaluation of decile forecasts is particularly relevant because it provides some


indication of where the forecasting methodology may be deficient in accounting for
uncertainty.
The following section provides a brief description of the B.C. Hydro system and
explains some of the operating considerations that add uncertainty to the Williston Lake
water level forecasts.
OVERVIEW OF THE B.C. HYDRO SYSTEM

B.C. Hydro supplies electricity to most of the province from an integrated system of
power plants that are predominantly hydroelectric. The nameplate generating capacity of
the system is 10,390 MW, with the 30 hydroelectric plants contributing 9332 MW.
Based on their operating regime, each of the hydroelectric plants can be placed into one
of three groups, namely, the Peace River plants, the Columbia River plants and the rest of
the system. The Peace River plants, OMS and PeN, are unique in that they provide most
of the monthly and annual operating flexibility in the system. The Columbia River
projects are characterized by large power plants with some storage reservoirs, but
relatively little control over monthly or annual generation levels. This lack of control is
partly the result of the Columbia River Treaty signed with the United States in 1961.
Mica Dam, on the main stem of the Columbia River is one of three dams built in Canada
under terms of the Columbia River Treaty. The reservoir created by Mica Dam,
Kinbasket Lake, has a live storage capacity of 14,800 x 106 m 3 of which approximately
8630 x 106 m3 are operated in accordance with Treaty requirements. Monthly releases
from Treaty storage or month-end storage targets for the upcoming year are
predetermined from studies jointly prepared by the United States and Canadian Entities
and, for the most part, are independent of runoff conditions. Storage targets rather than
releases are specified for the summer months to facilitate refill of Treaty storage over a
wide range of monthly inflow volumes. As a result, there is greater uncertainty over the
monthly releases and generation from the 1736 MW Mica power plant at that time of the
year. The 1843 MW Revelstoke project, located downstream of Mica Dam, has
considerable local inflow and 1560 x 106 m 3 of live storage capacity, but is generally
operated as a run-of-river plant. Consequently, the Columbia River Treaty obligations
play a dominant role in the monthly operation of both projects. There is however some
short-term operating flexibility available through the use of non-Treaty storage. B.C.
Hydro has an additional 1137 MW of generating capacity at two power plants, Kootenay
Canal and Seven Mile, located on tributaries of the Columbia River, but has little or no
control over the operation of the respective upstream storage reservoirs. The variation in
the summer generation from Columbia River plants is usually accommodated by
adjusting the generation from Peace River plants and is reflected in the Williston Lake
levels. The rest of the hydroelectric system is comprised of many small to moderate-
sized projects where the operation is dictated more by the hydrologic regime and storage
limitations than by system requirements. The percentage of the total system generation
provided by each of the three groups is shown in Figure 2. Thermal projects have not
contributed much generation in recent years, but the 912 MW Burrard thermal plant,
located near Vancouver, is available for system support as r~quired.

By design, B.C. Hydro generally has more energy than it needs to meet the domestic
demand, plus any other firm obligations, except under prolonged low water conditions.
Rather than have this surplus energy accumulate (as water) in the system reservoirs, until
it eventually spills, B.C. Hydro attempts to market the energy to utilities in the Pacific
66 D. I. DRUCE

15" OTHER ---~

2" THERMAL
31" PEACE

4bX COLUMBIA _ _- o J

Figure 2. B.C. Hydro system generation by source 1984 to 1992.

Northwest and Southwest United States. However, depending on the load/resource


balance in the Pacific Northwest, transmission access to these markets may be quite
restricted. Utilities in the Pacific Northwest have priority rights to the transmission grid
and, under many water conditions, crowd out energy from B.C. Hydro.

In the past, the quantity of surplus energy available for the current year was estimated
from deterministic energy studies that assumed average water conditions over a four year
planning period. Energy beyond what was required to refill the reservoirs in the first year
could be exported without concern about the effect on the longer term reliability of the
system. However, for these studies, only the energy that could be generated by the
hydroelectric projects was considered. In the event that the reservoir inflows were less
than expected, thermal power plants could then be operated to refill the system. As a
policy, thermal power plants were not operated concurrently with exports. Since B.C.
Hydro is only a marginal supplier to the large Pacific Northwest/Southwest market,
export prices were typically set just below the decremental cost of the importing utility,
after allowing for transmission costs. The incremental cost of actually supplying energy
from the hydroelectric system was generally unknown, but was thought to be small, and
had no influence on either the price or quantity of the exports.

By the late 1980's, B.C. Hydro had started to move away from the notion that
hydroelectric energy had only a nominal production cost and toward marginal pricing,
based on opportunity costs. In the short-run, B.C. Hydro's opportunity costs are directly
affected by the forecasted markets for exports and for alternate resources and by the risk
of a system spill. For estimates of its system marginal cost, B.C. Hydro relies on the SDP
model developed for the operations planning of Williston Lake. An outline of how that
model is used for forecasting both the Williston Lake levels and the system marginal
marginal cost is included as the next section.

FORECASTING METHODOLOGY
Most stochastic dynamic programming models that are used for reservoir management
FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE 67

treat the inflows as observations of a stochastic sequence (Yakowitz, 1982). However,


for the SDP model of Williston Lake operation, it was the weather that was assumed to be
the stochastic process (Druce, 1990). The weather not only causes the inflows to
Williston Lake, but also, affects the domestic supply and demand for energy. This SDP
formulation was chosen so as to increase the level of uncertainty considered by the model
and thereby add realism. Yeh et al (1992) have also acknowledged reservoir inflows and
load demands to be seasonally and cyclically varying stochastic quantities and caution
that they can present severe complications, if accommodated explicitly in the
optimization.

Under the assumption that Williston Lake is operated to balance system loads and
resources, it is necessary to establish how much of the load can be served by the other
resources. The GMS/PCN plants can then be operated to supply the residual fIrm load
and to accommodate any interruptible sales or purchases. The schematic, presented as
Figure 3, shows the components included in the derivation of the GMS/PCN load, as
well as other input and output for what has come to be known internally as the "marginal
cost" model.

DOMESTIC LOAD COLUMBIA RIVER HISTORICAL GENERATION COMM ITTED EXPORTSI


FORECAST . INFLOW FORECAST FROM OTHER IMPORTS/BURRARD
HYDROELECTR IC PLANTS OPERA TlON

I
GENERATION FROM
COLUMBIA RIVER
PLANTS

WILL ISTON LAKE RESIDUAL FIRM INTERRUPTIBLE EXPORT


INFLOW FORECAST LOAD FOR GMS/PCN AND ALTERNATE RESOURCE
MARKET FORECASTS

I
MARGINAL COST MODEL
I \
MARGINAL COST WILLISTON LAKE
FORECAST LEVEL FORECAST

Figure 3. Schematic for B.C. Hydro's marginal cost model.

Uncertainty for both loads and resources is based on the effects of randomized
monthly weather sequences from the years 1973 to 1992, i.e. the year before the current
year. The monthly domestic load forecast is adjusted by 20 gigawatt-hours (GW.h) per
degree Celsius whenever the mean temperature for an historical month at a weather
station near the load centre deviates from the long term mean temperature for the
corresponding month. However, this empirical relationship typically alters the monthly
load by less than three per cent. Uncertainty is added to the resource side by linking the
weather months and the energy historically generated by the rest of the hydroelectric
system, excluding the four large plants in the Columbia River group. As mentioned
earlier, the majority of these smaller projects have limited storage capacity and their
operation is related to the local hydrology rather than the system load. The year-to-year
variability in the small hydro generation for a given month is usually less than about five
68 D.J.DRUCE

percent of the system load. Examination of Figure 4 will help put this source of
uncertainty into perspective.
II.
b.
1- TOTAL SYSTEM 1--- GMS/PCN 1- SMALL HYDRO 1
5....

,.., 4....
.c
3t!)
'-'3....
>-
t!)

ffi
ffi 2....

1. II.

73 74 75 7b 77 78 79 81 81 82 83 84 85 8b 87 88 89 9. 91 92
YEAR

Figure 4. Variability in monthly load and generation.


For the Mica and Revelstoke plants on the Columbia River, seasonal inflow forecasts
are generated for the current year, using a conceptual hydrologic model (Druce, 1984).
The model is initialized with the best estimate of the hydrologic state of the basin at
the time of the forecast and then the runoff response is calculated for a sample of
historical weather years. These forecasts are first issued in January, then updated each
month through to August. It is therefore feasible to forecast the generation available from
the Mica and Revelstoke plants as a function of the weather sequence. Unfortunately, it
is difficult to generalize the Treaty operation of Mica Dam to cover a wide range of
runoff conditions and to date, B.C. Hydro has only been able to forecast the generation
corresponding to the expected inflows. For the Kootenay Canal and Seven Mile plants, it
is not possible to forecast the generation associated with various weather sequences
because the prerequisite type of inflow forecast is simply not available. The net effect is
that a deterministic forecast is made of the energy supplied by the four large Columbia
River plants.
The committed sales, purchases and Burrard operation make up the last component in
the derivation of the firm load for the GMS/PCN plants. This information is assumed to
be accurately known, but in fact may be quite uncertain due to forced outages and gas
supply disruptions.
The largest source of uncertainty considered by the SDP model is the water supply
forecast for Williston Lake. For the current year, the forecast is generated by a
conceptual hydrologic model, in the same manner as for the Mica and Revelstoke plants
on the Columbia River. For subsequent years, the inflow forecast is based on the
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 69

historical inflows. In each case, the inflow sequences, corresponding to the weather years
of 1973 to 1992 are divided into monthly values to produce monthly inflow and the
weather pairings. Monthly weather sequences are explicitly assumed to be independent
in the SDP
formulation and that implies that the monthly inflow values should not be highly
autocorrelated. The statistical summary of historical inflows to Williston Lake, included
as Table 1, shows that monthly autocorrelation is low to moderate. The range in the
annual water supply for Williston Lake, when converted to energy, amounts to about 18
per cent of the annual system load. Although, for the current year, that level of
uncertainty can be reduced by the hydrologic forecasting, as illustrated by Figure 5.
TABLE 1. Summary of inflow volumes to Williston Lake for 1973 to 1992.

TIME INFLOW VOLUMES R2WITH


PERIOD MEAN MINIMUM MAXIMUM PREVIOUS
(m 3 x 106 ) MONTH
JANUARY 834 547 1323 0.31
FEBRUARY 625 429 851 0.39
MARCH 705 448 1302* 0.17
APRIL 1449 671 3542* 0.58
MAY 6593 3900 10359 0.14
JUNE 9573 5762 13121 * 0.03
JULY 5444 2532 8992 0.10
AUGUST 2584 1327* 5008 0.61
SEPTEMBER 2043 1107* 3103 0.09
OCTOBER 2179 1056* 3694* 0.39
NOVEMBER 1367 722 2007 0.29
DECEMBER 955 512 1348* 0.55

ANNUAL 34351 25396 42383

* Records established during the 1989 to 1992 forecasting period.

For the past four years, the observed annual inflows have generally fallen within the
forecasted ranges. But, individual monthly values were often outside the forecasted
monthly range and that has resulted in substantial short-term errors in the water level
forecasts. The occurrence of such errors will tend to reduce over time as more weather
years are added to the data base.

The interruptible export and alternate resource markets are also major sources of
uncertainty for B.C. Hydro. In an upgraded version of the SDP model, these markets are
treated as stochastic variables. However, for the model that has been used for
forecasting, deterministic forecasts of the monthly quantities and prices for both markets
are input.
70 D.l.DRUCE

ill. I ..

,....
- - - - - -0-- ---- -I}-- - - - - -0- - - - - - -f)"
,,
.J::

:3 8. II. ,,
e ,
:3
0 '8 . . ........
....J
u. b.1I1I
z ............. 'Il
H ,,
....J ,,
<J:
:=! ,,
~ 4. 111111
"t1
z
H
W
(.!)
2. 11111
~ FORECASTING MODEL
\-1}- NAIVE \+- HYDROLOGIC \
~

.4------+------~----~----_+------~----~----~
1 JAN 1 FEB 1 MAR 1 APR 1 MAY 1 JUN 1 JUL 1 AUG
FORECAST DATE
Figure 5. Reduction in the uncertainty of the annual inflow to Williston Lake.

Given all the information described above. the SDP model selects for each reservoir
state and weather state. the monthly release from Williston Lake that will maximize the
net revenue to B.c. Hydro. That operating policy is then passed on to a companion
model that simulates the operation of the hydroelectric complex for each of the historical
weather years. from a known initial reservoir level. The sample of possible month-end
reservoir levels calculated by the simulation model are manipulated to obtain the
percentiles that are presented graphically. as shown in Figure 6. Based on the water level
forecast and the marginal value of water stored in Williston Lake. established by the SDP
model for each storage level and month over the planning horizon. a marginal cost
forecast is also routinely produced for internal use. It provides decision support for
interruptible sale. purchase and storage transactions. Other applications for the marginal
cost model have been previously described by Druce (1989).

Upper and lower decile values are plotted along with the median values to provide
some indication of the uncertainty associated with the forecasts. This is common practice
with the seasonal water supply forecasts available in British Columbia. Alberta and the
western United States. Moreover. an economic benefit can be expected whenever a
probabilistic forecast is used instead of a categorical forecast for decision making and the
gain tends to increase as the predictability decreases (Alexandridis and Krzysztofowicz.
1985). In the following section. the median and the decile water level forecasts for
Williston Lake are analyzed in an attempt to establish forecast credibility and to learn
where the forecasting methodology can be improved.

FORECAST EVALUA TION

The water level forecasts were evaluated for accuracy by comparing the median
forecasted and the observed month-end values over each rolling 24 month period since
April 1989. Forecast statisitics are displayed. as a function of lead time. in Figure 7.
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 71

FULL SUPPLY LEVEL


b711

b&5
,....
:: bbll ,.------, ,
......
~
,
QI
" ,,
,5 b55 ,,
S
H '.
~ &51
...J
lJJ b45

PROBABILITY OF EXCEEDENCE
b41l
1- 1.11 1- 1.51 1--- '.91 1
A M J J A SON D J F M A M J J A SON D J F M
1993 1994 1995

Figure 6. Water level forecast for Williston Lake.

2.51
I-+- BIAS I~ MEAN IERI 1-&- RMSE I
2." 9 ... .B--B-iJ..
)3"- "'0.,

,
,
JJ' 0.. 1)...
~
\
~
'1\

• 2 4 & 8 11
LEAD TIME (months)
12 14 1b 18 21 22 24

Figure 7. Forecast statistics for SDP and simulation models.

From these results, it appears that the forecasting methodology is very slightly
positively biased, but reasonably accurate. It was surprising to find however, that the
greatest errors, on average, have occured in the first 12 months of the forecast period.
72 D. J.DRUCE

greatest errors, on average, have occured in the first 12 months of the forecast period.
The same forecast statistics were calculated for a naive model, i.e. using the average
observed month-end water levels available prior to each forecast year. Those results,
plotted in Figure 8, also reveal greater errors in the fIrst 12 months of the forecast period.
It was therefore concluded that the perverse error pattern was not due to the simulation
modelling, but due to the weather and market conditions that prevailed in the four year
forecasting period. The modelling actually compensated quite well.
l.5'
1-1- BIAS I.... MEAN \ER\ I-a- RMSE I
l.1I B- -IT -9- -&-B
... 1:1 ....
'1),. ....
2.5' B... n
,.... D'iJ....
on '1),. ....
QI
B. .
!; 2.11
QI
. . IT -It
~ ~ ~_&-a-&-B_n
e -II" D,
lJ..
""
I!i 1. 5. '11'11
IX
ffi
1.11

....
1.51

I 2 4 b 8 11 12 14 1& 18 28 22 24
LEAD TIME (months)

Figure 8. Forecast statistics for naive model.

The reliability of the upper and lower decile forecasts was tested by calculating how
frequently they contained the observed values. If the forecasting methodology accurately
accounted for uncertainty, then the frequency should be 80 per cent of the time. Once
again, as shown in Figure 9, results are poorer for the forecasts with the shorter lead
times. However, this was no surprise. In fact, work on upgrading the SDP model to
include more uncertainty has been underway for some time. B.C. Hydro has not had
much success in subjectively forecasting export markets and it was decided that the
uncertainty in the export market should be acknowledged explicitly in the modelling.
Also, by adding a stochastic export market to the SDP model it was anticipated that the
reliability of the decile forecasts of Williston Lake levels would improve, for the shorter
lead times.

The upgraded SDP model has two additional state variables, to separately account for
the monthly export quantity and export price. The size of the export market available to
B.C. Hydro is determined, for the most part, by water conditions for the United States
PacifIc Northwest. When the forecasted or observed water supply for the Columbia River
at The Dalles is 111,000 x 106 m3 or less, B.C. Hydro has, on average, had access to a
larger export market. The revised SDP model, therefore, has a state variable for water
conditions in the PacifIc Northwest that can take on one of two values. The export
FORECASTING B.C. HYDRO' S OPERATION OF WILLISTON LAKE 73

18111

9111

88~ __~E~XP~E~C~TE~D~FR~E~QU~E~N:CY~__~______________________
,...
N
'"
>-
u
m 18
::J
C!/
I.U
0::
u.. b8

58

11 2 4 b 8 111 12 14 1b 18 21 22 24
LEAD TIME (months)
Figure 9. Frequency that decile forecasts contained observed values.
market quantity varies monthly for each water supply state, but over a year amounts to
7900 GW.h versus 4700 GW.h. For simulation purposes, the future water supply states
are modelled as a lag-one Markov process with the monthly state transition matrices
based on the actual forecasts issued over the period 1970 to 1992.

The expected monthly price for energy in the interruptible export market is forecasted
using an empirical relationship with the spot price of natural gas in the United States, at
Henry Hub, Louisiana. The New York Mercantile Exchange futures market provides the
forecast for the spot price of natural gas at Henry Hub for the first 18 months, then
various other sources of information are used to extend the forecast over the six year
planning horizon. However, the empirical relationship is not particularly strong, with a
R 2 value just over 0.50. The SDP model has therefore been reformulated to consider
deviations from the expected price as a state variable. Three possible deviations are
included, roughly -5, 0 and +5 mills per kilowatt hours from the expected price. The
exact values are calculated from the residuals of the regression equation and are updated
as new data on prices becomes available. They are the mean deviations for three class
intervals chosen to have equal frequencies over the period of record. Again, the export
price deviation states are modelled as a lag-one Markov process with the monthly state
transition matrices calculated from the residual pattern. The alternate resource or import
market has some links with the export market through share-the-profit pricing formulas.
Other sources of uncertainty for the alternate resources have yet to be considered.

For February, March and April of 1993, the original and the upgraded versions of the
SDP and simulation models were operated in parallel. From a comparison of their
respective water level forecasts, it appears that the addition of the stochastic
export/import market will increase the reliability of the decile forecasts, but only for lead
times of four months or more. However, this observation is based on a very small sample
of forecasts and is subject to change as more forecasts are produced.
74 D.J.DRUCE

Another area of concern has been the deterministic forecast of the generation from the
large Columbia River power plants, since that group supplies such a large proportion of
the system generation. Perhaps through more reliable modelling of the operation of these
plants, the accuracy of the Williston Lake level forecasts could improve sufficiently, over
the shorter lead times, that it would not be necessary to add even more complexity to the
SDP model. This hypothesis was investigated by calculating error statistics for the
Columbia River generation forecasts for durations of one to eight months. The error
statistics were computed for three combinations of plants - the main stem plants Mica and
Revelstoke, the tributary plants Kootenay Canal and Seven Mile and all four plants
together. The results for the Mica and Revelstoke plants are shown in Figure 10, in terms
of Williston Lake storage. The monthly generation patterns for the main stem plants and
the tributary plants are negatively correlated. Consequently, the errors for the Mica and
Revelstoke generation forecasts are, in most cases, worse than those for all four plants
combined. It is apparent from a comparison of the error statistics presented in Figures 7
and 10 that much of the error in the Williston Lake level forecasts, for the shorter lead
times, can be attributed to the forecasts of the generation supplied by the Mica and
Revelstoke plants. These results are quite encouraging because they point to greater
effort in modelling those Columbia River plants that B.C. Hydro has more control over
and for which probabilistic seasonal inflow forecasts already exist.

2.51
I-+- BIAS 1-1- MEAN IERI I-e- RMSE I
2."
_-"0,,_----.£1
_-11"---
8. . . -" -- -n-----..IJ----

IX
, .. ,,"

fi11 ... , It ....


IX
UJ , ,,
,,
1.51 ~"

I.II~----~--_,r_--~----_r----~----._--~----_,

I 2 4 b 8
DURATION (months)

Figure 10. Forecast statistics for Mica and Revelstoke generation.

CONCLUSIONS
The SDP and simulation models used for the operations planning of Williston Lake
produce forecasts of month-end water levels that are relatively accurate, over a planning
horizon of 24 months. However, upper and lower decile forecasts, based on the
uncertainty of weather effects, are not credible for lead times of less than one year.
Preliminary results from upgraded versions of the SDP and simulation models, that
FORECASTING B.C. HYDRO'S OPERATION OF WILLISTON LAKE 75

include the extra uncertainty of stochastic export/import markets, indicate that the
reliability of the decile forecasts will likely increase only for lead times of four months or
more. The credibility of decile forecasts, with shorter lead times, may improve with
better modelling of the effects of the operation of the Mica and Revelstoke power plants
on the Columbia River.

REFERENCES
Alexandridis, M.J. and Krzysztofowicz, R. (1985) "Decision models for categorical and
probabilistic weather forecasts", Applied Mathematics and Computation 17,241-266.

Druce, D.J. (1984) "Seasonal inflow forecasts by a conceptual hydrologic model for Mica
Dam, British Columbia", in J.J. Cassidy and D.P. Lettenmaier (eds.), A Critical
Assessment of Forecasting in Western Water Resources Management, American Water
Resources Association, Bethesda, Md., pp. 85-91.

Druce, D.J. (1989) "Decision support for short term export sales from a hydroelectric
system", in J.W. Labadie, L.E. Brazil, I. Corbu and L.E. Johnson (eds.), Computerized
Decision Support Systems for Water Managers, American Society of Civil Engineers,
New York, N.Y., pp 490-497.
Druce, D.J. (1990) "Incorporating daily flood control objectives into a monthly stochastic
dynamic programming model for a hydroelectric complex", Water Resources Research
26(1), 5-11.
Yakowitz, S. (1982) "Dynamic programming applications in water resources", Water
Resources Research 19 (4), 673-696.

Yeh, W. W-G., Becker, L., Hua, S-Q., Wen, D-P., and Liu, D-P. (1992) "Optimization of
real-time hydrothermal system operation", Water Resources Planning and Management
Division, ASCE 118(6),636-653.
EVALUATION OF STREAMFLOW FORECASTING MODELS

Tao Tao t and William C. Lennox2


tWater Resources Department, Ontario Hydro
700 University Ave. H9C22, Toronto, Ontario, Canada MSG lX6
2Department of Civil Engineering, University of Waterloo
Waterloo, Ontario, Canada N2L 3Gl

For application purposes, it is no longer a sound investment to develop a streamflow


forecasting model from basics. Currently, streamflow forecasting models are available
for nearly every scenario one can imagine. A model could be stochastic or conceptual;
lumped parameter or distributed parameter. The task of developing a model has been
transferred to one of evaluation and selection since no single model can be applied
universally without sacrificing some element of its performance. Therefore, it is
necessary to have some kind of consensus as to how forecasting models are evaluated
and selected for each individual application. In the past, the evaluations were often
conducted by comparing the forecasted and the observed streamflows with numeric
andlor graphic criteria with little consideration given to the specific application.
However, in this study, forecasting models are evaluated through simulated real-time
applications to investigate which one maximizes the system performance.

INTRODUCTION
Being able to forecast future streamflows is of major interest to operators of water
resource systems. Many different approaches have been tried: repeating historic inflow
series, using mean values of historic inflow series, constructing stochastic models based
on the statistical analysis of historic inflow series and developing physically based
conceptual models. Unfortunately, a streamflow forecasting model which is a success
in one application could be a failure in another. Different forecasting models may have
to be selected for different applications. Historically, the evaluation and selection are
often conducted by comparing the forecasted and the observed streamflows through
numeric andlor graphic criteria (WMO 1986). Little consideration is given to the
particular application where the forecast is required. However, an operator could be
more concerned with which model can be used to achieve a better management of the
water resource system rather than how good the forecasted values are in comparison
with the actual future streamflows. In this study, different forecasting models are
evaluated through simulated real-time applications, in addition to using numeric criteria
(WMO 1986), to see which one can best improve the performance of a reservoir system.

In this study, eleven models are created for one-month ahead inflow forecasting. The
forecasted inflow for the coming month and the mean inflows for future months (leading
77
K. W. Hipel et al. (eds.).
Stochastic and Statistical Methods in Hydrology and Environmental Engineering. Vol. 3. 77-85.
© 1994 Kluwer Academic Publishers.
78 T. TAO AND W. C. LENNOX

to infinity) constitute a future inflow sequence. The inflow sequence is then used in
deciding the optimal release policy for the coming month in the simulated real-time
monthly operation of a two-reservoir system. In the following sections, eleven
forecasting models and five numeric evaluation criteria (WMO 1986) are first detailed.
The real-time operation of a reservoir system and its simulation are then described.
Finally, a case study involving the simulated real-time operation of a system of two
reservoirs in parallel under different scenarios is given.
STREAMFLOW FORECASTING

The eleven forecasting models created are listed as follows,


1. Using monthly mean values of historic series.
For a given inflow series Q,(m, yr), the monthly mean values are calculated as follows:
- I Y
Q,(m)=-L Q/..m.yr) (1)
Y"..I

where i (i=I.2 •.. n). m (m=1,2, ... ,12) and yr (yr=I,2, ...• l') represent generating
station, month and year, respectively. n is the number of generating stations and Yis the
length of the series in terms of year. The forecast based on the monthly mean values is.
Q/..m.yr)=Q/..m) yr=Y+ I,Y+2,... (2)

2. Periodic, first order Markov process based on historic series, spatial correlation
assumed.
Streamflows usually repeat the same pattern annually. The periodic, first order Markov
process is constructed by finding a set of parameters for every two consecutive months
instead of estimating one set of parameters for the whole series. The forecast is given
by:
" (3)
Qj(m,yr) =a/..m) + L b,(m) • Qj(m-l,yr) yr=Y+I,Y+2, ...
j-I

The parameters at) and bl) are estimated using least squares estimation technique based
on Y years of data.
3. Periodic, first order Markov process based on historic series. independence
assumed.
The forecast is given by Eq.(3) except that b;fJ equals to zero for i¢j.
4. Second order Markov process based on deseasonalized historic series, spatial
correlation assumed.
The deseasonization is in fact a process of removing the trend from a stochastic series
which is equivalent to deducting the monthly mean values from the historical series. Let
the deseasonalized series be q,[(yr-l) ·12+m], then
EV ALVATION OF STREAMFLOW FORECASTING MODELS 79

qJ..(yr-l) 012 +m] =Q~m,yr) -Q/(m) yr=1,2, ... ,Y (4)

The forecast is given by: 2 "

Q/(m,yr)=(2t(m)+Qj+ L L biP) oqj[(yr-1)


/-1 J-l
o 12+m-t] (5)
yr=Y+1,Y+2, ...
Theoretically aj should be zero because q/[(yr-1) o 12+m] is a series with zero mean.
However, since there exist rounding errors in the computation process, a/ will hardly
be zero even though it might be close to zero.

S. Second order Markov process based on deseasonalized historic series,


independence assumed.

The forecast is given by Eq.(S) except that bi) equals to zero for i¢j.

6. Second order Markov process based on logarithm-taken and deseasonalized


historic series, spatial correlation assumed.

The streamflow series usually follows log-normal distribution. Let the new series, after
taking the natural logarithm of the original historical series, be W,{m, yr). Then
~(m,yr)=ln[Qj(m,yr)] yr=1,2, ... ,Y (6)

The model building and parameter estimating are the same as Model #4 except that
Q,{m, yr) is now replaced by W,(m, yr), However, the noise term is no longer additive
as far as the forecasts are concerned. The final forecast is obtained as:
Q/(m,yr) =exp[l¥,(m,yr)] (7)

7. Second order Markov process based on logarithm-taken and deseasonalized


historic series, independence assumed.

Model #7 is the counterpart of Model #5 as Model #6 to Model #4.

The next four forecasting models are based on first order Markov process. They are the
same as Models #4 to #7 except that b;J..t) equals to zero for t=2.

8. First order Markov process based on deseasonalized historic series, spatial


correlation assumed.

9. First order Markov process based on deseasonalized historic series, independence


assumed.

10. First order Markov process based on logarithm-taken and deseasonalized historic
series, spatial correlation assumed.

11. First order Markov process based on logarithm-taken and deseasonalized historic
series, independence assumed.
80 T. TAO AND W. C. LENNOX

Models #2 and #3 have twelve sets of parameters for twelve months. Models #4 to #11
have only one set of parameters for all twelve months. The models are used only for
one-month ahead inflow forecasting. The future inflow sequence used in finding the
optimal release policy are generated through combining the forecasts for the coming
month and the mean values for all other months leading to infinity.

The numeric evaluation criteria (WMO 1986) used here are given below:

a. Ratio of standard deviations of forecasted to observed streamflows:


(8)

b. Ratio of the sum of squares of the monthly residuals to the centred sum of
squares of the monthly observed streamflows:
(9)

c. Ratio of the standard deviations of the residuals to the mean observed


streamflows:
(10)

d. Ratio of the mean error to the mean observed streamflows:


(11)

e. Ratio of the absolute error to the mean observed streamflows:


(12)

where Yo (mean values with bar) and yjrepresent observed and forecasted streamflows,
and N is total number of months involved.

REAL-TIME RESERVOIR OPERATIONS

Real-time operation is an on-going decision-making process, run daily, weekly or


monthly over the life expectancy of a reservoir system. The operation is best described
as an attempt to achieve the optimal compromise between reservoir storages and
reservoir releases to meet the multiple objectives of the system.

The multiple objectives of a reservoir system may be represented by an appropriate


performance index. This performance index serves two purposes: to evaluate the after-
the-fact system performance during a specific operating span, and to decide the optimal
release policy for each decision period. The performance index for evaluation is in a
form as:
- ~
J= L g[S(t),R(t),t] (13)

where S(t) is a storage vector, representing reservoir storages at the beginning of


decision period t; R(t) is a release vector, representing reservoir releases during decision
EVALUATION OF STREAMFLOW FORECASTING MODELS 81

period t; to and ft represent the beginning and the end of a specific operating horizon and
g indicates some physical or non-physical quantity for evaluation.
The performance index for decision-making or real-time operation is derived from the
performance index for evaluation and is expressed as
...
j =L g[S(k),R(k),k] (14)
1-/

where t is for the coming decision period. It should be noted that in this study the
optimization horizon extends to infinity.

A n-reservoir system can be described in matrix form as:


S(t+ 1) =S(t)-FR(t) -L(t) +Q(t) (15)

where Q(t) is an nx1 uncontrollable inflow vector; L(t) is an nx1 loss vector and Pis
an nxn system matrix. The diagonal elementsfu's of P are always equal to 1. The off-
diagonal elementfv is -1 if the reservoir i receives release from the reservoir j, and 0
otherwise. The uncontrollable inflow is defined as the part of total inflow to a reservoir
after subtraction of releases from upstream reservoirs.

The formulated problem requires the use of nonlinear optimization. Most optimization
techniques search for the optimum of a non-linear system through time consuming
iterative processes. The selected technique may be feasible in real-time operations.
However, the time required for simulating this process many times over may not be
acceptable. The eleven forecasting models are tested indirectly by finding an
approximately equivalent quadratic performance index and using linear quadratic control,
where the analytic solution is available.

If such a quadratic performance index exists, the problem can now be rewritten as,

j =.!.2 f ([S(k) -S,.(k)YA[S(k)-Sr<k)] +[R(k) -Rr<k)YB[R(k)-Rr<k)]} (16)


1 _/
subject to the following system equation,
S(k)=S(k-l)-FR(k-1)-L(k-1)+Q(k-1) (17)
The optimal controller is given as (Pindyck 1972),
R(k) =Rr<k) +B -1 P 1'[(P-A)S(k) +p(k) +ASr<k)] (18)
with (19)
P=A +[P-l+FB -lp1)-l
p(k)=(P-A)[Q(k)-L(k)-PRr<k) -FB -IPTp(k+ 1)]+p(k+ 1)-ASr<k) (20)

where S~k) is an nx1 target storage vector; R~k) is an nx1 target release vector; A and
Bare nxn weighing matrices; P is the solution to the Riccati equation Eq.(19) and is
an nxn matrix with all elements constant; and p(k) is the solution to the tracking
equation Eq.(20) and is an nx1 vector. A is symmetric, positive semi-definite and B and
P are symmetric, positive definite.
82 T. TAO AND W. C. LENNOX

The solution process starts with finding the target storages and releases to formulate an
approximately equivalent performance index. The targets are obtained by optimizing the
original performance index with respect to mean monthly values of historic inflows for
several years. The optimization is subject to all system constraints. Since the future
inflows are made of monthly mean values from the second month on and the inflows are
periodic, the targets are periodic as well. No matter what the initial storages are, the
future storages and releases will be the same as the target storages and releases after
some months. It can be observed from Eq. (18) that p(k) can be predetermined as,
p(k)=-PS';"k) (21)
The optimal release policy for the coming month can then be obtained as follows:
R(t) =R';"t) +C{[S';"t+ 1) +FR';"t)] -[S(t)+Q(t) -i{t)]} (22)
where
C=-B-IPT(P-A)=B-1PT[(P-A)FB-lpTp_Pj (23)

If the reservoir releases required in the release policy of Eq.(22) are not within the
release constraints, they are first set to meet the release constraints. In the process of
implementing the release policy, the releases are further adjusted to ensure that the
storage constraints are met even it means the violations of release constraints. That is
consistent with the reality.

CASE STUDY: THE EAST RIVER WATERSHED

The East River watershed is located in the Province of Guangdong, Southern China.
There are two generating stations on the East River Watershed: the Harvest Dam and
the Maple Dam (Figure 1). The Maple Dam is upstream in the East River, and the
Harvest Dam is downstream of the tributary, the Harvest River, which joins the East
River below the Maple Dam. The system parameters are listed in Table 1.

Table 1 Basic parameters of generating stations

Unit Harvest Dam Maple Dam


Drainage Area km2 5734 5150
Streamflow Record year 34 34
Average Inflow cms 196 134
Normal Pool Level m 116 166
Useful Storage l<rm3 6480 1249
Minimum Pool Level m 94 128
Dead Storage l<rm3 4310 286
Average Tail Water Elevation m 35 90.5
Minimum Release Requirement cms 100 90
Installed Capacity MW 302.5 160
EVALVATION OF STREAMFLOW FORECASTING MODELS 83

Southern China

Harvest River

arvest Dam

Canton East River


o

South China Sea

Figure 1 Location of the East River Watershed

Thirty-four years of monthly historic inflow series are available. The data serves two
purposes. One is for parameter estimation of forecasting models. Another is for
simulating the real-time operation where they are used as future "actual n inflows.

The system performance index used to decide the optimal release policy for the coming
month is
'" 2 2
J=:L
t-t
[:L C -:L Pi(k)]2
i-l
i
i-l
1=1,2, ... ,408 (24)

where Ci is the installed capacity at station i and


if T/gHi(k)Ri(k) ~ Ci (25)
if T/gHi(k)Ri(k) < Ci
and
(26)

where ." is the efficiency factor and g is the acceleration of gravity.

The constraints on the system are given as follows:


(27)
84 T. TAO AND W. C. LENNOX

The system is optimized over seven years or eighty-four months based on monthly mean
values of inflows and subject to constraints given in Eqs.(15) and (27) to obtain the
target storages and releases. February is assumed having 28.25 days.
The results from the numerical evaluation of eleven forecasting models are presented in
Table 2. The values in the brackets in the first row indicate what should be expected
from a perfect forecast. It can be observed that the second model is generally the most
favoured.
Table 2 Numerical evaluation of forecasting models

CO(1.0) NTM(-1.0) S(O.O) R(O.O) A(O.O)


M#1 0.6939 0.0369 0.7124 -0.0007 0.4394
M#2 0.8080 -0.3017 0.5873 0.0008 0.3583
M#3 0.7138 -0.1271 0.6221 -0.1074 0.3595
M#4 0.7686 -0.1699 0.6428 -0.0005 0.3843
M#5 0.7646 -0.1576 0.6476 -0.0004 0.3858
M#6 0.7547 -0.1313 0.6639 -0.1013 0.3579
M#7 0.7517 -0.1262 0.6644 -0.1020 0.3597
M#8 0.7890 -0.1829 0.6573 -0.0023 0.3978
M#9 0.7495 -0.1226 0.6568 -0.0004 0.3928
M#10 0.7380 -0.0965 0.6716 -0.1048 0.3645
M#11 0.7339 -0.0915 0.6711 -0.1056 0.3648

Table 3 shows the results of simulated system operations. The fIrst eleven represent the
simulated real-time operations using linear quadratic control with forecasted inflows
from eleven different forecasting models for the coming month. The twelfth is the same
as the first eleven except that the 'actual' inflows are substituted for the forecasted
inflows for the coming month. There is an expected improvement over the first eleven.
This indicates the merits of improved inflow forecasting.
Table 3 System performance of simulated real-time operation

Avg. Power Generation (MW) Reliability: min reI.


Harvest Maple Total J(10-6) Harvest Maple
LQC M#1 120.33 67.35 187.68 33.31 100% 94.9%
LQCM#2 120.21 67.20 187.41 33.51 100% 95.1%
LQC M#3 119.98 67.87 187.85 33.37 100% 95.6%
LQCM#4 120.74 67.23 187.97 33.46 100% 95.4%
LQC M#5 120.75 67.21 187.96 33.45 100% 95.1%
LQC M#6 120.56 67.57 188.13 33.49 100% 95.4%
LQC M#7 120.51 67.58 188.09 33.49 100% 95.4%
LQC M#8 120.25 66.94 187.19 33.68 100% 95.4%
LQC M#9 120.31 67.17 187.48 33.51 100% 95.1%
LQC M#1O 120.14 67.54 187.68 33.53 100% 95.4%
LQC M#11 120.12 67.59 187.71 33.52 100% 95.4%
LQC actual 121.03 69.28 190.31 32.76 100% 97.3%
Ideal case 121.78 73.05 194.82 30.30 100% 97.8%
EVALUATION OF STREAMFLOW FORECASTING MODELS 85

Since there is no iteration involved in the linear quadratic control, it takes less than one
minute to complete the computation for the first twelve scenarios on a Compaq 386/33L.
However, it takes much more time to fmd targets.

The last scenario, called the 'Ideal case', serves as a reference and can never be
reached. It is assumed that the future inflows are perfectly known beforehand. The
system goes through a one-shot optimization over all 408 months. It should be noted that
the simulated operation using the forecasting model #6 produces only 3.4% less total
power than the ideal case.

The interesting finding is that the forecasting model #2 which is favoured by the numeric
criteria does not outperform other forecasting models in terms of maximizing the power
generation or minimizing the performance index.

SUMMARY

The results shown above do not warrant any specific conclusion as to which method
should be used in the evaluation and selection of a streamflow forecasting model.
However, the study demonstrates that a streamflow forecasting model can be evaluated
by a method other than the usual numeric and/or graphic criteria. The method is to
recognize the application for which the model is intended and to assess its merits based
on the final results, such as increase of power generation, reduction of flood damage,
etc.

ACKNOWLEDGEMENTS

The writers would like to thank Professor Xi-can Shi of Tsinghua University, Beijing,
China for providing the data used in this study.

REFERENCES

Pindyck, R.S. (1972) HAn application of the linear quadratic tracking problem to
economic stabilization policy. H IEEE Trans. A.uto. Control AC -17(3),287-300.

World Meteorological Organization (1986): Intercomparison of Models of Snowmelt


Runoff, No.646, Geneva, Switzerland.
APPLICA TION OF A TRANSFER FUNCTION MODEL TO A
STORAGE-RUNOFF PROCESS

P.-S. YO, C.-L. LIU and T.-Y. LEE


Department of Hydraulics and Ocean Engineering, National Cheng Kung University
Tainan, Taiwan 70101, R.O.c.

In the storage approach to conceptual rainfall-runoff models, runoff is commonly


simulated as a function of storage. Based on this hydrologic phenomenon, a storage-runoff
forecasting model is developed to compare with the rainfall-runoff forecasting model. The
model order and parameters are first calibrated by using Schwartz's Bayesian criterion.
Eight storm events were then used for verification. One to six hours ahead forecast
hydrograph according to both rainfall-runoff and storage-runoff models have a time
problem. After both models are corrected by a backward shift operator, the time problem
is relieved. Based on comparison between the forecasted and observed hydrographs and
eight kind criteria, it seems that the storage-runoff forecasting model has performance
superior to that of the rainfall-runoff forecasting model.

INTRODUCTION

Floods are one of the most destructive acts of nature. Real time flood forecasting has been
developed for flood protection and warning system recently. Depending upon the response
time of a basin, a mathematical model to be used for real-time may consist of some parts
of the following three basic elements: (1) rainfall forecasting model, (2) rainfall-runoff
forecasting model, and (3) flood routing model (Reed, 1984). Because lots of catchments
have quick response to rainfall input, a rainfall forecasting model is desirable which will
act in unison with a rainfall-runoff model to extend the forecast lead time. Normally, the
rainfall forecasting models are subject to significant forecasting error even if the forecast
lead time is short (Einfalt, 1991). An alternative method used to extend the forecast lead
time is discussed in this paper.

MATHEMATICAL MODEL

A wide range of rainfall-runoff forecasting models have been developed recently including:
(1) unit hydrograph and other methods using the S-curve, the discrete linear cascade
87
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 87-97.
© 1994 Kluwer Academic Publishers.
88 P.-s. YU ET AL.

reservoir model (Chander and Shanker, 1984; Bolinski and Mierkiewing, 1986; Corrodini
and Melone, 1987; Corrodini et al., 1986), (2) conceptual models, (3) non-linear storage
models, and (4) transfer function models. O'Connel and Clark (1979) and Reed (1984)
have reviewed some of these models. In this paper, the transfer function model is used to
simulate the process of rainfall-runoff and storage-runoff for flood forecasting separately.

Rainfall-runoff forecasting model

Powell (1985), Owens (1986), and Cluckie and Owens (1987) have demonstrated that
rainfall-runoff process can be satisfactorily simulated by the transfer function model,

Q(t) = ajQ(t -1) +a2Q(t -2) +"+apQ(t -p) +bjR(t -1) +b2R(t -2) +
(1)
... +bqR(t -q) +e(t),

where

Q(t), Q(t -1),··· = discharge at times t, t -1,···,


R(t)" R(t -1) ... = rainfall at times t , t -1 "...
e(t)= noise,
p and q = model order,
~, ... ,ap and hI' · .. ,hq = model parameters.

When (1) is applied for runoff forecasting, a rainfall forecasting model is required to
forecast the rainfall in the future. The model to forecast rainfall in this paper is:

(2)

Storage-runoff forecasting model

In the conceptual rainfall-runoff models according the the storage approach, the runoff at
the outlet of catchment is commonly simulated as a function of storage, S, (for example: Q
= KS). Based on this hydrologic phenomena, a storage-runoff forecasting model is
developed, in which the runoff at the present time is assumed to be a function of previous
runoff and catchment storage. The major difference from the rainfall-runoff forecasting
model as described in the previous section is that the storage replaced the rainfall as model
input. The forecasting models include storage-runoff forecasting model,
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 89

Q(t) = ~Q(t -1) +~Q(t -2) +"+apQ(t -p) +bIS(t -1) +b2 S(t -2) +
(3)
... +bqS(t -q) +e(t)

and storage forecasting model,

(4)

As the storage over the catchment area cannot be directly measured, the storage at present
time is computed from the mass balance between the rainfall input and the discharge
output,

S(t) = [R(t) - Q(t)] + S(t -1). (5)

PARAMETER ESTIMATION

Either rainfall-runoff forecasting model or storage-runoff forecasting model can be written


in the general form,

O(t) = ap(t -1) +~O(t -2) + .. +apO(t -p) +b/(t -1) +bi(t -2) +
(6)
... +b/(t"-q) +e(t),

in which the O(t),O(t -1), ... ,OCt -p) are system outputs (i.e. discharge). 1(t), 1(t -
1), ... , I(t - q) are system inputs, which is the rainfall in the rainfall-runoff forecasting
model or the storage in the storage-runoff forecasting model; p and q are model orders
and ai' ... ,ap , bl' ···,bq are model parameters, which are calibrated by using historical
input and output data. If the historical data have N observations and m is the larger of p
and q, (6) can be written as:

[om+'
~m+2 ] = [am
ON
~m+l
0N-l
Om_I

Om

°N_2
°m-p+1

°m-p+2

°N_P
1m

1m +1

IN- 1
1m- I

1m

I N _2
..
m,+,]
I~m-q+2 X

I N _q
nr']
~
"
'
bq
.
~m~

eN
(7)

i.e.,
90 P.-s. YU ET AL.

O(N.m)Xl = Z(N-m)X(p+q) • C(p+q) xl +E(N-m)Xl· (8)

Based on the minimum square error between the observed and simulated output data, the
optimum parameters are determined according to the following equation:

(9)

The model order can be decided based on the minimum value of SBC (Schwartz's
Bayesian Criterion) (Schwartz, 1978),

SBC(p,q) = NIna! +(p +q)lnN. (10)

a! is the variance of the residuals (the difference between observed discharge, OCt), and
simulated discharge, OCt),

~2
(f = (0 -0)' (0 -0)
-'--_---C---'-_-'- (11)
e N _(p + q)

CASE STUDY

Sixteen storm events over the Fei-Tsui reservoir catchment in northern Taiwan were
collected for a case study and eight of the storm events are used for calibration to
determine the model orders and parameters. The other eight storm events are used to
verify the model performance based on criteria of eight kinds (Habaieb et aI., 1991; Wang
et aI., 1991; Abraham and Lendalter, 1983), which is divided into two groups, statistical
and hydrologic indexes.

Statistical index

(a) Mean Absolute Deviation (MAD)

1
L
N
MAD = - IQ(t) -Q(t)l (12)
N t=)

where

N = No. of observation,
Q(t) = observed flow,
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 91

Q(t) = forecasted flow.

(b) Mean Square Error (MSE)

(13)

(c) Revised Theil Inequality Coefficient (RTIC)

RTIC =
N[Q(t) -
[I~ Q(tW
INI~ [Q(tW ]112 (14)

(d) Correlation Coefficient (CC)

where

Q = the mean value of observed flow,


Q= the mean value offorecasted flow.
Hydrologic index

(e) Coefficient of Efficiency (CE)

CE =1 - [It [Q(t) -Q(tW IIt [Q(t) -Q r] (16)

(t) Error of Peak Discharge (EQp)

(17)

where
92 P.-s. YU ET AL.

Qp = the peak value offorecasted flow,


Qp = the peak value of observed flow.

(g) Error of Time to Peak (ETp)

(18)

where

Tp = the time to forecasted peak flow,

Tp = the time to observed peak flow.

(h) Error of Total Volume (EV)

EV = t~ [Q(t) -Q(t)] t~
N / N
Q(t) (19)

RESULTS

Based on the criteria of SBC in (10), the optimal models calibrated by using eight storm
events are

Rainfall-Runoff Forecasting Model:

Q(t) = 1.1 194Q(t-l) - 0.4162Q(t-2) + 0.1551Q(t-3) + 0.0736R(t-l) + (20)


O.0491R(t-2),

Rainfall Forecasting Model:

R(t) = O.7891R(t-l) - O.0363R(t-2) + 0.0015R(t-3) + O.0448R(t-4) + (21)


O.0646R(t-5),

Storage-Runoff Forecasting Model:

Q(t) = 1.2285Q(t-l) - 0.4146Q(t-2) + O.1259Q(t-3) + O.0516S(t-l) - (22)


O.0129S(t-2) - 0.0356S(t-3),
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 93

Storage forecasting Model:

Set) = 1.4148S(t-l) - 0.3717S(t-2) - 0.0485S(t-3). (23)

Both rainfall-runoff forecasting model and storage-runoff forecasting model are applied to
eight storm events to verifY the model performance. One to six hours ahead forecast
hydrographs are compared with the observed hydrographs as shown in Figure 1 and
Figure 2, in which only two of eight storm events are shown. It is found that the forecast
hydrographs have a time problem in both models. The r-hour ahead forecast hydrograph is
shifted by nearly r-hours to compare with the observed hydrograph, therefore a backward
shift operator is applied to correct the time problem,

(24)

where r is the lead time. Figure 3 and Figure 4 are the forecast hydrographs corrected by
Eq(24). It seems that the time problem in forecast hydrographs is significantly reduced.
Figure 5 presents the comparison of model performance based on eight kinds of criteria.
From the analytical results of eight storm events one may conclude that the storage-runoff
forecasting model has better performance than rainfall-runoffforecasting model.

CONCLUSIONS

Rainfall-runoff forecasting model and storage-runoff forecasting model are developed and
compared by using 16 storm events over Fei-Tsui reservoir catchment in northern Taiwan ..
It is found that the forecast hydrographs have a time problem in both models. After both
models are corrected by a backward shift operator, based on the comparison between the
forecast hydrographs (one to six hours ahead) and eight kinds of criteria, the storage-
runoff forecasting model has better model performance than the rainfall-runoff forecasting
model. More researches are required to confirm this conclusion .

REFERENCES

Abraham, B. and Ledolter, J. (1983) Statistical Methods for Forecasting, John Wiley &
Sons, Inc., New York.
Bobinski, E. and Mierkiewicz, M. (1986) ''Recent developments in simple adaptive flow
forecasting models in Poland", Hyd. Sci. J. 31,263-270.
Chander, S. and Shanker, H. (1984) "Unit hydrograph based forccast model", Hyd. Sci. J.
31, 287-320.
Cluckie, I. D. and Owens, M. D. (1987) ''Real-time Rainfall Runoff Models and Use of
Weather Radar Information", Chap. 12, Weather Radar and Flood Forecasting, ed. by V.
K. Collinge and C. Kirby, Wiley.
94 P.-s. YU ET AL.

Corradini, C., Melone, F. and Uvertini (1986) "A semi-distributed model for real-time
flood forecasting", Water Resource Bulletin 22,6, 1031-1038.
Corradini, C. and Melone, F. (1987) "On the structure of a semi-distributed adaptive
model for flood forecasting", Hyd. Sci. J. 32,2,227-242.
Einfalt, T. (1991) "Inaccurate rainfall forecasts: hydrologically valuable or useless?", New
Technologies in Urban Drainage UDT '91, ed. by C. Maksimovic, Elsevier Applied
Science, London.
Habai'eb, H., Troch, P. A. and De Troch, F. P. (1991) "A coupled rainfall-runoff and
runoff-routing model for adaptive real-time flood forecasting", Water Resources Manage-
ment 5, 47-6l.
O'Connel, P. E. and Clark, R. T. (1981) "Adaptive hydrological forecasting - a review",
Hyd. Sci. Bulletin 26,2, 179-205.
Owens, M. (1986) Real-Time Flood Forccasting Using Weather Radar Data, Ph. D.
Thesis, University of Birmingham, Department of Civil Engineering, u.K.
Powell, S. M. (1985) River Basin Models for Operation Forecasting of Flow in Real-
Time, Ph. D. Thesis, University of Birmingham, Department of Civil Engineering, UK
Reed, D. W. (1984) A Review of British Flood Forecasting Practice, Institute of
Hydrology, Report No 90.
Schwartz, G. (1978) "Estimating the dimension of a model", Annuals of Statistics 6,461-
464.
Wang, R. Y. and Tan, C. H. (1991) "Study on the modified tank model and its application
to the runoff prediction of a river basin", Taiwan Water Conservancy 39, 3, 1-23. (in
Chinese)
Wei, W. W. S. (1990) Time Series Analysis: Univariate and Multivariate Methods,
Addison-Wesley Publishing Company, Inc., New York.
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 95

Forecasting Storm Event


Model 7021 7916
1000 .... --Ro"
8DII
--Reo!
..... """F. .... . _.0.
1t1rF.
---··2ttrF•

.... ,.
•• _.' 3hrF .
••• - - 2hrF.

...
8DII - - - - - bF. '100 · - - · · ..tlrF.
.• -., StirF.
.. __
Rainfall . .... hF.
• _0 • • IhrF .

"'"
.
••• n ..

...
--Reo!
- - - - - fhrF. 1. . .
__ 0 b r.
••

...
• •••• 3hrF. .11

-
- _. _. 4hrF.

Rainfall- o····lIIrF. •••


Runoff ... •.... "'F,

2"
••

Figure I. Using rainfall-runoff forecasting model to estimate one to SIX hours ahead
forecasted hrdrograph.

Storm Event
7916

.
'200
--0-,
'000

ODD

ODD

.
. .0

Figure 2. Using Storage-runoff forecasting model to estimate one to SIX hours ahead
forecasted hrdrograph.
96 P.-s. YU ET AL.

Forecasting Storm Event


Model 7021 7916
,000 .... --....
800
Real
• - - - - 1hrF. .... ••••• ttlrF •
... _- "'rF.

100
- - - - - 2hrF.
- •. - - 3hrF.
- - - - - .thrF.
.... • •• _. atlrF.
- . ' . - 'hrF•
••• -. ItlrF,
Rainfall 400 .000
• _ ••. 'flrF.

zoo ...
... ...
•1

,
- - R...
. _. -''''F. "'"
...
~
3GO

Rainfall-
.. ...
200 lOG

Runoff ,
...

Figure 3. Using corrected rainfall-runoff forecasting model to estimate one to six hours
ahead forecasted hrdrograph.

Forecasting Storm Event

----....
Model 7021 7916

Storage
-
,,.-
.
lit, •.

,-
·····."r •.

-.. -- ...
,ao•
--....

Storage- .. ...
Runoff
'" 2ao


.
,

Figure 4. Using corrected storage-runoff forecasting model to estimate one to six hours
ahead forecasted hrdrograph.
TRANSFER FUNCTION MODEL AND STORAGE-RUNOFF PROCESS 97

Criteria Storm Event


7021 7916

"
MAD "
"

MSE

RTIC

CC

CE

:~I1JjIiIiIJ'
.,

EV :~~.n-"'Ii'l
"j
-0.1

-0.115
r
. .
:: .Jl.Pi'fii
''
.0.15

.0.26

Figure 5. Comparsion of model performance based on eight kind of criteria with corrected
rainfall-runoff and storage-runoff forecasting model.
SEEKING USER INPUT IN INFLOW FORECASTING

T. Tao, I. Corbu, R. Penn, F. Benzaquen and L. Lai


Water Resources Department, Ontario Hydro
700 University Avenue, H9C22, Toronto, Ontario M5G lX6
CANADA

Traditionally, inflow forecasting models were used like black boxes. Users prepared the
inputs at one end and received the outputs at the other end. In such an environment, the
inflow forecasting models were aimed at generating inflow forecasts without any user
intervention. This paper describes anew, user friendly, approach to inflow forecasting.
It allows users to input their preferences, interactively, during the execution of the
model and to generate inflow forecasts with which they feel comfortable.

INTRODUCTION

One of the requirements for integrating the water management and operation of Ontario
Hydro's hydroelectric facilities into its Grid is to generate daily inflow forecasts for up
to 732 days for those facilities and their control reservoirs. It is recognized that inflow
forecasts with longer term could hardly be indicative of what will occur due to the
random process upon which they are based. In order to assess the risk associated with
the use of inflow forecasts, it is necessary to have a variety of equally likely inflow
forecasts. The model described in this paper provides such a capability, allowing users
to generate inflow forecasts with any desired exceedance probabilities of volume.

Each generated inflow series has three parts. The first part comprises inflows for the
first four days. They are provided interactively by a user-selected inflow forecasting
model. The second part includes inflows from day 5 to day N-l. These are heuristically
modified historic inflows which precede those in the third part. They ensure a smooth
transition from the first part to the third part. The third part contains inflows from day
N, which ranges from 15 to 60, to day 732. They are part of historic inflow records.
The process of generating such local inflow series is accomplished through three
modules. They are called expected forecast, heuristic forecast and probabilistic forecast
respectively. Users play an active role in deciding what will be produced in each module
(Tao 1993). The paper describes how such user interfaces are achieved.

EXPECTED FORECAST

The expected forecast generates the inflows for the first four days of the two-year time
99
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 99-103.
© 1994 Kluwer Academic Publishers.
100 T. TAO ET AL.

horizon. Preliminary comparative studies have been conducted to retain the best of
several conceptual and stochastic models (Tao and Lai, 1991) for the hydrologic
conditions in Ontario, Canada. Three stochastic models have been selected as a result
of these studies. They are ARMAX(3,O,2), ARIMA(I,I,I) and ARIMA(I,2,2). Each
of these models generates a different four-day forecast. The exogenous input in
ARMAX(3,O,2) reflects the contribution of the precipitation of the previous and current
days to the forecasted inflows. Natural logarithms of the inflows are used in formulating
these three models. This ensures that the forecasted inflows will always take non-
negative values.

The forecasts are performed one watershed at a time. For each run of the model, users
are required to provide the past three days of observed local inflows, the temperature
and precipitation for the previous day, and forecasted temperature and precipitation for
the next four days at the indicated site. The temperature and precipitation data represent
maximum daily temperature and total daily precipitation respectively (see the upper half
of Figure 1). The maximum daily temperature is used to decide whether or not the
precipitation should be considered in ARMAX(3,O,2). If the maximum temperature is
equal to or lower than zero degree celsius, the precipitation is assumed not to contribute
to the runoff. Once all required inputs have been provided, three different four-day
forecasts are instantly displayed on the screen (see the lower half of Figure 1). Users
can select one of them or issue their own forecast based on experience with the river
system, knowledge of the meteorological forecast and the output of the three stochastic
forecasting models.

IIISSISSAGI Observed Forecasted


ROCXY ISLAND 06/18/93 06/19/93 06/20/93 06/21/93 06/22/93 06/23/93 06/24/93
Max Teq:»er.(OC) 1m:~1 !il!!l!il!l!!!!l!i
mm:::::E$= > 0 > 0 > 0 > 0 > 0
Ttl Precip. (nm) :mmiH!5!!Hi
m:u=5ei$ ====::
:::=--=umm:: 0 0 0 0 0
Inflow (ems) 23.00 25.00 26.00 :::..15=:::::-.:
IIII!IiIIl!I!lI ~1 :==::::::::
iEiUa55ii5 Im~~

u:mmame
"i!ll, :::::m::===
!o:::s..::a:=
IialiHII!mI!!!
rue-::mmm:i ="==:::-
..:::::: 24.88 22.77 20.67 18.91
,UMA(1.1.1 ) i5k=a5i;~ ms:r:.==:m I~£i 26.51 26.77 26.91 26.98
JiUMA(1.2.2) =~ ~ I!ll5SI!!lI5! §::sm:ii 27.05 28.14 29.27 30.45
Note: Inflow data refer to the station: Mississagi River at Rocky Island Lake
Precip. and teq:»er. data refer to the station: Mississagi OH
IE· INPUT ISER'S FORECASTS

Figure 1 EXPECTED FORECAST: input and output screen

HEURISTIC FORECAST

The heuristic forecast provides a smooth transition between the single four-day forecast
retained by the user, which ends on day 4, and the multiple series of daily historic
SEEKING USER INPUT IN INFLOW FORECASTING 101

inflows, which start from day N. Basically, it eliminates the discontinuity that occurs on
day 4 between the expected forecast and the historical series of inflows (Figure 2). The
module extends the inflow forecast from the fourth day to the Mh day. From the Mh
day on, the forecast is represented by series of historic inflows. The value of N is
selected by the user and must be between 15 and 60. When selecting the value of N,
the user places judgement on the current state of the watershed and on how far apart this
is from its historical average at the time of the forecast. The rule for a smooth transition
from day 4 to day N is to reduce the differences identified on day 4 between the
expected forecast and each historic series at a constant rate continuously until such
difference on day N-l is only 1 % of the difference on the fourth day. The reduction rate
"r" can be determined by solving equation (1):
rN-s=.OI (1)

The heuristically calculated inflow forecasts QH,I are obtained as follows:


QH" =QA" +(QE,4 -QA,J • rt-4 5 ~t <N (2)

where QE 4 and QA 4 are the expected inflow and the actual historic inflow on the fourth
day respeCtively. Figure 2 shows an example where N=60.

90~---------------------------------------------.

LEGEND
80
A EXPECTED FORECAST
70
-- HISTORIC RECORDS
Ci) 60 ........ HEURISTIC FORECAST
E
~ 50
~
u:: 40

30 Discontinuity
201-'k------"'-
"'"

""

10 ...., ...........,..

OLL--------------L-~------------L-------------~
04 50 60 100 150
Time (day)

Figure 2 HEURISTIC FORECAST: modification of inflows (N=6O)

The heuristic forecast also takes care of the occurrence of peak inflow during the spring
freshet. It is assumed that the peak inflow caused by snowmelt during the spring freshet
can only happen once a year. Users are asked to indicate if the peak inflow for the
102 T. TAO ET AL.

current spring freshet has passed. If it is the case, the heuristic forecast will phase out
those peak inflows of historic series which lag behind. The process is demonstrated in
Figure 3 for one of the historic series. It first shifts the peak inflow of the historic series
to day one. The daily inflows after the peak are continuously moved to day 3, day 5,
and so on. This process is repeated nd times, where ~ represents the number of days
between day one and the day when the peak of historical inflow series occurred (see
Figure 3). The inflows on even days are set to be the averages of two neighbouring
inflows on odd days. The shifted inflows are then modified using equations (1) and (2)
to achieve a smooth transition from the fourth day to the Mh day.

~.--------------.----------------------------~
Actual inflows Forecasted inflows
nd I
200 LEGEND
HISTORIC RECORDS
-+- SHIFTED RECORDS
"ii) 150 HEURISTIC FORECAST
.[ A EXPECTED FORECAST
~
u:: 100

50

OL--------------LL-----------~~~-------- __~
-50 04 50 60 100
Time (day)

Figure 3 HEURISTIC FORECAST: phase-out of peak inflows

PROBABll.JSTIC FORECAST
The probabilistic forecast sorts out representative series based on user-specified
exceedance probabilities of volume. The user has a choice of defining two such
probabilities. The first one is based on the biannual volume. The second one is based
on the volume corresponding to a user-specified time period, which ranges from 45 days
to 366 days. The process starts with finding two sets of volumes: one set covers the
inflow volumes of each series from day one until the end of a user-specified time period.
Another set covers the total volume of all inflows of each series. The volumes in each
set are then ranked in descending order and exceedance probabilities are calculated.
Finally, each inflow series is associated with its exceedance probability. Figure 4
presents four forecasted local inflow series with representative exceedance probabilities
based on annual volumes, corresponding to a user-specified time period of 366 days, at
SEEKING USER INPUT IN INFLOW FORECASTING 103

Rocky Island Lake of the Mississagi River. Forty years of inflow data were used.

260
........... 5% ------ 50% - 75% - - 95%
240
60
220 First 60 days
200 50

180
40
_1S0
If)

,[ 140 30

~ 120
20
u:: 100
80 10
SO
40
20

100 200 300


Time (day)

Figure 4 Inflow series with representative exceedance probabilities

SUMMARIES

The inflow forecasting approach introduced in this paper provides users with more than
one optional scenario at each step, allowing them to make decisions interactively during
the forecasting process. Every decision made by users has an effect on the final forecast.
The major assumption of the new approach is that users are experienced practitioners.
The new approach is designed to enhance their capability of making an acceptable
forecast, and not to relieve them of doing the forecast. Figure 4 can, in fact, be viewed
on screen at the end of the forecast. Users can go back and try with different inputs until
they are comfortable with the inflow forecast they generate.

REFERENCES

Tao, T. and L. Lai (1991) Design Specifications of Short Term Inflow Forecaster,
Technical Report, Ontario Hydro.

Tao, T. (1993) Short Term Inflow Forecaster: User's Manual, Technical Report, Ontario
Hydro.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY

P. R. H. SALES!, B. de B. PEREIRN and A. M. VIEIRA!


! Centrais Eletricas Brasileiras S. A. - ELETROBRAS
Av. Pres. Vargas, 642, 8Q andar, Rio de Janeiro, PO Box 1639
20079-900, Brazil
2COPPEIUFRJ, Rio de Janeiro, PO Box 68507
21945-970, Brazil

Linear Procedures for Time Series Modeling, recently introduced in the Brazilian
Electrical Sector by ELETROBRAS through the Coordinating Group for the
Interconnected Operation of the Power System - GeOI, are presented for the following
model sub-classes: Univariate Autoregressive Moving Average (ARMA), ARMA
Exogenous or Transfer Function (ARMAX or TF), Seemingly Unrelated or
Contemporaneous ARMA (SURARMA or CARMA), Multivariate or Vectorial ARMA
(MARMA or VARMA) and MARMA Exogenous (MARMAX). The methodology and
the algorithms here proposed had, as a cornerstone, the works of Professor Hannan,
developed alone or with other researchers after 1980 and is a real application of inflows
forecasting, which takes a very important place in the Brazilian Electrical Operation
Planning.

INTRODUCTION

The Brazilian interconnected hydrothermal electrical generating system has 55900 MW


of installed capacity in which hydroelectric plants account for 93 percent. Since 1973,
GeOI, the Coordinating Group for the Interconnected Operation of the Power System,
which has representatives from the 18 majors Brazilian utilities and the holding,
ELETROBRAS, has been responsible to achieve the most efficient utilization of the
available hydro and thermal resources in the system. GeOI activities range from
operations planning for the next five years to real-time control of the power system. In
its operations planning, the inflows forecasting is one of the most important point, as
shown by Terry et al. (1986).
105
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 105-117.
© 1994 Kluwer Academic Publishers.
106 P. R. H. SALES ET AL.

Up to now, GeOI inflows forecasting is based on YevjevichIBox - Jenkins


methodologies, see Sales et al. (1986) or Salas et al. (1980). In a selection of a
methodology to a particular data set we should clearly answer the following questions:
(a) why has a technique been employed relative to other techniques?
(b) what has guided the choice of technique in the applications case?
(c) what utility have the resultant models?
The nature of GeOI coordinating activities in the operation of large scale power
system does not allow the use of much time in the analysis of a large number of
extensive hydrological time series, each one with more than 700 observations. We
believe that in this case linear automatic methods would result in a much faster action
and a more efficient performance of GeOI specialists group on operational hydrology.
The other available methods, such as non linear and non automatic techniques, would
be less adequate for being more dependent on the analyst functions. Furthermore,
automatic methods are also recommended in the case of training of new specialists who
replace utilities representatives.
The methodology and algorithms utilized in this paper were first proposed by Hannan and
Rissanen (1982) and further developments are presented in Hannan and Deistler (1988) and
were applied to several hydrological time series of monthly average natural inflows of
reservoirs in Brazil. Therefore, four linear algorithms were used .for identification and
estimation of the parameters of the ARMA, ARMAX, SURARMA and MARMA models,
as presented in Sales (1989) and Sales et al. (1986, 1987, and 1989, a, b) and forecasts for
one year ahead with standard error intervals were obtained for each one of the selected
model.
Theoretical properties, simulation results and applications of the identification and
estimation procedures given in this paper are presented in Hannan and Deistler (1988),
Poskitt (1989), Koreisha and Pukkila (1989, 1990, a, b) and Pukkila et al. (1990) and
references therein.
The purpose of this paper is to present a first known real application of some of these
methodologies.

TIME SERIES MODELS. LINEAR APPROACHES

A great number of univariate and multivariate time series models have been recently
proposed in hydrology, and they can be classified according to the dependency and
relationship previously mentioned.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 107

Consider m time series represented by W


-t
= (WIt' W 2t' ... , W m )
t

The MARMA (p,q) model is written as

cl>p(B)Z = 9 q (B)a (1)


-t -t

where
i) cl>p(B) and 9 q (B) are respectively the autorregressive and moving average
polynomial matrices in the backward shift operator B. It is assumed that all the
roots of the determinantal equations IcI>p (B)I= 0 and 19 q (B)I= 0 are outside the
unit circle.
ii) Zt ia a suitable transformation of the time series Wt. In our applications we first use
the Box-Cox transformation and them standartized by the monthly means and
standartized by the monthly means and standard deviations, since both were
periodic.
As concerns the complete muItivariated model expressed by (4), the following
remarks can be made:
(i) during the model building process the degrees of the operators cl>ij(B) and 9ij(B)
can be adjusted so that the ARMA models for the individual time series accurately
describe the behavior of each series;
(ii) the SURARMA model is the result of cl>ij(B) and 9ij(B) coefficients being null for i :#
j, that is, the parameters matrices are diagonal;
(iii) the ARMAX model is obtained when the coefficients cl>ij(B) and 9ij(B) are null for
i < j or, in other words, when the parameters matrices are triangular;
(iv) the MARMAX is obtained when is deleted one or more rows from the
autoregressive matrix, cl>p(B), and one or more columns of the moving average
parameters matrix, 9q(B).
Basically, the procedures for the linear algorithms consist in linearizing the estimates
of the innovations ina , Po.
-t
(i = 1, ... , k), where
1
Po.
is an initial estimate of the
1

parameter vector, (i = 1, ... , k), obtained previously in the Initial Identification and
~i
Estimation Stage. Thus, the ~i estimator expression can be written as
108 P. R. H. SALES ET AL.

(2)

a
where i is the partial derivative of
-t
a
-t
with respect of ~. .
1

The asymptotical variance estimator of ~i is given by

(3)

and the covariance estimator between ~i and ~ j , by

(4)

If the model does not have moving average terms, a is linear in J3 and the solution
-t -
is simply given by
(5)

Otherwise, a new parameters vector given by (5) is used in place of J3 and the
-0
linearization process is repeated until final convergence.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 109

APPLICATIONS
In order to get the forecasts, the proposed algorithms were used in eight series of natural
monthly average flows rates of the reservoirs of Furnas, on the Grande River, ltumbiara on
the Paranaiba River, Dba Solteira on the Parana River, Barra Bonita on the Tiete River,
Jurumirim on the Paranapanema River, Tres Marias and Sobradinho on the Sao Francisco
River as well as the incremental series of Sobradinho. Each one of the hydrological time
series analyzed has 648 observations. The data cover the period from January 1931 to
December 1984 and were obtained from Centrais Eletricas Brasileiras SA -
ELETROBRAS, Brazil- see Figure 1.

Flow Rates (~/s)


Max Min Average
1. FURNAS 3650 196 837
2. ITUMBIARA 5320 254 1379
3. I.SOLTEIRA 15293 1280 4583
4. B.BONITA 2104 42 335
5. JURUMIRIM 1539 51 203
6. T.MARIAS 3859 92 632
7. SOBRADINHO 15364 640 2569
8. Incr.SOBRAD. 12514 504 1926

Figure 1. Location of the time series used in the linear algorithms.


110 P. R. H. SALES ET AL.

Application of the Box-Cox transfonnation, Table 1 shown that all of them were of a
natural logarithmic type.

TABLE 1. Box & Cox transformation selected to the monthly natural average
flow rates of site developments of Furnas, Itumbiara,
Dba Solteira, Tres Marias, Sobradinho and the Intermediate Basin

SERIES Al A2
FURNAS 0 -179
ITUMBIARA 0 -179
I.SOLTEIRA 0 -1238
B.BONITA 0 -35
JURUMIRIM 0 -45
T.MARIAS 0 -90
SOBRADINHO 0 -529
INTERMEDIATE BASIN 0 -468

Univariate ARMA (p,q) model to Furnas series

From the initial estimates of the parameters of the ARMA (1,1) model identified in the
previous stages. ~1 = 0.8426, (h = -0.2278 and &;= 0.4351 we moved to final
stage of the proposed algorithm. The final estimates shown on Table 2 were obtained
after ten iterations with accuracy of I x 10-4.

0';
TABLE 2. Final estimates of the parameters cl>1' 01 and of the ARMA(1,I)
model fitted to the transformed series of Fumas site development

PARAMETER ESTIMATE STANDARD ERROR


0.8421 0.0237
-0.2398 0.0426
0.4343
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 111

Forecasts for one year ahead with two standard error intervals are shown in Figure 2.
4
3,5
3
2,5 • 1985

-!! 2 •+ + forecast
-
"'s +
"b 1,5 +
• • observed

-
+

---
• •
+
0,5 ~ • • • • •
0
J FMAMJJA SON D
Figure 2. Forecasts for 1985 with two standard error. ARMA model- Furnas.

ARMAX (p,r,s,q) model to Tres Marias, Sobradinho and incremental to


Sobradinho series
The identified ARMAX model was: p = 1, r = 2, s = 0 and q = 1. The initial estimate
a a
of the parameters were Cl =0.8075, 1 =0.6227, 2 = -0.5202, £'1 =-0.4130
and &; =0.2946.
After the initial estimates of the previous stages of the algorithm, we moved to the
final stage. The final estimates were obtained after eight iterations with accuracy of 1 x
10-4 and are summarized in Table 3 with the corresponding standard errors.

1, 0;
TABLE 3.Final estimates of the cl> d d2, fl and parameters of the
ARMAX model fitted to Tres Marias, Sobradinho and
Intermediate Basin series

SERIES VARIABLE PARAMETER ESTIMATE STDERROR

SOBRADINHO zt-l cl 0.8469 0.0866


TRESMARIAS Xt-l d1 0.5996 0.0358
Xt-2 d2 -0.4626 0.0742
RESIDUAL ~-l fl -0.3536 0.0899
0a2 0.2939
112 P. R. H. SALES ET AL.

Ex-ante forecasts for one year ahead with two standard error intervals are shown in
Figure 3.
14
12

,
10 1985


8
~
+ + forecast
-
"'a 6
"b 4 • observed
L; i
2
0
._.-t....!--
J F M A M J J A S 0 N D

Figure 3. Ex-ante forecasts for 1985 with two standard error. ARMAX model-
Sobradinho, input Tres Marias.

SURARMA (p,q) model to Dha Solteira, Barra Bonita and Jurumirim series

The algorithm considered, in iterative way, the estimates of the n matrix in order to
obtain vector 13 estimates and the corresponding standard errors.
Table 4 resumes the results of the convergence of the algorithm after four iterations
with accuracy of 1 x 10-4.

TABLE 4. Final estimates of the SURARMA(I, 1) model fitted to Dha


Solteira, Barra Bonita and Jurumirim series

SERIES FINAL ESTIMATES

I. SOLTEIRA ~1 = 0.7954 (0.0234)


81 = - 0.1610 (0.0404) ~2
O'a =04099
.

B. BONITA ~1 = 0.7616 (0.0188)


81 = - 0.3824 (0.0245) &; = 0.5151
JURUMIRIM ~1 = 0.6588 (0.0231) &; = 0.5421
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 113

Forecasts of one year ahead for the three series with two standard error intervals are
shown in Figure 4.

25
20 1985
... 15
........

-a + forecast
"b1O • observed
5

° J FMAMJJA SON D

Figure 4a. Forecasts for 1985 with two standard error. SURARMA model- TIha
Solteira.

1,4
1,2 1985
1
+ forecast

..
...~ 0,8
"S 0,6 • observed
0,4
0,2 • • • •• • • i +

+

° J F M A M J J A S 0 N D

Figure 4b. Forecasts for 1985 with two standard error. SURARMA model- B. Bonita.

0,8

0,6
1985
0,4

• • • •• i
:.!'!:!
"'s + + + forecast
"S 0,2
0
• • + + + +
•••
• observed

J F M A M J J A S 0 N D

Figure 4c. Forecasts for 1985 with two standard error. SURARMA model- Jurumirim.
114 P. R. H. SALES ET AL.

MARMA (p,q) to Fumas, Itumbiara and Tais Marias series

Using the residuals of the three series which were obtained previously, several
multivariate MARMA (p,q) models were estimated. With the BIC (p,q) criterion the
MARMA (1,1) model was identified.
A careful analysis of the results permitted the maximal use of the proposed algorithm
in Furnas, Itumbiara and Tres Marias series.
First, the iterative process of the final stage considered the complete multivariate
model, that is, with no restriction imposed on its parameters. The final estimates were
obtained after five iterations with accuracy of 1 x 10-4.
After this, restrictions were imposed to the parameters of the MARMA (1,1) model. In
other words, the hypothesis that not all parameters of the model differed significantly from
zero was considered consistent. In fact, the SURARMA model seems suitable here, but for
illustration of the MARMA algorithm we deleted only parameter within one standard error.
The final estimates of the parameters of the restricted MARMA (1,1) model were
obtained after four iterations in the final stage of the proposed algorithm with 1 x 10-4
accuracy. Table 5 summarizes the principal results of the final convergence process.
Standard errors of the estimates are shown in parenthesis.

TABLE 5.Final estimates for the restricted MARMA(1,1) model parameters


fitted to Furnas, Itumbiara and Tres Marias serlesof the
SURARMA(1,I) model fitted to Ilha Solteira, Barra Bonita
and Jurumirim series

SERIES cb MATRIX e MATRIX RESIDUAL


VARIANCE
FURNAS ITUMBlARA T.MARIAS FURNAS ITUMBlARA T.MARIAS

FURNAS 0.8685 -0.0551 - -0.3397 0.0827 - 0.4299


(0.0206) (0.0354) (0.0363) (0.0363)
ITUMBIARA - -0.7380 - - 0.2239 0.0971 0.4478
(0.0333) (0.0565) (0.445)
1'RES MARIAS 0.0459 -0.0665 0.8202 - 0.1093 -0.2219 0.4497
(0.0215) (0.0369)_ JO.0243) (0.0562) (0.431)
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 115

Forecasts for the three series for one year ahead with two standard error intervals are
shown in Figure 5.

Figure Sa. Forecasts for 1985 with two standard error. MARMA model- Furnas.

5
4
1985
3

• ••
-!!
+ + forecast
-
"'s 2 +
"b • observed
1
0
.............
• • ~
J F M A M J J A S 0 N D

Figure 5b. Forecasts for 1985 with two standard error. MARMA model- ltumbiara.

5
4
1985
3
-!! + forecast
-
"'s 2

-•
"b • observed
1 +
0
J F M A M J J A S 0 N D

Figure 5c. Forecasts for 1985 with two standard error. MARMA model- Tres Marias.
116 P. R. H. SALES ET AL.

FINAL COMMENTS
Some general comments can now be made.
i) Since variances of dry periods are smaller than wet periods in all graphs of 12 steps
ahead forecasts, this becames apparent whem we transform back to the original
variables.
ii) Since SURARMA models, from the physical point is most sensible in most
applications we obtained smaller standart errors for parameters and residual
variances than with ARMA models (eq. Table 2 and 5 for Furnas). Whenever
ARMAX where conveniant the residual variance where smaller than for ARMA
models
iii) Currently, at ELETROBRAS, forecast comparison are being made among the
automatic methodology of this paper with the Box-Jenkins methodology. The
results seems promising for the automatic methodology.
iv) The computer time on an ffiM 4381 R14 were respectively 3.06 sec. for the
ARMA, 10.52 sec. for the ARMAX, (4.67+3x3.06) sec. for the SURARMA and
19.78 sec. for the MARMA application. Currently we are working on a
microcomputer version with more efficient numerical algorithms.
v) Theoretical properties of the identification and estimation procedures given in this
paper are presented in Hannan and Deistler (1988) and references therein.
Simulation results and applications on this and related work are given in Newbold
and Hotopp (1986), Hannan and McDougall (1988), Poskitt (1989) and Koreisha
and Pukkila (1989, 1990 a,b) and Pukkila et al (1990).

ACKNOWLEDGMENT
The authors are grateful to the late Professor E. J. Hannan for his encouragement and
for making available many of his, at the time, unpublished papers and an anonimous
referee for his usefull suggestions.

REFERENCES

Hannan, EJ. and McDougall, AJ. (1988) "Regression procedures for ARMA
estimation", Journal of the American Statistical Association, Theory and Methods, 83,
490-498.
Hannan, E.J. and Rissanen,l. (1983) "Recursive estimation of mixed autoregressive-
moving average order", Biometrika, 69, 81-94. Correction, Biometrika, 70, 303.
Hannan, S. and Deistler, M. (1988) The Statistical Theory of Linear Systems, John
Wiley & Sons, New York.
LINEAR PROCEDURES FOR TIME SERIES ANALYSIS IN HYDROLOGY 117

Koreisha, S. and Pukkila, T. (1989) "Fast linear estimation methods for vector
autoregressive moving-average models", J. of Time Series An. 10,325-329.
Koreisha, S. and Pukkila, T. (1990 a) "Linear methods for estimating ARMA and
regression models with serial correlation", Comun. Statist. - Simula, 19, 71-102.
Koreisha, S. and Pukkila, T. (1990 b) "A generalized least-squares approach for
estimation of autoregressive moving-average models", J. of Time Series, An. 11, 139-
151.
Newbold, P. and Hotopp, S.M. (1986) "Testing causality using efficiently parametrized
Vector ARMA models", Applied Mathematics and Computation, 20, 329-348.
Poskitt, D.S. (1989) "A method for the estimation and identification of transfer function
models", J. Royal Statist. Soc. B, 51,29-46.
Pukkila, T., Koreisha, S., Kallinen, A (1990) "The identification of ARMA models",
Biometrika, 73, 537-548.
Salas, J.D., Delleur, J.W., Yevjevich, V. and Lane, W.L. (1980) Applied Modeling of
Hydrologic Time Series, Water Resources Publication.
Sales, P.R.H. (1989) "Linear procedures for identification and parameters estimation of
models for uni and multivariate time series", D. Sc. Thesis, COPPE / UFRJ (In
Portuguese) .
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1986) "Inflows forecasting in the
operation planning of the Brazilian Hydroelectric System", Annals of the II Lusitanian-
Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal, 217-226
(In Portuguese).
Sales P.R.H., Pereira, B. de B. and Vieira, AM. (1987) "Linear procedures for
identification and estimation of ARMA models for hydrological time series", Annals of
the VII Brazilian Symposium of Hydrology and Water Resources, Salvador, Bahia,
605-615 (In Portuguese).
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 a) "A linear procedure for
identification of transfer function models for hydrological time series", Annals of the IV
Luzitanian-Brazilian Symposium of Hydraulics and Water Resources, Lisbon, Portugal,
321-336 (In Portuguese).
Sales, P.R.H., Pereira, B. de B. and Vieira, AM. (1989 b) "A linear procedure for
identification and estimation of SURARMA models applied to multivariate hydrological
time series", Annals of the IV Luzitanian-Brazilian Symposium of Hydraulics and
Water Resources, Lisbon, Portugal, 283-248 (In Portuguese).
Terry, L.A, Pereira, M.V.F., Araripe Neto, T.A, Silva, L.F.C.A and Sales, P.R.H.
(1986) "Coordinating the energy generation of the Brazilian national hydrothermal
electrical generating system", Interfaces, 16, 16-38.
PART III
ENTROPY
APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS
IN HYDRAULICS

CHAO-LIN CHIU
Department of Civil Engineering
University of Pittsburgh
Pittsburgh, PA 15261
USA

This paper describes the present status of efforts to develop


an alternative approach to hydraulics, in which probability
and entropy concepts are combined with deterministic, fluid-
mechanics principles. Some results of applying the approach in
analysis and modeling of flows in pipes and open channels are
also presented.
INTRODUCTION

Uncertainties always exist in parameters and variables


involved in hydraulic studies of flows in pipes and open
channels, such as velocity distribution, discharge, shear
stress, friction factor, diffusion, and transport of mass,
momentum and energy. The uncertainties are due to both the
inherent randomness of these parameters and variables, and
man's ignorance or inability to fully understand them.
Hydraulic studies under such uncertainties require an approach
that has probability element. A possible approach being
developed is based on probability and entropy concepts
combined with the deterministic, fluid-mechanics principles.
Some of the research results have been published in a series
of papers (Chiu, 1987, 1988, 1989, 1991; Chiu and Murray,
1992; Chiu, etc., 1993). This paper summarizes the approach
applied in analysis and modeling of flows in pipes and open
channels.
MODELING OF VELOCITY DISTRIBUTION

The spatial distribution of mean-flow velocity in the


longitudinal direction affects the discharge, shear stress
distribution, friction factor, energy gradient, diffusion, and
concentration of sediment or pollutant, etc. Therefore, to
study various transport processes in pipes and open channels,
a reliable mathematical model of velocity distribution is
121
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 121-134.
© 1994 Kluwer Academic Publishers.
122 C.-L.CHIU

needed. A system of velocity distribution equations derived by


Chiu (1987, 1988, 1989)can be represented by

(1)

in which u=velocity at 1;; I;=an independent variable with which


u develops such that each value of I; corresponds to a value of
U; I;max=maximum value of I; where the maximum velocity U max
occurs; and I;o=minimum value of 1;, which occurs at the channel
bed where u is zero; and

(E a
N
p(u) =exp i +1 u i ) (2)
~=o

which is the probability density function of u, derived by


maximizing the entropy function (Shannon 1948),

H = -JoUmaxp(u) lnp(u) du (3)

subject to the following constraints:

flmaxp{u) du =1 (4)

(Umaxup(u)du =
Jo
u= Q
A
(5)

(6)

and

(7)

Equation (1) means that if I; is randomly sampled a large


number of times within the range (1;0' ~maX> and the
corresponding velocity samples are obtained, the probability
of velocity falling between u and u+du is p(u)du. Equation (4)
is based on the definition (condition) of a probability
density function. Equation (5) is based on the condition that
the mean or average velocity in a cross section must be equal
to Q/A, where Q is the discharge and A is the cross sectional
area of the channel. Equation (6) is based on the condition
APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 123

that the rate of momentum transport through a cross section is


pAu 2 or pA13il2 where l3 is the momentum coefficient. Equation (7)
represents the condition that the rate of kinetic-energy
transport through a section is pAu 3 /2 or pAail 3 /2 where a is
the energy coefficient.
A system of three different velocity distribution models,
Models I, II and III, can be obtained by using three different
sets of constraints (Chiu, 1989). If the first two
constraints, (4) and (5), are used in entropy maximization,
p(u) in (1) is given by (2) with N=l, and becomes
(8)

Equation (1) can then be integrated analytically to yield


Model I,

(9 )

in which M=a 2u max , a parameter called "entropy parameter"


since, among other reasons, the entropy of the distribution
p (u/umaxl, obtained by (3) with u replaced by u/u max and the
upper limit of integration replaced by unity, is a function of
only M (Chiu, 1989, 1991). Smaller values of M correspond to
a more uniform pattern of probability distribution p (u/u max ) ,
a greater value of entropy, and a less uniform velocity
distribution. By substituting (2) with N=l into (4) and (5),
the following equation can be obtained:

(10)

u (11)
Z1max

Equation (10) was used in deriving (9); and equation (11) is


a very useful equation that can be employed in parameter
estimation and many other applications. For instance, an
obvious application is to determine the entropy parameter M
from the ratio of the mean velocity to the maximum velocity.
It appears that an erodible channel tends to_shape the channel
and velocity distribution pattern so that u/u max may fall in a
range between 0.85 and 0.9 that corresponds to the value of
the entropy parameter M between 6 and 10 (Chiu, 1988), as
shown by the data obtained by Blaney (1937) from canals in the
Imperial Valley. Very few laboratory and field data available
124 C.-L.CHIU

include u max probably because, without the probability concept,


there has not been any basis or motivation to measure it.
According to the probability concept, u max contains important
information about the velocity. It is an important statistical
parameter that defines the range of velocity, as it is known
that th~ minimum velocity is zero. u max along with the mean
value u and the probability density function p(u) will fully
describe the probability law governing the velocity
distribution in a channel cross section. The importance of u max
as an important parameter or variable for characterizing a
streamflow should, therefore, be emphasized in future studies.
If (4)-(6) are used as constraints, p(u) in (1) is given
by (2) with N=2; and the velocity distribution equation given
by (1) is Model II. Similarly, if all four constraints, (4)-
(7), are used, N=3 in (2) and (1) yields Model III. To
determine u for a given value of ~ by Model II or III, (1) can
be integrated numerically to select u, the upper limit of
integration, that will balance the two sides of (1). Chiu
(1989) presented a discrete parameter estimation technique for
these models.
With the probability density function p(u), the cross-
sectional mean values of u, u 2 and u 3 can be obtained by taking
their mathematical expectations (expected values), without
integrating over the physical plane. This is an attractive
feature of the analytical treatment in probability domain,
especially when the channel cross section has an irregular and
complex geometrical shape. For instance, if (8) is used to
represent p (u), a, the cross-sectional mean of u, can be
obtained as the mathematical expectation of u (Chiu, 1988) as
expressed by (11) in ratio of u to u max as a function of M. The
expected values of u 2 and u 3 give the momentum and energy
coefficients also as functions of only M (Chiu, 1991).
Equation (1) indicates that (~-~o) / (~max-~O) is equal to
the probability of velocity being less than or equal to u.
This provides guidance in selecting a suitable form of
equation for ~. Flows through pipes and various open channels
can be studied by selecting a suitable equation for ~ in (1).

FLOW IN CIRCULAR PIPE

An axially symmetric flow in a circular pipe can be studied by


defining ~ in (9) to be

(12)

In which r=radial distance from the pipe center; and R=radius


of the pipe (Chiu, etc.,1993). ~ as expressed by (12) is the
ratio of the area in which the velocity is less than or equal
to u, to the total cross sectional area of the pipe. with ~
APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 125

defined by (14), ~o =0 i ~max=l; and (9) (Model I) becomes

(13)

This is the new velocity distribution equation proposed by


Chiu, etc. (1993) for a pipe flow. In contrast, a widely-used
form of Prandtl-von Karman universal velocity distribution law
for a pipe flow is
tImax -u
(14)
u.

Equation (13) satisfies the boundary conditions that u=O at


r=R and du/dr=O at r=O, but (14) ("universal law") does not.
Furthermore, unlike (14), (13) does not give the velocity
gradient that approaches infinity at the pipe wall. Therefore,
(13) is applicable in the entire flow field.
Figure l(a) exhibits a system of velocity distributions,
with u/u max given by (15) plotted against 1-r/R in the physical
plane, for a range of values of M. It correctly shows the
velocity gradient of each of the velocity distributions to be
zero at the pipe center (where 1-r/R=1). Figure l(b) shows the
same velocity distributions, but has u/u max plotted against ~
or 1-(r/R)2.
By equating the sum of the pressure and gravity forces
with the frictional resistance, the wall shear can be written
as

(15)

in which p=fluid density; Rh=hydraulic radius equal to D/4i U.=


shear velocity equal to (gRhS f ) 1/2; and Sf=energy gradient that
can be expressed as hf/L. Based on a balance between the shear
stress and diffusion of momentum at the pipe wall,

1: 0 =pt: o(- ddU) (16)


r r=R

In (16), Eo is the momentum-transfer coefficient at the wall,


which is equal to the kinematic viscosity v of the fluid if
the flow is laminar, or if the flow turbulent with a viscous
sub-layer (i. e., the pipe is hydraulically "smooth").
126 C.-L. CHID

0.9
0.8
0.7
0.6
cr:
-
0;: 0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8
ulu.u
(a)

I
0.9
0.8
0.7
0.6
l;0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 O.S
ulu_
(b)

Figure 1. Velocity distribution given by (13).

For turbulent flows in "rough" pipes, Eo is different from v


and varies with pipe roughness and fluid turbulence. With the
velocity distribution represented by (13), the velocity
gradient can be written as
2r (eM-l)
du- tImax R2
- (17 )
dr M
l+{eM-l)[l-(~n

which is zero at r=O as it should be. At the wall, (17)


APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 127

becomes

(dU)
dr r=R
=_ 2L1max (eM-l)
MR
(18)

which, unlike the velocity gradient given by (14), remains


finite. Equations (15), (16), and (18) give the headloss due
to friction over the pipe length L as

h =32 (eM-l) ( U
f M L1max
)-l( DU)-l(~)
v v
L Tf
D 2g
(19)

By comparing (19) with the Darcy-Weisbach equation, and using


(11) for u/umax , the friction factor can be obtained as:

(20)

in which
(21)

Equation (20) gives the friction factor f as a function of the


three dimensionless parameters, M, NR and to/v. The entropy
parameter M represents the velocity distribution pattern and,
hence, affects the transport of mass, momentum and energy. In
"smooth" pipes, a viscous sub-layer exists at the wall and,
hence, 8 o=V. If the flow is laminar, 8 o=V and f=64/N R, and (20)
yields F(M)=2 or, from (21), M=O. As M approaches zero, (11)
gi ves u max =2u according to the L Hospital rule; and (13)
I

becomes

(22)

which is identical to the parabolic velocity distribution


obtained by applying the momentum equation to a viscous,
Newtonian fluid.Results presented so far are strictly
analytical. By combining these results with experimental data,
Chiu etc. (1993) derived an equation that relate the entropy
parameter M to the friction factor, as shown in Figure 2.
Figure 3 gives a comparison of (13) anQ (14), based on
velocity data from a rough pipe (Nikuradse, 1932). The two
equations differ primarily near the center and the wall. The
region near the wall is enlarged in Figure 3 0 help depict the
difference. Figure 4 compares the velocity gradient given by
the two equations. As expected, the main differences also
occur near the center and the wall. The region near the center
is enlarged in Figure 4 to give a better contrast.
128 C.-L. CHIU

.........
0.1

f .......
t ' .......
0.01

0.001
o 2
" 6
M
& 10 12 14

Figure 2. Friction factor as function of M.

1.0
0 0 - (Ni1:undse 19)2)
Nit .. 10S.000
0.& Eq. 3 (M .. 6.sS)
"Univasal" Law
0.6

~
- 0.4

0.2
~
.:.
0 Cd
0 0.1
Wu
......

0.2 0.3

0
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
uIU

Figure 3. Comparison between (13) and universal law.


APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 129

IOC IOC .
101
101
." IC)1
dr 10
C IC)1
~
"0
••.••• v~ cradical I
by "'UIUYCnIl-!&W 0.99
'"<-
(!)
10
- - Vcloc:ity padicIIt
~ byEq.3
U
0
Gi
> lett

letZ

0.2 0.4 0.6 0.11 1.0


rlR
Figure 4. Velocity gradients given by (13) and (14).

OPEN-CHANNEL FLOW

To study a flow in a wide open channel of width B, ~ in (9)


can be defined as
(23)

in which y=vertical distance from the channel bed; and D=water


depth. For a two-dimensional velocity distribution in a
channel which is not "wide," and the velocity distribution is
affected by the two sides of the channel cross section, a
suitable equation for ~ derived by Chiu and Chiou (1986) is:
~=Y(l-Z) Piexp [PiZ-Y+1] (24)

ln which

(25)

(26)

In (25) - (27) the y-axis is selected such that it passes


through the point of maximum velocity; D=water depth at the y-
130 C.-L. CHIU

axis; Bi for i equal to either 1 or 2= the transverse distance


on the water surface between the y-axis and either the left or
right side of a channel cross section; z=coordinate in the
transverse direction; y=the coordinate in the vertical
direction; and h=a parameter that mainly controls the slope
and shape of velocity distribution curve near the water
surface. If h~O, ~ increases monotonically from the channel
bed to the water surface. However, if h>O, the magnitude of h
is the depth of the point of maximum velocity below the water
surface; hence, ~ increases with y only from the channel bed
to the point of maximum velocity where ~=~max=1 and, then,
decreases towards the water surface. Figure 5 shows the
coordinates chosen, along with other variables and parameters
which appear in (25)-(27). cSy, cS i and Pi are parameters which
vary with the shape of the zero-velocity isovel (i.e. the
channel cross-section) and isovels near the boundary (bed and
sides). Both cS y and cS i are approximately zero, if the channel
cross section is fairly rectangular, and increase as the cross
sectional shape deviates from the rectangular, as indicated by
Figure 5. For a method to determine these parameters, see Chiu
and Chiou (1986). The ~ curves shown in Figure 5 are
orthogonal trajectories of ~ curves. The idea of using the ~-~
coordinates in modeling the two-dimensional velocity
distribution in a channel cross section is similar to that of
using the cylindrical coordinate system in modeling the
velocity distribution and other processes in circular pipes.
The velocity distribution along the z-axis that passes
through the point of maximum velocity, or along a vertical at
the middle of a symmetrical cross section in a straight reach,
can be represented by (9) with ~ represented by

~ = ----L-
D-h
exp(l - ----L-)
D-h
(27)

which can be obtained by making Z=O and cSy=O in (25). Figure


6 shows the performance of (9) (labeled as Model I) wi th ~
defined by (28), as compared to a set of measured data (Hou &
Kuo, 1987). Also shown in Figure 6 for comparison are results
of using Model II and III obtained by making N=2 and 3,
respectively, in (2) and using it in (1). Model III seems to
be the most accurate, but requires a relatively complex method
to estimate its parameters. In contrast, the simplicity makes
Model I or (9) attractive. However, Models II and III should
also be examined for possible applications in certain
situations, such as studies concerned with the bed scour and
erosion for which an accurate estimation of velocity gradient
at the channel bed may be needed.
APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 131

'-:'-..----------i-%•
Channel bed

Figure 5. Velocity distribution and ~-~ coordinates.


N~ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~~--,

b .'0· 0.388
CD
C!) Kcasurecl Dau. Cirau , \:aG. 1987)
""eo
c- ....d t

"" ....d It

!bld Itt

!blels

t It lIt

-z:
~: -6.97 -2 -5.89 -l.
6.32xlO -S.06xlO
&l: - - 7.lwn-l.
-S.4l· -2
-7.18x10_ l
2.93x10_ s
&,: - - -l ••SuO

~
<-
.",
"" ,
....
N

0.00 15 :10 "5 60 75.


U (em/s)

Figure 6. Comparison of velocity distribution models.


132 C.-L.CHIV

If a set of velocity data is available, the velocity


distribution parameters such as M and u max of Model I can be
estimated easily by the method of least squares. In practical
applications, a simple technique using a graph such as Figure
7, obtained analytically from (9) and (11), could also be useQ
to simplify the parameter estimation. If the mean velocity u
is known from the through the discharge and the cross-
sectional area, the parameter M can be determined quickly by
simply plotting data points on the graph. The known value of
u and the estimated value of M can then be used in (11) to
determine the other parameter u max of (9). Also applicable to
pipe flows, such a graphical method will be very useful when
the umber of velocity samples is small. Figure 7 also shows
that, for M greater than about 5 or 6, the (cross sectional)
mean velocity occurs at ~=e-l~0.368. The actual location (as
measured by y, the vertical distance above the bed) of the
mean velocity depends on the relation between ~ and y. For a
wide channel, ~ can be approximated by y/Di therefore, the
mean velocity occurs at y /D=O . 368. However, for a channel
which is not "wide" according to the width-to-depth ratio such
that the velocity distribution is affected by the side walls,
the maximum velocity tends to occur below the water surface
and ~ must be represented by a non-linear function of y such
as (23) or (25). Then, the mean velocity occurs at y/D less
than 0.368 (Chiu, 1988).

IA,----------------------vnnnTTI~--~--~--~/1

0.1

0.

Figure 7. Entropy parameter M and velocity distribution.


APPLICATION OF PROBABILITY AND ENTROPY CONCEPTS IN HYDRAULICS 133

Under the analytical framework described above, the shear


stress distribution and the secondary-flow velocities can be
expressed as functions of M and u max (Chiu and Chiou, 1986). A
new equation for the vertical distribution of sediment
concentration in open channels can also be derived under the
same analytical framework (Chiu and Rich, 1992). The new
equation is similar to the well-known "Rouse equation."
However, the Rouse equation is based on the Prandtl-von Karman
velocity distribution model which gives the velocity gradient
to be infinite at the channel bed. It also gives the sediment
concentration at y=O to be infinite and is, therefore, not
applicable at and near the channel bed. The Rouse equation is
being applied only down to a certain finite distance above the
channel bed. However, without including the high sediment
concentration at and near the bed, the Rouse equation tends to
underestimate the mean sediment concentration on a vertical.
In contrast, the new equation derived does not have that
problem since it is based on (9) as the velocity distribution
which gives the velocity gradient to be of a finite value at
the channel bed such that the boundary conditions of both the
shear stress and sediment concentration can be satisfied. The
sediment characteristics (size and concentration) should have
effects on the entropy parameter M and, hence, the velocity
distribution and other related variables.

SUMMARY AND CONCLUSION

This paper presents an alternative approach to study of flows


in pipes and open channels, that consists of the following
elements:

The probabilistic formulation of velocity distribution


in fluid flows, used to lay the foundation of analytical
framework.

The variational principle maximizing the entropy,


employed to identify the probability law governing the
velocity distribution.

Deterministic, fluid mechanics principles, used to


provide the physical basis for establishing the linkages
among hydraulic variables and entropy parameter.

A geometrical technique using a special coordinate ~, to


build an effective, analytical framework in both the
deterministic and probabilistic components of the
approach.

The system of velocity distribution models derived by


134 C.-L.CHIU

using the approach is capable of describing the one- or mUlti-


dimensional velocity distribution in the entire cross section
of a pipe or an open channel, regardless of whether the pipe
or the open channel is smooth or rough, and regardless of
whether the flow is laminar or turbulent.
Under the probability and entropy concepts, u max and M
have emerged as two new hydraulic parameters that are useful
in studying the flow and various transport processes in pipes
and open channels. M is called" entropy parameter," a measure
of entropy. u max gives the range of flow velocity in a channel
cross section; and, therefore, contains an important
statistical information. The relation among M, i1 and u max ' as
represented by Eq. 11, is very useful in any hydraulic study.
Under the analytical framework developed, it has become
possible to establish linkages among various hydraulic
variables, that can be used to gain insight into the
unobservable interactions among the variables.

REFERENCES

Blaney, H. F. (1937). "Discussion of "Stable Channels in


Erodible MateriaL" by E. W. Lane." Trans., ASCE, Vol. 102.
Chiu, C.-L., and Chiou, J.-D. (1986). "Structure of 3-D Flow
in Rectangular Open Channels." J. Hydr. Engrg., ASCE, 112(11).
Chiu, C.-L. (1987). "Entropy and Probability Concepts in
Hydraulics." J. Hydr. Engrg., ASCE, 113(5), 583-600.
Chiu, C. -L. (1988). "Entropy and 2-D Velocity Distribution in
Open Channels." J. Hydr. Engrg., ASCE, 114 (7), 738-756.
Chiu, C.-L. (1989). "Velocity Distribution in Open- Channel
Flow." J. Hydr. Engrg., ASCE, 115 (5),576-594.
Chiu, C. -L. (1991). "Application of Entropy Concept in Open-
Channel Flow Study." J. Hydr. Engrg., ASCE, 117(5), 615-628.
Chiu, C.-L., and Murray, D. W. (1992). "Variation of Velocity
Distribution along Non-Uniform Open Channel Flow." J. Hydr.
Engrg., ASCE, Vol. 118 (7),989-1001.
Chiu, C.-L., and Rich, C. A. (1992). "Entropy-Based Velocity
Distribution Model in Study of Distribution of Suspended-
Sediment Concentration." Proc., ASCE National Conf. on Hyd.
Engrg., Baltimore, Aug., 1992.
Chiu, C.-L., Lin, G.-F., and Lu, J.-M. (1993). "Application of
Probability and Entropy Concepts in Pipe-Flow Study." J. Hydr.
Engrg., ASCE, Vol. 119(6).
Nikuradse, J. (1932). "Gesetzma/Hgkeit der turbulenten
Str6mung in glatten Rohren." Forschungsheft. 356.
Schlichting, H. (1979). Boundary-Layer Theory, McGraw-Hill
Book Co., New York, 596-621.
Shannon, C. E. (1948). "A Mathematical Theory of
Communication." The Bell System Technical Journal, Vol. 27,
October 1948, pp. 623-656.
ASSESSMENT OF THE ENTROPY PRINCIPLE AS APPLIED TO WATER
QUALI1Y MONITORING NE'lWORK DESIGN

N.B. HARMANCIOGLU1, N. ALPASLAN1, and V.P. SINGH2


IDokuz Eylul University, Faculty of Engineering
Bomova 35100 Izmir, Turkey
2Louisiana State University, Department of Civil Engineering
Baton Rouge, lA 70803 U.S.A.

With respect to design of water quality monitoring networks, the entropy principle
can be effectively used to develop design criteria on the basis of quantitatively
expressed information expectations and information availability. Investigations on
the application of the entropy method in monitoring network design have revealed
promising results, particularly in the selection of technical design features such as
monitoring sites, time frequencies, variables to be sampled, and sampling duration.
Yet, there are still certain problems that need to be overcome so that the method
can gain wide acceptance among practitioners. The presented study discusses the
advantages as well as the limitations of the entropy method as applied to the design
of water quality monitoring networks.

INTRODUCTION

Despite all the efforts and investment made on monitoring of water quality, the
current status of existing networks shows that the accruing benefits are low (Sanders
et aI., 1983). That is, most monitoring practices do not fulfill what is expected of
monitoring. Thus, the issue still remains controversial among practitioners and
researchers for a number of reasons. First, there are difficulties in the selection of
temporal and spatial sampling frequencies, the variables to be monitored, and the
sampling duration. Second, benefits of monitoring cannot be defined in quantitative
terms for reliable benefit/cost analyses. There are no definite criteria yet
established to solve these two problems. The entropy principle can be effectively
used to develop such criteria on the basis of quantitatively expressed information
expectations and information availability. This approach is justified in the sense that
a monitoring network is basically an information system. In fact, investigations on
135
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 135-148.
© 1994 Kluwer Academic Publishers.
136 N. B. HARMANCIOGLU ET AL.

application of the entropy principle in water quality monitoring network design have
revealed promising results, particularly in the selection of technical design features
such as monitoring sites, time frequencies, variables to be sampled, and sampling
duration.
There are still certain difficulties that need to be overcome so that the method
can gain wide acceptance among practitioners. Some of these difficulties stem from
the mathematical structure of the concept. For example, entropy, as a measure of
the uncertainty of random processes, has not yet been precisely defined for
continuous variables. The derivation of mathematical expressions for multivariate
distributions other than normal and lognormal are highly complicated. Other
difficulties encountered in the application of the method are those that are valid for
any other statistical procedure. As such, the entropy principle requires sufficient
data on the processes monitored to produce sound results. However, it is uncertain,
particularly in case of messy water quality data, to determine when a data record can
be considered sufficient. Another difficulty occurs in assessing monitoring
frequencies higher than the already selected ones.
The presented study addresses the above difficulties as well as the merits of
using the entropy principle in the design of water quality monitoring networks. The
discussions are supported by case studies relating to specific design problems such
as the selection of monitoring sites, sampling frequencies, and variables.

ASSESSMENT OF CURRENT DESIGN METHODOLOGIES

In recent years, the adequacy of collected water quality data and the performance
of existing monitoring networks have been seriously evaluated for two basic reasons.
First, an efficient information system is required to satisfy the needs of water quality
management plans. Second, this system has to be realized under the constraints of
limited financial resources, sampling and analysis facilities, and manpower. Problems
observed in available data and shortcomings of current networks have led
researchers to focus more critically on the design procedures used.
The early and even some current water quality monitoring practices were often
restricted to "problem areas" or "potential sites for pollution", covering limited
periods of time and limited numbers of variables to be observed. Recently, water
quality related problems and the need for monitoring have intensified so that the
information expectations to assess water quality have also increased. This pressure
has resulted in an expansion of monitoring activities to include more observational
sites and larger number of variables to be sampled at smaller time intervals. While
these efforts have produced plenty of data, they have also raised the question
whether one "really" needs "all" these data to meet information requirements.
Therefore, a more systematic approach to monitoring is required. As a result,
various network design procedures have been proposed and used to either set up
ASSESSMENT OF THE ENTROPY PRINCIPLE 137

a network or evaluate and revise an existing one.


Current methods of water quality monitoring network design basically cover two
steps: (a) description of design considerations, and (b) the actual design process.
Although often overlooked, proper delineation of design considerations is an
essential step before attempting the technical design of the network. In other words,
objectives of monitoring and information expectations for each objective must be
specified first (Ward and Loftis, 1986; Sanders et aI., 1983; Whitfield, 1988; Tirsch
and Male, 1984). Such design considerations are often presented as general
guidelines, rather than as fixed rules to be pursued in the second step of actual
design process (Sanders et aI., 1983). The technical design of monitoring networks
relates to the determination of sampling sites, sampling frequencies, variables to be
sampled, and the duration of sampling. It is only for this actual design stage that
fixed rules or methods are proposed. Considerable amount of research has been
carried out on the above-mentioned four aspects of the design problem. Sanders
et aI. (1983), Tirsch and Male (1984), or Whitfield (1988) provide a comprehensive
survey of research results and practices on the establishment of sampling strategies.
It is also well recognized that water quality monitoring is a statistical procedure
and the design problem must therefore be addressed by means of statistical methods.
Accordingly, information expectations from a monitoring system must be defined in
statistical terms so that the selection of sampling strategies (i.e., sampling sites,
variables, frequencies, and duration) can be accomplished and justified by a
statistical approach. The statistical methods employed in selection of spatial and
temporal frequencies basically cover regression techniques, analysis of variance
methods, standard error criteria, decision theory, and optimization techniques.
Selection of variables to be sampled is, however, a more complicated issue, as
there are no definite and readily acceptable criteria to guide the eventual decisions.
Objectives and economics of monitoring provide the basic guidelines for an overall
selection of variables, and regression and multivariate statistical analysis techniques
may be used to reduce the number of required variables. Determination of the
duration of monitoring is also often treated together with the problem of temporal
design. However, the amount of research carried out on the analysis of this problem
has been quite insufficient to bring even a reasonable resolution to the issue.
Deficiencies related to current design procedures are primarily associated with
an imprecise definition of information and value of data, transfer of information in
space and time, and cost-effectiveness. The major difficulty associated with these
current design methods is related to the lack of a precise definition for "information".
They either do not give a precise definition of how information is measured, or they
try to express it indirectly in terms of other statistical parameters like standard error
or variance. One important consequence of failure to define information can
possibly be the interchangeable use of the terms "data" and "information". Although
current methods stress the distinction between the two, a direct link between them
138 N. B. HARMANCIOGLU ET AL.

has not yet been established (Harmancioglu et al., 1992b).


Another difficulty with current design methods is how to define the value of
data. In every design procedure, the ultimate goal is an "optimal" network.
"Optimality" means that the network must meet the objectives of the data gathering
at minimum cost. While costs are relatively easy to assess, the major difficulty arises
in the evaluation of benefits because such benefits are essentially a function of the
value of data collected. The value of data lies in their ability to fulfill information
expectations. However, how this fulfillment might be assessed in quantifiable terms
still remains unsolved. As in the case of information, the value of data has been
described indirectly (Dawdy, 1979; Moss, 1976), often by the Bayesian decision
theory (Tirsch and Male, 1984).
Another criticism of the current design methods relates to how the techniques
are used in spatial and temporal design. The majority of current techniques are
based on classical correlation and regression theory, which basically constitutes a
means of transferring information in space and time. The use of regression theory
in transfer of information has some justification. However, regression approaches
transfer information on the basis of certain assumptions regarding the distributions
of variables and the form of the transfer function such as linearity and nonlinearity.
Thus, how much information is transferred by regression under specified assumptions
has to be evaluated with respect to the amount of information that is actually
transferable. One may refer to Harmancioglu et al. (1986) for the definition and
comparison of the terms "transferred information" and "transferable information".
To summarize the above discussions, one may state that the existing methods
of water quality network design are deficient because of the following specific
difficulties: (a) a precise definition of "information" contained in the data and how
it is measured is not given; (b) the value of data is not precisely defined, and
consequently, existing networks are not "optimal" either in terms of the information
contained in these data or in terms of the cost of getting the data; (c) the method
of information transfer in space and time is restrictive; (d) cost-effectiveness is not
emphasized in certain aspects of monitoring; (e) the flexibility of the network in
responding to new monitoring objectives and conditions is not measured and not
generally considered in the evaluation of existing or proposed networks. Within this
context, a methodology based on the entropy theory can be used for design of
efficient, cost-effective, and flexible water quality monitoring networks to alleviate
many of the above shortcomings of the existing network design methods
(Harmancioglu et at, 1992b; Alpaslan et at, 1992).

ENTROPY THEORY AS APPLIED TO MONITORING NE1WORK DESIGN

Entropy is a measure of the degree of uncertainty of random hydrologic processes.


Since the reduction of uncertainty by means of making observations is equal to the
ASSESSMENT OF THE ENTROPY PRINCIPLE 139

amount of information gained, the entropy criterion indirectly measures the


information content of a given series of data (Harmancioglu, 1981). According to
the entropy concept as defined in communication (or information) theory, the term
"information content" refers to the capability of signals to create communication.
The basic problem is the generation of correct communication by sending a sufficient
amount of signals, leading neither to any loss nor to repetition of information
(Shannon and Weaver, 1949).
Each sample collected actually represents a signal from the natural system which
has to be deciphered so that the uncertainty about the real system is reduced.
Application of engineering principles to this problem calls for a minimum number
of signals to be received to obtain the maximum amount of information. Redundant
information does not help reduce the uncertainty further; it only increases the costs
of obtaining the data. These considerations represent the essence of the field of
communications and hold equally true for hydrologic data sampling, which is
essentially communicating with the natural system. On the basis of this analogy, a
methodology based on the entropy concept of information theory has been proposed
for the design of hydrologic data networks. The basic characteristic of entropy as
used in this context is that it is able to represent quantitative measures of
"information". As a data collection network is basically an information system, this
characteristic is the essential feature required in a monitoring network (Alpaslan et
al., 1992; Harmancioglu et al., 1992b).
The definitions of entropy given in information theory (Shannon and Weaver,
1949) to describe the uncertainty of a single variable can be extended to the case
of multiple variables. In this case, the stochastic dependence between two processes
causes their total entropy and the marginal entropy of each process to be decreased.
The same is true for dependent multi-variables (Harmancioglu, 1981). This feature
of the entropy concept can be used in the spatial design of monitoring stations to
select appropriate numbers and locations so as to avoid redundant information. On
the other hand, the marginal entropy of a single process that is serially correlated
is less than the uncertainty it would contain if it were independent. In this case,
serial dependence acts to reduce marginal entropy and causes a gain in information
(Harmancioglu, 1981). This feature of the entropy concept is suitable for use in the
temporal design of sampling stations.
The entropy measures of information were applied by Krstanovic and Singh
(1993a and 1993b) to rainfall network design, by Husain (1989) to design of
hydrologic networks, by Harmancioglu and Alpaslan (1992) to design of water quality
monitoring networks, and by Goulter and Kusmulyono (1993) to prediction of water
quality at discontinued water quality monitoring stations in Australia. Similar
considerations were used for design of data collection systems (Singh and Krstanovic,
1986; Harmancioglu, 1984). In these studies, the entropy concept has been shown
to hold significant potential as an objective criterion which can be used in both
140 N. B. HARMANCIOGLU ET AL.

spatial and temporal design of networks.


With respect to water quality in particular, the entropy principle can be used to
evaluate five basic features of a monitoring network: temporal frequency, spatial
orientation, combined temporal/spatial frequencies, variables sampled, and sampling
duration. The third feature represents an optimum solution with respect to both the
time and the space dimensions, considering that an increase in efforts in one
dimension may lead to a decrease in those in the other dimension (HarmanciogIu
and Alpaslan, 1992). To determine variables to be sampled, the method can be
employed, not to select from a large list of variables but to reduce their number by
investigating information transfer between the variables (Harmancioglu et al., 1986,
1992a and b). Assessment of sampling duration may be approached in a number
of ways. If station discontinuance is the matter of concern, decisions may be made
in an approach similar to that applied in spatial orientation. The problem is much
simpler when a sampling site is evaluated for the redundancy of information it
produces in the time domain. If no new information is obtained by continuous
measurements, sampling may be stopped permanently or temporarily.

ADVANTAGES OF THE ENTROPY METHOD IN·DESIGN OF MONITORING


NE'IWORKS

The basic role of the entropy method

The studies carried out so far show that the method works quite well for the
assessment of an existing network. It appears as a potential technique when applied
to cases where a decision must be made to remove existing observation sites, and/or
reduce frequency of observations, and/or terminate sampling program. The method
may also be used to select the numbers and locations of new sampling stations as
well as to reduce the number of variables to be sampled (HarmanciogIu and
Alpaslan, 1992).
On the other hand, the entropy method cannot be employed to initiate a
network; that is, it cannot be used for design purposes unless a priori collected data
are available. This is true for any other statistical technique that is used to design
and evaluate a monitoring network. In fact, the design process is an iterative
procedure initiated by the selection of preliminary sampling sites and frequencies.
This selection has to be made essentially by nonstatistical approaches. After a
certain amount of data is collected, initial decisions are evaluated and revised by
statistical methods. It is throughout this iterative process of modifying decisions that
the entropy principle works well. Its major advantage is that such iterations are
realized by quantifying the network efficiency and cost-effectiveness parameters for
each decision made.
ASSESSMENT OF THE ENTROPY PRINCIPLE 141

Measure of information and usefulness of data

One of the valuable aspects of the entropy concept as used in network design is its
ability to provide a precise definition of "information" in tangible terms. This
definition expresses information in specific units (i.e., napiers, decibels, or bits) so
that it constitutes a completely quantitative measure. At this point, it is important
to note again the distinction between the two terms "data" and "information". The
term "data" represents a series of numerical figures which constitute a means of
communication with nature; what these data actually communicate to us is
"information". This distinction means that availability of data is not a sufficient
condition for availability of information unless those data have utility, and the term
"information" describes this utility or usefulness of data. Among the various
definitions of information proposed to date, the entropy measure appears to be the
only one that gives credence to the relevance or utility of data.
The value of data can also be expressed in quantitative terms since it is
measured by the amount of information the data convey. This observation implies
that monitoring benefits may eventually be assessed on the basis of quantitative
measures rather than indirect descriptions of information. In comparison with the
current methods, the entropy method develops a clearer and more meaningful
picture of data utility versus cost (or information versus cost) tradeoffs. 'This
advantage occurs because both the information and the costs can be measured in
terms of quantitative units. For example, if cost considerations require data to be
collected less frequently, the entropy measure describes quantitatively how much
information would be risked by increasing the sampling intervals (Harmancioglu,
1984; Harmancioglu and Alpaslan, 1992). By such an approach, it is possible to
express how many bits of information would be lost against a certain decrease in
costs (or in monetary measures). Similarly, it is possible to define unit costs of
monitoring in such terms of the amount of dollars per bit of information.

Network efficiency and flexibility

Efficiency is related to objectives of monitoring in that the latter delineates


"information-expected" from monitoring and the former describes "information
produced" by a network. The "information produced" is a function of the technical
features of a network related to variables sampled, spatial and temporal sampling
frequencies, and the duration of sampling. It is plausible then to define efficiency
as the "informativeness" of a network. If the design of the network is such that this
information is maximized, then the requirement of efficiency is satisfied. The
entropy theory can be used to test whether the supplied information is optimal or
not thereby ensuring system efficiency (Harmancioglu and Alpaslan, 1992).
A network, once designed and in operation, has to be evaluated for efficiency,
142 N. B. HARMANCIOGLU ET AL.

particularly if the monitoring objectives have been changed or revised. The entropy
method may again be used to assess the data collected to determine how much
information is conveyed by the network under present conditions. If revisions and
modifications are made, their contnbution to an increase in information can be
measured by the same method. Within this respect, the entropy theory also serves
to maintain flexibility in the network since each decision regarding the technical
features can be assessed on objective grounds.

Information-based design strategy

The entropy theory may be used to set up an information-based design strategy. As


noted earlier, the approach for developing the design strategy for efficient and
cost-effective networks encompasses two steps: (a) delineation of design
considerations, and (b) technical design of the network. The first step is to define
objectives of monitoring and information needs associated with each objective. The
entropy method can be employed basically for the second step to be combined with
cost considerations to realize both informativeness and cost-effectiveness
(Harmancioglu et aI., 1992b). This two-stage process can permit the design
procedures to be developed to match the information expected from monitoring.
Such an approach covers both the "demand" (objectives of monitoring) and the
"reaction" (monitoring practices) parts of the problem in an integrated fashion. Both
parts can then be defined in terms of "information" as "information needed" and
"information supplied". Efficiency and effectiveness of the network can be realized.
by matching these two aspects.
The "demand" part of the design problem can be addressed by specifying the
information expected of each objective of monitoring. The "reaction" portion of the
problem covers the more specific questions of any design procedure, such as
the selection of variables to be sampled and the selection of temporal and spatial
frequencies. Solution of the problems associated with this "reaction" step requires:
(a) an extraction information from available data, and (b) a transfer of information
among water quality variables with respect to time and space. These two steps are
shown to be effectively accomplished by entropy-based measures (Harmancioglu et
al., 1986).

Cost-effectiveness

A major difficulty underlying both the design and the evaluation of monitoring
systems is the lack of an objective criterion to assess cost-effectiveness of the
network. In this assessment, costs are relatively easy to estimate, but benefits are
often described indirectly in terms of other parameters, using optimization
techniques, Bayesian decision theory or regression methods (Schilperoort et aI., 1982;
ASSESSMENT OF THE ENTROPY PRINCIPLE 143

Tirsch and Male, 1984). Thus, a realistic evaluation of benefit/cost considerations


cannot be achieved since benefits are not directly quantified. Actually, benefits of
monitoring can only be measured by means of the information conveyed by collected
data; that is, they are a function of the value or worth of data. The concept of
entropy can also be used to quantify the benefits of monitoring since it describes
the utility of data. Here, benefits of monitoring are expressed as the information
supplied which is quantified in tangible units by entropy measures. Cost-
effectiveness can be evaluated by comparing costs of monitoring versus information
gained via monitoring. The issue is then an optimization problem to maximize the
amount of information (benefits of monitoring) while minimizing the accruing costs.
The technical features of design can then be evaluated with respect to cost
effectiveness (Harmancioglu and Alpaslan, 1992).

Space-time sampling frequencies and selection of variables

The entropy method measures the information content of available data (extraction
of information) and assesses the goodness of information transfer between temporal
or spatial data points (transfer of information). These two functions constitute the
basis of the solution to the design problems of what, where, when, and how long to
observe. Such a solution is based on the maximization of information transfer
between variables, space points, and time points, respectively. The amount of
information transfer used in such an analysis can be measured by entropy in specific
units. The selection of each technical design factor can be evaluated again by means
of entropy to define the amount of information conveyed by the data collected by
each of the selected monitoring procedures. These evaluations may eventually
provide the ability to make quantitatively based rational decisions on how long a
gauging should be operated (Alpaslan et at, 1992).
Harmancioglu and Alpaslan (1992) demonstrated the applicability of the entropy
method in assessing the efficiency and the benefits of an existing water quality
monitoring network with respect to temporal, spatial and combined temporal/spatial
design features. They described the effect of each feature upon network efficiency
and cost-effectiveness by entropy-based measures. For example, the effect of
extending the sampling interval from monthly to bimonthly measurements for three
variables investigated leads to a loss of information in the order of 20.4% for DO,
32.8% for Cl, and 68.7% for EC, as shown in Fig.l. Here, the selection of an
appropriate sampling interval is made by assessing how much information the
decision-maker would risk versus given costs of monitoring. A similar evaluation can
be made with respect to the number and locations of required sampling sites as in
Fig.2, where changes in rates of information gain are investigated with respect to the
number of stations in the network. Harmancioglu and Alpaslan (1992) further
combined both spatial and temporal frequencies as in Fig.3 to assess the variation
144 N. B. HARMANCIOGLU ET AL.

(0J0)
1.0

0.80
00

--
,;t060
CI

-
.
:z:::
....
I
><,"0.40

-
><
I-
EC

0.20

o~~----~----~----~---
234
ot(months)

Figure 1. Effects of sampling frequency upon information gain about three water
quality variables (Harmancioglu and Alpaslan, 1992).

of information with respect to both space and time dimensions. The results of these
analyses have shown the applicability of the entropy concept in network assessment.

LIMITATIONS OF THE ENTROPY METHOD IN NE1WORK DESIGN

The above-mentioned advantages of the entropy principle indicate that it is a


promising method in water quality monitoring network design problems because it
permits quantitative assessment of efficiency and benefit/cost parameters. However,
some limitations of the method must also be noted for further investigations on
entropy theory.
As the situation holds true for the majority of statistical techniques, a sound
evaluation of network features by the entropy method requires the availability of
sufficient and reliable data. Applications with inadequate data often cause numerical
difficulties and hence unreliable results. For example, when assessing spatial and
temporal frequencies in the multivariate case, the major numerical difficulty is
related to the properties of the covariance matrix (Harmancioglu and Alpaslan,
ASSESSMENT OF THE ENTROPY PRINCIPLE 145

DO

EC

O~~2----~3----~4----5~--~6-­
no. of stations

Figure 2. Changes in rates of information gain (cumulative transinformation/joint


entropy) with respect to number of stations (Harmancioglu and
Alpaslan, 1992).

.......
...III
.a." 4.700
~
"
c

4.600
~ ~I
-2

2 3 4
At(months)

Figure 3. Variation of information with respect to both alternative sampling sites


(numbered 2 to 6) and sampling frequencies (Harmancioglu and
Alpaslan, 1992).

1992). When the determinant of the matrix is too small, entropy measures cannot
be determined reliably since the matrix becomes ill-conditioned. This often occurs
when the available sample sizes are very small.
On the other hand, the question with respect to data availability is "how many
146 N. B. HARMANCIOGLU ET AL.

data would be considered sufficient". For example, Goulter and Kusmulyono (1993)
claim that the entropy principle can be used to make "sensible inferences about
water quality conditions" but that sufficient data are not available for a reliable
assessment. The major difficulty here arises from the nature of water quality data,
which are often sporadically observed for short periods of time. With such "messy"
data, application of the entropy method poses problems both in numerical
computations and in evaluation of the results. Particularly, it is difficult to determine
when a data record can be considered sufficient.
With respect to the temporal design problem, all evaluations are based on the
temporal frequencies of available data so that, again, the method inevitably appears
to be data dependent. At present, it appears to be difficult to assess smaller time
intervals than what is available. However, the problem of decreasing the sampling
intervals may also be investigated by the entropy concept provided that the available
monthly data are reliably disaggregated into short interval series. This aspect of
entropy applications has to be investigated in future research.
Another important point in entropy applications is that the method requires the
assumption of a valid distribution-type. The major difficulty occurs here when
different values of the entropy function are obtained for different probability
distribution functions assumed for the same variable. On the other hand, the
entropy method works quite well with multivariate normal and lognormal
distributions. The mathematical definition of entropy is easily developed for other
skewed distributions in bivariate cases. However, the computational procedure
becomes much more difficult when their multivariate distributions are considered.
When such distributions are transformed to normal, then uncertainties in parameters
need to be assessed.
Another problem that has to be considered in future research is the
mathematical definition of entropy concepts for continuous variables. Shannon's
basic definition of entropy is developed for a discrete random variable, and the
extension of this definition to the continuous case entails the problem of selecting
the discretizing class intervals 4x to approximate probabilities with class frequencies.
Different measures of entropy vary with Ax such that each selected 4x constitutes a
different base level or scale for measuring uncertainty. Consequently, the same
variable investigated assumes different values of entropy for each selected Ax. It may
even take on negative values which contradict the positivity property of the entropy
function in theory.
One last problem that needs to be investigated in future research is the
development of a quantifiable relationship between monitoring objectives and
technical design features in terms of the entropy function. As stated earlier, an
information-based design strategy requires the delineation of data needs or
information expectations. To ensure network efficiency, "information supplied" and
"information expected" must be expressed in quantifiable terms by the entropy
ASSESSMENT OF THE ENTROPY PRINCIPLE 147

concept. At the current level of research, if one considers that the most significant
objective of monitoring is the determination of changes in water quality, then the
entropy principle does show such changes with respect to time and space. However,
future research has to focus on the quantification of information needs for specific
objectives (e.g., trend detection, compliance, etc.) by means of entropy measures.

CONCLUSION

Fundamental to accomplishment of an efficient and cost-effective design of a


monitoring network is the development of a quantitative definition of "information"
and of the "value of data". Within this context, application of the concept of
information in entropy theory has produced promising results in water quality
monitoring network design problems because it permits quantitative assessment of
efficiency and benefit/cost parameters.
However, there are still certain difficulties associated with the entropy
theory that need to be overcome so that the method can gain wide acceptance
among practitioners. The majority these difficulties stem from the mathematical
structure of the concept. Other difficulties encountered in application of the method
are those that are valid for any other statistical procedure. These problems need
to be investigated further as part of future research on design of networks so that
the validity and the reliability of entropy theory can be accepted without doubt.

REFERENCES

Alpaslan, N.; Harmancioglu, N.B.; Singh, V.P. (1992) "The role of the entropy
concept in design and evaluation of water quality monitoring networks", in: V.P.
Singh & M. Fiorentino (eds.), Entropy and Energy Dissipation in Water Resources,
Dordecht, Kluwer Academic Publishers, Water Science and Technology Library,
pp.261-282.
Dawdy, D.R. (1979) "The worth of hydrologic data", Water Resources Research,
15(6), 1726-1732.
Goulter, I. and Kusmulyono, A. (1993) "Entropy theory to identify water quality
violators in environmental management", in: R.Chowdhury and M. Sivakumar (eds.),
Geo-Water and Engineering Aspects, Balkema Press, Rotterdam, pp.149-154.
Harmancioglu, N. (1981) "Measuring the information content of hydrological
processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil
Engineering, Ege University, Faculty of Engineering, pp.13-38.
Harmancioglu, N. (1984) "Entropy concept as used in determination of optimum
sampling intervals", Proceedings of Hydrosoft '84, International Conference on
Hydraulic Engineering Software, Portoroz, Yugoslavia, pp.6-99 and 6-110.
Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of
148 N. B. HARMANCIOGLU ET AL.

information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth


Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499.
Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design:
a problem of multi-objective decision making", AWRA, Water Resources Bulletin,
Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28,
no.1, pp.1-14.
Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992a) "Versatile uses of the entropy
concept in water resources", in: V.P. Singh & M. Fiorentino (eds.), Entropy and
Energy Dissipation in Water Resources, Dordecht, Kluwer Academic Publishers,
Water Science and Technology Library, pp.91-117.
Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992b) "Design of Water Quality
Monitoring Networks", in: RN. Chowdhury (ed.), Geomechanics and Water
Engineering in Environmental Management, Rotterdam, Balkema Publishers, ch.8.
Husain, T. (1989) "Hydrologic uncertainty measure and network design", Water
Resources Bulletin, 25(3), 527-534.
Krstanovic, P.F. and Singh, V.P. (1993a) "Evaluation of rainfall networks using
entropy:I.Theoretical development", Water Resources Management, v.6,pp.279-293.
Krstanovic, P.F. and Singh, V.P. (1993a) "Evaluation of rainfall networks using
entropy: II.Application", Water Resources Management, v.6,pp.295-314.
Moss, M.E. (1976) "Decision theory and its application to network design",
Hydrological Network Design and Information Transfer, World Meteorological
Organization WMO, no.433, Geneva, Switzerland.
Tirsch, F.S., Male, J.W. (1984) "River basin water quality monitoring network
design", in: T.M. Schad (ed.), Options for Reaching Water Quality Goals,
Proceedings of 20th Annual Conference of A WRA, AWRA Publ., pp.149-156.
Sanders, T.G., Ward, Re., Loftis, J.e., Steele, T.D., Adrian, D.D., Yevjevich, V.
(1983) Design of networks for monitoring water quality, Water Resources
Publications, Littleton, CO, 328p.
Schilperoot, T., Groot, S., Wetering, B.G.M., Dijkman, F. (1982) Optimization of the
sampling frequency of water quality monitoring networks, Waterloopkundig
Laboratium Delft, Hydraulics Lab, Delft, the Netherlands.
Shannon, e.E. and Weaver, W. (1949) The Mathematical Theory of Communication,
The University of Illinois Press, Urbana, Illinois.
Singh, V.P. and Krstanovic, P.F. (1986) "Space design of rainfall networks using
entropy", Proc., International Conference on Water Resources Needs and Planning
in Drought Prone Areas, pp.173-188, Khartoum, Sudan.
Ward, Re., Loftis, J.e., (1986) "Establishing statistical design criteria for water
quality monitoring systems: Review and Synthesis", Water Resources Bulletin,
AWRA, 22(5), 759-767.
Whitfield, P.H. (1988) "Goals and data collection designs for water quality
monitoring", Water Resources Bulletin, A WRA, 24(4), 775-780.
COMPARISONS BETWEEN BAYESIAN AND ENTROPIC
METHODS FOR STATISTICAL INFERENCE

J. N. KAPUR!, H. K. KESAVAN 2, and G. BACIU3


1
Jawaharlal Nehru University New Delhi, INDIA
2Department of Systems Design Engineering, University of Waterloo
Waterloo, Ontario, CANADA N2L 3Gl
3Department of Computer Science, HKUST
Clear Water Bay, Kowloon, Hong Kong

Four methods of statistical inference are discussed. These include the two well known non-entropy
methods due to Fisher and Bayes and two entropic methods based on the principles of maximulI}
entropy and minimum cross--entropy. The spheres of application of these methods are elucidated in
order to give a comparative understanding. The discussion is interspersed with illustrative examples.

INTRODUCTION
Maximum entropy and minimum cross-entropy principles provide methods distinct from the classical
methods of statistical inference. In this context the following questions naturally arise:
• What is statistical inference?
• What are the classical methods of statistical inference?
• How do these methods compare with entropic methods of statistical inference?
• When should one use entropic rather than non-entropic methods?
The answers to these questions are related to the age old controversy arising from the two methods
of non-entropic statistical inference: (1) Bayesian and (2) non-Bayesian. There are strong arguments
for and against both of these methods of inference. The object of the present paper is to shed some
light on these fundamental questions, from the vantage point of the entropic methods of inference.

What is statistical inference?


The scope of this vast subject is summarized in the following categories for purposes of highlighting
the entropic methods of inference:
• it is concerned with drawing conclusions on the basis of noisy data, i.e., data which is influenced
by random errors.
• since it is probabilistic, its nature depends upon our concept of probability itself.
• procedures are inductive, and as such, they depend on the axioms that are introduced to make
deductive reasoning possible.
• it deals with methods of inference when only partial information is available about a system.
Since data can be made available in many different forms, since we have the subjective and objective
concepts of probability, since statistical inference is inductive, and since we can assume many different
axioms of induction, it is not surprising that we have many different methods of statistical inference.
Each method tries to capture some aspect of statistical truth contained in a given set of data. The
149
K. W. Hipel etal. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 149-162.
© 1994 Kluwer Academic Publishers.
150 J. N. KAPUR ET AL.

different approaches can. however. supplement one another to enlarge the scope of investigation. It
is th<'fefore essential that users of statistical inference, irrespective of the discipline they belong to,
understand the differences and similarities between the various types of statistical inference, without
being overly conccrned about the doctrinaire cont roversies that beset the different groups.
Over the course of its evolution the discipline of statistics has becB divided into two views:
flayt:;wn. and lIoII-Baytsian or the frequcnllsl vicw. 1\lore reccntly, however. st.atistical inference
has been enriched within the framework of the principles of maximum elltropy or mmlmum cross-
entropy. These principles provide a more unified foundation for the problem specification and the
criteria for obtaining a meaningful solution. Since the Bayesian and the frequentist schools are
well established, the newer methodologies tend to be classified with respect to these two, either as
Bayesian or as non-Bayesian. However, entropic methods represent a distinct class of statistical
reasoning. Nevertheless, the followers of the entropic methods feel closer to the Bayesians despite
t.heir claim for a separate identity.
In the next sections, the similarities and differences of the following four types of statist.ical
inference are discussed:
(i) classical, traditional, or orthodox stat.istical inference;
(ii) Bayesian inference;
(iii) inference based on maximum entropy principle (MaxEnt); and
(iv) inference based on minimum cross-entropy principle (MinxEnt).
Furthermore, the conditions under which each is most appropriate, and the circumstances under
which these can supplement one another are considered.

Classical statistical inference


This approach is based on the frequentist view of probability, and on the use of sampling distribution
theory. Every observed value is regarded as a random sample from some population, and the object is
to draw conclusions about the population from the observations given in the data. For instance, one
may specify that the population is normal with two parameters, namely the mean and variance, and
may construct functions of observations Xl, X2, ... , Xn to estimate these parameters. Since different
functions can be used to estimate the same parameter, some well-established criteria are needed for
good estimators, such as consistency, efficiency, and sufficiency. Furthermore, one needs a deductive
procedure such as the method of maximum likelihood to provide good estimators.
The parameters are regarded as fixed numbers and confidence intervals are constructed based on
observations, in which the parameters are expected to lie with different degrees of certainty. We
even form null hypotheses about the parameters which are then accepted or rejected. This decision
is taken on the basis of observations. In this process, either a correct hypothesis may be rejected or
a wrong hypothesis may be accepted resulting in an error of first or second type, respectively. We
design estimation procedures which minimize both types of errors, or minimize the second type of
error when the first type of error is kept at a fixed level. We also wish to design experiments to give
us data on which statistical analysis can be performed effectively and efficiently.
Sometimes, the population is not specified with a parametric density function and non-parametric
methods of statistical inference are developed in order to estimate the density function directly from
the data.

Bayesian inference
This approach differs from the classical approach in the sense that, in addition to specifying a popula-
tion density f(x, 0), where X and () may be scalars or vectors, and having observations Xl, X2, ... , X n ,
it also assumes some a priori distribution for the parameter O. It assumes that there is a specific
a priori distribution which the parameter follows. This method then proceeds to use the observa-
tions Xl, X2, ... , Xn to update the knowledge of the a priori distribution resulting in an a posteriori
distribution which incorporates the knowledge of the data. If more independent observations are
obtained, the a posterIOri distribution, obtained as a result of the first set of observations, may be
treated as an a priOri distribution for the next set of observations resulting in a second a posteriori
distribution.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 151

The assumption of an a priori distribution of (J and the continuous updating of this distribution in
the light of independent observations are essential features of Bayesian inference. The traditionalists
assert that (J is a fixed number and consequently, its probability distribution IS an inadmissible
consideration. The Bayesians counter the objection by saying that the probability distribution of (J
is not to be understood in the relative frequency sense; in fact, it is not the true value of (J that is
being discussed, but rather our perception of this value. This perception changes as our knowledge
based on observations increases. The probability distribution of (J depends on the observations and
it is our objective to find this distribution.
In fact, not assuming any probability distribution for (J is very often equivalent to assuming that
all values of (J are equally likely, or that (J has a uniform, regular or degenerate, distribution. By
not making a statement about the a priori distribution, we may in fact be making a statment about
it. In this process, we may restrict our choice drastically. If our knowledge of the situation, prior
to our getting the data, warrants an a priori distribution for e, it should be used. In some sense,
we may say that, as the amount of Bayesian a priori information decreases, the Bayesian inference
methodology approaches the classical inference methodology. From this point of view, the classical
inference may be regarded as a limiting form of Bayesian inference. Bayesian and non-Bayesian
methods of statistical inference are conceptually quite different, although in many situations, they
may give the same results.
There are many situations in which Bayesian methods are better geared to provide answers than
the classical methods. Bayesian inference uses informative priors, while classical inference uses non-
informative priors. The uniform distribution gives a non-informative prior, but other non-informative
priors do exist also.

Maximum entropy statistical inference


In this approach, there is no assumption of any a priori knowledge about the distribution. Infer-
ences about probability density functions are made on the basis of knowledge of moments of the
distribution. There may be many probability distributions consistent with the information given in
the moments. The distribution sought is that which has the maximum entropy, or uncertainty.
This approach differs from the traditional approach in the sense that instead of presuming the
knowledge of Xl> X2, ••• , X n , it presumes the knowledge of expected values of some functions like
n
Ep;gr(X;), r =1,2, .. . ,n. (1)
;=1
It can be shown however, that if we assume a density function !(X, e) and further assume only the
knowledge of the random sample Xl> X2, .•. , Xn , then the principle of maximum entropy leads to
the principle of maximum likelihood for estimating e. In this special case, it should lead to the
same results as in the traditional theory. On the other hand, the traditional theory is of no help if
knowledge is available in the form of moments (or expected values of some functions) only.
This differs from the Bayesian approach in the sense that it assumes no prior distribution either for
the parameter or for the random variable. In fact, one of its goals is to generate a priori distributions
or density functions which can later be used in tandem with Bayesian inference.

Minimum cross-entropy statistical inference


This approach differs from the classical approach in that knowledge of moments is used, rather than
knowledge of sample values Xl, X2,"" Xn · It also presumes knowledge of an a priori distribution.
It is different from the Bayesian approach due to the use of knowledge of moments. Although
it presumes an a priori distribution, this distribution is not the parameter distribution, but rather
of the density function itself. However, like the Bayesian method, it continuously updates the
knowledge of the density function.
It is unlike the maximum entropy approach in the sense that an a priori knowledge of the density
function is presumed. Recall that in the maximum entropy approach, either no a priori density
function is assumed to be known, or equivalently, the prior distribution is taken to be the uniform
distribution. Thus, it can be observed that, as the a priori distribution approaches the uniform
distribution, the minimum cross-entropy distribution approaches that of the maximum entropy
distribution.
152 J. N. KAPUR ET AL.

SOME EXAMPLES OF BAYESIAN INFERENCE


Some examples of the use of Bayesian inference are given below, and then these examples are used
to illustrate the similarities and differences between various methods of statistical inference.
In Bayesian inference, the following are given: (1) the a priori density function for a population
parameter, and (2) a random sample from the population. Bayes's theorem is used to find the
a posteriori probability distribution of the parameter, on the basis of the information provided by
the random sample.
In the following examples notes are provided for each example to bring out the,finer points of the
method. These notes are relevant to our discussions of the comparative roles of the various methods
of statistical inference.
Problem 1: Given that the mean, m, of a distribution is an N(mo, (15) variate, and given a random
sample Xl, X2,"" Xn from the population N(m, (12), where mo, (15, and (12 are known, find the
a posteriori distribution for the mean of the distribution.
Solution 1: Using Bayes's theorem, viz, the a posteriori density is proportional to the product of
the a priori density and likelihood function, the a posteriori probability density function is obtained

Cexp [ --
n
l(m-mo)2 - -lL: (Xi-m)2] =C exp [12(1
I
--m n) + m (mo
-(12 + -(12 -er + -niX)] , (2)
2 (12 2 (12 2 er
2 2
° ;=1 ° °
so that the a posteriori probability distribution is N (ml' ern, where
mo iX
-+--
er5 er 2 /n lin
ml= 1 1
and -
err -- -er5+(12
-. (3)
-+--
er5 er 2 1n
Remarks:
• As era -- 00, i.e., as the a priori distribution tends to the degenerate uniform distribution, the
influence of the a priori distribution tends to disappear and ml -- iX and er5 __ er2 In, which
are the classical results. This result illustrates equation the limiting case of the minimum
cross-entropy when the a priori distribution is the uniform distribution.
• On the other hand, if ero is small, then the a priori distribution dominates and the data make
only relatively small contributions to ml and err. In fact, it is this fear of 'dominance of the
a priori' which deters many people from using Bayesian methods.
• If a further random sample, Yl, Y2, ... , Yr, is obtained from the distribution, then the final
a posteriori distribution has

(4)

and
lin r
- --+-
er~ - (15 er 2+er-2 . (5)

Thus, the final distribution is the same as that obtained from N(mo, (15) with a random sample
Xl, X2, ... , Xn; Yl, Y2,' .. , Yr· This is an important feature of Bayesian inference, viz, that the
final result is independent of the number of stages in which updating is done, so long as the
total information in all stages combined is the same.
Problem 2: Given that the parameter p of a binomial distribution is a B( a, b) variate and n inde-
pendent Bernoullian trials have resulted in r successes, find the a posteriori probability distribution
distribution for p.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 153

Solution 2: Using Bayes's theorem, an a posteriori distribution is obtained


n
Gp o-l(l_ p)b-l IIp'''(I- p)l-.. " (6)
s=1
where Xi = 1 or 0, according to whether there is a success or a failure in the ith trial, so that
Xl + X2 + ... + Xn =r, (7)
and the a posteriori density is

1 po+r-l(1 _ p)b+n-r-l. (8)


B(a+r,b+n-r)
If m additional independent Bernoulli trials are performed and s successes are obtained, then the
a posteriori density would be

1 po+r+'-l(1 _ p)b+n+m-r-.-l (9)


B(a+r+s,b+n+m-r-s) ,
which again verifies that the final result is the same as if m + n trials had been done at the same
= =
time. Again, if a 1 and b 1, i.e., if the a priori distribution is the regular uniform distribution,
equation (8) gives
1 r(1 )n-r r(1 )n-r (10)
B(r+l,n-r+l)P -p =ncrP -P ,
which is the classical result.
Remarks:
• The elegance of the results in these two examples depends upon our ability to make an a priori
choice of "conjugate a priOM" distribution. However, the following results are independent of
this choice:
1. The Bayesian result would approach the classical result as the a priori distribution ap-
proaches the uniform distribution.
2. The final Bayesian a posteriori probability distribution is the same whether observations
are done in stages or are all done simultaneously.
• However, if a distribution like the Poisson distribution is considered as the a priori distribution,
it will not be easy to get the classical results in the limit, since the Poisson distribution does
not approach the uniform distribution for any value of its parameter.
Problem 3: Let the observation equation be
y=Ax+e, (11)
where y is the m x 1 observed vector, A is a known m x n matrix, X is an n x 1 vector to be estimated,
and e is an m x 1 error vector which follows an N(O, R) distribution, where R is a non-singular
m x m matrix. Let X have an a priori N(mo, Eo) distribution. Find its a posteriori distribution
in the light of the observation vector y. This is the well-known state estimation problem which has
many engineering applications.
Solution 3: The a posteriori density function is proportional to

exp [-~(X - mo?Eol(x - roo) - ~(AX - y)T R-l(AX - y)j


= G' exp{ -~ [XT(Eol + AT R- l A)x - xT(Eolmo + AT R-ly)

- (m;rEol + yT R- l A)x+ rooEolmo + yT R-lyj}, (12)


154 J. N. KAPUR ET AL.

so that the new distribution is N(mb Ed, where


Ell = EOl + AT R- l A (13)
and
=
Ellml Eolmo + AT R-ly. (14)
If a further observation equation is available
z = Bx+f, (15)
where f is an error vector with a N(O, S) distribution, the following is obtained
E2l = Ell +BTS-1B = EOl +ATR-1A+BTS-1B (16)
and
E2lm2 =Ellml + BTS-1B =Eolmo + ATR-ly+ BTS-1Z, (17)
which again illustrates the updating property.
Remarks:
• As R- l goes to the zero matrix, El - Eo and ml - mo. As EOl goes to the zero matrix,
Ell _ AT R- l A and Ellml _ AT R-ly, which are the classical results.
• If m #; n, then AT R-l A is a singular matrix, and El does not exist in the strict sense, although
it may exist in the generalized inverse matrix sense. However, EOl +AT R-l A is not a singular
matrix. In this case, the Bayesian solution exists, but the non-Bayesian solution does not even
exist except in a special sense. This example shows the need for the Bayesian approach to the
solution of this important problem.
• This example gives us the Bayesian solution to the problem of state-space estimation. If m n, =
the solution is obtained without assuming any a priori distribution. Of course, the solution is
also obtained by assuming an !l priori distribution. The solution is well-known.
• However, the present approach also gives the solution when m < nor m > n. Of course, the
solution will depend upon the a priori distribution assumed. Its influence can be reduced by
taking the a priori distribution to be nearly uniform, but not exactly uniform. This example
also gives a method of updating the estimates as more observations become available.
• As a special case when m = 1 and A = (l/n,l/n, ... ,I/n)T, it gives a method of estimation
when only the mean is prescribed.
• The model, equation (11), is called the general linear model, and includes models associated
with polynomial regression, multiple regression, analysis of variance, randomized block designs,
incomplete block designs, and factorial designs.
• Equation (11) can be written as
i= 1,2, ... ,m. (18)
:1:1, :l:2,···,:l:n are regression coefficients, ail, ai2, ••. , ain is the ith input, and Yi is the ith
= =
output. If rank A n and R (T2 I, then the a posteriori density is

(~) n (T-n exp [- 2!2 (y - AX)T(y - AX)]

= (~r (T-n exp [- 2!2 [(X - X)T AT A(x - x) + IIS21] , (19)

where
lI=m-n (20)
and
y=Ax. (21)
It then follows that
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 155

1. ic is sufficient for x, if 0"2 is known.


2. ic and S2 are jointly sufficient for (x, 0"2).
3. ic is a N [x, 0"2 (AT A)-I] variate.
4. I/s 2 /0"2 is distributed independently ofic as X2 •
We have discussed the more general case when R is any m x m non-singular matrix and the rank
of A is not necessarily equal to n.

EXAMPLES OF MINIMUM CROSS-ENTROPY STATIS-


TICAL INFERENCE
Kullback's principle of minimum cross-entropy is advantageous when there is not enough information
to determine an a posteriori probability distribution uniquely from the moments alone. In Bayesian
inference, Bayes's theorem yields a unique a posteriori distribution in the first instance. However,
both the a priori and a posteriori probability distributions in the minimum cross-entropy method
are distributions of the variates themselves, rather than of the parameters as in Bayesian inference.
We now illustrate the similarities and the difference with the following examples.

Problem 4: Given that a random variate has an a priori distribution N(mo, 0"2) and that its mean
is m, find its a posteriori distribution. Now, given in addition that its variance is also known and
is 0"2, find its new a posteriori distribution. Given further that E[(x - m)4] =
b4, find the third
a posteriori distribution and calculate entropies and cross-entropies.

Solution 4: Here
[ 1 (x - mo)2]
go(x) = =1
y27r0"0
exp --2 2
0"0
• (22)

1
Minimizing
00 f(x)
f(x) In -(-) dx, (23)

i: i:
-00 go x
subject to
f(x)dx =1 and xf(x)dx = m, (24)

results in
f(x) = gl(X) = ~
y27r0"0
exp [_~ (x -
2
;n)2] .
0"0
(25)

Thus our first a posteriori distribution is N(m,0"5). Again, minimizing

1 00

-00
I(x)
I(x) In - (-) dx,
gl X
(26)

subject to
and (27)

results in
I(x) =g2(X) = _1_
v"f;
exp [_~.!.-(x_--::m,-')~2]
0" 2 0- 2
(28)

so that the second a posteriori distribution is N (m, 0"2). It can be observed that this distribution is
independent of both mo and 0"5. Again, minimizing

1 00

-00
I(x)
I(x) In - (-) dx,
g2 X
(29)
156 J. N. KAPUR ET AL.

subject to
and (30)

we obtain
f(x) = Aexp [-i (x ~;n)2 - A(X - m)4] , (31)

where A and A are determined by equations (30). This distribution will be different from the normal
distribution unless b4 = 3114 •
Remarks:
• In example 1, the a priori distribution of m was given; here, the a priori distribution of x is
given.
• In example 1, Bayes's theorem was used; here, the principle of minimum cross-entropy is used.
• In example 1, information was supplied in terms of random values Xl, X2,"" Xn ; here infor-
mation is supplied in terms of knowledge of the mean, variance, or other moments.
• Continuous updating is done in both cases.
• Example 4 shows that, as more and more observations are obtained, the uncertainty about.
the mean of the population decreases. In fact, even with a single set of observations, as
n -> 00, 111 -> 0 and entropy goes to its minimum value of -00. It also demonstrates that
if information is given in the form of different values for the moments already given in the
a priori distribution, then the uncertainty may decrease, or may in fact increase. However, if
the information is about different moments, then those involved in the a priori distribution,
the uncertainty is likely to decrease and will certainly not increase.
• Some cross-entropies of interest are

1 00
-00
gl(X) In gl«X» dx
go x
= ~(m-mo?
2 115
(32)

1 00
-00
g2(x)ln g2«X» dx
go x
= 1 (m - mo)2
-2 2
110
112 -
+--2-'
110
115
(33)

In spite of the additional information, the second cross-entropy may be either smaller or larger

-1:
than the first. Some entropies of interest are
1
= 21n (21TeI10)
-1:
go(x)lngo(x)dx (34)

-1:
gl(X) Ing1(X) dx = 2 1n(21TeI10) (35)

1
g2(X) Ing2(x) dx = 2 In (21Tel1). (36)

Thus the entropy does not change when the mean alone is changed, but it does change when
the variance is changed, although it may either increase or decrease depending on whether 11
is greater than or less than 110. On the other hand, entropies of the distributions obtained in
example 1 are

and (37)

and
(38)
so that, in this case, the entropy goes on decreasing as more and more information becomes
available.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 157

Problem 5: Let X be an N(rno, Eo) variate and let it be given that E[Ax] =
y and that the
covariance matrix of Ax is R. Find the minimum cross-entropy distribution for x.

Solution 5: We must minimize the cross-entropy,

J I(x)
I(x) In g(x) dx, (39)

J
subject to
I(x)dx= 1, (40)

J (Ax - Y)/(x) dx = 0, (41)

and
J (Ax - y)(Ax - y)T I(x) dx = R. (42)

This gives

I(x) = exp [-~(X - molEol(x - rno) - Ao

- AT
(Ax - y)I T
- 2(AX - y) D- I (Ax - y) , 1 (43)

where Ao is a scalar, A is an m x 1 vector, and D is an m x m matrix. These must be determined


using equations (40), (41), and (42). These equations are sufficient to determine all the Lagrange
multipliers.
This distribution is still multivariate normal, but it is different from the one obtained by Bayesian
inference in example 3 because the distribution there does not satisfy the constraints used here.
The problems solved are different, the methods of attack are different, and as such the solutions are
bound to be different in spite of the fact that the a priori probability distributions are the same.

Remarks:
• In examples 3 and 5, though the problems look similar, these are not exactly the same. The
methods of attack are different and the solutions are different.
• In example 3, it is required that the standardized likelihood satisfies the given constraints.
• In example 5, on the other hand, we want the a posteriori distribution to satisfy these con-
straints.
• The standardized likelihood is a conditional probability distribution while the a posteriori
distribution is not.
• The constraints give a unique likelihood and a unique Bayes's a posteriori distribution. How-
ever, the constraints alone do not give a unique a posteriori distribution and the minimum
cross-entropy principle must be used in order to obtain a unique probability distribution.

Problem 6: A box contains n + 1 balls which can be either white or black, but there is at least
one ball of each color. As such, there can be n possible hypotheses, HI, H2, ... , Hn where Hj is the
hypotheses that there are i white balls and N + 1 - i black balls. The a priori probabilities of these
hypotheses being true are given by the probability distribution q =(qlo Q2,"" qn). Now, let a ball
be drawn from the box and let it be white. Find the a posteriori probabilities of the n hypotheses,
=
{P(Hj IE), i 1,2, ... , n}, where E is the event that the ball drawn is white.
158 J. N. KAPUR ET AL.

Solution 6: Using Bayes's theorem,


P(H;)P(E I H;)
P(H; I E) = L:i=l P(Hj )P(E IHj)' i=1,2, ... ,n (44)

Now,
i
P(H;) = q; and P(EIH')=-
• n+l'
(45)
so that
(46)

Remarks:
• We may consider a parameter, (J, which takes values 1,2, ... , n according to which of the
hypotheses 1,2, ... , n are true. Given the a priori probability distribution of (J, its a posteriori
probability distribution is found.
• Here, the information given was sufficient to determine the conditional probabilities and a pos-
teriori probabilities.
Problem 1: In the previous example, it is given that the a priori probabilities are ql, q2, ... , qn, and
that the mean number of white balls observed is m. Find the a posteriori probability distribution.
Solution 1: If p = (pi, P2, ... , Pn) is the a posteriori probability distribution, then it is given that
n
and Lip; =m. (47)
;=1

Unlike the situation in example 7, this information is not sufficient to determine p uniquely. We
therefore appeal to the principle of minimum cross-entropy and minimize a suitable measure of
entropy subject to equations (47).
Minimizing the Kullback-Leibler measure, we get
i = 1,2, ... ,n, (48)
where a and b are determined by the equations
n n
a Lq;b; =1 and a Liq;b; = m. (49)
;=1 i=l

This result gives an a posteriori distribution different from equation (46). However, minimizing the
Havrada-Charvat measure of cross-entropy of second order,

(50)

subject to equation (47), we get


Pi = q;(c + di), (51)
where c and d are determined by

L iq; + d L i q; = m.
n n n
c+dLiq; =1 and C 2 (52)
i=l i=l ;=1

If
(53)
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 159

then
c=O and d= (
n
~iq;
)-1 , (54)
.=1
so that
P; =iq; / t iq;, (55)
;=1
which is the same as equation (46).

Remarks:
• In example 6, there is enough information to find p uniquely.
• In example 7, the knowledge of the mean alone does not give an unique a posteriori distribution.
However, using the minimum cross-entropy principle, the same a posteriori distribution as that
given by Bayes's theorem can be obtained, provided we
1. take the prescribed value of the mean to be the value given by Bayes's theorem, and
2. use the Havrada-Charvat measure of cross-entropy of second order.
• In general, using the generalized minimum cross-entropy principle, we can say that we can
always find an appropriate measure of cross-entropy and an appropriate set of constraints
such that when this measure is minimized subject to these constraints, the same a posteriori
distribution is obtained as is given by Bayes's theorem.
• If the Kullback-Leibler measure is required, the geometric mean can be prescribed so that the
given information is
n n

L:p; =1 and L:p;lni = lnq. (56)


;=1 ;=1
Minimizing the Kullback-Leibler measure subject to constraints (56), we obtain

p; =q;eiJ , (57)

where e and f are determined by


n n
e L:q;iJ =1 and e L:q;iJ Ini =lng, (58)
;=1 ;=1

so that
(59)

If the distribution given by equation (46) is required, then

~q;ilni/ ~iqi =lng, (60)

which is the mean of the distribution obtained using Bayes's theorem.


160 J. N. KAPUR ET AL.

AN OUTLINE OF MAXIMUM ENTROPY STATISTICAL


INFERENCE
Here it is not assumed that any a priori knowledge exists and we must make inferences about the
probability density function on the basis of knowledge of some moments of the distribution. There
may be many probability distributions consistent with the information of the moments. We seek
that distribution which has the maximum uncertainty, or entropy.
Maximum-entropy statistical inference is a special case of the minimum cross-entropy principle,
when the a priori distribution is uniform, regular, or degenerate. However, conceptually, this is a
different principle, since it is based on the concept of uncertainty, rather than on the concept of
"distance" or "divergence" from an a priori distribution.
The entropy of the a posteriori distribution is always less than the entropy of the a priori (i.e.,
uniform) distribution. This inference has an updating property. i.e., if we are given some moments,

(61)

LPi9r(Zi) ar , = r=I,2, ... ,m (62)


i=1
then by maximizing the entropy subject to these constraints, we get
(63)
Now, suppose that we are given the additional moment constraints,
n

LPi9r(Zi) =ar , r= m+ l,m+2, ... ,m+ 11:, (64)


i=1
then the maximum-entropy distribution

Pi = exp[-/Jo -/J191(Zi) -/J292(Zi) - ... -/Jm9m(Zi)


-/Jm+l9m+l (Zi) - ... -/JmH9mH(Xi)], (65)

is obtained, where the /J's are determined using by the constraints (61), (62), and (64). Next, we
shall start with equation (65) as the a priori distribution and use constraints (61) and (64) to get
the a posteriori distribution

PI = exp[-Ao - A191(Zi) - A292(Zi) - ... - Am9m(Zi)]


x exp [-11m - l Im+19m+l(Zi) - l Im+29m+2(Zi) - ... - IImH9mH(Zi)], (66)

where the A's and the II'S are obtained using constraints (61), (62), and (64). Since equations (65)
and(66) are of the same form, and since the multipliers are determined by the same constraints, the
=
final probability distribution is the same so that Pi PI for all i.
Also, the entropy of the final distribution, (Pt.P2, ... ,Pn) is less than or equal to the entropy of the
intermediate distribution, (P1, P2, ... , Pn), since (P1, P2, ... , Pn) has greater entropy than any other
distribution which satisfies these constraints, and since (Pt. P2,"" Pn ) is one such distribution.
As such, the principle of gain in information continues to hold for both maximum entropy and
minimum cross-entropy principles. However, there will be information gain when old constraints
continue to hold and additional constraints, linearly independent of the earlier ones, are imposed.
Under these conditions, there will be a positive information gain and the uncertainty will increase.
If additional independent constraints are not imposed, and we only give information changing the
values of the moments given earlier, the entropy can, in fact, increase.
BAYESIAN AND ENTROPIC METHODS FOR STATISTICAL INFERENCE 161

CONCLUSIONS
A Bayesian approach to statistical inference implies an initial "opinion" in the form of an a priori
probability distribution. Then, it uses the available "evidence" in the form of knowledge of a
random sample or of some moments to obtain a final "opinion" in the form of a posterior probability
distribution. In this sense, all our first three methods follow the Bayesian approach.
In Bayesian inference we are given a density function f(z, 9). We start with an a priori distribution
for 9, use the values of the random sample to construct a likelihood function, and then, use Bayes'
theorem to obtain the a posteriori probability distribution for 9.
In the minimum cross-entropy inference, we are given the a priori distribution for the random
variates. We use the evidence in the form of values of some moments to get the a posteriori prob-
ability distribution via Kullback's minimum cross-entropy principle [Kullback and Leibler, 1951,
Kullback, 1959].
In the maximum entropy inference, we are not given any initial opinion about the probability
distribution. We use Laplace's principle of insufficient reason and assume that the a priori probability
distribution is uniform. Then, we proceed as in the minimum cross-entropy approach.
In the Bayesian approach, only the density function f( z, 9) is considered and no other prior
opinion. The evidence is in the form of a random sample for the population and the final opinion is
in the form of a Dirac delta function for 9. Thus, there is a great deal of commonality between the
four methods.
The principles of maximum entropy and minimum cross-entropy have been explored in detail in
[Kapur, 1989, Kapur and Kesavan, 1989]
[J.N.Kapur and H.K.Kesavan, 1990, Kapur and Kesavan, 1992], [H.K.Kesavan and J.N.Kapur, 1989],
and [H.K.Kesavan and J.N.Kapur, 1990b, H.K.Kesavan and J.N.Kapur, 1990a] where some other
aspects of statistical estimation are also given.
Methods of estimating non-parametric density function by using maximum entropy principle have
been discussed by [Theil and Fiebig, 1984] for both the univariate and multivariate cases.
Earlier, the discussion of [Campbell, 1970] on the equivalence of Gauss' principle and Minimum
Discrimination for estimation of probabilities illustrates the interaction between entropic and non-
entropic methods of inference.
The principle of maximum entropy can also be used to derive maximum entropy priors for use in
Bayesian estimation. In a given a density function f(z,9), maximum entropy prior is that density
function f(x, 9) for which the entropy

-JJ P(9)f(x, 9) In(P(9)f(x, 9»d9dz (67)

is maximum. The principle is closely related to maximum data information priors discussed by
[Zellner, 1977].

ACKNOWLEDGEMENTS
This work was possible due to the financial support in the form of grants from the Natural Sciences
and Engineering Research Council of Canada and the Province of Ontario's Centres of Excellence
Programme.

REFERENCES
Burg, J. (1972). "The Relationship between Maximum Entropy Spectra and Maximum likelihood
Spectra". In Childrers, D., editor, Modern Spectral Analysis, pages 130-131. M.S.A.
Campbell, L. L. (1970). "Equivalence of Gauss's Principle and Minimum Discrimination Estima-
tion of Probabilities". Ann. Math. Stat., 41, 1011-1013.
Cramer, H. (1957). "Mathematical Methods of Statistics". Princeton University Press.
Fisher, R. (1921). "On the Mathematical Foundations of Theoretical Statistics". Phil. Trans.
Roy. Soc., 222(A), 309-368.
Fougere, P., editor (1990). "Maximum Entropy And Bayesian Methods Proceedings of the 9th
MaxEnt Workshop". Kluwer Academic Publishers, New York.
162 1. N. KAPUR ET AL.

Goel, P. K. and Zellner, A., editors (1986). "Bayesian Inference and Decision Techniques". North-
Holland, Amsterdam.
Grandy, W. T. J. and Schick, L. H., editors (1991). "Maximum Entropy and Bayesian Methods".
Kluwer Academic Press, Dordrecht.
Havrda, J. H. and Charvat, F. (1967). "Quantification Methods of Classication Processes: Concept
of Structural a Entropy". Kybematika, 3, 30-35.
H.K.Kesavan and J.N.Kapur (1989). "The Generalized Maximum Entropy Principle". IEEE
Trans. Syst. Man. Cyb. 19, pages lO42-lO52.
H.K.Kesavan and J .N.Kapur (1990a). Maximum Entropy and Minimum Cross Entropy Principles:
Need for a Broader Perspective. In Fougere, P. F., editor, Maximum Entropy and Bayesian
Methods, pages 419-432. Kluwer Academic Publishers.
H.K.Kesavan and J.N .Kapur (1990b). "On the Family of Solutions of Generalized Maximum and
Minimum Cross-Entropy Models". Int. Jour. Gen. Systems vol. 16, pages 199-219.
Jaynes, E. (1957). "Information Theory and Statistical Mechanics". Physical Reviews, 106,620-
630.
J.G.Erickson and C.R.Smith, editors (1988). "Maximum Entropy and Bayesian Methods in Sci-
ence and Engineering, vol 1 (Foundations), vol 2 (Applications)". Kluwer Academic Publishers,
New York.
J .H.Justice, editor (1986). "Maximum Entropy and Bayesian Methods in Applied Statistics".
Cambridge University Press, Boston.
J.N.Kapur and H.K.Kesavan (1990). Inverse MaxEnt and MinxEnt Principles and their Applica-
tions. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods, pages 433-450. Kluwer"
Academic Publishers.
J.N.Kapur and Seth, A. (1990). A Comparative Assessment of Entropic and Non-Entropica
Methods of Estimation. In Fougere, P. F., editor, Maximum Entropy and Bayesian Methods,
pages 451-462. Kluwer Academic Publishers.
J .Skilling, editor (1989). "Maximum Entropy and Baysean Methods". Kluwer Academic Pub-
lishers, Doedrecht. .
Kapur, J. (1989). "Maximum Entropy Models in Science and Engineering". Wiley Eastern, New
DeIhL
Kapur, J. and Kumar, V. (1987). "A measure of mutual divergence among a number of probability
distributions". Int. Jour. of Maths. and Math. ScL, 10,3, 597-608.
Kapur, J. N. and Kesavan, H. K. (1989). "Generalized Maximum Entropy Principle (with Appli-
cations)". Sandford Educational Press, University of Waterloo.
Kapur, J. N. and Kesavan, H. K. (1992). "Entropy Optimization Principles and their Applica-
tions". Academic Press, San Diego.
Kullback, S. (1959). "Information Theory and Statistics". John Wiley, New York.
Kullback, S. and Leibler, R. (1951). "On Information and Sufficiency". Ann. Math. Stat., 22,
79-86.
Rae, C. R. (1989). "Statistical Data Analysis and Inference". North-Holland, Amsterdam.
Renyi, A. (1961). "On Measures of Entropy and Information". Proc. 4th Berkeley Symp. Maths.
Stat. Prob., I, 547-561.
Seth, A. K. (1989). "Prof. J. N. Kapur's Views on Entropy Optimization Principles". Bull. Math
Ass. Ind., 21,22, 1-38,1-42.
Shannon, C. E. (1948). "Mathematical Theory of Communication". Bell System Tech. Journal,
27,1-4,379-423,623-656.
Smith, C. and W.T. Grandy, J., editors (1985). "Maximum-Entropy and Bayesian Methods on
Inverse Problems". D. Reidel, Doedrecht, Holland.
Theil, H. and Fiebig, D. (1984). "Exploiting Continuity: Maximum Entropy Estimation of Con-
tinuous Distributions". Ballinger, Cambridge.
Tribus, M. (1966). "Rational Descriptions, Decisions and Designs". Pergamon Press, Oxford.
Wilks, S. S. (1963). "Mathematical Statistics". John Wiley, New York.
Zellner, A. (1977). "Maximal data information prior distributions". In Aykac, A. and Brumat, C.,
editors, New Developments in the Applications of Bayesian Methods. North Holland, Amsterdam.
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE

N B. HARMANCIOGLU
Dokuz Eylul University
Faculty of Engineering
Bornova 35100 Izmir, Turkey

In the design and operation of hydrologic data collection networks, the question of
how long data gathering should be continued is a problem that is not often
addressed. The current situation is that no definite criteria have been established
to decide upon when and where to terminate data collection. Entropy-based
measures of information, as used in this study, present convenient and objective
means of assessing the status of an existing station with respect to information
gathering. Such an assessment can be realized by evaluating the redundancy of
information in both the time and the space domains. Accordingly, a particular site
that either repeats the information provided by neighboring stations or produces the
same information by successive measurements can be discontinued. The presented .
study shows that the entropy concept can be effectively employed to evaluate the
spatial and temporal redundancy of information produced by a hydrologic network.
The application of the method is demonstrated on case studies comprising water
quality and quantity monitoring networks in selected Turkish river basins.

INTRODUCTION

There are four basic issues to be considered in the design and operation of
hydrologic data collection networks: what to measure, where, when, and for how
long. Among these, the last issue of how long data collection should be continued
is not often addressed even though monitoring agencies, often due to budgetary
constraints, would like to know whether continuous data collection practices are
required or not. The current situation is that no definite criteria have been
established to decide upon when and where to terminate data collection.
Maddock (1974) was among the first to address the problem; he considered
station discontinuance based on correlation links with other stations in the network.
Wood (1979) used the sequential probability ratio test, where the decision whether
to discontinue a station is dependent on statistical considerations that include the
163
K. W Hipel etal. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 163-176.
© 1994 Kluwer Academic Publishers.
164 N. B. HARMANCIOGLU

error probabilities of accepting a model when it is incorrect as well as rejecting it


when it is correct. Wood's approach is directed at stations whose primary purpose
is to provide hydrologic information for design; in particular, he considered flood
protection as the design purpose and demonstrated the method in case of flood
frequency curve determination. Lane et a1. (1979) considered termination of
hydrologic data collection at experimental watersheds in New Mexico and Arizona.
Their decision-making procedure is modelled in terms of the theory of "bounded
rationality", where a regional analysis and the Bayesian decision theory indicated the
closure of the watersheds with respect to research objectives.
With respect to the station discontinuance problem, two controversial
approaches exist among hydrologists. Some claim that "the more data, the better",
considering that long records of data increase our chances of understanding the
natural phenomena. Others differentiate between "data" and "information" and
consider that more data does not necessarily imply more information. This latter
approach is adopted as the basic idea underlying the presented study. That is, it is
claimed here that a station which does not convey any new information about the
process observed should be discontinued. In this case, the problem is to determine
the amount of information provided by a station with respect to both space and
time.
Entropy-based measures of information, as used in this study, present convenient
and objective means of assessing the status of an existing station with respect to
information gathering. To solve the problem in the space domain, spatial orientation
of stations within a network has to be evaluated for redundancy of information so
that a particular site that repeats the information provided by other stations can be
discontinued. New observational sites may be considered where additional
information is needed. Entropy-based measures can be effectively employed here
to assess the spatial redundancy of information produced by the network.
The problem is similar in the time domain. A monitoring site is again evaluated
for redundancy of information, this time with respect to the temporal frequency and
the duration of observations. If a station produces the same information via
continuous monitoring, data collection may be terminated permanently or
temporarily. Two approaches may be adopted here. The first one involves the
investigation of temporal frequencies by the entropy method to assess whether the
station, with its selected monitoring frequencies, repeats the "same information" by
successive measurements. In this case, the results may indicate either a decrease in
time frequencies or a complete termination of data collection at the site. The
second approach focuses on the change of information conveyed by a station with
respect to record length. Here, one may decide to discontinue the observations if,
after a certain duration of monitoring, no new information is obtained by additional
measurements. Again, the entropy method can be easily used to delineate the
change of information conveyed with respect to length of data records.
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 165

The presented study shows that entropy measures of information constitute


effective tools in evaluating station discontinuance in both space and time
dimensions. The application of the method to the problem of station discontinuance
is demonstrated on case studies comprising water quality and quantity monitoring
networks in selected Turkish river basins.

STATION DISCONTINUANCE IN mE SPACE DOMAIN

Entropy principle applied to multi-variables in the space domain

In the multivariate case, the total entropy of M stochastically independent variables


~ (m=l, ... ,M) is (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):

M
H(Xl'X2, ... ,XM) = ~ H(~) (1)
m=l

where H(~) represents the marginal entropy of each variable ~ in the form of:

N
H(~) = K ~ p(xn) log [1/p(~)] (2)
n=l

with K=l if H(~) is expressed in napiers for logarithms to the base e. Eq.(2)
defines the entropy of a discrete random variable x.n
with N elementary events of
probability P n = p(~) (n= 1, ... ,N) (Shannon and Weaver, 1949). For continuous
density functions, p(~) is approximated as [f(xn).llx] for small Ax, where f(xn) is the
relative class frequency and ~x, the length of class intervals (Amorocho and
Espildora, 1973). Then the marginal entropy for an assumed density function f(~)
is:
+00

H(x",;Ax) - f f(x) log [l/f(x)J dx + log[lI~J (3)

_00

In the above, the selection oUx becomes a crucial decision as it affects the values
of entropy (Harmancioglu and Alpaslan, 1992; Harmancioglu et aI., 1986).
If significant stochastic dependence occurs between M variables, the total
entropy has to be expressed in terms of conditional entropies H(~!Xl''''~) added
to the marginal entropy of one of the variables (Harmancioglu, 1981; Topsoe, 1974):
166 N. B. HARMANCIOGLU

M
H(Xl'X2,···,XM ) = H(X1) + l: H(~ IXl'···'~_l) (4)
m=2

Since entropy is a function of the probability distribution of a process, the


multivariate joint and conditional probability distribution functions of M variables
need to be determined to compute the above entropies (Harmancioglu, 1981):

+00 +00

H(X"X., ...,XM) -- J.. J f(x,,···,xM)Jog f(X,,···,xM)· <Ix, dx•...dxM (5)

..00 ..00

r· J
+00 +00

H(XM 1 x,.,...,x...,) -- f(x" ...,x,.)Jog f(... 1"".......-,). dx, dx,........ (6)

..00 ..00

The common information between M variables, or the so-called transinformation


T(X1, ... ,XM ), can be computed as the difference between the total entropy of Eq.(l)
and the joint entropy of Eq.(5). It may also be expressed as the difference between
the marginal entropy H("m) and the conditional entropy of Eq.(6). It follows from
above that the stochastic dependence between multi-variables causes their marginal
entropies and the joint entropy to be decreased. This feature of the entropy concept
can be used in the spatial design of monitoring stations to select appropriate
numbers and locations so as to avoid redundant information (Harmancioglu and
Alpaslan, 1992).
An important step in the computation of any kind of entropy is to determine the
type of probability distribution function which best fits the analyzed processes. If
a multivariate normal distribution is assumed, the joint entropy of X (the vector of
M variables) is obtained as (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):

H(X) = (Ml2) In 2 + (1/2) In IC I + M/2 (7)

where M is the number of variables and I C I is the determinant of the covariance


matrix C. Equation (7) gives a single value for the entropy of M variables. If
logarithms of observed values are evaluated by the above formula, the same
equation can be used for lognormally distributed variables. The calculation of
conditional entropies in the multi-variate case can also be realized by Eq.(7) as the
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 167

difference between two joint entropies. For example, the conditional entropy of
variable X with respect to two other variables Y and Z can be determined as:

H(X IY,Z) = H(X,Y,Z) - H(Y,Z) (8)

Application to Assessment of Station Discontinuance

An investigation of station discontinuance in the space domain requires the


assessment of reduction in the joint entropy of two or more variables due to the
presence of stochastic dependence between them. In this case, the reduction is
equivalent to the redundant information in the series of the same hydrologic variable
observed at different sites. Application of the entropy principle within this context
is demonstrated here on water quality and runoff data collected in two different
basins in Turkey.
Available monthly dissolved oxygen (DO) and electrical conductivity (EC) data
from six sampling stations in the Porsuk river basin (numbered 010, 011, 013, 015,
016, and 019, consecutively in the downstream order) are used to investigate
information transfer in space. Here, the common period of observation at all sites
covers 38 months, as all stations except 010 have sporadic observations. Series of
DO and Ee data collected at the six stations are assumed to be normally distributed.
Joint entropies are computed by Eq.(7) for M=2, ... ,6, which can be used to
determine the conditional entropies by Eq.(8). Next, transinformations are
computed for M=2, ... ,6. For each variable, the joint entropy of simultaneous
observations at all 6 stations represents the total amount of uncertainty that may be
reduced by observations at each station. Increasing the number of stations
contributes to this reduction so that the total uncertainty is decreased. Results of
such computations are shown in Table 1, where the joint entropy of 6 stations
represents the total uncertainty about the variable considered. The number of
stations is increased by starting at the downstream station and successively adding
to the list the next station in the upstream direction (Harmancioglu and Alpaslan,
1992).
For DO, the first four stations (019, 016, 015, and 013) reduce 95% of the total
uncertainty of 14.733 napiers so that the last stations produce redundant information
in this combination. Thus, it appears that if stations 010 and 011 were discontinued,
one would still recover 95% of the information available about DO. In the case of
Ee, however, all 6 stations disclose only 35% of the total uncertainty. This implies
that 65% of the information has to be obtained by the addition of new stations.
This result appears to be physically justified for Ee since it is a significant indicator
of nonpoint source pollution, which dominates water quality in the basin
(Harmancioglu and Alpaslan, 1992).
Next, different combinations of sampling sites are considered for DO to
168 N. B. HARMANCIOGLU

TABLE 1. Reduction in total uncertainty about water quality by increasing


the number of stations (Harmancioglu and Alpaslan, 1992).

Variable DO EC

Joint entropy 14.733 36.543


(napiers)

No.of stations Transinformations (napiers)

2 4.658 2.510
3 9.326 5.050
4 14.025 7.626
5 18.724 10.207
6 23.424 12.812

investigate additional reductions in total uncertainties. Results of three alternatives


and the best combinations are shown in Table 2. The last combination of sampling
sites provides the optimum solution in obtaining almost 100% of information.
Accordingly, it appears that the two stations 013 and 019 do not contribute
significantly to the reduction of uncertainty about DO levels in the Porsuk river.
Thus, one may infer here that these two stations may discontinue DO observations.
The assessment of station discontinuance in the above example is based on
transinformations of various combinations of stations. An increase in these values
is accomplished by either adding new stations or excluding some of the existing ones.
In this case, costs of adding new sites or decreases in sampling costs by discontinuing
some stations are to be evaluated in comparison with rates of information gain which
is expressed as the ratio of transinformation of a particular combination of stations
to the total uncertainty described by all stations in the network. When entropy
measures indicate a need for new stations as in the case of EC, the same principle
can be used to determine the numbers and locations of these sites. Such an
investigation is not presented here since the major purpose of this study is to assess
station discontinuance. Previously, Husain (1989) addressed this problem by
proposing the information decay concept for network expansion purposes. Husain
determines the location of new sites by analyzing the variation of entropy with
distance.
Similar analyses are carried out for 5 years of daily runoff data of three
streamgaging stations, called Kagizman (Xl)' Cobandede (X2) and Mescitli (~) in
the down- to upstream order along the Aras river basin in eastern Turkey.
Assuming that the three variables are normally distributed, joint entropies and
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 169

TABLE 2. Transinformations and rates of uncertainty reduction for alternative


combinations of stations in case of DO (Harmancioglu and Alpaslan, 1992).

Stations Transinformations Rate of reduction


(napiers) (percent)

019, 016, 015, 013 14.025 95


019, 015, 013, 011 14.009 95
016, 015, 011, 010 14.344 97

transinformations are computed as in Table 3. It may be observed here that neither


a combination with two stations nor one with three stations is sufficient to reduce
the total uncertainty of 14.162 napiers. Furthermore, the two upstream stations Xz
and ~ produce a significant amount of redundant information so that both
combinations practically result in the same amount of uncertainty reduction. That
is, when two stations reduce uncertainty by 8%, addition of the third station
increases this rate only to 10%. This result is also confirmed by the conditional
entropies H(XI/Xz) and H(XI /Xz,X3 ) of Table 3, which indicate that the presence
of the third station does not help to further reduce the uncertainty of 5.536 napiers
about Xl. In this case, one may consider the termination of Mescitli (X3 ) as it
contributes a smaller amount of information as compared to Cobandede (X z) when
two conditional entropies, H(XI/Xz) and H(XI~)' are considered. It may be
preferred to investigate new sites that will be more informative than Mescitli (X3).
The final decision whether Mescitli should be terminated or continued has to be
based on costs of sampling versus the information provided by this station.

STATION DISCONTINUANCE IN THE TIME DOMAIN

Assessment of station discontinuance with respect to record length

The analysis of station discontinuance in the time domain is based on the assessment
of temporal information transfer between successive measurements. By using the
entropy principle, a monitoring site is evaluated for redundancy of information with
respect to two factors: length of data records and sampling frequency. Investigation
on the basis of the first factor focuses on the change of information conveyed by a
station with respect to record length N. Here, one may decide to terminate a
station if, after a certain duration of monitoring, no new information is obtained by
additional measurements. The entropy method can be used to delineate the change
of information conveyed by data on the basis of the record length N.
170 N. B. HARMANCIOGLU

TABLE 3. Assessment of runoff data for station discontinuance.

Joint entropy: 14.162 napiers Marginal entropy of Xl: 5.536 napiers

Stations Transinformation Conditional Rate of reduction


(napiers) entropy of Xl (percent)
(napiers)

1.258 4.278 8.9


1.495 4.263 10.0

The definition given in Eq.(3) for the marginal entropy of X involves the term
"log (l/Ax)" which essentially describes a reference level of uncertainty according to
which the entropy (uncertainty or information conveyed) of the process is evaluated.
In this case, the entropy function assumes values relative to the reference level
described by "log (l/Ax)" (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992;
Harmancioglu et a!., 1992). On the other hand, some researchers propose the use
of a function m(x) such that the marginal entropy of a continuous variable is
expressed as:
+00

H(X) = - J fix) In «x)/m(x)] dx (9)

where m(x) is often considered to be a priori probability distribution function


(Jaynes, 1983; Harmancioglu et a!., 1992). If a uniform distribution is assumed to
be the prior distribution used to describe maximum uncertainty about the process
prior to making any observations, then the marginal entropy of Eq.(3) becomes:

I
+00

H(X) = fix) log [l/f(x)] dx + log[l/N] (10)

or:

H(X) = I «x) log [llf(x)] dx -logiN] (11)


AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 171

where "log N" becomes the reference level of uncertainty. Another property of this
term is that, for a record of N observations, "log N" represents the upper limit of the
entropy function (Harmancioglu, 1981; Topsoe, 1974). Using this property and
rearranging Eq.(ll), one may write:

+00

H'(X) = log[N] - J f(x) log [11f(x)] dx (12)

..0()

where H'(X) now describes the change in information conveyed by data compared
to the reference level of "log N". Here, the absolute value of the H'(X) function has
to be considered as it defines the difference between the upper limit "log N" and the
information provided by data of N records. Since "log N" represents the maximum
amount of uncertainty, H'(X) measures the amount of information gained by making
observations. This amount, or the rate of change of information conveyed, changes
as N is increased. The particular point where H'(X) stabilizes or remains constant
with respect to N indicates that no new information is gained by additional
observations. If this point is reached within the available record of observations,
then one may decide to discontinue sampling, as further observations will not bring
further information.
The approach described above is applied to nine years of daily runoff data of
the three stations in Aras basin, which were analyzed in the previous section for
station discontinuance in the space domain. Figure 1 shows the H'(X) values for
each station in terms of the rates or changes of information gain with respect to
record length N. These values are computed with daily data of each station for N
years, starting with one year (365 observations) and consecutively increasing the
length of record to 9 years by 365 x N. The curves obtained show that the rate of
information gain is high for the first three years. The rate of change decreases after
the fourth year, indicating a relatively smaller amount of information conveyed by
data beyond this period. However, none of the stations have yet reached the point
where this change is negligible; that is, the H'(X) values have not yet become
constant with respect to N in anyone of the stations. Thus, one may infer that none
of the stations have reached the point of termination.

Assessment of temporal sampling frequencies

If a monitoring station is found not to have reached the point of discontinuance, one
may investigate further whether temporal sampling frequencies may be decreased
or not. Again, entropy measures can be used to analyze this problem by evaluating
172 N. B. HARMANCIOGLU

c
3
b
a

...... 2
....
I/)

.a.
III

IU
c::

X
-:J:
........

0 2 4 6 8 10
N( Years)

Figure 1. The change of information gain with respect to length of data records:
(a) Kagizman, (b) Cobandede, (c) Mescitli .

redundancy of information with respect to sampling frequencies. In this case,


temporal frequencies have to be investigated to assess whether the station, with its
already selected monitoring frequencies, repeats the "same information" by successive
measurements. In this case, the results may indicate either a decrease in time
frequencies or a complete termination of data collection at the site.
It was stated earlier that the stochastic dependence between two processes
causes their marginal and joint entropies to be decreased. The same is true for a
dependent process when it is considered as a single variable. The marginal entropy
of a single process that is serially correlated is less than the uncertainty it would
contain if it were independent. If the values that a variable assumes at a certain
time t can be estimated by those at times t-l, t-2, ..., the process is not completely
uncertain because some information can be gained due to the serial dependence
present in the series. In this case, stochastic dependence again acts to reduce
entropy and causes a gain in information (Harmancioglu, 1981). This feature is
suitable for use in the temporal design of sampling stations. Sampling intervals can
be selected so as to reduce the redundant information between successive
measurements (Harmancioglu and Alpaslan, 1992).
For a single process, the marginal entropy as defined in Eq.(3) represents the
total uncertainty of the variable without having removed the effect of any serial
dependence. However, if the ith value of variable X, or Xi is significantly correlated
to values xi _k ' k being the time lag, knowledge of these previous values xi _k will make
it possible to predict the value of~, thereby reducing the marginal entropy of X.
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 173

To analyze the effect of serial correlation upon marginal entropy, the variable X can
be considered to be made up of series ~, ~.l' .., ~.k' each of which represents the
sample series for time lags k=O,I, ...,K and which obey the same probability
distribution function. Then conditional entropies such as H(~ I~.l)' H(Xi I~.l'
~.2' ...' ~.k) can be calculated. If ~.k (k= 1,... ,K) are considered as different
variables, the problem turns out to be one of the analysis of K + 1 dependent multi-
variables; thus, formulas similar to Eq.(6) can be used to compute the necessary
conditional entropies (Harmancioglu, 1981; Harmancioglu and Alpaslan, 1992):

..00 ..00

For a serially correlated variable, the relation:

(14)

exists between the variables ~.k (k=O, ... ,K). Thus, as the degree of serial
dependence increases, the marginal entropy of the process will decrease until the
condition:

(15)

is met for an infinitesimally small value of Eo It is expected that the lag k where the
above condition occurs indicates the degree of serial dependence within the analyzed
process (Schultze, 1969; Harmancioglu, 1981).
The above approach is applied to the same three series of nine years of daily
runoff data analyzed in the previous section. Figure 2 shows the change of their
marginal entropies with respect to time lag k. It is observed here that, for all
stations, only the first time lag, or the first order serial dependence, is effective in
reducing the uncertainty of each station. The following lags do not contribute
significantly to this reduction so that nonnegligible uncertainty still remains in the
processes at time lags beyond k= 1. This result indicates that successive
measurements do not repeat the same information in the time domain. Thus, the
stations should not be discontinued. The fact that the first lag in each station
produces the highest reduction of uncertainty raises the question whether the
temporal frequencies may be extended fromAt=1 day toAt=2 days or to larger time
frequencies. This problem is investigated by evaluating the transinformations
(common information) between successive measurements for different At sampling
174 N. B. HARMANCIOGLU

--....
I II
CLI
G
'Ci
ftI
c: 5
--oX
,---________________________________ a
x- I 4
~===========================~
......
x-
3

2
x.....
~

0 2 4 6 10
k (time lag)

Figure 2. Reduction in marginal entropies of runoff data in the form of conditional


entropies at successive time)ags: (a) Kagizman, (b) Cobandede,
(c) Mescitli.

frequencies. Transinformations T(~'~.l) are described here as percentages of the


marginal (or total) information H(~). Figure 3 shows the changes in these ratios
with respect todt sampling frequencies. It is observed here that the extension of the
frequency to 2 days retains only about 25% of information, thereby leading to an
information loss of about 75% for each station. Information is almost completely
lost when the frequency is extended to three months as the ratio of 25% drops down
to as low as 0.2 . 0.4% Even for At of 2 days, 75% is a significant percentage of
information loss. Thus, one may decide here to continue with daily observations at
each station.

CONCLUSION

The study presented addresses the problem of station discontinuance on the basis
of the information provided by a gage in both the space and the time domains. The
entropy measures of information are used to quantitatively describe the contribution
of a sampling site to the reduction of uncertainty about the processes observed. The
approach used considers a particular gage as part of a gaging network, the purpose
of which is not to serve particular design objectives but to gather information about
a hydrologic process.
Within this context, the application of the entropy principle to cases of observed
water quality and quantity data shows that the method can be effectively used to
AN ENTROPY-BASED APPROACH TO STATION DISCONTINUANCE 175

( Dfo)
to

Figure 3. Effects of sampling frequencies upon information gain about runoff


processes at the three stations investigated.

assess station discontinuance as it defines the information provided by a gaging


network in quantitative terms. Thus, it is possible to measure such concepts as
"information gain", "information loss", "redundant information, or "change of
information" in specific units. Accordingly, assessment of station discontinuance can
now be based on concrete grounds. One problem that has to investigated further
is the assessment of a sampling site in a combined spatial/temporal framework to
merge the two separate approaches (spatial and temporal) applied in this study.
The entropy method still entails some limitations most of which are of a
mathematical nature. To name a few, entropy, as a measure of the uncertainty of
random processes, has not yet been precisely defined for continuous variables. The
derivation of mathematical expressions for multivariate distributions other than
normal and lognormal are highly complicated. Other difficulties encountered in
application of the method are those that are valid for any other statistical procedure.
These and other similar problems associated with the method (Harmancioglu et aI.,
176 N. B. HARMANCIOGLU

1993) will have to be solved as part of future research so that the entropy principle
can be used more effectively in assessment of station discontinuance.

REFERENCES

Amorocho, J.; Espildora, B. (1973) "Entropy in the assessment of uncertainty of


hydrologic systems and models", Water Resources Research, 9(6), pp.1551-1522.
Harmancioglu, N. (1981) "Measuring the information content of hydrological
processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil
Engineering, Ege University, Faculty of Engineering, pp.13-38.
Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of
information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth
Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499.
Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design:
a problem of multi-objective decision making", A WRA, Water Resources Bulletin,
Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28,
no.1, pp.1-14.
Harmancioglu, N.B.; Alpaslan, N.; Singh, V.P. (1993) "Assessment of the entropy
principle as applied to water quality monitoring network design", International
Conference on Stochastic and Statistical Methods in Hydrology and Environmental
Engineering, Waterloo, Canada, June 21-23, 1993.
Harmancioglu, N.B.; Singh, V.P.; Alpaslan, N. (1992) "Versatile uses of the entropy
concept in water resources", in: V.P. Singh & M. Fiorentino (eds.), Entropy and
Energy Dissipation in Water Resources Dordecht, Kluwer Academic Publishers,
Water Science and Technology Ubrary, pp.91-117.
Husain, T. (1989) "Hydrologic uncertainty measure and network design", Water
Resources Bulletin, 25(3), 527-534.
Jaynes, E.T. (1983) Papers on Probability, Statistics and Statistical Physics (ed. by
R.D. Rosenkrantz). Dordrecht, D. Reidel, vo1.158.
Lane, L.J.; Davis, D.R.; Nnaji, S. (1979) "Termination of hydrologic data collection
(a case study)", Water Resources Research, vo1.15, no.6, pp.1851-1858.
Maddock, T.III (1974) "An optimum reduction of gauges to meet data program
constraints", Hydrologic Sciences Bulletin, 19, pp.337-345.
Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of Communication,
The University of Illinois Press, Urbana, Illinois.
Schultze, E. (1969) Einfuhrung in die Mathematischen Grundlagen der
Informationstheorie. Berlin, Springer-Verlag, Lecture Notes in Operations
Research and Mathematical Economics, 116 pp.
Topsoe, F. (1974) Informationstheorie. Stuttgart, B.O. Teubner, 88p.
Wood, E.F. (1979) "A statistical approach to station discontinuance", Water
Resources Research, vol.15, no.6, pp.1859-1866.
ASSESSMENT OF TREATMENT PLANT EFFICIENCIES BY THE ENTROPY
PRINCIPLE

N.ALPASLAN
Dokuz Eylul University
Faculty of Engineering
Bomova 35100 Izmir, Turkey

The inputs and outputs of water and wastewater treatment plants (TP) are
significantly variable so that the design and operation of TP require an assessment
of such random fluctuations. In practice, however, this variability is often
inadequately accounted for, and mean values (or maximum values) of the input and
output processes are used in either designing the TP or evaluating their operational
performance. The study presented introduces the use of the entropy concept in
assessing the uncertainty of input and output processes of TP within an informational
context. In particular, the entropy measures of information are employed to define
a "dynamic efficiency index" (DEI) as the rate of reduction in input uncertainty or
entropy to arrive at a minimum amount of uncertainty in the outputs. Besides
describing the performance of TP, the definition of such an index has further
advantages, the most significant one being in sensitivity analyses of process
parameters. The approach described is demonstrated in the case of an existing TP
of a paper factory, for which input and output data on BOD, COD, and TSS
concentrations are available.

INTRODUCTION

The inputs and outputs of water and wastewater treatment systems fluctuate
significantly with time so that the proper design and operation of treatment plants
(TP) require an assessment of such variability. Inputs to a water treatment plant are
provided by sources such as surface waters, groundwater, or reservoirs whose output
processes are basically random variables. The outputs from the TP also have a
variable character depending upon the operation of the plant, which essentially
produces such variability (Weber and Juanico, 1990). Inputs to wastewater TP show
more fluctuations as they are constituted by domestic and industrial wastewaters.
Again, the operation of the TP inevitably produces variable outputs.
177
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 177-189.
© 1994 Kluwer Academic Publishers.
178 N.ALPASLAN

It is not easy to accurately quantify these random fluctuations in both the inputs
and the outputs of TP. This is often due to poor knowledge of the various sources
of variability that can affect such processes. Consequently, in practice, the random
nature of inputs and outputs are inadequately accounted for, and their mean or
maximum values are used in either designing the TP or in evaluating its operational
performance.
With respect to design of TP, the most important factor that affects the selection
of design parameters is the variability and thereby the uncertainty of the input
processes. Use of mean or maximum values, which is often the case in practice, to
describe the inputs may lead to overdesign or underdesign. At this point, one needs
to properly assess the variability of the inputs so that the design parameters can be
selected accordingly.
With respect to the operation of TP, the performance of the treatment system
has to be evaluated. This is often realized by defining the "operational efficiency"
of the TP in terms of both the inputs and the outputs. Efficiency is simply described
as:

(1)

where E is the efficiency, and Si and Se are the time variant random input and
output (effluent) concentrations, respectively. In practice, the TP system is assumed
to reach a level of steady state conditions. That is, no matter how variable the
inputs are, the TP is expected to produce an effluent of uniform and stable
characteristics. This implies a reduction in input variability to obtain smooth effluent
characteristics. The assumption of steady state conditions is made for the sake of
simplicity both for design purposes and for assessment of operation. In that case,
the above definition of efficiency is given either for a single time point or by using
the means of Si and Se data within the total period of observation. In reality, the
operation of a TP reflects a dynamic character; yet, due to the assumptions of steady
state conditions, there exists no definition of efficiency that accounts for the dynamic
or random character of the inputs and outputs of the TP.
There are only a few studies that focus on the variability of input/output
processes of TP in evaluating its performance. Weber and Juanico (1990) describe
in statistical terms that the effluents from a TP may be as variable as the input raw
water or sewage. They claim that the coefficient of variation is a better indicator
of input/output variability than the standard deviation as a representative statistic.
Ersoy (1992) and Tas (1993) arrived at similar results in comparing various statistical
parameters to describe random fluctuations of these processes. They also
investigated the relationships between input and output variables to express
operational efficiency so as to account for the dynamic character of TP. However,
none of the above studies have arrived at a precise relationship between the
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 179

variability of input/output processes and the efficiency of the TP. Tai and Goda
(1980 and 1985) define "thermodynamic efficiency" on the basis of thermodynamic
entropy and show that the efficiency of a water treatment system can be described
by the rate of decrease in the entropy of polluted water. They also relate the
reduction in thermodynamic entropy to entropy of discrete information conveyed by
the output process of a TP.
The study presented considers the use of the informational entropy concept as
a measure of uncertainty in evaluating the variability of both the input and the
output processes of a TP. Furthermore, the entropy concept is also proposed here
to define a "dynamic efficiency index" (DEI) as the rate of reduction in input
uncertainty or entropy to arrive at a minimum amount of uncertainty in the outputs.
In the ideal case, the TP is expected to produce outputs that comply with an effluent
standard, the value of which is generally constant. In terms of entropy, this constant
indicates a value of zero entropy. In practice, though, the outputs will fluctuate
around the standard, and the TP is expected to reduce the variability of the output
(effluent) so that such fluctuations are kept below the standard value. In entropy
terms again, this indicates that the uncertainty of the effluents should be minimum
or that it should approach zero. Then the performance of the TP can be evaluated
by means of the DEI which measures the rate of reduction in uncertainty (entropy)
of the inputs so that it approaches zero, or in practical terms, a minimum value for
the entropy of the outputs. The term "dynamic" is used here to indicate that
efficiency is expressed on the basis of variability of input/output processes by using
entropy measures of uncertainty.
The approach described above is demonstrated in the case of an existing TP of
a paper factory, for which input and output data on BOD, COD, and TSS
concentrations are available. The results of the application indicate the need for
further investigations on the subject, particularly in relating the informational
entropy concept to the dynamic processes of treatment.

APPLIED METHODOLOGY

The informational entropy concept

Entropy is a measure of the degree of uncertainty of random hydrological processes.


Since the reduction of uncertainty on the observer's side by means of collecting data
is equal to the amount of information gained, the entropy criterion indirectly
measures the information content of a given series of data (Shannon and Weaver,
1949; Harmancioglu, 1981).
The entropy of a discrete random variable X with N elementary events of
probability P n = p(xn) (n= 1, ... ,N) is defined in information theory as (Shannon and
Weaver, 1949):
180 N.ALPASLAN

N
H(X) =K ~ p(x.,) log [l/p(x.,)] (2)
n=l

with K =1 if H(X) is expressed in napiers for logarithms to the base e. H(X) gives
a single value for the information content of X and is called the "marginal entropy"
of X, which always assumes positive values within the limits 0 and log N.
When two random processes X and Y occur at the same time, stochastically
independent of each other, the total amount of uncertainty they impart or the total
amount of information they may convey is the sum of their marginal entropies
(Harmancioglu, 1981):

H(X,Y) = H(X) + H(Y) (3)

When significant dependence exists between variables X and Y, the concept of


"conditional entropy" has to be introduced as a function of the conditional
probabilities of X and Y with respect to each other:

N N
H(XIY) = -K ~ ~ p(xn'Yn).log p(xn IYn) (4)
n=l n=l

N N
H(YIX) =-K ~ ~ p(xn'Yn).log P(Yn Ixn) (5)
n=l n=l

where p(x."yn) (n = 1,... ,N) define the joint probabilities and p(x.,1 yn) or p(yn Ixn) the
conditional probabilities of the values x., and Yn. The conditional entropy H(X I Y)
defines the amount of uncertainty that still remains in X, even if Y is known; and
the same amount of information can be gained by observing X.
If the variables X and Yare stochastically dependent, the total entropy is
expressed as (Schultze, 1969; Harmancioglu, 1981):

H(X, Y) = H(X) + H(Y I X) (6)

H(X, Y) = H(Y) + H(X I Y) (7)

The total entropy H(X,Y) of dependent X and Y will be less than the total entropy
if the processes were independent:
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 181

H(X,Y) < H(X) + H(Y) (8)

In this case, H(X, Y) represents the joint entropy of X and Y and is a function of
their joint probabilities:

N N
H(X,Y) =K~ ~ p(xn'Yn) log [l!p(xn'Yn)] (9)
n=l n=l

The difference between the total and the joint entropy is equal to another concept
of entropy called "transinformation":

T(X,Y) = H(X) + H(Y) - H(X,Y) (10)

When stochastic dependence exists between X and Y, the total uncertainty is


reduced in the amount of T(X,Y) which is common to both processes, since
transinformation represents the amount of information that is repeated in X and Y
(Amorocho and Espildora, 1973; Harmancioglu, 1981).
By replacing the term H(X,Y) in Eq.(9) with its definition given in Eqs.(6) or
(7), transinformation can be formulated as:

T(X,Y) = H(X) - H(Y IX) (11)

T(X, Y) = H(X) - H(X IY) (12)

Transinformation, like the other concepts of entropy, always assumes positive values
and equals 0 when two processes are independent of each other.
For continuous density functions, p(xn) is approximated as [f(xn). Ax] for small
t.x, where f("n) is the relative class frequency and /lx, the length of class intervals
(Amorocho and Espildora, 1973). Then the marginal entropy for an assumed
density function f(xn) is:

+00

H(X;dx) = J f(x) log [1If(x») dx + log[I!.!.x) (13)

_00

and the joint entropy for a given bivariate density function f(x,y) is:
182 N.ALPASLAN

IJ
+00 +00

H(X, Y;&x,&y)- f(x,y) log [l/f(x,y)] dx.dy + log [I/AxAy] (14)

-00 -00

Similarly, the conditional entropy of X with respect to Y is expressed as a function


of f(x,y) and the conditional probability distribution f(x Iy):

H(X IY; Ax) = JJ f(x,y) log [I/f(x Iy)] duly + 10g[I/Ax] (15)

.00 .00

The transinformation T(X,Y) is then computed as the difference between H(X;4x)


and H(X IY; Ax). In the above, the selection of 4x becomes a crucial decision as it
affects the values of entropy (Harmancioglu et aI., 1986; Harmancioglu and Alpaslan,
1992).

Application of the concept to assessment of TP input/output variability

The variability and, therefore, the uncertainty of input and output processes of a TP
can be measured by the entropy method in quantitative terms. This can be realized
by computing the marginal entropy of each process by Eq.(13) under the assumption
of a particular probability density function. Representing the input process by ~
and the output process by Xe, marginal entropies H(Xi ) and H(Xe) can be expressed
in specific units to describe the uncertainty prevailing in such processes. It must be
noted here that the level gf uncertainty obtained is relative to the selected
discretizing interval x of Eq.(13). Such relativity in entropy measures may appear
to be inconvenient in assessment of uncertainty (Harmancioglu et aI., 1993).
However, since the objective here is to compare the uncertainty of inputs with that
of the outputs, the problem is sufficiently handled when the same x is used for
both processes. In this case, uncertainty of both processes is rated with respect to
the same reference level of uncertainty defined by log (1/ x). Likewise, marginal
entropies of different water quality variables can be compared with each other by
keeping the reference uncertainty level constant for all variables.
As stated earlier, a treatment system is expected to reduce the variability of
inputs to produce an effluent with stable characteristics. This implies that the most
efficient operation of a TP is realized when maximum reduction in the uncertainty
or variability of the inputs is achieved to arrive at a minimum amount of uncertainty
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 183

in the outputs. In entropy terms, the uncertainty of the effuents Xe, or H(Xe)'
should be minimized. Accordingly, DEI can be defined as the rate of reduction in
uncertainty (entropy) of the inputs, H(Xi ), so that it approaches a minimum value
for the entropy of the outputs, H(Xe). Such a measure can be expressed in (%) as:

DEI = [H(~) - H(Xe)] / H(Xi) x 100% (16)

In essence, the above definition involves the first requirement for efficient treatment,
namely that the difference between input/output variability must be maximized.
This difference is actually an indicator of treatment capacity such that if it reflects
low efficiency, then the process or design parameters of the TP may have to be
changed to increase the DEI. Furthermore, calculation of the DEI for different
values of different process parameters can help to identify those parameters which
significantly affect the treatment system. The TP may be considered sensitive to
those values of particular param6ters which lead to maximum reductions in H(Xi ).
Then, such parameters will need to be more strictly observed throughout monitoring
procedures for both the design and the operation of the TP.
The above approach to assessment of efficiency appears to comply with the
thermodynamic efficiency definition given by Tai and Goda (1980 and 1985). As
mentioned earlier, their description of efficiency refers to the decrease in the
thermodynamic entropy of polluted water, where the treated media moves from a
state of disorder to order. The terms "order" and "disorder" are analogical in both
the thermodynamic and the informational system considered. In the former,
"disorder" refers to thermodynamic disorder or pollution, which can be measured by
the thermodynamic entropy of the system. In the latter, "disorder" indicates high
variability in the system, again quantified by entropy measures albeit in an
informational context. Accordingly, the two efficiency definitions, one given by Tai
and Goda (1980 and 1985) and the other presented here, are similar in concept; the
major difference between them is that the former is given in a thermodynamic
framework, whereas the latter is presented on a probabilistic basis.
The second requirement for effective treatment is recognized as the insensitivity
of effluents with respect to the inputs. That is, the correlation between the inputs
and the outputs is required to be a minimum for a reliable treatment system. Such
a requirement may also be considered as an indicator of TP efficiency. Entropy
measures can again be employed to investigate the relationship between
input/output processes. In this case, conditional entropies in the form of H(XJXi)
have to be computed as in Eq.(15). The condition H(XJXi)=H(Xe) indicates that
the outputs are independent of the inputs and, consequently, that the TP is effective
in processing the inputs. Otherwise, if inputs and outputs are found to be correlated
with H(XeIXi)IH(Xe), this implies that the effluents are sensitive to the inputs and
that the treatment system fails to effectively transform the inputs.
184 N.ALPASLAN

Another entropy measure of correlation between the input and the output
processes is transinformation T(Xi,Xe). If transinformation between the two
processes is zero, this indicates that they are independent of each other. In the case
of complete dependence, T(Xi,Xe) will be as high as the marginal entropy of one
of the processes.

APPLICATION

The above described methodology is applied to the case of Seka Dalaman Paper
Factory treatment plant in Turkey, for which input and output data on BOD
(biochemical oxygen demand), COD (chemical oxygen demand), and TSS (total
suspended solids) concentrations are available. The Seka treatment plant, with its
physical and biological treatment units, is designed to process wastewaters from the
factory at a capacity of 4500 m3/day. Data on daily BOD, COD, and TSS
concentrations were obtained for the period between January 1989 and October
1991. The input variables were monitored at the main entrance canal to the
treatment plant, and the output concentrations were observed at the outlet of the
biological treatment lagoon. The available data sets have a few missing values as
monitoring was not performed on days of plant repair and maintenance.
First, input/output processes of the three variables are analyzed for their
statistical properties. Table 1 shows these characteristics together with the classical
efficiency parameter computed by Eq.(I) using the mean values of the inputs and
the outputs. According to these computations, the TP appears to process TSS more
efficiently than the other two variables; in fact, the efficiency drops down to 75% for
COD. It is also interesting to note in Table 1 that both the input and the output
processes of TSS reflect more uncertainty than BOD and COD if the coefficient of
variation, Cv' is considered as an indicator of variability. Furthermore, for high
levels of efficiency, the Cv of outputs is higher than that of the inputs as in the case
of BOD and TSS.
Next, the same data are investigated by means of entropy measures to assess
their variability and the efficiency of the TP. Table 2 shows the marginal entropies
H(~) and H(Xe) of the input/output processes, the conditional entropies H(XJXi)
of the effluents with respect to the inputs, transinformations T(Xi,Xe), joint entropies
H(~,Xe)' and finally the DEI of Eq.(16) for each variable. These entropy measures
are computed by Eqs.(13) through (15), assuming normal probability density function
for each variable. It may be observed from this table that the input TSS has the
highest variability (uncertainy or entropy) followed by COD and BOD. With respect
to the outputs, COD still shows high variability whereas the uncertainty (entropy)
of BOD and TSS are significantly reduced. Likewise, the joint entropy of inputs and
outputs is the highest for COD. These results show that the treatment processes
applied result in TSS having the highest reduction in input uncertainty and COD
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 185

TABLE 1. Statistical properties of input/output data of Seka treatment plant

Variable Sample size Mean Standard Coefficient Efficiency


N Deviation of
(mg/lt) (mg/lt) (mg/lt) Variation (%)

BOD input 843 170.95 54.11 0.32


91
BOD output 843 15.83 9.37 0.58

COD input 819 760.17 343.33 0.45


75
COD output 819 188.17 52.39 0.28

TSS input 880 458.58 304.09 0.66


96
TSS output 880 19.26 14.41 0.73

having the lowest reduction. This feature is also reflected by the dynamic efficiency
index which reaches the highest value for TSS and the lowest for COD.
It is interesting to note that when the classical efficiency measure of Eq.(I)
gives values in the order of 96%, 91%, and 75% respectively for TSS, BOD, and
COD, the efficiencies defined by entropy measures on a probabilistic basis result in
the respective values 50%, 36%, and 25%. Although the two types of efficiencies
described do not achieve similar values, their relative values for each variable are
in the same order. That is, both types of efficiencies are the highest for TSS and
the lowest for COD with BOD in between. On the other hand, the DEI values
comply with the thermodynamic efficiency definition given by Tai and Goda (1980
and 1985). The DEI rates obtained here reflect the highest reductions in input
uncertainty or entropy under the prevailing operational procedures. The only
difference between the DEI presented here and the thermodynamic efficiency of Tai
and Goda (1980 and 1985) is that the DEI are expressed on the basis of a
probabilistic measure of entropy rather than on a thermodynamic basis.
The entropy measures shown in Table 2 also reveal the relationship between
inputs and outputs for each variable. For TSS, the conditional entropy of outputs
H(XA) is equal to the marginal entropy H(Xe),which indicates that the outputs are
186 N.ALPASLAN

TABLE 2. Assessment of input/output variability by entropy based measures.

Entropy Measures BOD COD TSS


(in napiers)

H(Xi) 5.4424 7.2110 7.1357

H(Xe) 3.4837 5.4264 3.5662

H(XJXi) 3.4586 5.3996 3.5660

H(~,Xe) 8.9011 12.6106 10.7017

T(~,Xe) 0.0251 0.0268 0.0002

DEI (%) 35.9 24.8 50.0

independent of the inputs. The same result is shown by the transinformation


T(Xi,Xe) as its value is very close to zero. Accordingly, it is observed here that the
output TSS of the treatment process is insensitive to the inputs and that the
operation of the TP can be considered reliable in the case of TSS. The level of
dependence increases slightly for BOD and COD, indicating some correlation
between the inputs and the outputs. For these two variables, the conditional
entropies are close to but not equal to the marginal entropies of the output
processes. Thus, one may state that the reliability of the TP decreases for BOD and
COD.
Classical correlation analyses are also applied here to investigate the relationship
between the inputs and the outputs. The correlation coefficients obtained are 0.009,
0.051 and 0.26 respectively for TSS, BOD, and COD. Statistical significance tests,
for the available sample size, show that the coefficients for TSS and BOD are not
significantly different from zero, whereas for COD, the correlation coefficient of
0.26 is significantly different from zero. These results confirm those obtained by
entropy measures, regarding TP input/output dependence.
It must be noted in the above entropy computations that the marginal entropies,
transinformations, and joint entropies shown in Table 2 are relative to the
uncertainty level represented by Ax, which is selected as 1 mgllt for all variables in
this application. When Ax is changed, the above entropy measures also change;
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 187

however, the rates of reduction in input uncertainty, transinformations, and the


relationship between the conditional and the marginal entropies remain the same.
Furthermore, the results are also sensitive to the assumption of a particular
probability distribution for both the inputs and the outputs. Distributions functions
of best fit must be investigated and used to obtain reliable entropy values
(Harmancioglu and Alpaslan, 1992; Harmancioglu et al., 1993).

CONCLUSION

The input and output processes of a TP fluctuate significantly with time. This
variability is often insufficiently recognized in both the design and operation of
treatment systems. The study presented proposes the use of entropy measures of
information in the assessment of input/output uncertainty of TP. Such measures
help to identify how variable or how steady the inputs or the outputs are so that
both the design parameters and the operational efficiency of a TP can be evaluated.
According to the approach applied, the operational efficiency of TP is assessed
by means of the "dynamic efficiency index" (DEI) which represents the rate of
reduction in the uncertainty of inputs to produce minimum amount of entropy or
uncertainty in the outputs. In essence, two requirements are foreseen for an
effective and reliable treatment system: (a) the highest reduction in input uncertainty
must be obtained; (b) the outputs must be insensitive to the inputs; that is, the
condition of independence must be satisfied. The entropy measures, as proposed
here, can be effectively used to assess whether these two requirements are met by
the operation of the TP.
The definition of TP efficiency on the basis of entropy measures has further
advantages, the most significant one being in sensitivity analyses of process
parameters. Such parameters in biological treatment, for instance, may be maximum
specific growth rate, decay rate, and yield coefficient, the values of which must be
selected for the design of TP. The TP system is either sensitive or insensitive to
these design parameters so that its efficiency is eventually affected by them. The
effects of these parameters are already recognized; however, the degree of
uncertainty they convey to the system are not well quantified. Values of these
parameters for design purposes are either taken from literature or determined by
laboratory analyses. Outputs from a simulation model of a TP may be observed
with respect to different parameter values and in calculating DEI for the system for
each case. The TP system may be considered sensitive to those values of
parameters which lead to maximum reductions in H(~). Then those parameters
will need to be more strictly observed throughout the data collection procedures for
the design as well as for the operation of TP.
As discussed earlier, the DEI definition proposed in the study is parallel to the
thermodynamic efficiency description of Tai and Goda (1980 and 1985), who express
188 N. ALPASLAN

efficiency in terms of the rate reduction in thermodynamic entropy within the


process of treatment. Tai and Goda (1985) have also attempted to relate
thermodynamic entropy and the entropy of discrete information. It is claimed here
that further investigations are needed on the subject, particularly in disclosing the
relationship between the informational entropy measures and the dynamic (physical,
thermodynamic, etc.) processes of treatment.
Another feature of the entropy principle which has to be further investigated is
the relativity of entropy measures with respect to the selected discretizing intervals
I:tx when continuous density functions are used to represent the random variable.
This and other mathematical difficulties associated with the method are discussed
in detail by Harmancioglu et al. (1993). Once such problems are solved, the entropy
measures can be effectively used to assess TP efficiencies with respect to both design
parameters and operational procedures.

REFERENCES

Amorocho, J.; Espildora, B. (1973) "Entropy in the assessment of uncertainty of


hydrologic systems and models", Water Resources Research, 9(6), pp.1551-1522.
Ersoy, M. (1992) Various Approaches to Efficiency Description for Wastewater
Treatment Systems (in Turkish), Dokuz Eylul University, Izmir, Graduation Project
in Environmental Engineering (dir. by N.Alpaslan).
Harmancioglu, N. (1981) "Measuring the information content of hydrological
processes by the entropy concept", Centennial of Ataturk's Birth, Journal of Civil
Engineering, Ege University, Faculty of Engineering, pp.13-38.
Harmancioglu, N.B., Yevjevich, V., Obeysekera, J.T.B. (1986) "Measures of
information transfer between variables", in: H.W.Shen et al.(eds), Proc. of Fourth
Int. Hydrol. Symp. on Multivariate Analysis of Hydrologic Processes, pp.481-499.
Harmancioglu, N.B., Alpaslan, N. (1992) "Water quality monitoring network design:
a problem of multi-objective decision making", A WRA, Water Resources Bulletin,
Special Issue on "Multiple-Objective Decision Making in Water Resources", vol.28,
no. 1, pp.I-14.
Harmancioglu, N.B.; Alpaslan, N.; Singh, V.P. (1993) "Assessment of the entropy
principle as applied to water quality monitoring network design", International
Conference on Stochastic and Statistical Methods in Hydrology and Environmental
Engineering, Waterloo, Canada, June 21-23, 1993.
Shannon, c.E. and Weaver, W. (1949) The Mathematical Theory of Communication,
The University of Illinois Press, Urbana, Illinois.
Schultze, E. (1969) Einfuhrung in die Mathematischen Grundlagen der
Informationstheorie. Berlin, Springer-Verlag, Lecture Notes in Operations Research
and Mathematical Economics, 116 pp.
ASSESSMENT OF TREATMENT PLANT EFFICIENCES BY THE ENTROPY PRINCIPLE 189

Tai, S. and Gada, T. (1980) "Water quality assessment using the theory of entropy",
in: M.J. Stiff (ed.), River Pollution Control, Ellis Horwood Publishers, ch.21,
pp.319-330.
Tai, S. and Gada, T. (1985) "Entropy analysis of water and wastewater treatment
processes", International Journal of Environmental Studies, Gordon and Breach
Science Publishers, vol.25, pp.13-21.
Tas, F. (1993) Definition of Dynamic Efficiency in Wastewater Treatment Plants (in
Turkish), Dokuz Eylul University, Izmir, Graduation Project in Environmental
Engineering (dir. by N.Alpaslan).
Weber, B. and Juanico, M. (1990) "Variability of effluent quality in a multi-step
complex for wastewater treatment and storage", Water Research, vo1.24, no.6,
pp.765-771.
INFILLING MISSING MONmLY STREAMFLOW DATA USING A
MULTIVARIATE APPROACH

C. GOODIER AND U. PANU


Department of Civil Engineering,
Lakehead University, Thunder Bay, Ontario, Canada, P7B-5El.

Water resources planners and managers use historic monthly streamflow data for a
variety of purposes. Often, the data set is not complete and gaps may exist due to various
reasons. This paper develops and tests two computer models for infilling the missing
values of a segment. The first model utilizes data only from the series with a segment
of missing values, whereas the second model utilizes data from the series with a segment
of missing values as well as from other concurrent series without a segment of missing
values. These models are respectively referred to as the Auto-Series (A~) model and the
Cross-Series (CS) model. Both models utilize the concepts of seasonal segmentation and
cluster analysis in estimation of the missing values of a segment in a set of monthly
streamflows. The models are evaluated based on comparison of percent differences
between the estimated and the observed values as well as on entropic measures.
Results indicate that the AS model provides adequate predictions for missing
values in the normal range of flows, but is less reliable during extreme (high) range of
flows. Whereas, the results from the CS model indicate that the use of another
concurrent set of streamflow data enhances predictions for all ranges of flows.

INTRODUCTION

In the past, various approaches have been used for infilling missing values in monthly
streamflow data [panu (1992)], and among them, the most commonly used are the
regression approach and the multivariate approach. One multivariate approach
incorporating the concept of segmentation of data series into seasonal segments (or
groups) was suggested by Panu and Unny (1980) and later developed by Afza and
Panu (1991). This approach utilizing the characteristics of segmented data for infilling
missing values of a segment has a distinct advantage over the regression approach. The
191
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 191-202.
© 1994 Kluwer Academic Publishers.
192 C. GOODIER AND U. PANU

latter approach treats each data point as an individual value, while the former approach
utilizes the group characteristics of similar data values. Based on such consideration of
values in groups, the missing values can be in filled as a whole group rather than as
individual values. The multivariate approach and the regression approach are
conceptually presented in Figure 1.

(a) MULTIVARlATEAPPROACH MISSING


DATA

HIGH FLOW

ENTIRE GROUP ALLED


!"~"''''''-'''''''' IN ONE STEP
....

0 6 12 18 24 42
l1ME IN MONTHS

1200
MISSING
1(XX) (b) REGRESSION APPROACH DATA

EACH POINT
if; 8:X)
FILLED
<
.s em
INDMDUALLY

3:
0
..J
LL. 400
c:J

200

6 12 18 24 3J 36 42
11ME IN MONTHS

Figure 1. Data Infilling Approaches: (a) Multivariate Approach and (b) Regression
Approach.

One problem with the regression approach is the diverging confidence limits for
subsequent estimates as more reliance is given to the most recent estimate of a previously
INFILLING MISSING MONTHLY STREAMFLOW DATA 193

unknown value. On the other hand, the multivariate approach has constant confidence
limits for the segment. The model development based on the multivariate approach
follows.

MODEL DEVEWPMENT

The development of the AS and the CS models is summarized in the form of a flow chart
in Figure 2.

DETERMINE
SEASONAL SEGMENTATION

Figure 2. Flow Chart for Model Development.

The first step in developing the models is the determination of seasonal segments in the
data series. These seasonal segments, as described by Panu (1992), form the pattern
vectors. The correlogram and the periodogram are used to infer seasonality for both the
models. The AS model requires segmentation of the data series with missing values.
Whereas, the CS model requires segmentation of both, the concurrent data and the data
series with missing values.
The next step involves testing for multivariate normality of pattern vectors. In
order to test for multivariate normality, the ranked Mahalanobis Distance is plotted
against the theoretical X2 values corresponding to different probabilities. If the pattern
vectors exhibit signs of non-normality, transformations are applied to the vectors until
194 C. GOODIER AND U. PANU

multivariate normality is achieved.


Once the set of pattern vectors is determined to be multivariate normal, a K-
means algorithm [Hartigan and Wong (1979)] is applied to group the similar pattern
vectors into clusters. The purpose of clustering is to recognize the occurrence of pattern
vectors in the data set. In turn, this information is used to develop an inter-pattern
structure of the model based on the assumption that the dependence among patterns can
be described by lag-one Markovian structure.
For the two models, the final step in the estimation of missing values is slightly
different as explained below.

AS Model: This model assumes lag-one Markovian structure for the inter-pattern
relationships and in turn estimates the missing values based on the pattern vector
occurring immediately prior to the gap [Figure 3].

=>
1t JJ

SI S2 S3 I .. ·· Sk_1 ISkl Sk+1


GAP
Sk+2 I.. ·· ISn_1 Sn

Figure 3. Conceptual Seasonal Segmentation for AS Model.

The missing segment in the sequence of streamflows is designated as Sk' The pattern
vector in segment Sk_l is used to estimate the missing values in segment Sk' The inter-
pattern structure takes into account the Markovian transitions from other segments,
similar to Sk_l and Sk' which were identified by the clustering technique.
Johnson and Wichern (1988) suggest that the conditional mean and covariance of
the segment Sk can be determined given that the segment Sk-l has occurred. In this paper,
the conditional mean and covariance are considered sufficient statistics to describe the
missing pattern vector for Sk and their formulation is explained as follows.
Let S= [Sk ISk-lf be distributed as multivariate normal and denoted by Nil' ,I:) ,
for d ;::: 2, where

and

The determinant of the partitioned section I:k-1,k-l must be greater than zero, i.e. II:k_l,k_
I I > O. Given that the segment Sk.l has occurred, the conditional mean and covariance of
missing segment Sk' is given by:
INFILLING MISSING MONTHLY STREAMFLOW DATA 195

Mean of sl: = ~ I: + El;J:-IE;~I,J:-l (SI:_1- ~I:-l) and

Covariance of SJ: = El;J: - El;J:-IE;~1,J:-IEI:_l,J:

The mean of St, as given above, is considered adequate vector to represent the missing
segment St.
The utility of this model is limited [panu (1992)] due to its dependence solely on
the information contained in the data set with the missing values. The development of the
CS model overcomes this difficulty, as presented below, by using additional information
contained in the concurrent data set.

CS Model: This model assumes cross-correlation between the data set with missing
values and the concurrent data set. Both these data sets are respectively referred to as the
subject river and the base river. It is noted that the base river could be any other data set
(precipitation, streamflow, etc.), but it is simply referred to as the base river. The CS
model is conceptually presented in Figure 4.

GAP

SIc_! I's~ SIc+I


1t
Slc+2 I.... I Sa_1 Sa

1t

Figure 4. Conceptual Seasonal Segmentation for CS Model.

The missing values in segment St from the subject river are infilled based on the
observed pattern vector in segment Sbt from the base river. The cross-pattern
relationships in the CS model take into account the transitions from segments with similar
characteristics (identified by the clustering technique) from Sht to segment St. The
conditional mean and covariance of St can be determined based on the considerations
outlined above and the observed pattern vector in the segment Sbk [Johnson and Wichern
(1988)].
The formulation of conditional mean and covariance for the CS model is similar
to that of the AS model with the exceptions as follows. The pattern vector for the
segment Sbk is substituted for Stott and all subsequent occurrences of terms with
subscripts (k-l) are substituted with the similar terms from the base river, i.e. substitute
196 c. GOODIER AND U. PANU

ILb k for ILk-I; lJ>k,k for Ek-I,k-I etc.


The computer models are developed with the capability of assuming the successive
segment to be missing. In turn, successive computer runs are conducted to infill each
such missing segment. An application of both the models to a streamflow data is
presented below.

APPLICATION OF AS AND CS MODELS

The streamflow gauging station (05QAOO1) of English River at Sioux Lookout, Ontario,
was considered to be the site with missing values. An uninterrupted set of unregulated
streamflows is available for this site from 1922 to 1981. Another streamflow gauging
station (05QAOO2) of English River at Umfreville is located upstream of Sioux Lookout
station. Flow values for this station are available from 1922 to 1990. This station is used
as the base river in the CS model. Precipitation data at Sioux Lookout Airport is
available for the period of 1963 to 1990. The precipitation data is used as another base
river in the CS model. Since concurrent data from 1963 to 1981 is available for all three
data sources, this 18 years of data is used in the application.
For the application of the AS and the CS models to the monthly streamflow data
of the English River at Sioux Lookout, the seasonality was inferred, from the
correlogram and the periodogram, to be two six-month seasons or one twelve-month
season. The starting and ending months, respectively, were determined for the six-month
dry season to be November and April, and for the wet season to be May and October.
On one hand, the presence of a single twelve-month season was inferred for precipitation
data at Sioux Lookout Airport. On the other hand, the presence of two six-month seasons
or one twelve-month season was assessed for streamflows of the English River at
Umfreville. Experimental runs were conducted for both the models using two six-month
seasons or one twelve-month season.
The multivariate normality of the pattern vectors was best achieved by using
natural log transformation. Clustering of the segmented data was performed based on the
assumption that there were two clusters in each season. Results incorporating the
clustering technique showed only minor deviations from those obtained without using
sub-clusters. For brevity, only the results without using sub-clusters are presented in this
paper. Both the models are applied to in fill a missing segment based on the assumption
that such a segment occurs sequentially over the entire length of the data series.

RESULTS AND DISCUSSION

Three methods of analysis were used to examine the results; graphical, statistical, and
entropic. As well, a comparison was made on the results obtained by inftlling the missing
values: using the mean, minimum or maximum value for each month. Plots of the results
for both the models are presented in Figure 5.
INFILLING MISSING MONTHLY STREAMFLOW DATA 197

Ca} AUTO SEAlES MODEL LEGEND

Observed Flow

InfiIledFlow

Oct-71

(b) CROSS SEAlES MODEL LEGEND


USING PAECIPrrATlON Observed Flow

InIilled Flow

Nov-63 Oct-67 Oct-7' Oct-7S Oet-79

ee} CAOSS SEAlES MODEL LEGEND


USING 51REAMFLOW
Observed Flow

InfiliedAow

Oet-67 Oet-n Oet-75 Oct-79

Figure 5: Results from AS and CS models.


198 C. GOODIER AND U. PANU

Graphical Analysis: An examination of the results from the AS model indicates that the
estimated values closely follow the observed values for most years, but entail larger error
in case of extreme flows. This would be expected since the estimated values are based
only on the flow values that have occurred during other years in the data series. To
overcome this difficulty, one could use a multivariate random number generator to
estimate the missing values such that the conditional mean and covariance of the
estimated values are same as those of the data series.
The estimated values from the CS model, using precipitation as the concurrent
data, show deviations from the observed flow in many cases. This could be due to the
effects of snowfall and spring runoff. During winter months, precipitation falls as snow,
and does not entail any effect in streamflows until spring-thaw. This lag period is
variable, and has an influence on the structure of covariance matrix in the CS model.
Future development of the model could include a procedure to account for the variability
in snowmelt phenomenon.
Further examination of the results from the CS model, using another streamflow
data as the concurrent data, indicates that the estimated values follow very closely the
observed values for the entire range of flows. This would be expected, since both sets
of data are in the same watershed, and the outcome of hydrologic events could be similar
at both gauging stations.

Table 1. Summary of Percent Error of Various Infilling Methods

I Infilling Method II Overestimate Range I Underestimate Range I


Auto-Series 9.4% to 84.0% 1.4% to 52.5%
Cross-Series with 28.0% to 198% 16.2% to 58.9%
Precipitation
Cross-Series with 7.5% to 28.7% 6.3% to 28.8%
Streamflow

Infilled by Mean 24.6% to 152% 16.0% to 37.0%


Infilled by Minimum 7.3% to 211 % 54.3% to 81.5%
Infilled by Maximum 53.4% to 467% 2.3% to 38.2%

Statistical Analysis: The percent differences (positive or negative) between the estimated
and the observed values are given in Table 1. These results were obtained separately for
two cases: with and without the use of sub-clusters in each season. The results with the
use of sub-clusters exhibit only minor deviations from the results obtained without using
sub-clusters. It is in this vein that only the results without using sub-clusters are
INFILLING MISSING MONTHLY STREAMFLOW DATA 199

presented in this paper. Similar statistics are also included in the table for infilling
missing values by using the mean, minimum or maximum value for each month.

In Table 1, the least error occurs when a concurrent set of streamflow data is used in the
CS model. However, the error is much larger for this model when other concurrent data
such as precipitation is used. On the other hand, in such cases, use of the AS model
entails smaller error and would be the obvious choice for data infilling.

Table 2. Summary of Entropic Measures

I Entropic Measure I Entropy I Reduction from He % Reduction

H.nax 5.375 nla nla

H.=2 0.693 nla nla


Helustcred 0.427 nla nla
H.nmov (AS) 0.414 0.013 3%
H(XIY) 0.392 0.035 8%
(CS - precipitation)
H(XIY) 0.181 0.246 58%
(CS - streamflow)

Other currently popular methods of replacing the missing values by such ad hoc values
as mean, minimum, or maximum value entail too large an error [Table 1]. The results
of our analyses indicate that one should avoid [panu and Mclarty (1991)] the use of such
ad hoc methods of data infilling.
Both of the models are further assessed for their effectiveness in reducing the
uncertainty associated with infllied data values. Entropic measures, as explained in the
Appendix, are used in such an assessment.

Entropic Analysis: The results obtained for various entropic measures related to both
the models are summarized in Table 2. From the above results, it is apparent that the
maximum reduction in entropy (58 %) occurs for the CS model, when using streamflows
of English River at Sioux Lookout, and at Umfreville. A large reduction in entropy also
results, when seasonal segments are grouped into sub-clusters. This reduction in
uncertainty is gained due to exclusion of certain clusters, once the occurrence of a
particular season is known.
200 C. GOODIER AND U. PANU

CONCLUSIONS

The AS model is found satisfactory in estimation of the missing values in normal range
of streamflows but performs inadequately in case of extreme (high) flows. Statistical
results show that the error in the estimated values could range from -53% to +84%.
Entropic analysis indicates a small reduction (three percent) in entropy, when considering
the system as Markovian as opposed to random. In other words, the assumption for inter-
pattern structure to be of Markovian type does not appear valid for infilling purposes.
The CS model using precipitation as another concurrent data provides widely
varying estimates of missing values. In terms of percent error, the variability has been
found to range from -59 % to + 198 %. Entropic analysis for such data sets indicates that
only a small reduction in entropy (eight percent) is achieved. Such a small reduction in
entropy is an indicator that there exists a poor cross-correlation between the streamflow
data and the precipitation data.
The CS model is found to perform adequately in estimation of the missing values
with the use of concurrent streamflow data from a nearby station. The estimates of
missing values have been found satisfactory in average flow range as well as extreme
(high) flow range. The percent error in the estimated values ranges from -29 % to + 29 %.
Entropic analysis indicates that a reduction of 58% in entropy is achieved. In other
words, the use of concurrent data exhibiting high cross-correlation with streamflows data
having missing values, provides satisfactory estimates of the missing values.

ACKNOWLEDGEMENT

The financial support provided by the Natural Sciences and Engineering Research
Council of Canada is gratefully acknowledged in conducting various aspects of this
investigation. The computational efforts by C. Goodier are especially appreciated.

REFERENCES

Afza, N. and U.S. Panu (1991) Infilling of Missing Data Values in Monthly
Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead
University, Thunder Bay, Ontario.
Domenico, P.(1972) Concepts and Models in Groundwater Hydrology,
McGraw-Hill, San Francisco, U.S.A.
Hartigan, J. and M. Wong (1979) "Algorithm AS 136: A K-Means Clustering
Algorithm", Applied Statistics, 28, 100-108.
Johnson, R.A. and D.W. Wichern (1988) Applied Multivariate Statistical Analysis,
Prentice Hall, New Jersey.
Khinchin, A.I. (1957) Mathematical Foundations of Information Theory,Dover
Publications Inc., New York.
INFILLING MISSING MONTHLY STREAMFLOW DATA 201

Panu, U.S, and T.E. Unny (1980) "Stochastic Synthesis of Hydrologic Data Based on
Concepts of Pattern Recognition", Journal of Hydrology, 46, 5-34, 197-217, 219-237.
Panu, U.S. and B. McLarty (1991) Evaluations of Quick Data Infilling Methods in
Streamflows, An Unpublished Technical Report, Dept. of Civil Engineering, Lakehead
University, Thunder Bay, Ontario.
Panu, U.S. (1992) "Application of Some Entropic Measures in Hydrologic Data Filling
Procedures", Entropy and Energy Dissipation in Water Resources, Kluwer Academic
Publishers, Netherlands, 175-192.
Shannon, C.E. (1948) "The Mathematical Theory of Communication", Bell System
Technical Journal, 27, 379-428; 623-656.

APPENDIX: ENTROPIC MEASURES OF UNCERTAINTY IN HYDROWGIC


DATA

The entropy of a system is a measure of its degree of disorder. Shannon (1948) first
applied the concept of entropy to measure the information content of a system. Khinchin
(1957) reports on Shannon's entropy in dealing with a finite scheme, which is applicable
to a hydrologic data series. Entropy (H) as measure of uncertainty of a system is defined
as follows:
II

H(Pt,p2,···,pIl) = - E Pk In(Pk)
k=t
Where, n is the number of states and p" is the probability of the ICh state in a fmite
scheme.
The maximum entropy occurs when all outcomes are equally likely. For a series
of n equally likely events, the probability of each event is lin, and the maximum entropy
of the system is obtained as follows.
HJI/IJU. = In(n)
While grouping the segmented data into clusters, an adjustment must be made to the
definition of entropy (H) to account for the clusters. The entropy for clustering, He, can
be computed as given below.
w lit

Hc = - E
kal
p(sJ E
c-l
P(Ck) In[p(cJ]

Where, w is the total number of seasons per year, nil. is the total number of clusters in
any season k, p(cJ is the probability of cluster c in season k, and p(sJ is the probability
of season k.
The value of entropy for clustering does not take into account the ordering of
202 c. GOODIER AND U. PANU

clusters. That is, the clusters could have occurred in any order, and still the value of
entropy would be same. To further look into the effect of ordering (Le., the dependence
among clusters), the entropy of a Markov chain as applicable to the AS model is
examined.

Entropy of a Markov Chain (AS Model): Domenico (1972) describes the entropy of
a Markov chain (H...) as the average of all the individual entropies of the transitions (H0,
weighted in accordance with the probability of occurrence of the individual states. It is
noted that there are as many states as there are clusters. The Markovian entropy can be
expressed as follows.
II

H .. = -E
i=1
PiHj

Where, n is the number of states and Pi is the probability of occurrence of the state i.
A measure of reduction in uncertainty can be obtained by taking the difference
between He and Hm. In other words, the clustered data (Le., the dependence among
clusters) is treated as Markovian rather than random. For the CS model, the entropy of
a combined system consisting of two related systems must be used rather than the
Markovian entropy.

Entropy of a Combined System [CS Model]: Domenico (1972) suggests that entropic
measures can be applied to situations where two related systems are observed. The
applicability of such an entropic measure to the CS model is apparent. That is, two sets
of observed data are available for the CS model, namely, streamflow data with a missing
segment, and concurrent data with no missing segment.
The measure of entropy of one system (say, the system X, be the streamflow data
with the missing segment), given the knowledge of the observations in the other system
(say, the system Y, be the concurrent data without the missing segment),is obtained as
follows.
II II

H(XIY) = - E E P(Xi,y)ln[P(Xi Iy)]


i=1 j-l

Where, P(Xi IYj) is the conditional probability of system X being in state Xj, given that
system Y is observed to be in state yj' and P(Xj,y) is the joint probability of Xj and Yj.
The original measure of entropy of system X, [H(X)] can be computed from II.:
for a clustered system. Thus, the measure of uncertainty reduced in X, after observing
Y, is obtained as follows.

H(X ~ Y) = He - H(XIY)
where, H(X ~ Y) represents the decrease in uncertainty in X, after observing Y.
PART IV

NEURAL NETWORKS
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION

MU-LAN ZHUI, M. FUJITAI, and N. HASHIMOT0 2


IDepartment of Civil Engineering
Hokkaido University
Sapporo, 060 Japan
2Hokkaido Development Bureau
Sapporo, 060 Japan

In this paper, a new method to forecast runoff using neural networks (NNs) is
proposed and compared with the fuzzy inference method suggested previously by the
authors (Fujita and Zhu, 1992). We first develop a NN for off-line runoff prediction.
The results predicted by the NN depend on the characteristics of training sets. Next,
we develop a NN for on-line runoff prediction. The applicability of the NN to runoff
prediction is assessed by making 1-hr, 2-hr and 3-hr lead-time forecasts of runoff
in Butternut Creek, NY. The results indicate that using neural networks to forecast
runoff is rather promising. Finally, we employ an interval runoff prediction model
where the upper and lower bounds are determined using two neural networks. The
observed hydrograph lies well between the NN forecasts of upper and lower bounds.

INTRODUCTION

Neural network architectures are models of the neurons, they are not deterministic
programs, but learn examples. Through the learning of training sets which consist of
pairs of inputs and target outputs, neural networks iteratively adjust internal parameters
to the point where the networks can produce a meaningful answer in response to each
input. After the learning procedure is complete, information about relationship between
inputs and outputs, which may be non-linear and extremely complicated, is encoded
in the network.
As we know, the relationship between rainfall and runoff is non-linear and quite
complex due to many related factors such as field moisture capacity and evaporation
rate, etc. A mathematical definition of this kind of relationship is difficult, thus it
would be attractive to try the neural network approach which accommodates this kind
of problem.
The runoff prediction can be classified into several cases based on accessibility of
hydrological data. Table 1 shows the two cases considered in this paper, where
question mark "?" denotes the unknown future runoff that we are about to forecast.
If runoff information at every moment by the current time for the present flood is
205
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 205-216.
© 1994 Kluwer Academic Publishers.
206 M.-L. ZHU ET AL.

available, the authors call it the on-line case. On the contrary, if this infonnation is
not available for the present flood, the authors call it the off-line case.
To forecast runoff for the present flood, neural networks need first to learn about
previous flood events. The learning procedure is conducted using the back-propagation

TABLE 1. Classification of runoff prediction


present flood
cases previous flood data
past present future

rainfall data accessible inaccessible


off-line accessible
runoff data inaccessible ?

rainfall data accessible inaccessible


on-line accessible
runoff data accessible ?

learning algorithm involving a forward-propagation step followed by a backward-


propagation step (Rumelhart et al. 1986). Figure 1 illustrates a fully interconnected
three-layer network. The details of forward and backward propagation steps are
described as follows:
Xl

XII 02,
input layer output layer
Figure 1. A fully interconnected, three-layer neural network.

Forward-propagation step:

This step calculates the output from each processing unit of the neural network by
starting from the input layer and propagating forward through the hidden layer to the
output layer. Each processing unit except the units in the input layer takes a weighted
sum of its inputs and applies a sigmoid function to compute its output. Specifically,
given an input vector [xl'~'''' ,xm] , the outputs from each layer are calculated in
this way:
First, the input layer is a special case. Each unit in this layer just sends the input
values as they are along all the output interconnections to the units in the next layer.
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 207

input layer: i=l,2 ... m (1)


where OJ denotes the output of unit i in the input layer.
For the hidden and output layers, the output calculation of each processing unit is
identical: taking a weighted sum of inputs and using a sigmoid function f on the sum
to compute the output.
m
hidden layer: Slj=E OJ<o>ljj +01j j=l,2 ...n (2)
;=1

(3)

output layer:
"
s2 k=E 01j<o>2tj+02k k=1,2 ...p (4)
j=1

(5)

where olj, 02k are the outputs from unit j in the hidden layer and from unit k in the
output layer respectively. <o>lji is the interconnection weight between i-th unit in the
input layer and j-th unit in the hidden layer, and 6)2tj is the interconnection weight
between j-th unit in the hidden layer and unit k in the output layer. 01j , 02k are the
biases of unit j and unit k respectively. Function f, a sigmoid curve, is expressed by
(6), where x is defined over (-00, +00) so that the function values result in the
interval (0,+1).
1
f(x)=-- (6)
l+e-x

Backward-propagation step

The back-propagation step is an error-correction step which takes place after the
forward-propagation step is completed. The calculation begins at the output layer and
progresses backward through the hidden layer to the input layer. Specifically, the
output value from each processing unit in the output layer is compared to the target
output specified in the training set. Based on the difference between output and target
output, an error value is computed for each unit in the output layer, then the weights
are adjusted for all of the interconnections that go into the output layer. Next, an
error value is calculated for all of the units in the hidden layer and the weights are
adjusted for all of the interconnections that go into the hidden layer. The following
equations indicate this error-correction step explicitly.
For unit k in the output layer, its error value is computed as:
ak=(tk -02k)*!'(s2J (7)
where tk=the target value of unit k in the output layer
/(s2 k) =the derivative of sigmoid function for S2k
208 M.-L. ZHU ET AL.

The interconnection weights going into unit k in the output layer then are adjusted
based on the l> k value as follows:
(iJ2lq~new)=(iJ2kj(old)+T\ l>kolj (8)

where T\ is the learning rate. And the bias Ok for unit k in the output layer is
corrected as follows:
(9)

For unit j in the hidden layer, its error value is computed as:
p
l>j=[L l)k(iJkj] !'(slj) (10)
k=1

The interconnection weight c.>ji which goes to unit j in the hidden layer from unit
i in the input layer is then corrected as follows:
(11)

and the bias for unit j in the hidden layer is corrected as:
Olinew)=Oli°ld)+T\l>j (12)

During the learning procedure, the forward-propagation and back-propagation are


executed iteratively for the training set. The adjustments of the internal parameters of
NN including interconnection weights and biases are continued until the produced
outputs are hardly improved further.
In our following calculations, the NN was initialized by assigning random numbers
from the interval (-1, +1) to all parameters.

APPUCATIONS OF NEURAL NETWORKS

Off-line runoff prediction

In the case of off-line runoff prediction, the runoff data for the present flood is
inaccessible. This limitation determines that the NN should be developed to infer the
future runoff based on only rainfall data. The runoff system equation may be
expressed as:
Q(t)=f fR(t-l),R(t-l-l) ...R(t-d)} (13)
where R, Q, t denote rainfall, runoff and time respectively, and 1, d are two
parameters reflecting the hydrological characteristic of the basin.
As mentioned previously, when a NN is used to address a complex non-linear
problem, such as the relationship between rainfall and runoff, it must first learn from
a training set so that a desired output will be produced. The trained NN is generalized
to deal with inputs which may not be included in the training set. The validity of the
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 209

trained NN naturally depends on the characteristic of the selected training set. In order
to make accurate runoff forecasts, it is important to understand the dependence of the
NN on the training set. Since it is not easy to obtain various kinds of flood data from
observations to examine this dependence, we employed the storage function method
to simulate flood data arbitrarily. Equation (14) shows the basic equation of the
storage function method where R. Q, L denote rainfall, runoff and lag-time
respectively, and K, P denote the storage coefficient and storage exponent
respectively.

Kd~:l =R(t-L)-Q(t) (14)


In the simulation, we set parameters K=60, P=O.6, L=2, and the rainfall inputs be
several patterns of triangles shown in Figure 3-Figure 11. The simulated floods were
then calculated by applying a neural network developed corresponding to (13). It is
easy to find that 1 in (13) is equivalent to L in (14), and d in (13) is related to the
values of K and P. In this paper, the value of d was selected as 41 hr in
accordance to the values of K and P previously stated. The neural network developed
is shown in Figure 2. It should be pointed out that the value of d =41hr is not
deterministic: slight changes on the d value hardly affect the NN's performance.
First we trained the NN
using the two simulated floods R It -2 )
shown in Figure 3, the learning
rate was 1') =0.5. The solid lines R It -3 )
in Figure 3 indicate the two
R It -4- )
simulation floods used as
training data and the black
circles indicate the computed R It -4-1 )
output from the trained NN for
these two floods. The results Figure 2. The NN developed
show that the NN learned the corresponding to (13).
training data extremely well. In
order to examine the degree to which the NN can generalize its training to forecast
floods not included in the training data, we used the trained NN to forecast various
types of floods. Figures 4, 5 show two of them, where the solid lines denote the O(t)
obtained from (14) and the black squares denote the computed outputs from the NN.
The results shown in Figure 4 indicate that the trained NN may yield good results for
the validation data whose time to peak rainfall intensity lies between the ones of
training data and duration of rainfall is equal to the training ones. However, as shown
in Figure 5 the trained NN works poorly for the other validation data whose duration
of rainfall is larger than the training data.
Furthermore, we trained the above NN by presenting another training set which
consisted of the four typical simulated floods shown in Figure 6. Five flood events
as shown in Figure 7-Figure 11 were provided as validation data, where the two
floods shown in Figures 7, 8 are just the same as previous validation data shown in
Figures 4, 5. The forecast results shown in Figure 7-Figure 11 indicate that the NN
210 M.-L. ZHU ET AL.

0 0 0
5 5 5
10 Tr=15~r) 10 10
-e rp=10 mm/hr) -e -e
~ Tp=8(hr)
~ ~
~ ..5.
II:
..5.
II:
2 2 2

Figure 3. The two floods Figure 4. The first flood Figure 5. The second flood
as training data. for validation. for validation.

~~~~~~~~~o
5
Tr=15(hr) 10

I
10

f
rp=10(mm/hr)
Tp=8(hr)

If
2

o 10 20 30 40 50 r)

Figure 6. The four floods Figure 7. The first flood Figure 8. The second flood
as training data. for validation. for validation.

0 0
5 5
Tr=10(hr)
E 10 ~10(mm/hr) 10

I
-e p=8(hr)
~ ~E
If ~
2 2 2

Figure 9. The third floods Figure 10. The forth flood Figure 11. The fifth flood
for validation. for validation. for validation.

works well for these five floods. Although the rainfall input profiles in the validation
data were completely different from those used in the training data, their durations of
rainfall Tr's, their peak rainfall intensities rp's as well as their time to peak rainfall
intensity Tp's could be seen to fall within the ranges of rainfall inputs used in the
training data. This characteristic is here defined as interpolation, while the contrary
is defined as extrapolation. By comparing Figure 5 with Figure 8, we can see that the
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 211

performance of NN depends on the training data used. Specifically, the performance


of NN depends on whether the rainfall input of the forecasted flood event is an
interpolation or extrapolation of training set.
In conclusion, when a NN is developed to forecast runoff off-line, its performance
depends closely on the training set. It is hard to generalize a trained NN to forecast
floods, unless the NN is trained using data representative of the expected flood
hydrographs. Besides, when a NN is applied to an actual basin, introducing some
suitable parameters reflecting the basin initial condition into the NN is necessary.
Otherwise, the NN can't achieve a convergence state in most cases when it tries to
learn various observed floods having different basin initial conditions. This is unlike
the simulation study here where the basin initial conditions were all the same and
may simply be ignored. The applications of NNs to forecast runoff off-line in actual
basins are being further studied.
In the following section, we introduce another method to forecast runoff on-line.
For this method, the utilization of current accessible runoff data provides an effect to
correct runoff forecasting at every moment through inputting the new runoff data. The
method shows applicable to actual basins.

On-line runoff prediction

In the case of on-line runoff prediction, current runoff data is accessible for the
present flood. Therefore, we may express the runoff system equation as:
(15)
ll.Q(t)=/ fR(t-l) ...R(t-m),ll.Q(t-l) ... ll.Q(t-n)}

where ll.Q(t)=Q(t)-Q(t-l) , parameters m, n can be properly chosen by taking account


of the hydrological characteristics of the basin and the forecast lead-time. In this
paper, we have made I-hr, 2-hr and 3-br lead-time forecasts of the runoff in
Butternut Creek, New York. The simplified runoff equation at this basin was adopted
as:
(16)
llQ(t)=/ fR(t-3),R(t-4),IlQ(t-l)}

The NN developed in response to the above runoff system equation is shown in


Figure 12. It should be noted that the structure of this NN is partly interconnected
since the natures of rainfall inputs and runoff inputs are distinct. Since the output
from the network is the increment of runoff which may take positive or negative
values, the sigmoid function described in the previous section was redefined by (17),
where the range of x is also defined from -co to +co but the function values result
in the interval (-1,1).
2
f(x) = - - -1 (17)
l+e-x
We have five flood events in Butternut Creek. The first two floods shown in
Figure 13 were chosen as training data and the last three floods as validation data,
where the flood scales of validation data are almost the same level as the ones of
212 M.-L. ZHU ET AL.

training data. We have concluded that


NNs may work well for interpolation
problems but poorly for extrapolation R (t - 3 I
problems in subsection (1), and this is
the point for choosing a suitable RIt -4 I
training set.
The internal parameters of the NN
were tuned in training procedure and AQ (t -1 I
remained fixed to make I-hr, 2-hr
and 3-hr lead-time forecast for Figure 12. The NN developed
validation data. The learning rate T} corresponding to (16).
was set 0.15 in the training procedure.
The authors have also studied forecasting floods in an adaptive forecasting
environment where the internal parameters of the NN are updated at every moment
when new flood information is received (Zhu and Fujita, 1993). However, there is
a practical difficulty with this method because the computation time increases
substantially to forecast runoff with desired accuracy.
The prediction algorithm can be expressed as follows:
l1Q'(t+l)=/ {R(t-2), R(t-3), l1Q(t)} (18)

l1Q'(t+2)=/ {R(t-1), R(t-2),l1Q'(t+ 1)} (19)


l1Q'(t+3)=/ {R(t), R(t-1), l1Q'(t+2)} (20)
where the denotations "I" are
added to forecasted values to
distinguish from observed values.
The I-hr' 2-hr and 3-hr lead-time
the flood in
Oct. 17, 1975 the flood in i
Oct. 7, 1976 10
forecast results for the flood in
Oct. 20, 1976, as examples, are
shown in Figures 14, 15, 16
respectively. From these figures, we
can see the condition that the
prediction error is gradually Figure 13. The two floods as
enlarged from I-hr to 3-hr lead- training data.
time forecasts. The forecast results
for the validation data are evaluated based on the relative error of peak flow
(Q;-QpJIQ;, the time difference of peak flow (t; -tp"} and variance (QI_Q fl)2/n
(where Q;, t;, QI are the forecasted peak flow, the time to forecasted peak flow and
the forecasted flow respectively; Q;, t;,QD are the observed ones; n is the number
of samples on the forecasted flood). The evaluation results are shown in Table 2.
Figures 14, 15, 16 and Table 2 indicate that the prediction accuracy is good even
though the developed NN is extremely simple. However, overestimation around peak
flow is seen in the forecast results. This is because that the NN has an effect of
inertia: it is unlikely to produce a negative output of l1Q'(t+ 1) with enough
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 213

magnitude under a positive input of 11Q(t) when flow turns from an increasing to a
decreasing stage. Furthermore, a larger positive prediction error may be caused in the
further lead-time forecasts since the predicted values are used as indicated in the
prediction algorithm in (19), (20).

0 0 0
"C" "C" "C"

5i 5i 51a:
10 10 10
-c3 -.::3
~ ~
~ 0~

Figure 14. 1-hr lead- Figure 15. 2-hr lead-time Figure 16. 3-hr lead-
time forecast results. forecast results. time forecast results.

TABLE 2. Evaluations for the results forecasted by neural network method


relative error time difference variance
of peak flow to peak flow

I-hr lead-time forecast 0.0109 1hr 1.025E-03


the flood in
Oct. 20, 1976 2-hr lead-time forecast 0.0529 Ohr 6.177E-03
3-hr lead-time forecast 0.0808 1hr 1.765E-02
I-hr lead-time forecast 0.0138 Ohr 4.836E-04
the flood in
2-hr lead-time forecast 0.0546 1hr 2.412E-03
Sept. 26, 1977
3-hr lead-time forecast 0.0879 1hr 6.441E-03
I-hr lead-time forecast 0.0456 -lhr 3.512E-03
the flood in
Oct. 16, 1977 2-hr lead-time forecast 0.1012 Ohr 1.948E-02
3-hr lead-time forecast 0.1514 1hr 5.44E-02

The authors have previously studied the forecasting of runoff using the fuzzy
inference method. The method first establishes the fuzzy relation between rainfall and
runoff based on previous flood data; then it forecasts the future runoff for the present
flood through making a fuzzy reasoning based on the above obtained fuzzy relation.
The method was applied to forecast the same flood events in Butternut Creek. Figures
17, 18, 19, as examples, show the 1-hr, 2-hr and 3-hr lead-time forecast results for
214 M.-L. ZHU ET AL.

the flood in Oct. 20, 1976 respectively. Table 3 presents the evaluation results for
these forecasts based on the same criteria as shown in Table 2.

~~"r-~~~~O

51
-e-
Lead llme=2hr

10

Figure 17. 1-hr lead- Figure 18. 2-hr lead-time Figure 19. 3-hr lead-
time forecast results. forecast results. time forecast results.

TABLE 3. Evaluations for the results forecasted by fuzzy inference method


relative error time difference variance
of peak flow to peak flow

I-br lead-time forecast 0.0297 1hr 1.763E-03


the flood in
Oct. 20, 1976 2-br lead-time forecast 0.0861 Ohr 1.08E-02
3-br lead-time forecast 0.1617 Ohr 3.258E-02
I-br lead-time forecast 0.047 Ohr 8.2E-04
the flood in
Sept. 26, 1977 2-br lead-time forecast 0.1171 1hr 5.757E-03
3-br lead-time forecast 0.2201 Ohr 1.904E-02
I-br lead-time forecast 0.0511 -lhr 3.488E-03
the flood in
Oct. 16, 1977 2-br lead-time forecast 0.1198 Ohr 1.913E-02
3-br lead-time forecast 0.1909 1hr 5.5E-02

Comparing Table 2 with Table 3 and Figures 14, 15, 16 with Figures 17, 18, 19,
we can see that the prediction accuracy of the NN method shows slightly better
performance than that of the fuzzy inference method. Besides, the calculation time of
the NN method for forecasting runoff is extremely short, since the time-consuming
job of tuning the internal parameters of the NN can be executed in advance. However,
when we try to make a long lead-time forecast of runoff, predicted information for
rainfall is needed. At the present level of technology , this information is provided
APPLICATION OF NEURAL NETWORKS TO RUNOFF PREDICTION 215

by weather radar described qualitatively as weak, medium, or strong intensity. To


utilize this kind of qualitative rainfall prediction information for making runoff
prediction, the fuzzy inference method appears quite useful. A successful attempt to
utilize the qualitative rainfall prediction information to make a long lead-time
forecast of runoff was made by the authors recently (Zhu and Fujita, 1994). The aim
of our next study is to develop a new method that combines the advantages of the
NN and fuzzy inference methods.

Interval prediction

In principle, the more training sets a NN learns from, the more accurate output it will
yield. These training sets should consist of various types of floods. On the other hand,
such various types of floods as training sets make it difficult for a NN to converge
to a desired state. To avoid this difficulty, we may employ a modified learning
algorithm proposed by Ishibuchi which was originally developed for determining the
upper and lower bounds of a nonlinear interval model (Ishibuchi and Tanaka, 1991).
The modified learning algorithm is the same as the back-propagation learning
algorithm, except that it introduces a coefficient C to (7)
(21)
a1 =c*(t1 -02J*!(s2J
First, in the learning procedure, if the output value 021 from NN is more than or
equal to target value t1 , let C take a very small positive value a, and thus the
internal parameters of NN such as interconnection weights and the biases whose
corrections are based on the magnitude of error value a1 will be changed only
slightly. On the other hand, If 021 is less than t1 , let C take 1, and thus the internal
parameters will be corrected substantially. As a result, through presenting the training
set to NN iteratively, the outputs from NN fmally will be expected to be all larger
than or equal to the target values. The NN trained in this way is defined as NN·.
Second, let C take its value just contrary to the above case. And thus, after the
learning is completed, the outputs from NN will be expected to be all smaller than
or equal to the target values. The NN trained in this way is defined as NN•.
We consider that this modified learning algorithm may be applied to make an
interval prediction of runoff where the lower and upper bounds of interval are
provided by NN. and NN· respectively. Especially, when a NN fails to achieve a
convergence state for learning various types of floods simultaneously, applications of
NN. and NN· may be meaningful since NN. and NN· may automatically focus on
small runoff cases and large runoff cases respectively.
Again, we chose the Butternut Creek for calculation, and adopted the same runoff
system equation and the same structure of neural network as stated in subsection (2).
Using the above modified learning algorithm, we obtained two different neural
networks NN. and NN· through learning the same training set which consists of the
first two floods in this basin shown in Figure 13. In the calculation, a was set at
0.01. The trained NN. and NN· then were used to carry out the forecast task for the
216 M.-L. ZHU ET AL.

last three floods. The results of the 1-br lead-


time forecast for the flood in Oct. 20, 1976
are shown in Figure 20, where the solid lines
represent observations, the dashed lines
Lead Time= 1hr
51
represent the predictions provided by NN. and II:
10
the skip-dashed lines represent the results
predicted by NN·.
Figure 20 shows that the observed
hydrograph falls within the predicted upper and
lower bounds, and the other two results show
the same condition. It should be pointed out
that the aim of above application in Butternut
Creek was to examine the validity of the Figure 20. The result of interval prediction
modified learning algorithm for making interval for the flood in Oct. 20, 1976.
predictions. However, the floods in this basin
can be forecasted well employing a single NN as described in subsection (2).

CONCLUSIONS AND REMARKS

Several neural networks have been developed to forecast runoff in three manners, an
off-line prediction, an on-line prediction and an interval prediction. The dependence
of the NNs' performance on the training set for the off-line prediction was discussed
and this work helped us understand how to construct an adequate training set.
However, for the off-line prediction, further study about the application to actual
basins is still needed. On the other hand, the method for the on-line prediction
appeared very applicable. The interval runoff prediction model, which consisted of two
neural networks provided a way to estimate the upper and lower bounds' of a flood.

REFERENCES

Fujita, M. and Zhu, M.-L. (1992) "An Application of Fuzzy Theory to Runoff
Prediction", Procs. of the Sixth IAHR International Symposium on Stochastic
Hydraulics, 727-734.
Rumelhart, McClelland and the PDP Research Group (1986) Parallel Distributed
Processing, MIT Press, Cambridge, Vol.1, 318-362.
Zhu, M.-L. and Fujita, M. (1993) "A Comparison of Fuzzy Inference Method and
Neural Network Method for Runoff Prediction", Proceedings of Hydraulic
Engineering, JSCE. Vol.37, 75-80.
Zhu, M.-L. and Fujita, M. (1994) "Long Lead Time Forecast of Runoff Using Fuzzy
Reasoning Method", Journal of Japan Society of Hydrology & water resources, Vol.7,
No.2.
Ishibuchi, H. and Tanaka, H.(1991) "Determination of Fuzzy Regression Model by
Neural Networks", Fuzzy Engineering toward Human Friendly Systems, Procs. of
the International Fuzzy Engineering Symposium'91, Vol.1, 523-534.
PREDICTION OF DAILY WATER DEMANDS
BY NEURAL NETWORKS

S.P. ZHANG, H. WATANABE and R. YAMADA


Nihon Suido Consultants, Co. Ltd.
Okubo 2-2-6, Shinjuku-ku, Tokyo, 169,
JAPAN

In this paper a new approach based on the artificial neural network model is pro-
posed for the prediction of daily water demands. The approach is compared with the
conventional ones and the results show that the new approach is more reliable and
more effective. The fluctuation analysis of daily water demands and the sensitivity
analysis of exogenous variables have also been carried out by taking advantage of the
ability of neural network models handling with non-linear problems.

INTRODUCTION

An accurate short-term prediction (such as prediction with a lead-time one day)


of daily water demands is required for the optimal operation of city water supply
networks. A lot of researches dealing with the prediction methods have been re-
ported, such as the multiple regression model (e.g., Tsunoi, 1985), the ARIMA model
(Koizumi et al., 1986), and the model based on the Kalman Filtering Theory (Jinno
et al., 1986). This research effort has been made because of not only a great number of
the factors which make daily water demand fluctuating very widely, but also the com-
plexity of the relations between daily water demand and exogenous variables (such
as temperature, weather, etc.) and the stochastical properties of exogenous vari-
ables. In essence, the majority of the short-term water demand prediction models
published have treated daily water demands as a stochastic time series, and described
the relation between daily water demand and exogenous variables by use of a linear
expression. However, a lot of researches have shown that the relations usually are
nonlinear, then these models will no longer be adequate.
In this study we shall predict the daily water demands by use of an artificial neural
network model, which is expected to be able to handle with the non-linear relations.
To verify the efficiency of the neural network model, it will be compared with the
conventional models. The fluctuation analysis of daily water demands and the sensi-
tivity analysis of exogenous variables will also be carried out by taking advantage of
the ability of neural network models handling with non-linear problems.
217
K. W Hipel etal. (eds.).
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 217-227.
© 1994 Kluwer Academic Publishers.
218 S. P. ZHANG ET AL.

NEURAL NETWORK MODEL FOR PREDICTION


OF DAILY WATER DEMANDS

Introduction to neural network model

A neural network is a network system constructed artificially by idealizing the neurons


(nerve cells), and consists of a number of nodes and lines that are called respectively
units and connections (or links). By network structures, neural networks generally
are classified into two types: layered network and interconnected network. It has been
shown that a layered network is suitable to prediction problems due to its abilities of
learning (self-organization) and parallel processing.
Figure 1 shows the structure of a layered neural network, which have a layer of
input units at the top, a layer of output units at the bottom, and any number of
hidden layers between the input layer and the output layer. Connections exist only
between two layers next to each other, and connections within a layer or from higher
to lower layers are forbidden.

Xl~ X=Wl"Xl+W2"X2+W3"X3+8

X2---7w~~ @-7 Y=f(X)=l![ 1+exp( -X)]


X3~3 X unit input
Y uni t output
w connection weight
e threshold value
A,

~ Input
Inpu t Layer i

Hidden Layer j

Output Layer k
~ ,
Output I difference 1- -------'

TeacQsignal

Figure 1. Structure of a layered neural network.

A neural network can be modeled as follows. For convenience' sake, we consider


a neural network consisting of three layers.
Let the unit numbers of input, hidden and output layers be N, M and 1, respec-
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 219

tively. When an input {Ii, i = 1,2,···, N} is given to the units of the input layer,
the inputs and outputs of the hidden layer and output layer units are represented by

y.J = f(Xj) j = 1,2,··· ,M (1)


N
X-J = Lw·I+e·
'J' J
j = 1,2,···,M (2)
i=l
0 = f(Z) (3)
M
Z = LWj~ +8 (4)
j=l-

where y.J output from the unit j of the hidden layer.


X·J input to the unit j of the hidden layer.
f(-) unit output function.
w··
'J
connection weight of the input layer unit i and hidden layer unit j.
e·J threshold value of the hidden layer unit j.
o output of the output layer unit.
Z input to the output layer unit.
WJ connection weight of the hidden layer unit j and output layer unit.
e threshold value of the output layer unit.
For the unit output function f(·) some ex-pressions have been proposed. The Sig-
moid function is the one used most widely.

f(X) = 1 + le_X (5)


Theoretically, the neural network model expressed by (1),,",(5) approximates any
non-linear relations between inputs and outputs with any degree of accuracy by using
enough hidden layer units and setting connection weights as well as threshold values
to be appropriate (Asou, 1988).

Inputs to the neural network model

The inputs to the neural network model for the daily water demand prediction prob-
lem are the exogenous variables which cause the fiactuation of daily water demands.
According to Zhang, et al. (1992) and Yamada et al. (1992) the main factors are the
following five exogenous variables:
1) last day's delivery
2) daily high temperature
3) weather
4) precipitation
5) day's characteristics.
220 S. P. ZHANG ET AL.

The forecasts of daily high temperature, weather and precipitation are available
from the weather report presented in the morning or in the last evening by the
meteorological observatory. For the model user's convenience and considering the
accuracy of weather report, we treated weather and precipitation as discrete variables,
i.e., weather is classified as sunny, cloudy and rainy, and precipitation is classified
as three ranks, i.e., [Ommjday,lmmjday), [lmmjday,5mmjday) and [5mmjd,00).
For the day's characteristics, weekday and sunday (including National Holiday) are
distinguished.
The variable values as inputs to the neural network model are defined as follows.
1). The last day's delivery is transformed into a variable which belongs to (0,1) by
1
11 = (6)
1 + e-a(Q-Q)
------:---==:-

where II is the transformed last day's delivery, Q the real last day's delivery, Q the
mean of delivery records, and (Y a parameter for guaranteeing that the transformed
daily delivery record series {Iit), t = 1,2" . " T} is a uniform distribution, which is
very important to improve the accuracy of prediction.
2). The daily high temperature is also transformed in the same way to the last
day's delivery, that is,
1
12 = -----=- (7)
1 + e-{J(T-T)
where 12 is the transformed high temperature, T the real daily high temperature,
T the mean of daily high temperature records, and f3 a parameter for guaranteeing
that the transformed daily high temperature record series {I~t), t = 1,2, ... ,T} is a
uniform distribution.
3). Weather is quantified as follows.

0.0, for sunny


h = { 0.5, for cloudy (8)
1.0, for rainy
where 13 is the quantified variable corresponding to weather.
For most of statistical models with a linear structure, it is very important how to
quantify a non-quantitative discrete variable. However, due to the ability of neural
network model handling with non-linear problem through learning (self-organization)
process, the quantifying procedure here becomes very simple.
4). Similarly to weather, precipitation is quantified as follows.

0.0, RE[Ommjday, 1mmjday)


14 = { 0.5, RE[lmmjday,5mmjday) (9)
1.0, RE[5mmjday,00)
where 14 is the quantified variable corresponding to the daily precipitation R.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 221

5). Finally, the day's characteristics are quantified as

Is = { 0.0, for Sunday and National Holiday


(10)
1.0, for Weekday
where Is is the quantified variable corresponding to the day's characteristics.

Composition of the neural network model

Identifying the composition of the neural network model means deciding layer number
of network and unit number of each layer. In this study, the input layer has five units
corresponding to the five exogenous variables, and the output layer has only one unit
corresponding to the prediction of daily water demand.
The layer number has been set to 3 and the unit number of the only hidden
layer has been decided to be 17 by the following procedure , which is based on the
philosophy that a simpler structure is a better structure (Zhang et al., 1992).
Step 1: Set the unit number j=l.
Step 2: Train the neural network with the learning procedure (see the next section)
until the difference of the outputs between successive iteretions is within a specified
error.
Step 3: Calculate the mean relative error of outputs.
Step 4: Set j=j+l and repeat the above steps until the mean relative error is less
then a specified expectation of prediction relative error.

Learning process of the neural network model

As mentioned in Introduction to Neural Network Model, in order to obtain an accu-


rate prediction of daily water demand, it is necessary to set the connection weights
of units and the threshold values of each unit to be appropriate. In the case of neural
network models these parameters are to be identified through "Learning". The term
Learning here means the self-organization process through which the neural network
model automatically adjusts these parameters to the appropriate values, when a se-
ries of samples of input-output data (called teacher data or teacher signals) are shown
to the model. If we consider the information processing in neural network models as
a transformation of input data to output data, then model learning can be considered
to be the process through which the neural network model gradually becomes capable
of imitating the transformation patterns represented by the teacher data.
A lot of learning algorisms have been proposed. In this study we use the Error
Back Propagation algorism, the most popular learning algorism. The Error Back
Propagation algorism can be summarized as follows (Rumelhart et al., 1986).
Suppose T sets of teacher data

{I~t), ... , I~), a(t)}, t = 1,2,···, T


222 S. P. ZHANG ET AL.

are gIven. Consider an initial value

for the connection weights and threshold values, respectively. Then the outputs
corresponding the inputs of the teacher data {I?), ... , IW,
t = 1,2"", T} can be
obtained from (1) to (5). Let the outputs as {U(t), t = 1,2"", T} . It is easy to
understand that {U(t), t = 1,2, ... ,T} are different from the outputs of the teacher
data {a(t), t = 1,2, ... , T}, and an error function can be defined as follows.
T
R(O) = ~] a(t) _ U(t))2 (11)
t=1

It is clear that R(O) is a function of connection weights and threshold values. The
Error Back Propagation algorism makes use of the connection weights and threshold
values that minimize the etror function R(O), and a nonlinear programming method as
well as an iteration process are applied to solve the optimization problem and obtain
the optimal (sometimes suboptimal) connection weights and threshold values. In this
study the steepest descent method is used. The final iteration (learning) procedures
are as follows.
T
(k+I)
wJ wy) -1]' l:(8(t) 'lj(t)) (12)
t=1
T
e(k+I) = e(k) - 1] . 2:( 8(t)) (13)
t=1
T
(k+l)
w, w~) - 1]' l:(8(t) . W)k+l) . "I)t) . I;(t)) (14)
'J
t=1

2:( 8(t) . w;k+1) . "Ijt))


T
e(k+1)
J
ejk) - 1] . (15)
t=1
8(t) (a(t) - U(t)) . a(t) . (1 - a(t)) (16)
(t) (1 _
"Ij = lj(t) . lj(t)) (17)
where the superscript (k) indicates the number of learning (iteration), and 1] is a
small positive number, which indicates the step size of the steepest descent method.
In order to avoid the overfitting problem, the criterion that the learning process
ends when the mean relative error of the outputs is less then a specified expectation
of prediction relative error is applied as the stopping criterion. In this study we have
set 1] = 0.25 and the expectation of prediction relative error to 2.0% . For the initial
values of the connection weights and threshold values, some random numbers within
interval [-0.5, 0.5] generated by Monte Carlo method are applied. The learning of the
neural network model has been carried out by taking use of the weather and daily
delivery records from April, 1982 to March, 1990 as teacher data.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 223

PREDICTION RESULT ANALYSIS

With the learned (identified) neural network model, the daily water demands from
April, 1990 to March, 1991 have been predicted and compared with the records(Figure
2). The relative error distribution of the predictions is shown in Figure .3. It can be
seen that the relative error is less than .5% for .3.39 days of the year. The five days
when the relative error is greater than 10% are a special holiday, the New Year day,
and three typhoon-striking days. In general, the predictions by the neural network
model are in an excellent agreement with the records.

CCl 310 _ record prediction


o ?~i0!'~e<l~/:/.l~1~90~j~eS~/3~1~90~/0~7~/3~3~90~/0~C~:2~8~90-/1...1,-'2...
7-Cl-/0-1/-26-9-:......I.'~?/27

Time
Figure 2. Predictions of daily water demands.

3~~

25,)

200
til

0""
OJ
1::~

1("0

50
S( 1. 4%)
0

Relative Error(~)

Figure 3. Relative error distribution of the predictions.

The predictions by the neural network model have also been compared wit.h those
by the multiple regression model, ARIMA model and the model based OIl the Kalman
Filtering Theory (for detail about the compared models, see Zhang et al., 1992). The
results are shown in Table I, where the following three indexes are applied to estimate
224 S. P. ZHANG ET AL.

the fit-goodness of the predictions with the records.

TABLE 1. Comparison of different models

HODEL HRE(%) CC RRHSE


Hultiple Regression Hodel 2.90 O. 764 0.659
ARIHA Hodel 2.80 O. 794 0.623
Kalman Filtering Hodel 2.69 0.808 0.599
Neural Network Hodel 2.12 0.877 0.483

1. Mean Relative Error (MRE).The smaller the mean relative error is, the better
the predictions are.
2. Correlation Coefficient between predictions and records (CC). The larger the
correlation coefficient is, the better the predictions are.
3. Relative Root Mean Square Error (RRMSE). RRMSE=O for the perfect predic-
tions, and RRMSE=l if the predictions are equal to the mean of the records.

It can be seen that all of the indexes show the neural network model gives the
best predictions of the tested four models. From the above results we can say that
the neural network model is more reliable and suitable for a practical purpose.

FLUCTUATION STRUCTURE ANALYSIS OF


DAILY WATER DEMANDS

Although the nonlinear relations between daily water demands and exogenous vari-
ables has been recognized, most of the published methods described the relations with
a linear expression, as pointed out in Section 1. It is clear that these methods can not
give us correct knowledge about the fluctuation structure of daily water demands.
In this meaning the neural network model greatly differs from the conventional
models, that is, the neural network model has a nonlinear structure and can simulate
any complex relations between input and output through learning.
Figure 4 shows some simulation results by the learned (identified) neural network
model. From these results we can say that the neural network model indeed is
able to handle with a nonlinear problem and that a model with nonlinear structure
is necessary to describe correctly the relations between daily water demands and
exogenous variables. However, to identify the nonlinear relations between daily water
demands and exogenous variables more simulations are needed, which is a. meaningful
problem to be studied.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 225

~ ~--------------~--~--~~--~~
: (Sunny, I 1 =372(103n1»:
390 ~.' ..... :.' ..... ~." ....:....... '~.' ..... :.' .. "':." ............
: :. . ...•.....
. . . .. ........;.. .
3S'l :...... i ......:......:...... :...... ~ ..... ;. . :.::>,:<:..... ,.... .
: : : .:.... :
370
. ..........~...... .,;
; ..... : ..... r···· T'
·;·'·'''·'··;·'··<··~·····r··· -:...... : .... .

360 i·····:······:······:······~·····t··~Weekday···i·····
: : : : : ... : .... Sunday&hol iday
350
14 18 22 26 34
400 ~------------------~~--~--~~--~
: (Rain(15mm/day) , It =372 (103n1 »
390 : ..... : ..... : ......:...... :...... : ............:...... :.....•..... :
: ·:· : : .: . : : : : :
... .
.
..
.
.,...... .
3S'l .' .... : .....
·
r'" . :..... ':" .... ;..... :..........
. .
'.: ::·.:·:·:·:·:r····1
..

370 :·····;·····r···· r····T·····;·····;· :.>(' "T'" ··:·····1


360 : ..... : ..... ~ ......:...... :..... :.::/:: .... weekd~y .. ~ ..... :
: : : : . ~ .., ...... : ......... Sunday&hol'iday :
350
14 18 26 34

400 : (Weekday , II =372(103n1):


13
'"o 390 : . :
; ..... : ..... : ......:...... ~ ..... ; .....•....... ............-:. . .;? .... ~.
3S'l ; ..... ;. .....;. .....: ....•. ;. ..... ; ..... >. :...;---.. . ;.
. .
..... ;. ..... ;
. .

370
', .....•..•..•.•~......:~..-~.:::.:.::.:):.:..
. . . . • :'" ; ...... : .. '.'
..'.
.. :.. ....
.
...: ..... :

: .... ;.. ...: " .. ".. .'. .. .... :::-:-:-:= Rainy:


360 ~ ..... ~ ..... ~ ............. :..... ;'" .... ~ ... Sunny:
: : .. ' .. CloudY:
350 ~1~4--~--~18--~--~~--~--~2~6--.---3r0--~--T34

400 ~~----------------~~~~~~~--~
: (Sun~ay&~olid~y. 1:1 =37~ (l 03~)
390 : ..... : ..... ~ ......:...... ;...... : ..... ; ......;.- .... ;... ...; ..... :

~ ~ ~ ~': .: ........;... ~
3S'l ; ..... ; ......;. .....;. ..... :...... : ..... ; ......:..• ,.,-;.;.:: •• :;:" .. :
.
........-..... .. .,~.:. .. ..
i.·· ...:. ..-.::.:.:~. . .,. --~.,., . . ,.;<. <: >. .:. : ....:...... :...... ;
;....
370 ; .....
. . . . .
. . .: "-..;" .: . .. ~. ::::-:~ R~iny ;
360 : ..... : ..... : ......:...... ; . Sunny;
.. .. ~ ...

. : .. Cloudy
3~ ~--~--~--r-~---r--~.--~---r,
~ g 22 H --~~~~"
_ ~

Daily High Temperature(1:)

Figure 4. Fluctuation of daily water demands.


226 S. P. ZHANG ET AL.

SENSITIVITY ANALYSIS OF EXOGENOUS VARIABLES

A sensitivity analysis is necessary to estimate the influences of exogenous variables


on daily water demands. According to definition the sensitivity coefficient of the ex-
ogenous variables can be calculated with the following equations, which are derived
from (1) to ('5).

-aO
aI = f(2) . [1 - f(2)] . A· B (18)

aO
where
aI = {aO/aIl ,···, aO/aIN V, a N x 1 vector.
A = {Wij}, a N x M matrix.
B = {wJ ' f(X j ). [1- f(X j )]}, a M x 1 vector.

Figure 5 shows the changing sensitivity coefficients of the five exogenous variables
in one year, which are the monthly average values of those calculated with the 8 year's
weather and daily water demand records by (18). The results can be summarized as
follows.
1. The daily high temperature is a main factor of the fluctuation of daily water
demands in one year except in the winter season.
2. The influences of the last day's delivery on daily water demand are also very
great, especially in the fall and winter seasons, but the its range of changing
with time is smaller than that of daily high temperature.
3. The influences of the other exogenous variables are far less than those of the
daily high temperature and last day's delivery.

1.8
.., :.6
c::
.~ 1.4 ~I2
..............
CJ
1.2

~ 0.8
U
..,»
.....
0.~
~.4 . ". ~ , .

..,>
..... 0.2 .......
0
'"c::<!l -::'.2
~_""b
?i I~ 11-
CJ) h
-('.4
6 7 S q 10 II 12 2

Month
Figure 5. Sensitivity coefficients of exogenous variables.
PREDICTION OF DAILY WATER DEMANDS BY NEURAL NETWORKS 227

SUMMARY

In this study we applied a neural network model for predictions of daily water de-
mands and verified that the model is very efficient and reliable. The neural network
model differs greatly from most of the statistical model in the ability handling with
a nonlinear problem. The results of the fluctuation structure analysis of daily water
demands and the sensitivity analysis of exogenous variables have suggested that a
neural network model can be used to identify the demand structure due to the self-
organization ability of the model.

REFERENCES
Jinno,k., Kawamura,T., and Ueda,T.(1986) "On the dynamic properties and predic-
tions of daily deliveries of the purification stations in Fukuoka City", Technology
Reports of the Kyushu University 59(4), 495-502.
Koizumi,A., Inakazu,T., Chida,K., and Kawaguchi,S.(1988) "Forecasting of daily wa-
ter consumption by multiple ARIMA model", J. Japan Water Works Association
57{ 12), 13-20.
Tsunoi,M.(1985) "An estimate of water supply based on weighted regression analysis
using a personal computer, J. Japan Water Works Association 54(3), 2-6.
Zhang,S.P., Watanabe,H., and Yamada,R.(1992) ;'Comparison of daily water demand
prediction models", Annual Reports of NSC, Vol.18, N 0.1.
Asou,H.(1988), The Information Processing by Neural Network Models, Sangyo Pub-
lishers, Tokyo.
Yamada,R., Zhang,S.P., and Konda,T.(1992) "An Application of Multiple ARIMA
Model to Daily Water Demand Forecasting", Annual Reports of NSC, VoLl8, No.l.
Rumelhart,D.E., Hinton,G.E., and Williams, R.J.(1986) "Learning Representations
by Back-Propagating Errors", Nature, 323(9), 533-536.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING

GERSON LACHTERMACHER1 and J.DAVID FULLER2


IDepartment of Industrial Engineering, University Federale Fluminse,
Rio De Janeiro, Brazil
2Department of Management Sciences, Faculty of Engineering, University of Waterloo
Waterloo, Ontario, Canada N2L 3Gl

One of the major constraints on the use of backpropagation neural networks as a practical
forecasting tool, is the number of training patterns needed. We propose a methodology
that reduces the data requirements. The general idea is to use the Box-Jenkins models in
an exploratory phase to identify the "lag components" of the series, to determine a
compact network structure with one input unit for each lag, and then apply the validation
procedure. This process minimizes the size of the network and consequently the data
required to train the network. The results obtained in four studies show the potential of
the new methodology as an alternative to the traditional time series models.

INTRODUCTION
Most of the available techniques used in time-series analysis, such as Box-Jenkins
methods (Box-Jenkins,1976), assume a linear relationship among variables. In practice
this drawback can make it difficult to analyze and predict accurately the real processes
that are represented by these time series. Tong (1983) described some drawbacks of linear
modelling for time series. In the last decade several nonlinear time series models have
been studied, such as the threshold autoregressive models developed by Tong & Lim
(1980). These are 'model-driven approaches' (Chakraborty et al., 1992) in which we first
identify the type of relation among the variables (model selection) and afterwards estimate
the selected model parameters.
More recently, neural networks have been studied as an alternative to these
nonlinear model-driven approaches. Because of their characteristics, neural networks
belong to the data-driven approaches, i.e. the analysis depends on the available data, with
little a priori rationalization about relationships between variables and about the models.
The process of constructing the relationships between the input and output variables is
addressed by certain general purpose 'learning' algorithms (Chakraborty et al., 1992).
Some drawbacks to the practical use of neural nets are the long time consumed
in the modelling process and the large amount of data required by the present neural
network methodologies. Present methodologies, depending on the problem, can take
several hours or even days in the neural network calibration process, and they require
229
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 229-242.
© 1994 Kluwer Academic Publishers.
230 G. LACHTERMACHER AND J. D. FULLER

hundreds of observations. However, in order to be a practical methodology these


requirements should be reduced to a few hours and to a few dozen observations. One
cause of both problems is the lack of a definitive generic methodology that could be used
to design a small network structure. Most of the present methodologies use large
networks, with a large number of parameters ('weights'). This means lengthy computations
to set their values, and a requirement for many observations. Unfortunately, in practice,
a model's parameters must be estimated quickly and just a small amount of data are
available.
Contradictory conclusions about the forecasting performance of the neural network
model, compared with the traditional methods have been reached by several authors (Tang
et al.,1991). The explanation for such contradictions may be related to differences in
factors such as the network structure used, the type of series (e.g. stationary,
nonstationary) used in the studies and the relation of the size of the network structure and
the number of entries of the time series.
The goal of this research is to devise and evaluate a practical methodology for the
use of neural networks in forecasting, using noisy and small real world time series. Until
the present time, no method available could satisfactorily deal with this kind of problem.
Our approach is a hybrid of Box-Jenkins and neural networks methods. It uses the Box-
Jenkins methods to identify the 'lag components' of the data that should be used as input
variables, and employs a heuristic to suggest the number of hidden units needed in the
structure of the model. In this way, we can define a small but adequate network, thus
reducing the time used to identify and construct an appropriate model, and reducing the
data requirements. In addition, the use of 'synthetic' time series (Hipel & McLeod, 1993)
as the validation set further decreases the data requirements, helping to overcome the
problem of short time series.
The next section briefly describes the backpropagation learning procedure for
neural networks (Rumelhart et al., 1986), and gives a brief literature review of the
application of neural networks in time series analysis. The following section describes the
methodology used in this paper and the performance tests used to compare the resultant
models to other traditional time series models. Next we present the results obtained. Four
time series have been studied and the performance of the neural network has been
compared to the proper ARMA model and to other time series models. All series are
stationary (annual river flow).
The final section concludes that the forecasting performance of the neural network
models is usually as good as or better then the alternative models. The methodology has
proven to be a practical method to reduce the modelling time by identifying a small
suitable structure that can handle the problem. Moreover, we suggest that this work
should be extended to other types of series not studied here (e.g. nonstationary, seasonal
and cyclic time series).

RELEVANT BACKGROUND
Neural networks are composed of two primitive elements: units (processing elements)
and connections ('weights') between units. In essence, a set of inputs are applied to a unit
that, based on them, fires an output. Each input has its own influence on the total output.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 231

In other words, each input has its own weight in the total output. The connections of
several units (artificial neurons), arranged in one or more layers, make a neural network.
Many network structures are arranged in layers of units that can be classified as
Input, Output or Hidden layers. According to the type of connections between units,
neural networks can be classified as feedforward or recurrent. In feedforward networks,
the units at a layer i may only influence the activity of units at higher levels ( closer to
the system's output). Also, in a pure feedforward system, the set of input units to one
layer cannot include units from two or more different layers. The recurrent models are
characterized by feedback connections. This type of connection links cells at higher levels
to cells at lower ones. In this study we used pure fully connected feedforward neural
networks.
Backpropagation learning procedure (Rumelhart et al., 1986) is a gradient descent
method that establishes the values of the neural network parameters (weights and biases)
to minimize the output errors, based on a set of examples (patterns). This learning
procedure was used in this study.
The resolution of the conflicting conclusions, reported in the literature, may be
connected to the fact that a neural network model's performance is highly dependent on
its structure, activation function of the units, type of connections, backpropagation
stopping criteria, data normalization factors and on the overfitting problem, among other
things. Furthermore, no definitive established methodology exists to deal with the neural
network modelling problem and no unique comparison method is used in all studies.
Lapedes & Farber (1987, 1988) applied neural networks to forecast two chaotic
time series, i.e. they are generated by a deterministic nonlinear process, but look like
"random" time series. They concluded that the backpropagation procedure may be thought
of as a particular, nonlinear, least squares method. Their results indicated that neural
networks allow solution of the nonlinear system modelling problem, with excellent
prediction properties, on such "random" time series when compared to traditional
methods. Unfortunately, as they pointed out, their study did not include the effects of
noisy real time series, and the related overfitting problem.
Tang et al. (1991) compared neural networks and Box-Jenkins models, using
international airline passenger traffic, domestic car sales and foreign car sales in the U.S ..
They studied the influence on the forecasting performance of the amount of data used,
the number of periods for the forecast and the number of input variables. They concluded
that the Box-Jenkins models outperformed the neural net models in short term forecasting.
On the other hand the neural net models outperformed the Box-Jenkins in the long term.
Unfortunately, the stopping criteria and the overfitting problems were not investigated.
Therefore, wrong conclusions may have been reached since the networks studied are very
large, compared with the number of training patterns, and the network could have been
overtrained.
Recently, a large number of studies have tried to apply neural networks to short
term electric load forecasting (EI-Sharkawi et al., 1991; Srinivasan et al., 1991; Hwang
& Moon, 1991, among others). Most of the studies used feedforward neural networks
with all the units using nonlinear activation functions (sigmoid or hyperbolic tangent). All
these studies required a large amount of data and had large network structures. The input
232 O. LACHTERMACHER AND J. D. FULLER

data included several types of variables such as weather variables, dummy variables to
represent the day of the week and variables to represent the historic pattern, among
others. Almost all the studies did not discuss the overfitting problem, the relation between
the size of the network and the number of patterns used in the training procedure, the
stopping criteria or how the best structure was found. Most of them just presented the
results without fully explaining how they were obtained. Most concluded that neural
networks performed as well as or better than the traditional methods.
Important work has been done by Weigend (1991) and Weigend et al. (1990,
1991). They introduced the weight-elimination backpropagation learning procedure to deal
with the overfitting problem. They also presented all the relevant information that
characterized the models used in the forecasting of the sunspots and exchange rate time
series. They also discussed the stopping criteria in the validation procedure and compared
the results with traditional time series models.
They concluded that the neural network model performed as well as the TAR
model (Tong, 1983,1990 and Tong & Lim, 1980) in the case of one-step-ahead
prediction. In the case of multi-step prediction the neural net models outperformed the
TAR model. The drawback of the weight-elimination procedure is to increase the training
time by the inclusion of the penalty term and another dynamic parameter, A..
Nowlan & Hinton (1992), used an alternative approach called Soft Weight-Sharing. The
results obtained are slightly better than both compared models. Besides the drawback of
increasing the complexity of the modelling process, compared to the weight-elimination
procedure, the authors concluded that the effectiveness of the technique is likely to be
somewhat problem dependent However, 1he aulhors claim 1he advantage of a more sophisticated
model is its ability to better adapt to individual problems.

ME1HODOWGY
In this section we describe the hybrid methodology developed for the practical application
of neural networks in time series analysis, the performance analysis used to compare the
neural network models to other types of time series models, and the software and
hardware used in the study. In order to investigate the benefits of the methodology, four
distinct stationary time series were used.

Hybrid Methodology
The neural networks' main drawbacks of large data and time requirements are related to
the fact that it has been a data-driven approach, because no definitive procedure is
available for its use in time series modelling. The general idea of our new hybrid
methodology is to use the available Box-Jenkins methodology as an exploratory procedure
which identifies some important relationships in the data series. Based on the information
produced in the exploratory step, we define an initial structure whose small size decreases
the neural network's modelling time and reduces the number of estimated parameters and
the amount of data required. This overcomes the most important practical drawbacks to
the application of neural networks as a forecasting tool. Furthermore, because the Box-
Jenkins models are linear, we believe that the nonlinearities included in the neural
networks models will help to improve the forecasting performance of the final model.
I
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 233

Modelling Procedure:
The modelling procedure consists of basically two steps. The first is called the
exploratory phase and the second is called the modelling phase. In the exploratory phase,
the general idea is to observe the series in order to identify its basic characteristics. In the
modelling phase, the general idea is to use the information obtained in the first step to
aid the design of the neural network structure and then perform the neural network
modelling. In the following sections we will describe each of the methodology's phases.

Exploratory Phase:
In this research, our exploratory phase (Tukey, 1977, Ripel & McLeod, 1993) consists
of two parts. In the first one, we use the plot of the time series and of the autocorrelation
function to try to identify trends in the mean and in the variance, seasonalities and
outliers of the data series. Based on this information the second part consists of the Box-
Jenkins modelling process. This process consists of the identification, estimation and
diagnostic checking of the, appropriate ARMA model (Ripel & McLeod, 1993) for each
time series. The decision of the best model to represent each time series is based on the
Akaike Information Criterion (AIC) developed by Akaike (1974).

Modelling Phase:
An important feature gathered at the exploratory phase is what we call the lag
components of the data series, i.e. the lag entries that are important to forecast a new
element of a given time series in a linear context, as suggested by the autoregressive
parameters of the calibrated Box-Jenkins model. A neural network structure that uses
these lag entries as its input variables should easily be trained to perform the linear
transformation done by the Box-Jenkins models. Furthermore, we expect that part of the
"randomness" term of the linear model is in fact nonlinearity, and it can be learned and
incorporated in the new model by the implicit nonlinearity of the neural network models.

Learning Procedure:
In all the studies performed, we used the backpropagation learning procedure, to train the
networks. The additional momentum term, that on average speeds the process (Hertz et
al. 1991), was also used in most of the studies. There exist several other methods that
speed the training procedure. In order to make it clear that any gains in the speed of
modelling are due to the hybrid methodology, we purposely used only the well known
technique of backpropagation, with a momentum term, and we avoided the use of any
other speeding technique.

TrainingNa1idation Procedure:
As mentioned before, an important issue in the application of neural networks is the
relation of the size of the network (and so the number of free parameters) to the number
of training examples. Like other methods of function approximation, such as polynomials,
a large number of parameters will allow the network to fit the training data closely.
However, this will not necessarily lead to optimal generalization properties (i.e. good
forecasting ability, in the present context).
234 G. LACHTERMACHER AND J. D. FULLER

Weigend et al. (1990), suggested two methods to deal with the overfitting
problem. The first one, the weight-elimination backpropagation procedure, was mentioned
in the last section. The second one involves providing a large structure which contains
a large number of modifiable weights but stopping the training before the network has
made use of all of its degrees of freedom. As pointed out by Weigend et al. (1990,1991).
the problem with this procedure is to determine when the network has extracted all useful
information and is starting to incorporate the noise in the model parameters. They
suggested that part of the available data should be separated to serve as a validation set.
The performance on this set should be monitored and the training should be stopped when
the error on the validation set starts to decrease slowly ( almost constant ) or to increase.
Three problems of the latter procedure were pointed out by Weigend et al. (1990).
The first one is that part of the time series available cannot be used directly in the
training process. In some cases. the available series do not present more than 40 to 50
elements. so it is impractical to separate one part of the series to be used as a validation
set, because all available data are needed for training and performance evaluation. The
second problem is related to the pair of training set and validation set chosen. The authors
found out, when studying the sunspot series, that the results depend strongly on the
specific pair chosen. The last problem is related to the stopping point. They point out that
is not always clear where the network is starting to learn the noise of the time series.
They proposed the weight-elimination procedure to overcome these problems. However,
the drawback of weight-elimination is the increase of the training time.
Our training/validation procedure is based on Weigend's validation methodology.
However, it deals with the overfitting and learning time problems in a different way. The
general idea is to use the information obtained in the exploratory phase of the
methodology to establish a small network (and so, few parameters) that would learn on
the lag components of the time series. To avoid the need to use real data for validation,
we generate a synthetic times series, that possesses the same statistical properties of the
original time series, using the Waterloo simulation procedures (Hipel & McLeod, 1993),
to be used as a validation set. The forecasting performance on this validation set is then
monitored as the training proceeds, as suggested by Weigend et al. (1990,1991), and the
stopping decision is based on this performance.

Stmctwe of NeUJ1ll Netwolk Model:


In this study, we used a fully connected feedforward network which has an input layer,
one hidden layer and a single unit output layer. This type of network was suggested by
Lapedes & Farber (1987) and used by Weigend (1991) and Weigend et al.( 1990, 1991).
However, instead of using a linear activation function in the output layer, we are going
to use a nonlinear (sigmoid) one which was originally used by Rumelhart et al. (1986).
The number of units in the input and hidden layers varies according to the
calibrated Box-Jenkins model and the amount of data available to train the network. The
number of input units is initially determined by the number of autoregressive terms of the
calibrated ARMA model. Furthermore, in order to test the adequacy of the model, a
model with an additional input is also tested as a type of sensitivity analysis.
The number of units in the hidden layer is based on the number of input units and the
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 235

number of training patterns. As said before, the relation of the number of weights and the
number of patterns is important to guarantee a good generalization performance of the
neural network model. Therefore, given the fact that the validation procedure (Weigend
et al., 1990, 1991) suggests an initial large structure, the number of hidden units is
determined in order to make the number of weights follow the heuristic rule given by :

1.1P s H(1+1) < 3P (1)


10 10

where
p is the total number of patterns used to train the network,
H is the number of hidden units used in the structure, and
I is the number of input units used in the structure.

This heuristic is based on the relation of one weight to ten patterns suggested in
the literature (Weigend et al., 1991) in order to obtain good generalization properties. The
idea is to have a structure that is 1.1 to 3 times bigger than the relation of one to ten
suggested in the literature.

Parameters' Initialization:
In this paper all the initial sets of weights and biases were randomly generated from a
uniform distribution in the range from -0.25 to 0.25. The same range was used by
Weigend et al. (1990,1991) in their studies of neural network modelling in order to avoid
bad solutions.

Learning and Momentum Rates:


Several pairs of learning (11) and momentum (J.I.) rates were used in this work. The
general idea was to start using the pair of 1'\ = 0.5 and J.I. = 0.9. However, in some studies,
this pair of parameters drove the training process to a local minimum of very high error
values (higher than 2 in a normalized scale). In order to overcome this problem, several
new pairs with smaller parameters were tried. During our experiments we observed, as
expected, that the local minimum problem happens more frequently when using very
small structures. This observation helped to formulate the initial structure heuristic rule
(1) proposed above.

Penonnance Criterion:
Several performance criteria have been used in neural network time series modelling.
Weigend et al. (1990,1991), Weigend (1991) and Nowlan & Hinton (1992) used the
Average Relative Variance (ARV). Gorr et al. (1992) used the Mean Error (ME), the
Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE). Coakley et al.
(1992) studied several alternative performance measures such as the Mean Squared Error
(MSE), the Root Mean Squared Error (RMSE) and the Mean Absolute Percentage Error
(MAPE). In this study we used MSE as the cost function for backpropagation, and RMSE
to determine the stopping point of training and to evaluate forecasting performance.
236 G. LACHTERMACHER AND J. D. FULLER

Patterns Nonnalization:
Because of the output range of the sigmoid unit ( zero to one ), the patterns used to train
the neural network should be normalized. Several normalization procedures are described
in the literature. The most common one is to divide all the values by one number that is
larger than the biggest value presented in the studied time series. In this paper, this
number is refereed to as the normalization factor (NF). This procedure was used by
Weigend et al. (1991) in their studies of the sunspot time series.
In the case of stationary series, sometimes we want to forecast values which might
be outside the historical range. Therefore we choose a normalization factor that ranges
from 30% to 100% larger than the maximum value in the historical training patterns. The
value varies according to visual inspection of the plot of the series (variance), that was
done during the exploratory phase of the methodology.

Stopping Criterion:
The stopping criterion used in our methodology is similar to Weigend et al.'s (1991)
validation procedure. During the training process we monitor the performance of the
validation set (synthetic time series) using the RMSE criterion. Gorr et al. (1992) used
the RMSE in their validation process,but they the authors used part of the original series
as the validation set. These measurements are made in the original range of the time
series. The general idea is when the RMSE starts to increase, or to decrease very slowly,
the process should be stopped in order to avoid overlearning and deterioration of the
generalization properties of the resultant model. Furthermore, we observe during our study
that a better criterion is to observe the behaviour of the validation and training sets
together. We noticed that when the training is starting to overlearn, the trends of the
validation RMSE and training RMSE plots are in opposite directions. This was observed
in most of the problems studied.

Sensitivity Analysis:
In order to check whether the neural network structure is adequate, an additional input
unit is included in the initial structure suggested by the calibrated Box-Jenkins models.
If the results do not present significant improvement, the first model is maintained. If the
model with the extra input unit is preferred, then a new unit is added to the input layer,
and the process is repeated. Additional hidden units are also tested. To be consistent with
our aim to minimize data requirements, we compare "forecast" performance by RMSEs
on the synthetic data. This kind of procedure has been used by Ripel & McLeod (1993)
in order to test the ARMA model in relation to the overfitting problem.

Petfonnance Analysis
In this section we describe the types of time series studied, and the types of prediction
performed in each case.

Description of the TIme Series:


Only stationary time series were used in this study. They were represented in this study
by four annual river flow time series. The sizes of these series vary from 96 to 1SO
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 237

elements. These series were chosen because they were previously studied by Hipel &
McLeod (1993) in a comparison study among the ARMA model and several other time
series models. In order to obtain a fair comparison of the neural network models
developed and the results obtained by Ripel & McLeod (1993), the same conditions
(modelling and prediction data sets) were used to train the neural networks.

Types of Prediction:
The forecasting abilities of the neural network models were tested over a period of time
of 30 years depending on the size of the studied time series. Two types of prediction
were performed, the one step ahead and the multi-step ahead.
In one step ahead prediction, only the actual observed time series is utilized as the
input of the models (neural networks and Box-Jenkins) to forecast the next entry of the
series. Each new forecasted entry is then independent and uncorrelated from the previous
forecasts.
Two types of multi-step ahead prediction are reported in the neural network
literature. The first type is called iterated multi-step prediction. In this type, in order to
obtain several steps ahead of prediction, the results obtained in the model are used in the
subsequent forecasts. In this case, the network has just one unit in the output layer, which
represents the next forecast entry of the time series. This forecasting technique is similar
to the one used by the Box-Jenkins models.
In the second type of multi-step ahead prediction, the neural network contains
several units at the output layer of the structure, each representing one step to be
forecasted. However, this type of structure requires a larger time series than the first type
of multi-step prediction in order to avoid generalization problems. Since the goal of this
research is to develop a methodology that uses small real world time series, this technique
was not used. Instead, we used iterated multi-step predictions.

Computational Resources
In the case of the neural networks models, we used the Hybrid Backpropagation (HBP)
software developed by Lachtermacher (1991,1993). The software was rewritten in Turbo-
Pascal® utilizing Turbo Vision® (OOP library) to increase the processing speed of the
software. The training was done in a 486 - 50MHz PC compatible. The training time of
a model ranged from 30 minutes to 3 hours depending on the number of training patterns
and the structure used.

RESULTS
Hipel & McLeod (1993) compared the performance of the ARMA models against several
stationary models, using the river flow time series. We expand this study with the
inclusion of the results of the calibrated neural network models, using the same modelling
and prediction sets. All the time series models are well described in Hipel & McLeod
(1993). Table I summarizes the results obtained.
238 G. LACHTERMACHER AND 1. D. FULLER

Table L One Step Ahead Prediction Time Series Models' RMSE Comparison

Model\Series Neumunas Gota Mississippi St.Lawrence


ARMA 118.30· 87.58· 1508.03· 473.89·
"FGN 115.80· 95.57· 1543.56· 630.55·
FDIFF 116.12· 97.66· 1574.85· 875.91·

Markov 114.70· 97.45· 1625.90· 450.85·


Nonparametric 115.40· 92.86· 1560.00· 426.90·
Neural Nets 115.82 85.85 1498.49 472.47
* Obtamed by Hipel & McLeod (1993)

Table n. Time Series Models' Ranking Summary

Model\Rank 1 2 3 4 5 6 Rank
Sum
ARMA 0 2 0 1 0 1 14
FGN 0 0 2 1 1 0 15
FDIFF 0 0 0 0 2 2 22
Markov 1 1 0 0 1 1 14
Nonparametric 1 1 1 1 0 0 10
Neural Nets 2 0 1 1 0 0 9

Table II presents a ranking summary of the results described in Table I. The rank
sum is simply the sum of the product of the rank and the number of times the model
received that rank. Therefore, models with low rank sums forecast better overall than
models with higher rank sums (Hipel & McLeod, 1993).
As can be seen from Table II, the calibrated neural network models presented the
best overall performance in the one step ahead predictions, considering the four river flow
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 239

studies. Furthermore, in all studies performed using stationary series, at least one neural
network model outperformed the corresponding ARMA model in the one step ahead
predictions.
We should note that the differences in the performances between the calibrated
neural network models and the corresponding ARMA models were very small and may
not justify the additional work to achieve them.
The hybrid methodology proved to be a good approach to forecast stationary series
well, reducing the time used to calibrate the neural network model to about two hours,
typically. Moreover, the use of the synthetic time series as the validation set proved to
be an efficient way to decrease the neural network data requirements in this type of time
series.
An important observation is that the performance of the neural networks' one step
ahead predictions deteriorate with the inclusion of additional hidden and/or input units.
The deterioration is more influenced by the inclusion of an unnecessary input unit than
by a hidden one. Therefore, more care should be taken in the determination of the
relevant number of input units than in setting the number of hidden units used in the
network structure.
In the case of the multi-step prediction (Table ill) both models have almost the
same performance. Another important fact was noted in this type of prediction. We
observed that the performance of the neural networks improves with the increase of the
number of estimated parameters of the model (by increasing the number of hidden or
input units) in all studies. However, there is not yet an exact explanation of these facts.

Table m Multi-Step Ahead Prediction Time Series Models' RMSE Comparison

Model\Series Neumunas Gota Mississippi St.Lawrence


ARMA 101.34 101.98 1792.57 909.72
Neural Nets 102.84 101.60 1792.97 903.33

Nonnality of the Residuals and Forecast Bias:


The residuals of the training and forecasting sets were tested for normality using normal
probability plots and the skewness and kurtosis statistics. In all cases, the hypothesis of
normality was not rejected at a significance level of 95%. Furthermore, there was no
significant bias( 95%) indicated in the forecasts made by the neural network models.

CONCLUSIONS AND FUTURE RESEARCH


In this research, we developed and tested the Hybrid methodology for the use of neural
networks in time series forecasting. We tailored this methodology to be applied to
stationary and noncyclical series. The methodology consists of two phases, Exploratory
240 G. LACHTERMACHER AND J. D. FULLER

and Modelling. In the exploratory phase we identify the 'lag components' of the series
using the traditional Box-Jenkins methods. Based on these results a small structure is
suggested and then the training is performed. In the following paragraphs we describe the
conclusions reached in this study.

Penonnance on Stationary TIme Series


In the study of the stationary series, we observed that the calibrated neural network
models have a better overall performance than all the traditional time series methods used
in the benchmark. However, the differences in performance were very small, compared
with the corresponding ARMA models, in both one and multi-step predictions. This
suggests that the additional work to create the neural network model may not be
compensated by the improvement obtained in the forecasting.

Derenninldion of the Neural Network Structure


The use of the Box-Jenkins method to identify the 'lag components' of the time series and
to suggest a suitable neural network structure was demonstrated to be effective. Of the
four series studied, in no cases did the sensitivity analysis procedure point out a different
calibrated model than initially suggested by the Box-Jenkins method.

Synthetic Data
In most of the cases, the synthetic data seems to mimic the original data sufficiently well
to suggest an appropriate stopping point, to avoid overfitting. Furthermore, the use of
synthetic data in the sensitivity analysis procedure usually pointed to the model with the
best overall performance in both types of prediction tested.

Data Requirements and Modelling TIme


By reducing the overall modelling time and decreasing the data requirements (compared
with earlier attempts to use neural networks in time series forecasting), the Hybrid
methodology proved to be a practical way to use neural networks as a forecasting tool,
in the case of stationary time series. It should be noted that in the present study, we used
small time series and achieved a better performance than the corresponding ARMA model
in a very short training/validation process.

Future Resesarch
Further research should be developed to tailor the Hybrid methodology for other types
of time series, such as the seasonal and nonstationary time series. In this, some attention
should be given to the use of tradional methods, such as moving average and
deseazonalization factors. Furthermore, the application of the Hybrid Methodology to a
multivariate time series should also be studied.
In addition, the methodology should be tested on more time series in order to
verify the initial results obtained for the stationary time series. Moreover, a special study
should be done to establish a procedure to better identify the adequate normalization
factor(s) to be used in the forecasting process.
BACKPROPAGATION IN HYDROLOGICAL TIME SERIES FORECASTING 241

REFERENCES

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions
on Automatic Control, AC-19, 6, 716-723.
Box, G. E. P., Jenkins, G. M. (1976) Time series analysis: forecasting and control,
Holden-Day, Inc. Oakland, California.
Chakraborty, K, Mehrotra, K, Mohan, C. K, Ranka, S. (1992) Forecasting the behavior
of multivariate time series using neuml networks, Neural Networks, 5,961-970.
Coakley, J. R., McFarlane, D. D., Perley, W. G. (1992) A Itemative criteriaforevaluating
artificial neuml network performance, presented at TIMS/ORSA Joint National
Meeting, April.
El-Sharkawi, M. A., Oh, S., Marks, R. J., Damborg, M. J., Brace, C. M. (1991) Short
term electric load forecasting using an adaptative trained layered perceptron, in
Proceedings of the First Forum on Application of Neural Networks to Power
Systems, 3-6, Seattle, Washington.
Gorr, W., Nagin, D., Szcypula, J. (1992) The relevance of artificial neuml networks to
managerial forecasting; an analysis and empirical study, Technical Report 93-1,
Heinz School of Public Policy and Management, Carnegie Mellon University,
Pittsburgh, PA, USA.
Hertz, John, Krogh, Anders, Palmer, Richard G. (1991) Introduction to the Theory of
Neuml Computation, Addison-Wesley Publishing Co., Don Mills, Ontario, 1-8 and
89-156.
Hipel, KW., McLeod, A.I. (1993) Time series modelling of water resources and
environmental systems, to be published by Elsevier, Amsterdam, The Netherlands.
Lachtermacher, G. (1991) A fast heuristic for backpropagation in neuml networks,
Master's Thesis, Department of Management Sciences, University of Waterloo,
Waterloo, Ontario, Canada.
Lachtermacher, G. (1993) Backpropagation in Time Series Analysis, Ph.D Thesis,
Department of Management Sciences, University of Waterloo, Waterloo, Ontario,
Canada.
Lapedes, A., Farber, R. (1987) Nonlinear signal processing using neuml networks:
prediction and system modelling, Technical Report LA-UR-*&-2662, Los Alamos
National Laboratory.
Lapedes, A., Farber, R. (1988) How neuml nets works, in Neural Information Processing
Systems, ed. Dana Z. Anderson, 442-456, American Institute of Physics, New
York.
Nowlan, S. J., Hinton, G. E. (1992) Simplifying neuml networks by soft weight-sharing,
Neural Computation, 4, 473-493.
Rumelhart, David E., McClelland James L. and The PDP Research Group (1986) Pam/lei
Distributed Processing: Explorations in the Microstructure of Cognition. Volume
1: Foundations, MIT Press, Cambridge, Massachusetts, USA.
Srinivasan, D., Liew, A.C., Chen, J. S. P. (1991) Short term forecasting using neuml
network approach, in Proceedings of the First Forum on Application of Neural
Networks to Power Systems, 12-16, Seattle, Washington.
242 G. LACHTERMACHER AND J. D. FULLER

Tang, Z., Almeida, C., Fishwick, P.A. (1991) Times series forecasting using neural
networks vs. Box-Jenkins methodology, in Simulations, 303-310, Simulations
Councils, Inc., November.
Tong, H., Lim, K. S. (1980) Threshold autoregressive, limit cycles and cyclical data,
Journal of the Royal Stat. Society, series B, 42, 3, 245-292.
Tong, H. (1983) Threshold Models in non-linear time series analysis, in Lecture Notes
in Statistics, ed. D.Brillinger, S. Flenberg, J.Ganid, J.Hartigan and K. Krickeberg,
Springer-Verlag, New York, N.Y., USA.
Tukey, J. W. (1977) Exploratory data Analysis, Addison-Wesley, Reading,
Massachusetts, USA.
Weigend, A. S. (1991) Connectionist architectures for time series prediction of dynamical
systems, PhD Thesis, Department of Physics, Stanford University, University
Microfilms International, Ann Arbor, Michigan.
Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1990) Predicting the future: a
connectionist approach, International Journal of Neural Systems, 1, 3, 193-209.
Weigend, A. S., Rumelhart, D. E., Huberman, B.A. (1991) Back-propagation, weight-
elimination and time series prediction, in Connectionist Models - Proceedings of
the 1990 Summer School, Edited by D.S.Touretzky, J.L.Elman, T.J.Sejnowski,
G.E.Hinton, Morgan Kaufmann Publishers, Inc.
PART V

TREND ASSESSMENT
TESTS FOR MONOTONIC TREND

A. I. MCLEOD 1 ,2 and K. W. HIPEV,3


1 Department of Statistical and Actuarial Science, The University of Western Ontario
London, Ontario, Canada N6A 5B7
2Department of Systems Design Engineering, University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
3Department of Statistics and Actuarial Science, University of Waterloo

The monotonic trend tests proposed by Mann (1945), Abelson and Tukey (1963) and
Brillinger (1989) are reviewed and evaluated. Simulation experiments using a very
large number of simulations are carried out for comparing the Abelson-Tukey and
Mann-Kendall tests. The advantages and disadvantages of each test are discussed
and the practical implementation and usefulness of these test procedures are clearly
demonstrated with some applications to environmental data..

INTRODUCTION
Recently, Brillinger (1989) proposed a new test for monotonic trend. The important
advantage of this test over the Mann-Kendall test (Mann, 1945) is its validity in
the presence of an auto correlated error component. Brillinger demonstrated that in
this case his test would have asymptotic power equal to one whereas other tests,
which do not take into account the autocorrelation of the error component, would by
comparison have an asymptotic relative efficiency of zero.
For the situation where the errors from the monotonic trend can be assumed to
be statistically independent, the method of Brillinger (1989) may be replaced with a
test originally developed by Abelson and Tukey (1963). It is of interest to compare
the power of the Mann-Kendall and Abelson-Tukey tests since there are important
examples where the errors appear to be independent and identically distributed white
noise. However, it should be noted that if it is the case that the error component is
significantly correlated, then both of these tests would be expected to perform very
poorly relative to Brillinger's trend test.
In the next section, the Brillinger trend test and its practical implementation
details are outlined. Then, in the subsequent two sections the Abelson-Tukey and
Mann-Kendall tests are briefly summarized. Power comparisons of the Abelson-Tukey
and Mann-Kendall tests demonstrate the usefulness of these tests.

BRILLINGER TREND TEST


The basic underlying model considered can be written as

Zt = St + 1/1> t = 1, ... ,n, (1)


where Zt is the observed time series, St represents a signal or trend component and
1/t stands for an auto correlated error component. Under the null hypothesis, it is
245
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 245-270.
© 1994 Kluwer Academic Publishers.
246 A. I. MCLEOD AND K. W. HIPEL

1.0

0.5
~
Z •

..-----------

-------------_
w ......"....
U
u.. 0.0 ..
----
~ :./*
u -0.5.

-1.0~----~----~----~----~----~
o 20 40 60 80 100
TIME
Figure 1. Plot of coefficient Ct in (3) against t for the Brillinger trend test.

assumed that St is a constant. The alternative hypothesis to be tested assumes that


St is either a nondecreasing (St ::; St+d or nonincreasing (St 2: StH) function of time
t.
The test statistic Brillinger (1989) developed is given as

ZB = L:CtZt , (2)
est.sd.(L: CtZt)
where
Ct
Ct: -
= yt(l-;;) J+ (t t+1
1)(1- -;-), (3)
and n is the length of the series. Under the null hypothesis of no trend, the statistic
ZB is asymptotically normally distributed with mean 0 and variance 1. Large values
of IZBI indicate the null hypothesis is untenable and hence there is the possibility
of a trend in the series. The trend is increasing or decreasing according as ZB is
> 0 or < 0, respectively. A plot of Ct versus t for n = 100 is shown in Figure 1.
Notice how the function Ct contrasts the values between each end of the series so
values near the beginning are given weights close to -1 while those near the other
end are given weights near +1. The function Ct was originally derived by Abelson
and Tukey (1963) for testing the case where the error component TJt is assumed to be
independently normally distributed with mean zero and constant variance. This will
be referred to as the Gaussian white noise case. It can be shown that
(4)
where 111(0) denotes the spectral density function of the error component evaluated
at O.
TESTS FOR MONOTONIC TREND 247

In order to estimate /,,(0), it is first necessary to estimate Tf. Assuming there are
no outliers, an estimate of the trend component is given by the running average of
order V defined as
v 1
St =L V ZHi· (5)
2 +1 i=-V

The practitioner should normally choose a value of V to give a reasonable estimate


of the trend component. To assist in verifying the choice of V, Brillinger (1989)
suggests examining a plot of the trend component for the specified choice of V. It is
very important that V not be too small since this can cause /,,(0) to be drastically
underestimated which may easily result in a Type 2 error, that is, rejection of the
null hypothesis when it is true. In some cases where there are outliers in the series,
a suitable Box-Cox transformation may be used to make the data more normally
distributed. The normal probability plot can be used to choose the transformation
by examining the plot while trying different power transformations (see Hipel and
McLeod (1994, Section 3.4.5)).
After the trend component, St, has been estimated the autocorrelated error com-
ponent, Tfh can be estimated by ~t = Zt - St. Then, an estimate of /,,(0) derived by
Brillinger (1989) is
L
• ~ 2:n1 Ej12
f," (0) -- -"L=-----'
3=1 (6)
E(I-a )2 o

j=1 3

where
n-l-V 2 °t
'In J}

= exp { -
0

~.
fj L..J Tft , (7)
t=V+l n
where i =A and
sin{bj (2V+1)}
a - 2n (8)
j - (2V + l)sin(¥.!)"
The parameter L determines the degree of smoothing of the periodogram compo-
nent. A plot of the periodogram of the estimated autocorrelated error component,
~t, showing the bandwidth corresponding to L is suggested by Brillinger (1989) to
aid in choosing L. As with the choice of V, a suitable selection of L is essential to
obtain a reasonably good estimate of /,,(0).
Finally
est.sd·(L CtZt) = J21r j,,(O) L c~. (9)
In practice, the Fourier transform Ej may either be computed using the Discrete
Fourier Transform (DFT) or the Fast Fourier Transform (FFT). If the FFT is em-
ployed, the series is padded with zeros at both ends until it is of length n' = 2P ,
where p = [log2(n)) + 1, where [e) denotes the integer part. To avoid leakage, espe-
cially when the FFT is used, Tukey (1967) and Brillinger (1981, p.54) recommend
that data tapering be used. Tukey's split cosine bell data taper (Bloomfield, 1976,
p.84) involves multiplying the series ~t by the cosine tapering function UtI where

Ut = "21 ( 1 - cos
1r(t i- !))
2 , for t = 1, ... ,I,
248 A. I. MCLEOD AND K. W. HIPEL

= 1, for t = l + 1, ... , n' - l - 1,


= 21(1 - cos
1r(n'-t+~»)
i ,for t = n' - i, ... , n' (10)

to form the tapered series ~; = ~tUt. The Fourier transform for the tapered series is
then evaluated. The percentage of data tapered, say r, is then r = 200i/n'. Tukey
recommends choosing r = 10 or 20. Hurvich (1988) suggests a data based method
for choosing the amount of tapering to be done.
The choice parameters V, L and r are very important in the application of
Brillinger's test since a poor selection of these parameters may result in a completely
meaningless test result. We have found it helpful to practice with simulated time se-
ries data in order to develop a better feel for how these parameters should be chosen.
ABELSON-TUKEY TEST
In this case, under the null hypothesis of no trend, the test statistic may be written
as
ZA = ECtzt , (11)
J(E cl)(E(zt - z)2
where z = E zt/n. Under the null hypothesis of no trend, the statistic ZA is asymptot-
ically normally distributed with mean 0 and variance 1. Large values of IZAI indicate
the null hypothesis is untenable and hence there is the possibility of a trend in the se-
ries. The trend is increasing or decreasing according as ZA is > 0 or < 0, respectively.
THE MANN-KENDALL TREND TEST
The Mann-Kendall trend test is derived by computing the Kendall rank correlation
(Kendall, 1975) between Zt and t (Mann, 1945). The Mann-Kendall trend test as-
sumes that under the null hypothesis of no trend, the time series is independent and
identically distributed. Since the rank correlation is a measure of the monotonic
relationship between Zt and t, the Mann-Kendall trend test would be expected to
have good power properties in many situations. Unlike the Brillinger test, one is not
restricted to having consecutive equi-spaced observations. Thus, the observed series
may be measured at irregularly spaced points in time. However, one can assume the
previous notation, where Zt is interpreted as the t-th observation in data series and
t = 1, ... ,n.
In the general case where there may be multiple observations at the same time
point producing ties in t and there may also be ties in the observations Ze, the Mann-
Kendall score is given by

8».
n
S = E sign«zt - z.) (t - (12)
t<.

This situation arises in water quality data due to repeated uncorrelated samples
taken at the same time and the limited accuracy of the observations. Under the null
hypothesis of no trend, the expected value of S is zero while increasing or decreasing
monotonic trends are indicated when S < 0 or S > O. Valz et al. (1994a) present
improved approximations to the null distribution of S in the case of ties in both
rankings as well as an exact algorithm to compute its significance levels {Valz et al.,
TESTS FOR MONOTONIC TREND 249

1994b). A detailed description of nonparametric trend tests used in water resources


and environmental engineering is provided by Ripel and McLeod (1994, Ch. 23).
If it is assumed that there are no ties in either Zt or t then the formula for the
Kendall score may be simplified (Kendall, 1973, p.27) to yield,

S = 2P - (;), (13)

where P is the number of times that Zt2 > Ztl for all tb t2 = 1, ... , n, such that t2 > t l .
Under the null hypothesis all pairs are equally likely, so Kendall's rank correlation
coefficient which is defined in the case of no ties as

(14)

can be written as
T = 211"c - 1, (15)
where 1I"c is the relative frequency of positive concordance (i.e., the proportion of time
for which Zt2 > Ztl when t2 > tl).
In the case where there are no ties in either ranking, it is known (Kendall, 1975,
p.51) that under the null hypothesis, the distribution of S may be well approximated
by a normal distribution with mean zero and variance,
1
var(S) = 18n(n -1)(2n + 5), (16)

provided that n ~ 10.

POWER COMPARISONS
The power function at a 5% test of significance, denoted by f3MK and f3AT for the
Mann-Kendall test and the Abelson-Tukey tests, respectively, are estimated for var-
ious forms of the basic trend model

Zt = f(t) + at, t = 1, ... , n, (17)

where Zt is the observed time series, f(t) represents a monotonic trend component and
at represents an error component which is independent and identically distributed.
Length of series, n = 10,20,50 and 100, are generated one million times for each
of a variety of trend models and error component distributions. The proportion of
times that the null hypothesis of no trend is rejected gives estimates for f3MK and {JAT.
Thus, the maximum standard error in the estimated power functions, {JMK and (JAT,
is 10-3 / v'2 == 0.0005. Consequently, one may expect that the power probabilities
may differ by at most one digit in the third decimal from the true exact result most
of the time.
Three models for trend are examined. The first trend model is a linear trend
so f(t) = )..t. In this case, it is known that the Mann-Kendall trend test is nearly
optimal since when at is Gaussian white noise, the Mann-Kendall trend test has 98%
asymptotic relative efficiency with respect to the optimal estimator which is linear
250 A. I. MCLEOD AND K. W. HIPEL

regression (Kendall and Stuart, 1968, §45.25). In the second model, f(t) is taken to
be the step function
f(t) = 0, if t:S n/2,
= >., if t > n/2. (18)
Step functions such as this are often used in intervention analysis modelling (see Hipel
and McLeod (1994, Oh. 19) for detailed descriptions and applications of various types
of intervention models). A priori, it would be hoped that both the Abelson-Tukey and
Mann-Kendall trend test should perform well for step functions. In the third model,
f(t) = ACt, where Ct is defined in equation (3). For this model, the Abelson-Tukey
procedure is optimal when at is Gaussian white noise. The values of the parameter
A in these trend models is set to A = aJlO/n where a = 0.01,0.04,0.07,1.0.
Two models for the error component distribution are used. The first is the normal
distribution with mean zero and variance 0'2, while the second is a scaled contami-
nated normal distribution, </>c(z),

(1 - p)</>(z) + p.l</>(z/O'c))
</>c(z)=0' (1 (19)
- p - pUc
iT. ) ,

where O'c = 3 and p = 0.1. The scaling ensures that the distribution has variance
equal to 0'2. These particular parameters are suggested by Tukey (1960) and have
been previously used in many simulation studies. The reason for o'c = 3 is that Tukey
(1960) found that there are many datasets occurring in practice where this choice was
suggested. We choose p = 0.1 since for this choice the variance contribution from both
distributions is equal and so the contamination effect is largest. In previous simulation
studies (see Tukey (1960)), it is found that this choice produces the greatest effect
when a non-robust estimator is compared to a robust one. We take 0' = 0.5,1,2,4 in
both the normal and contaminated normal cases.
Data are generated by applying the Box-Muller transformation (Box and Muller,
1958i to uniform (0,1) pseudo-random variables generated by Superduper (Marsaglia,
1976 . The tests are applied to the same data series but a different set of random
num ers is used for every model and parameter setting.
The simulation results are presented in Tables lola to 3.2b. As previously noted,
due to the very large number of simulations, all results presented in these tables
are essentially exact to the number of decimal places given. Tables lola and LIb
show the results for a simple linear trend with Gaussian white noise. For comparison
of {3MK and {3AT, one can look at their ratios, {3MK/{3AT, as well as their absolute
magnitudes. These ratios vary from 0.93 to 1.56. As might be expected, in no case
is the Abelson-Tukey test substantially better than the Mann-Kendall test whereas
there are many instances, especially for series length 100, where the Mann-Kendall
is much better. This conclusion also applies to Tables 2.1a and 2.2b. In Tables 1.1a
through 2.2b, the only situations where the Abelson-Tukey test has larger power is
when the null hypothesis is either true or very nearly true so the power function is
really just reflecting the probability of a Type 1 error.
For step functions, the results are shown in Tables 2.1a through 2.2b. For longer
series lengths shown in Tables 2.1b and 2.2b, the Mann-Kendall test dominates since
the only times where the Abelson-Tukey has a larger power is when the null hypothesis
is true and even in these cases the Mann-Kendall is better since probability of Type
1 error is closer to its nominal 5% level. For smaller samples shown on Tables 2.1a
TESTS FOR MONOTONIC TREND 251

TABLE 1.1a. Linear trend with Gaussian white noise


n C\! tT f3MK f3AT f3MK/f3AT
10 0.00 0.50 0.060 0.056 1.07
10 0.00 1.00 0.060 0.056 1.06
10 0.00 2.00 0.060 0.056 1.06
10 0.00 4.00 0.060 0.056 1.06
10 0.01 0.50 0.059 0.059 1.00
10 O.oI 1.00 0.058 0.057 1.03
10 0.01 2.00 0.058 0.056 1.05
10 0.01 4.00 0.059 0.056 1.06
10 0.04 0.50 0.091 0.098 0.93
10 0.04 1.00 0.064 0.067 0.96
10 0.04 2.00 0.058 0.058 1.00
10 0.04 4.00 0.058 0.056 1.03
10 0.07 0.50 0.178 0.183 0.97
10 0.07 1.00 0.082 0.088 0.93
10 0.07 2.00 0.062 0.064 0.97
10 0.07 4.00 0.059 0.058 1.01
10 0.10 0.50 0.313 0.309 1.01
10 0.10 1.00 0.115 0.122 0.94
10 0.10 2.00 0.068 0.072 0.94
10 0.10 4.00 0.059 0.060 0.99
20 0.00 0.50 0.051 0.053 0.96
20 0.00 1.00 0.051 0.053 0.96
20 0.00 2.00 0.050 0.053 0.95
20 0.00 4.00 0.051 0.053 0.96
20 0.01 0.50 0.061 0.063 0.96
20 0.01 1.00 0.053 0.056 0.94
20 0.01 2.00 0.051 0.054 0.94
20 0.01 4.00 0.050 0.053 0.95
20 0.04 0.50 0.254 0.220 1.15
20 0.04 1.00 0.097 0.095 1.03
20 0.04 2.00 0.061 0.063 0.96
20 0.04 4.00 0.053 0.055 0.95
20 0.07 0.50 0.625 0.519 1.20
20 0.07 1.00 0.204 0.181 1.13
20 0.07 2.00 0.086 0.085 1.01
20 0.07 4.00 0.058 0.061 0.95
20 0.10 0.50 0.901 0.800 1.13
20 0.10 1.00 0.367 0.310 1.19
20 0.10 2.00 0.126 0.118 1.07
20 0.10 4.00 0.067 0.069 0.97
252 A. I. MCLEOD AND K. W. HIPEL

TABLE 1.Ib. Linear trend with Gaussian white noise


n a (1'
f3Mx PAT f3MK/PAT
50 0.00 0.50 0.051 0.051 0.99
50 0.00 1.00 0.051 0.051 1.00
50 0.00 2.00 0.051 0.051 0.99
50 0.00 4.00 0.051 0.052 1.00
50 0.01 0.50 0.140 0.111 1.27
50 0.01 1.00 0.072 0.066 1.10
50 0.01 2.00 0.056 0.055 1.03
50 0.01 4.00 0.052 0.052 0.98
50 0.04 0.50 0.933 0.783 1.19
50 0.04 1.00 0.410 0.289 1.42
50 0.04 2.00 0.140 0.110 1.27
50 0.04 4.00 0.073 0.066 1.10
50 0.07 0.50 1.000 0.997 1.00
50 0.07 1.00 0.858 0.675 1.27
50 0.07 2.00 0.329 0.234 1.41
50 0.07 4.00 0.118 0.096 1.23
50 0.10 0.50 1.000 1.000 1.00
50 0.10 1.00 0.991 0.924 1.07
50 0.10 2.00 0.583 0.414 1.41
50 0.10 4.00 0.192 0.144 1.33
100 0.00 0.50 0.049 0.051 0.97
100 0.00 1.00 0.050 0.051 0.99
100 0.00 2.00 0.050 0.051 0.98
100 0.00 4.00 0.050 0.051 0.99
100 0.01 0.50 0.419 0.269 1.56
100 0.01 1.00 0.141 0.104 1.35
100 0.01 2.00 0.072 0.064 1.12
100 0.01 4.00 0.056 0.054 1.03
100 0.04 0.50 1.000 0.999 1.00
100 0.04 1.00 0.940 0.757 1.24
100 0.04 2.00 0.418 0.269 1.55
100 0.04 4.00 0.141 0.105 1.35
100 0.07 0.50 1.000 1.000 1.00
100 0.07 1.00 1.000 0.996 1.00
100 0.07 2.00 0.868 0.645 1.34
100 0.07 4.00 0.336 0.218 1.54
100 0.10 0.50 1.000 1.000 1.00
100 0.10 1.00 1.000 1.000 1.00
100 0.10 2.00 0.992 0.910 1.09
100 0.10 4.00 0.593 0.388 1.53
TESTS FOR MONOTONIC TREND 253

TABLE 1.2a. Linear trend with contaminated normal white noise


n or IT /3MK /3AT f3MK//3AT
10 0.00 0.50 0.059 0.062 0.95
10 0.00 1.00 0.059 0.062 0.95
10 0.00 2.00 0.060 0.063 0.95
10 0.00 4.00 0.059 0.063 0.95
10 0.01 0.50 0.059 0.066 0.90
10 0.01 1.00 0.058 0.064 0.91
10 0.01 2.00 0.059 0.063 0.94
10 0.01 4.00 0.059 0.063 0.95
10 0.04 0.50 0.110 0.115 0.96
10 0.04 1.00 0.067 0.076 0.89
10 0.04 2.00 0.059 0.066 0.89
10 0.04 4.00 0.058 0.064 0.91
10 0.07 0.50 0.236 0.221 1.07
10 0.07 1.00 0.096 0.103 0.94
10 0.07 2.00 0.064 0.072 0.89
10 0.07 4.00 0.058 0.065 0.89
10 0.10 0.50 0.416 0.367 1.13
10 0.10 1.00 0.144 0.145 1.00
10 0.10 2.00 0.075 0.083 0.90
10 0.10 4.00 0.061 0.067 0.90
20 0.00 0.50 0.051 0.060 0.84
20 0.00 1.00 0.051 0.060 0.85
20 0.00 2.00 0.051 0.060 0.84
20 0.00 4.00 0.050 0.060 0.84
20 0.01 0.50 0.066 0.071 0.93
20 0.01 1.00 0.054 0.063 0.85
20 0.01 2.00 0.051 0.061 0.84
20 0.01 4.00 0.051 0.061 0.83
20 0.04 0.50 0.341 0.242 1.41
20 0.04 1.00 0.120 0.105 1.14
20 0.04 2.00 0.066 0.071 0.93
20 0.04 4.00 0.054 0.062 0.86
20 0.07 0.50 0.757 0.563 1.34
20 0.07 1.00 0.273 0.199 1.37
20 0.07 2.00 0.103 0.094 1.09
20 0.07 4.00 0.062 0.069 0.90
20 0.10 0.50 0.952 0.820 1.16
20 0.10 1.00 0.488 0.341 1.43
20 0.10 2.00 0.162 0.130 1.25
20 0.10 4.00 0.076 0.077 0.98
254 A. I. MCLEOD AND K. W. HIPEL

TABLE 1.2b. Linear trend with contaminated normal white noise


n or (T
i3MK /3AT i3MK//3AT
50 0.00 0.50 0.051 0.056 0.91
50 0.00 1.00 0.051 0.056 0.91
50 0.00 2.00 0.051 0.056 0.91
50 0.00 4.00 0.051 0.056 0.91
50 0.01 0.50 0.180 0.112 1.60
50 0.01 1.00 0.082 0.069 1.18
50 0.01 2.00 0.059 0.059 0.99
50 0.01 4.00 0.053 0.057 0.93
50 0.04 0.50 0.980 0.801 1.22
50 0.04 1.00 0.543 0.298 1.83
50 0.04 2.00 0.181 0.112 1.62
50 0.04 4.00 0.082 0.069 1.18
50 0.07 0.50 1.000 0.990 1.01
50 0.07 1.00 0.945 0.698 1.35
50 0.07 2.00 0.443 0.240 1.84
50 0.07 4.00 0.149 0.098 1.52
50 0.10 0.50 1.000 0.999 1.00
50 0.10 1.00 0.998 0.925 1.08
50 0.10 2.00 0.730 0.430 1.70
50 0.10 4.00 0.255 0.146 1.74
100 0.00 0.50 0.050 0.054 0.92
100 0.00 1.00 0.050 0.054 0.92
100 0.00 2.00 0.050 0.053 0.93
100 0.00 4.00 0.050 0.054 0.93
100 0.01 0.50 0.556 0.269 2.06
100 0.01 1.00 0.183 0.103 1.78
100 0.01 2.00 0.082 0.066 1.25
100 0.01 4.00 0.058 0.057 1.01
100 0.04 0.50 1.000 0.996 1.00
100 0.04 1.00 0.986 0.774 1.27
100 0.04 2.00 0.555 0.269 2.06
100 0.04 4.00 0.183 0.102 1.78
100 0.07 0.50 1.000 1.000 1.00
100 0.07 1.00 1.000 0.990 1.01
100 0.07 2.00 0.954 0.662 1.44
100 0.07 4.00 0.451 0.216 2.09
100 0.10 0.50 1.000 1.000 1.00
100 0.10 1.00 1.000 1.000 1.00
100 0.10 2.00 0.999 0.915 1.09
100 0.10 4.00 0.745 0.393 1.90
TESTS FOR MONOTONIC TREND 255

TABLE 2.1a. Step function with Gaussian white noise

n (t (T
f3MK f3AT f3MK/f3AT
10 0.00 0.50 0.060 0.056 1.07
10 0.00 1.00 0.060 0.056 1.07
10 0.00 2.00 0.060 0.056 1.07
10 0.00 4.00 0.060 0.055 1.08
10 0.50 0.50 0.184 0.143 1.28
10 0.50 1.00 0.086 0.081 1.06
10 0.50 2.00 0.062 0.062 1.00
10 0.50 4.00 0.058 0.058 1.01
10 1.00 0.50 0.468 0.307 1.52
10 1.00 1.00 0.184 0.143 1.29
10 1.00 2.00 0.086 0.080 1.07
10 1.00 4.00 0.062 0.062 1.00
10 2.00 0.50 0.686 0.563 1.22
10 2.00 1.00 0.469 0.307 1.53
10 2.00 2.00 0.184 0.143 1.29
10 2.00 4.00 0.086 0.080 1.07
10 5.00 0.50 0.694 0.907 0.77
10 5.00 1.00 0.693 0.653 1.06
10 5.00 2.00 0.575 0.383 1.50
10 5.00 4.00 0.253 0.182 1.39
20 0.00 0.50 0.050 0.053 0.95
20 0.00 1.00 0.050 0.053 0.95
20 0.00 2.00 0.050 0.053 0.94
20 0.00 4.00 0.050 0.053 0.95
20 0.50 0.50 0.221 0.148 1.49
20 0.50 1.00 0.090 0.078 1.16
20 0.50 2.00 0.060 0.059 1.00
20 0.50 4.00 0.052 0.054 0.96
20 1.00 0.50 0.635 0.376 1.69
20 1.00 1.00 0.221 0.148 1.49
20 1.00 2.00 0.090 0.078 1.16
20 1.00 4.00 0.059 0.059 1.00
20 2.00 0.50 0.978 0.797 1.23
20 2.00 1.00 0.634 0.376 1.69
20 2.00 2.00 0.221 0.149 1.49
20 2.00 4.00 0.090 0.078 1.16
20 5.00 0.50 0.994 1.000 0.99
20 5.00 1.00 0.991 0.906 1.09
20 5.00 2.00 0.802 0.501 1.60
20 5.00 4.00 0.315 0.197 1.60
256 A. I. MCLEOD AND K. W. HlPEL

TABLE 2.1h. Step function with Gaussian white noise

n Q IT f3MK /3AT f3MK//3AT


50 0.00 0.50 0.051 0.051 1.00
50 0.00 1.00 0.051 0.051 1.01
50 0.00 2.00 0.051 0.051 1.00
50 0.00 4.00 0.051 0.051 1.00
50 0.50 0.50 0.252 0.142 1.77
50 0.50 1.00 0.100 0.074 1.35
50 0.50 2.00 0.062 0.057 1.09
50 0.50 4.00 0.054 0.053 1.01
50 1.00 0.50 0.723 0.394 1.83
50 1.00 1.00 0.252 0.142 1.77
50 1.00 2.00 0.099 0.074 1.34
50 1.00 4.00 0.063 0.057 1.09
50 2.00 0.50 0.999 0.881 1.13
50 2.00 1.00 0.722 0.394 1.83
50 2.00 2.00 0.252 0.143 1.76
50 2.00 4.00 0.100 0.074 1.34
50 5.00 0.50 1.000 1.000 1.00
50 5.00 1.00 1.000 0.967 1.03
50 5.00 2.00 0.886 0.545 1.63
50 5.00 4.00 0.362 0.194 1.87
100 0.00 0.50 0.050 0.051 0.98
100 0.00 1.00 0.050 0.050 0.99
100 0.00 2.00 0.050 0.051 0.98
100 0.00 4.00 0.050 0.051 0.98
100 0.50 0.50 0.258 0.135 1.90
100 0.50 1.00 0.100 0.072 1.40
100 0.50 2.00 0.062 0.056 1.11
100 0.50 4.00 0.053 0.052 1.02
100 1.00 0.50 0.743 0.384 1.93
100 1.00 1.00 0.259 0.137 1.90
100 1.00 2.00 0.100 0.072 1.40
100 1.00 4.00 0.062 0.056 1.11
100 2.00 0.50 0.999 0.891 1.12
100 2.00 1.00 0.742 0.384 1.94
100 2.00 2.00 0.259 0.136 1.90
100 2.00 4.00 0.100 0.071 1.40
100 5.00 0.50 1.000 1.000 1.00
100 5.00 1.00 1.000 0.975 1.03
100 5.00 2.00 0.902 0.539 1.67
100 5.00 4.00 0.373 0.185 2.02
TESTS FOR MONOTONIC TREND 257

TABLE 2.2a. Step function with contaminated normal white noise

n Q (1'
f3MK /3AT f3MK//3AT
10 0.00 0.50 0.062 0.073 0.85
lO 0.00 1.00 0.071 0.059 1.20
lO 0.00 2.00 0.056 0.068 0.82
10 0.00 4.00 0.059 0.060 0.98
10 0.50 0.50 0.236 0.147 1.61
10 0.50 1.00 0.109 0.105 1.04
10 0.50 2.00 0.083 0.084 0.99
10 0.50 4.00 0.042 0.062 0.68
10 1.00 0.50 0.519 0.330 1.57
lO 1.00 1.00 0.224 0.153 1.46
lO 1.00 2.00 0.103 0.098 1.05
10 1.00 4.00 0.065 0.084 0.77
10 2.00 0.50 0.701 0.614 1.14
10 2.00 1.00 0.505 0.309 1.63
10 2.00 2.00 0.219 0.167 1.31
10 2.00 4.00 0.104 0.091 1.14
10 5.00 0.50 0.689 0.901 0.76
10 5.00 1.00 0.695 0.685 1.01
10 5.00 2.00 0.588 0.395 1.49
10 5.00 4.00 0.314 0.211 1.49
20 0.00 0.50 0.041 0.049 0.84
20 0.00 1.00 0.049 0.068 0.72
20 0.00 2.00 0.054 0.061 0.89
20 0.00 4.00 0.054 0.067 0.81
20 0.50 0.50 0.311 0.173 1.80
20 0.50 1.00 0.094 0.084 1.12
20 0.50 2.00 0.066 0.074 0.89
20 0.50 4.00 0.058 0.057 1.02
20 1.00 0.50 0.721 0.386 1.87
20 1.00 1.00 0.285 0.164 1.74
20 1.00 2.00 0.111 0.087 1.28
20 1.00 4.00 0.068 0.073 0.93
20 2.00 0.50 0.966 0.811 1.19
20 2.00 1.00 0.748 0.409 1.83
20 2.00 2.00 0.287 0.158 1.82
20 2.00 4.00 0.103 0.089 1.16
20 5.00 0.50 0.992 0.994 1.00
20 5.00 1.00 0.974 0.907 1.07
20 5.00 2.00 0.865 0.533 1.62
20 5.00 4.00 0.412 0.206 2.00
258 A. I. MCLEOD AND K. W. HlPEL

TABLE 2.2h. Step function with contaminated normal white noise

n a (f'
f3MK f3AT f3MK/f3AT
50 0.00 0.50 0.044 0.053 0.83
50 0.00 1.00 0.046 0.065 0.71
50 0.00 2.00 0.055 0.058 0.95
50 0.00 4.00 0.057 0.063 0.90
50 0.50 0.50 0.350 0.161 2.17
50 0.50 1.00 0.124 0.080 1.55
50 0.50 2.00 0.062 0.081 0.77
50 0.50 4.00 0.065 0.055 1.18
50 1.00 0.50 0.852 0.418 2.04
50 1.00 1.00 0.356 0.153 2.33
50 1.00 2.00 0.113 0.074 1.53
50 1.00 4.00 0.080 0.076 1.05
50 2.00 0.50 1.000 0.897 1.11
50 2.00 1.00 0.807 0.365 2.21
50 2.00 2.00 0.311 0.126 2.47
50 2.00 4.00 0.123 0.076 1.62
50 5.00 0.50 1.000 1.000 1.00
50 5.00 1.00 1.000 0.968 1.03
50 5.00 2.00 0.958 0.537 1.78
50 5.00 4.00 0.496 0.220 2.25
100 0.00 0.50 0.049 0.050 0.98
100 0.00 1.00 0.054 0.048 1.12
100 0.00 2.00 0.052 0.060 0.87
100 0.00 4.00 0.044 0.072 0.61
100 0.50 0.50 0.365 0.147 2.48
100 0.50 1.00 0.114 0.073 1.56
100 0.50 2.00 0.051 0.067 0.76
100 0.50 4.00 0.054 0.067 0.81
100 1.00 0.50 0.880 0.387 2.27
100 1.00 1.00 0.350 0.123 2.85
100 1.00 2.00 0.121 0.083 1.46
100 1.00 4.00 0.068 0.062 1.10
100 2.00 0.50 1.000 0.899 1.11
100 2.00 1.00 0.869 0.357 2.43
100 2.00 2.00 0.363 0.135 2.69
100 2.00 4.00 0.122 0.068 1.79
100 5.00 0.50 1.000 1.000 1.00
100 5.00 1.00 1.000 0.976 1.02
100 5.00 2.00 0.970 0.570 1.70
100 5.00 4.00 0.508 0.193 2.63
TESTS FOR MONOTONIC TREND 259

TABLE 3.1a. Abelson-Tukey function with Gaussian white noise

n Q iT f3MK {3AT f3MK/{3AT


10 0.00 0.50 0.059 0.056 1.07
10 0.00 1.00 0.060 0.056 1.06
10 0.00 2.00 0.059 0.056 1.06
10 0.00 4.00 0.059 0.056 1.06
10 0.50 0.50 0.178 0.266 0.67
10 0.50 1.00 0.083 0.108 0.77
10 0.50 2.00 0.062 0.069 0.90
10 0.50 4.00 0.059 0.059 0.99
10 1.00 0.50 0.494 0.740 0.67
10 1.00 1.00 0.178 0.266 0.67
10 1.00 2.00 0.083 0.108 0.77
10 1.00 4.00 0.062 0.069 0.90
10 2.00 0.50 0.904 0.999 0.90
10 2.00 1.00 0.493 0.739 0.67
10 2.00 2.00 0.178 0.266 0.67
10 2.00 4.00 0.083 0.108 0.77
10 5.00 0.50 1.000 1.000 1.00
10 5.00 1.00 0.964 1.000 0.96
10 5.00 2.00 0.646 0.897 0.72
10 5.00 4.00 0.247 0.379 0.65
20 0.00 0.50 0.051 0.054 0.95
20 0.00 1.00 0.050 0.053 0.96
20 0.00 2.00 0.050 0.053 0.95
20 0.00 4.00 0.051 0.053 0.96
20 0.50 0.50 0.128 0.191 0.67
20 0.50 1.00 0.069 0.088 0.78
20 0.50 2.00 0.054 0.061 0.89
20 0.50 4.00 0.051 0.056 0.92
20 1.00 0.50 0.357 0.569 0.63
20 1.00 1.00 0.129 0.191 0.67
20 1.00 2.00 0.068 0.087 0.79
20 1.00 4.00 0.054 0.062 0.88
20 2.00 0.50 0.827 0.988 0.84
20 2.00 1.00 0.357 0.568 0.63
20 2.00 2.00 0.128 0.191 0.67
20 2.00 4.00 0.068 0.086 0.79
20 5.00 0.50 1.000 1.000 1.00
20 5.00 1.00 0.930 1.000 0.93
20 5.00 2.00 0.497 0.757 0.66
20 5.00 4.00 0.173 0.269 0.64
260 A. I. MCLEOD AND K. W. HIPEL

TABLE 3.lh. Ahelson-Tukey function with Gaussian white noise

n Q tT f3MK f3AT f3MK/f3AT


50 0.00 0.50 0.051 0.051 0.99
50 0.00 1.00 0.051 0.052 0.99
50 0.00 2.00 0.051 0.051 0.99
50 0.00 4.00 0.051 0.051 0.99
50 0.50 0.50 0.088 0.119 0.74
50 0.50 1.00 0.060 0.068 0.88
50 0.50 2.00 0.053 0.055 0.96
50 0.50 4.00 0.052 0.053 0.98
50 1.00 0.50 0.202 0.328 0.62
50 1.00 1.00 0.089 0.119 0.74
50 1.00 2.00 0.060 0.068 0.88
50 1.00 4.00 0.053 0.055 0.96
50 2.00 0.50 0.576 0.855 0.67
50 2.00 1.00 0.202 0.328 0.62
50 2.00 2.00 0.088 0.120 0.74
50 2.00 4.00 0.060 0.068 0.88
50 5.00 0.50 0.997 1.000 1.00
50 5.00 1.00 0.747 0.965 0.77
50 5.00 2.00 0.284 0.472 0.60
50 5.00 4.00 0.110 0.158 0.69
100 0.00 0.50 0.050 0.051 0.98
100 0.00 1.00 0.050 0.051 0.98
100 0.00 2.00 0.050 0.051 0.98
100 0.00 4.00 0.050 0.051 0.98
100 0.50 0.50 0.069 0.089 0.78
100 0.50 1.00 0.054 0.060 0.91
100 0.50 2.00 0.051 0.053 0.96
100 0.50 4.00 0.050 0.051 0.98
100 1.00 0.50 0.129 0.207 0.62
100 1.00 1.00 0.070 0.089 0.78
100 1.00 2.00 0.055 0.060 0.92
100 1.00 4.00 0.051 0.053 0.96
100 2.00 0.50 0.360 0.625 0.58
100 2.00 1.00 0.128 0.208 0.62
100 2.00 2.00 0.069 0.089 0.78
100 2.00 4.00 0.055 0.060 0.91
100 5.00 0.50 0.958 1.000 0.96
100 5.00 1.00 0.507 0.813 0.62
100 5.00 2.00 0.173 0.297 0.58
100 5.00 4.00 0.080 0.111 0.73
TESTS FOR MONOTONIC TREND 261

TABLE 3.2a. Abelson-Tukey function with contaminated normal

n a (1' f3MK /3AT f3MK//3AT


10 0.00 0.50 0.060 0.063 0.95
10 0.00 1.00 0.060 0.063 0.96
10 0.00 2.00 0.059 0.063 0.94
10 0.00 4.00 0.060 0.063 0.95
10 0.50 0.50 0.230 0.323 0.71
10 0.50 1.00 0.097 0.128 0.76
10 0.50 2.00 0.065 0.078 0.83
10 0.50 4.00 0.059 0.067 0.88
10 1.00 0.50 0.593 0.780 0.76
10 1.00 1.00 0.230 0.323 0.71
10 1.00 2.00 0.097 0.128 0.76
10 1.00 4.00 0.065 0.078 0.83
10 2.00 0.50 0.928 0.987 0.94
10 2.00 1.00 0.592 0.780 0.76
10 2.00 2.00 0.231 0.324 0.71
10 2.00 4.00 0.097 0.128 0.76
10 5.00 0.50 1.000 1.000 1.00
10 5.00 1.00 0.971 0.997 0.97
10 5.00 2.00 0.728 0.891 0.82
10 5.00 4.00 0.321 0.452 0.71
20 0.00 0.50 0.051 0.060 0.84
20 0.00 1.00 0.051 0.060 0.85
20 0.00 2.00 0.051 0.061 0.84
20 0.00 4.00 0.051 0.060 0.84
20 0.50 0.50 0.163 0.214 0.76
20 0.50 1.00 0.077 0.097 0.80
20 0.50 2.00 0.057 0.069 0.82
20 0.50 4.00 0.051 0.063 0.82
20 1.00 0.50 0.458 0.615 0.75
20 1.00 1.00 0.163 0.214 0.76
20 1.00 2.00 0.077 0.096 0.80
20 1.00 4.00 0.057 0.070 0.81
20 2.00 0.50 0.891 0.972 0.92
20 2.00 1.00 0.457 0.614 0.74
20 2.00 2.00 0.164 0.214 0.77
20 2.00 4.00 0.077 0.097 0.80
20 5.00 0.50 1.000 1.000 1.00
20 5.00 1.00 0.961 0.993 0.97
20 5.00 2.00 0.609 0.781 0.78
20 5.00 4.00 0.225 0.302 0.75
262 A. I. MCLEOD AND K. W. HIPEL

TABLE 3.2h. Ahelson-Tukey function with contaminated normal

n Q IT f3MK /3AT f3MK//3AT


50 0.00 0.50 0.051 0.056 0.91
50 0.00 1.00 0.051 0.056 0.91
50 0.00 2.00 0.051 0.056 0.90
50 0.00 4.00 0.051 0.056 0.91
50 0.50 0.50 0.105 0.122 0.86
50 0.50 1.00 0.064 0.072 0.89
50 0.50 2.00 0.054 0.060 0.90
50 0.50 4.00 0.052 0.057 0.91
50 1.00 0.50 0.265 0.342 0.77
50 1.00 1.00 0.105 0.122 0.86
50 1.00 2.00 0.064 0.072 0.89
50 1.00 4.00 0.054 0.060 0.91
50 2.00 0.50 0.702 0.863 0.81
50 2.00 1.00 0.264 0.342 0.77
50 2.00 2.00 0.105 0.122 0.86
50 2.00 4.00 0.064 0.072 0.89
50 5.00 0.50 0.999 1.000 1.00
50 5.00 1.00 0.852 0.957 0.89
50 5.00 2.00 0.374 0.494 0.76
50 5.00 4.00 0.135 0.162 0.84
100 0.00 0.50 0.050 0.054 0.92
100 0.00 1.00 0.050 0.054 0.92
100 0.00 2.00 0.050 0.054 0.92
100 0.00 4.00 0.050 0.054 0.93
100 0.50 0.50 0.078 0.089 0.88
100 0.50 1.00 0.056 0.062 0.91
100 0.50 2.00 0.052 0.056 0.93
100 0.50 4.00 0.050 0.055 0.92
100 1.00 0.50 0.163 0.206 0.79
100 1.00 1.00 0.078 0.088 0.88
100 1.00 2.00 0.057 0.062 0.91
100 1.00 4.00 0.052 0.056 0.92
100 2.00 0.50 0.471 0.642 0.73
100 2.00 1.00 0.163 0.206 0.79
100 2.00 2.00 0.078 0.088 0.88
100 2.00 4.00 0.057 0.062 0.91
100 5.00 0.50 0.988 0.999 0.99
100 5.00 1.00 0.639 0.826 0.77
100 5.00 2.00 0.227 0.299 0.76
100 5.00 4.00 0.093 0.109 0.85
TESTS FOR MONOTONIC TREND 263

and 2.2a it generally holds true that the Mann-Kendall test outperforms the Ableson-
Tukey test although there is one curious exception which occurs. In particular, when
n = 10, a = 5 and tr = 0.5 we have f3MK = 0.694 and f3AT = 0.907 in the Gaussian
white noise case and f3MK = 0.689 and f3AT = 0.901 in the contaminated Gaussian
white noise case.
Finally for the trend function based on the Abelson-Tukey function, one can see
that the Abelson-Tukey method is better in almost all cases, as should be expected.
The difference in power is roughly comparable to the differences one can see in Ta-
bles 1.1a through 2.1b. More generally, in situations when it is not known where the
monotonic trend commences, the Abelson-Tukey contrast may be expected to out-
perform the Mann-Kendall test. On the basis of these simulations, one can conclude
that both tests seem to perform reasonably well. In actual applications, it would be
reasonable to use either or both tests.

ILLUSTRATIVE APPLICATIONS

All datasets discussed in this section are available bye-mail from the statlib archive
by sending the following e-mail message: send hipel-mcleod from datasets to
statlibGtemper. stat. emu. edu. Additionally, the graphical output and analytical
results are generated using the decision support system called McLeod-Hipel Time
Series (MHTS) package (McLeod and Hipel, 1994a, b).

Great Lakes Precipitation

A time series plot of the estimated total annual precipitation for 1900-1986 for the
Great Lakes, is shown in Figure 2 along with a Cleveland's robust LOESS smooth
curve (Cleveland, 1979). As shown by this smooth, there is an apparent upward trend.
An autocorrelation plot, cumulative periodogram analysis and a normal probability
plot of the residuals from the trend curve shown in Figure 2, are shown in Figures
3, 4 and 5, respectively. These figures suggest that the data could be modelled as
Gaussian white noise. One can consult Hipel and McLeod (1994, Ch. 7) for details
on the interpretation of these plots. In order to test the statistical significance of the
apparent trend upwards one may use either the Mann-Kendall or the Abelson-Tukey
methods. These methods yield test statistics T = 0.2646 and ZA = 3.22 with two-
sided significance levels of 2.9 x 10-4 and 3.3 x 10-3 , respectively. Therefore, the null
hypothesis of no trend is strongly rejected.
As a matter of interest, neither the autocorrelation plot nor the cumulative pe-
riodogram for the original data detect any statistically significant non-whiteness or
autocorrelation. Hipel et al. (1983, 1994, Section 23.4) demonstrate that for mono-
tonic trends in Gaussian white noise, the Mann-Kendall trend test clearly outperforms
autocorrelation tests. This example also nicely illustrates this fact.
264 A. I. MCLEOD AND K. W. HIPEL

41


37 •
z • • • • •• • • •
. ...
0
~ • • • RS80
~ 33 • • •
a:: • •• • •
1AJ
V> •••••• • •• ••

m
• •• •• • • •• •• • •
0
• •
29 •
• • • • •• • ••

10 20 30 40 50 60 70 80 90
OBSERVATION NUMBERS
Figure 2. Annual precipitation in inches for the Great Lakes (1900-1986).

1.0..-
...
...
I-
...
0.5 i--
l-

t..
U 0.0
l- I I
« ~
-I • I

-0.5 ~
~
~
~
l-
-1.0 I I I I I I I I I I
0 1 2 3 4 5 6 7 8 9 10
LAG
Figure 3. Autocorrelation function (ACF) plot of the Great Lakes residual
precipitation data (1900-1986).
TESTS FOR MONOTONIC TREND 265

1.0
::E
<
a::::
(,!) O.B
0
0
0
a:::: 0.6
UJ
~
UJ
> 0.4
~
5::>
::E 0.2
::>
u
0.0
0.00 0.10 0.20 0.30 0.40 0.50
FREQUENCY
Figure 4. Cumulative periodogram graph of the Great Lakes residual precipitation
data (1900-1986).
9 1=.028. DSP=.045. W=.9894,
S.L. .91029E+00 .76202E+00 .94726E+00
Vl
1.&.1
-'
t=
z
«
o~ 0.00
-'
~ -1.50
0::
oz -3.00~~__~______~______~____~
-8 -4 a 4 8
EMPIRICAL QUANTILES
• ••
• •
Figure 5. Normal probability plot of the residuals of the Great Lakes residual
precipitation data (1900-1986).
266 A. I. MCLEOD AND K. W. HIPEL

Average Annual Nile Riverflows

Hipel and McLeod (1994, Section 19.2.4) show how the effect of the Aswan dam on
the average annual riverflows of the Nile River measured just below the dam can
be modelled using intervention analysis. They also demonstrate using intervention
analysis that there is in fact a significant decrease in the mean annual flow after the
dam went into operation in 1903. It is of interest to find out whether the trend tests
are able to confirm this trend hypothesis by rejecting the null hypothesis of no trend.
Figure 6 displays a time series plot of the mean annual Nile River flow for 1870-1945
(rn 3 / 8) with a superimposed Cleveland robust LOESS trend smooth. Figures 7 and
8 show autocorrelation and cumulative periodogram plots of the error component
or residuals from the trend. These plots indicate that the error component is an
auto correlated time series. Thus, Brillinger's trend test can be applied. We chose
a running-average smooth with V = 8 in (5). The smoothed curve and original
data are shown in Figure 9. Next a smoothing parameter for the spectral density
estimation at zero is set to L = 5 in (6) and a 10% cosine-bell taper is used (see Figure
10). The resulting test statistic is ZB = -4.83 with a two-sided significance level of
1.4 x 10-6 • For comparison, the Mann-Kendall and the Abelson-Tukey methods yield
test statistics T = -0.4508 and ZA = -4.52 with two-sided significance levels < 10- 10
and 6 x 10-6 , respectively.
Brillinger (1989) demonstrates the usefulness of his method on a very long time
series (n > 3 x 104 ) of daily river heights. Our example above, shows the usefulness
of Brillinger's test even for comparatively short series (n = 75).

CONCLUSION

The problem of testing for a monotonic trend is of great importance in environmetrics.


Hipel and McLeod (1994) survey the literature on this problem and present several
actual case studies. In this paper, we focus on tests for trend in the case of nonsea-
sonal series. We compare basically two different methods for testing for monotonic
trend. The older methods of Mann (1945) and Abelson and Tukey (1963) can be used
when one can model the time series as a monotonic trend plus an independent and
identically distributed white noise component whereas the new method of Brillinger
(1989) can be used for the more general case when the time series is comprised of a
monotonic trend plus a stationary autocorrelated error component. We show how a
trend model can be fitted to time series data and examined to see whether it appears
to be monotonic with a correlated or uncorrelated error component. The usefulness
of our approach is demonstrated with two interesting illustrative environmetrics ex-
amples.
TESTS FOR MONOTONIC TREND 267



z 4000

• ••••
0
t=
<

~ 3200
• •
• • ••• •
. .. ... •
L&J
(/) • •
m • • • ••
0 2400 •• •• • •• • RS80
• •

10 20 30 40 50 60 70 80
OBSERVATION NUMBERS
Figure 6. Average annual flows (water year) of the Nile River (m 3 /s) (1870-1945).

1.0 r-
l-
I-
l-
I-
0.5 i--
I-

lL. 0-
I
U 0.0 ... • •
< I I I
I I
---
I

-0.5
--
-1.0 - I I I I I I I I I I
o 1 2 3 4 5 6 7 8 9 10
LAG
Figure 7. Autocorrelation function (ACF) of the error component from the trend of
the average annual Nile River flows.
268 A.1. MCLEOD AND K. W. HIPEL

1.0
:E
<
cr 0.8
(,!)
0
0
0
cr 0.6
I.&J
0..
I.&J
> 0.4
.....
:5
:::> 0.2
:E
:::>
u
0.0
0.00 0.10 0.20 0.30 0.40 0.50
FREQUENCY
Figure 8. Cumulative periodogram plots of the error component from the trend of
the average annual Nile River flows .


• • -
Z
0

~ :.
••

..


cr 3200 • •
I.&J
(/) • •• • •• • • •
• •••
• •
. ..
CO
0 •
2400 •• • • •



10 20 30 40 50 60 70 80
OBSERVATION NUMBERS
Figure 9. The smoothed curve and original annual data of the Nile River.
TESTS FOR MONOTONIC TREND 269

~
«
a:::: 9.38
C,!)
0
0
0 6.25
a::::
L.U
D-
C,!)
0 3.13
.....J

o.0 0 LLU...u..L.u.L.lu..u.~u.....u..&.&..U.I..L.L.I..&...L.l.lu..a..&.""""""I..I..I..I...u..&.'-&.I

0.0 0.1 0.2 0.3 0.4 0.5


FREQUENCY
Figure 10. Test statistic with L = 5 and a 10% cosine-bell taper.

REFERENCES

Abelson, R. P. and Tukey, J. W. (1963) "Efficient utilitzation of non-numerical in-


formation in quantitative analysis: general theory and the case of simple order", The
Annals of Mathematical Statistics, 34, 1347-1369.
Box, G. E. P. and Muller, M. E. (1958) A note on the generation of normal deviates,
Annals of Mathematical Statistics 28, 610-611.
Bloomfield, P. (1976), Fourier Analysis of Time Series, Wiley, New York.
Brillinger, D. R. (1981), Time Series Data Analysis and Theory, (expanded edition),
Holt, Rinehart and Winston, New York.
Brillinger, D. R. (1989), "Consistent detection of a monotonic trend superimposed
on a stationary time series", Biometrika 76, 23-30.
Cleveland, W. S. (1979), "Robust locally weighted regression and smoothing scatter-
plots", Journal of the American Statistical Association 74, 829-836.
Hipel, K. W. and McLeod, A. I. (1994), Time Series Modelling of Environmental and
Water Resources Systems, Elsevier, Amsterdam.
Hipel, K. W., McLeod, A. I. and Fosu, P. (1983), "Empirical power comparisons of
some tests for trend, in Statistical Aspects of Water Quality Monitoring" , in Develop-
ments in Water Science, Volume 27,347-362, Edited by A.H. EI-Shaarawi and R.E.
270 A. I. MCLEOD AND K. W. HIPEL

Kwiatkowski.
Hurvich, C. M. (1988), "A mean squared error criterion for time series data windows",
Biometrika 75,485-490.
Kendall, M. G. and Stuart, A. (1968). The Advanced Theory of Statistics, Volume
3, Hafner, New York.
Kendall, M. G. (1973). Time Series, Griffin, London.
Kendall, M. G. (1975). Rank Correlation Methods (4th ed), Griffin, London.
Mann, H. B. (1945), Nonparametric tests against trend, Econometrica 13, 245-259.
Marsaglia, G. (1976), "Random Number Generation", in Encyclopedia of Computer
Science, ed. A. Ralson, pp. 1192-1197, Petrocelli and Charter, New York.
McLeod, A. 1. and Hipel, K. W. (1994a) The McLeod-Hipel Time Series (MHTS)
Package, copyright owned by A. 1. McLeod and K. W. Hipel, McLeod-Hipel Research,
121 Longwood Drive, Waterloo, Ontario Canada N2L 4B6, Tel: (519)884-2089.
McLeod, A. 1. and Hipel, K. W. (1994b) The McLeod-Hipel Time Series (MHTS)
Package Mannual, McLeod-Hipel Research, 121 Longwood Drive, Waterloo, Ontario
Canada N2L 4B6, Tel: (519)884-2089.
Tukey, J. W. (1960), "A survey of sampling from contaminated distributions" in
Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling,
Edited by I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow and H. B. Mann,
Standford University Press, Standford.
Tukey, J. W. (1967), "An introduction to the calculations of numerical spectrum
analysis" in Advanced Seminar on Spectral Analysis of Time Series, edited by B.
Harris, pp.25-46, Wiley, New York.
Valz, P., McLeod, A. I. and Thompson, M. E., (1994a, to appear) "Cumulant gen-
erating function and tail probability approximations for Kendall's score with tied
rankings", Annals of Statistics.
Valz, P., McLeod, A. I. and Thompson, M. E., (1994b, to appear) "Efficient algo-
rithms for the exact computation of significance levels for Kendall's and Spearman's
scores.", Journal of Statistical Graphics and Computation.
ANALYSIS OF WATER QUALITY TIME SERIES
OBTAINED FOR MASS DISCHARGE ESTIMATION

BYRON A. BODOl,2, A. IAN MCLEOD2,3, and KEITH W. HIPEV,4


1 Byron A. Bodo & Associates, 240 Markham St., Toronto, Canada M6J 2G6

2Department of Statistical and Actuarial Sciences


The University of Western Ontario, London, Ontario, Canada N6A 5B7
3Department of Systems Design Engineering
University of Waterloo,Waterloo, Ontario, Canada N2L 3G1
4Department of Statistics and Actuarial Science
University of Waterloo,Waterloo, Ontario, Canada N2L 3G1

Methods are proposed for quantifl!ng long term mean annual riverine load reduc-
tions of the nutrient phosphorus lP] and other agricultural pollutants anticipated
in southwestern Ontario Great Lakes tributaries due to farm scale nonpoint source
[NPS] remediation measures implemented in the headwater catchments. Riverine
delivery of NPS pollutants is a stochastic process driven by episodic hydromete-
orologic events; thus, progress towards tributary load reduction targets must be
interpreted as the expected mean annual reduction achieved over a suitably long,
representative hydrologic sequence. Trend assessment studies reveal that runoff
event biased water quality monitoring records are conceptualized adequately by the
additive model Xi = iti + Ci + Si + Ti + ei where Xi is sample concentration,
iti is 'global' central tendency, Ci is discharge effect, Si is seasonality, Ti is trend
(local central tendency) and ei is residual noise. As the watersheds systematic hy-
drochemical response embodied in components Cj and Si has remained stable in
the presence of gradual concentration trends, the expected mean annual load reduc-
tions may be inferred by the difference between the mean annual loads estimated by
adjusting the water quality series to (1) pre-remediation and (2) current mean con-
centration levels where concentrations on unsampled days are simulated by Monte
Carlo methods. Fitting components by robust nonparametric smoothing :filters in
the context of generalized additive models, and jointly fitting interactive discharge
and seasonal effects as a two dimensional field C®St are considered.

INTRODUCTION
Diffuse or nonpoint source [NPS] water pollution by agricultural runoff has long
been recognized as a significant issue in southern Ontario. During the 1970s under
the aegis of the International Joint Commission [IJC], Canada and the U.S. un-
dertook to define NPS impacts on the Great Lakes with the PL U ARG [Pollution
from Land Use Activities Reference Group] studies that documented the extent of
water quality impairment by agriculture in southern Ontario (Coote et al., 1982;
271
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 271-284.
© 1994 Kluwer Academic Publishers.
272 B. A. BODO ET AL.

Wall et al., 1982; Miller et al., 1982; Nielsen et al., 1982; Frank et al., 1982).
A 1983 review by the IJC (1983) noted that Ontario had yet to implement any
comprehensive NPS remediation policies in response to PLUARG recommendations
tabled in 1978. In 1987 Canada and the U.S. finally set specific remediation targets
under Annex 3 of the amendments to the 1978 Canada-U.S. Great Lakes Water
Quality Agreement (IJC, 1987) which call for a 300 tonne per annum reduction of
phosphorus [P]loading to Lake Erie from the Canadian side. Presumably much of
the reduction was to be achieved by NPS remediation which involves implementation
of various farm scale land management practices including vegetated buffer strips
along streambanks, low tillage cultivation, restricted livestock access to streams,
solid and liquid waste management, and the retirement of erosion prone land from
cultivation.
Inherently stochastic NPS contaminant delivery mechanisms driven by ran-
domly episodic hydrometeorological processes mitigate against attempts to estab-
lish the quantitative impact of abatement practices. While short term pilot studies
at the plot and small watershed scale can demonstrate the general effect of a partic-
ular management practice, they often fail to provide a reliable basis for broad scale
extrapolation because the hydrologic regime of the pre-treatment monitoring phase
differs appreciably from that during treatment. Linking farm scale NPS abatement
measures implemented in small headwater catchments to progress towards specific
Great Lakes tributary nutrient load reduction targets poses a formidable challenge
as very subtle trends must be detected against appreciable background variation
imparted by stochastic NPS contaminant delivery mechanisms.
Over the past decade, federal and provincial agencies began implementing NPS
remediation policies in Ontario ostensibly to mitigate inland surface water quality
problems from agricultural runoff, and hopefully, to reduce Canadian tributary
nutrient loads to the lower Great Lakes. In the short term, miscellaneous farm
scale abatement initiatives will not discernibly influence water quality in the main
channels of larger rivers draining southwestern Ontario. However, the long term
cumulative impact of many small scale improvements should ultimately manifest in
downstream waters. It is presently unclear what has been achieved on a grander
scale by the patchwork of Ontario agricultural pollution remediation programs.
Some have been in progress since the mid 1980s and a comprehensive review of
water quality trends across southwestern Ontario would be timely. This paper
explores extensions of time series methods deVeloped for assessing long term water
quality concentration trends to the problem of estimating long term trends in mass
delivery of nutrients and other NPS pollutants to the Great Lakes.

RIVERINE MASS-DISCHARGE ESTIMATION


In Ontario, the problem of river mass-discharge estimation came to the fore during
PLUARG when tributary inputs were required for mass budgets of nutrient P for
the Great Lakes. For tributaries draining sedimentary southern Ontario, instanta-
neous P concentrations usually correlate positively with streamflow. Thus, P mass
delivery is dominated by relatively brief periods of high streamflow superimposed
on seasonally maximal discharge norms that occur over late winter and early spring.
Accordingly, Canadian PL UARG tributary surveys emphasized high frequency sam-
pling of these dominant mass delivery periods. Because annual tributary mass loads
estimated from flow-biased concentration data were highly sensitive to the method
of calculation, as the standard reporting technique, the IJ C ultimately imposed a
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 273

method informally known as 'stratified Beale ratio estimation' by which the annual
mass load is derived as the standard ratio estimate from sampling survey theory
(Cochran, 1977) adjusted by the bias correction factor of E.M.L. Beale (Tin, 1965).
To improve estimates, data were retrospectively blocked into two or more strata or
flow classes of relatively homogeneous concentrations. Two way blocking by flow
and time categories was also applied to treat both flow and seasonal variation.
Beyond estimation technique, the quality of annual mass load estimates de-
pends on the quality of the river monitoring record. At the outset of PLUARG,
vaguely understood mass delivery phenomena and the formidable logistical burdens
posed by runoff event sampling inevitably lead to unsampled high delivery periods
and oversampled low delivery periods. Realizing that a watershed's fundamental
hydrophysicochemical response had not changed appreciably from one year to the
next, Ontario tributary loads were determined by pooling 1975 and 1976 survey
data in order to improve the respective annual and monthly estimates (Bodo and
Unny, 1983).
Following PLUARG, Environment Ontario (MOE) implemented the 'Enhanced
Tributary Monitoring' [ETM] program at 17 major tributary outlets to the Great
Lakes where higher frequency sampling was to be conducted in order to more reli-
ably estimate mass delivery for a limited suite of variables. Annual P loads were to
be reported to the IJ C for inclusion in the biennial reports of the Great Lakes Water
Quality Board. Due to minimal budgets and staffing, the execution of the ETM
program was somewhat haphazard from the outset. Local 'observers' resident in
the vicinity of the ETM sites were retained to collect and ship samples. Observers
were instructed to sample more frequently at high flows with neither quantitative
prescription of river stage for judging precisely what constituted high flows nor any
regular supervision. Accordingly, sampling performance has varied erratically and
critical high flow periods have gone unsampled. Initially from 20-120 and more
recently from 20-60 samples per annum were obtained from which annual P loads
were determined by two strata Beale ratio estimation with a subjectively deter-
mined flow boundary separating low and high flow classes. Each year was treated
independently and flow boundaries were manipulated subjectively to force data into
two classes. Consequently, annual P mass load estimates for Ontario tributaries re-
flect the vagaries of sampling practice as much or more so than legitimate trends
and hydroclimatic variations that determine the true mass transport.
Figure 1 shows the location of the 3 southwestern Ontario ETM sites most
suitable for studying mass-discharge trends. The Grand and Saugeen Rivers which
were the Ontario PL UARG pilot watersheds now have lengthy water quality records
spanning 1975-1993. The ETM site in the Thames River basin, the most intensely
agricultural watershed of southwestern Ontario, has low frequency data from 1966-
1975 and higher frequency records from late 1979. Figure 1 also illustrates a prob-
lem common to ETM sites that are not co-located with flow gauges where flows
estimated by areal proration from gauges in upstream or adjacent watersheds are
employed to generate annual P loads. Errors in mean daily flow estimates derived
for the Grand and Thames ETM sites are small as only 15% and 10% of respective
upstream areas are ungauged.
While studies (Dolan et al., 1981; Richards and Holloway, 1987; Young and
DePinto, 1988; Preston et al., 1989) have shown stratified Beale ratio estimation
to perform at least as well as other techniques for estimating annual loads in larger
274 B. A. BODO ET AL.

+
SCALE:

Figure 1. Map, major southwestern Ontario Grea.t Lakes tributaries.


rivers, other approaches are better suited for evaluating long term. trends in m.a.ss
delivery. A simple alternative estimate of annual load L is given by the sum

(1)

where Xi is the concentration of the ith sample collected at time ti, Qi is the mean
discharge over bi = (tHl - ti-l)/2, the time interval represented by sample i. This
method produces acceptable results when the sampling is conducted at a frequency
appropriate to the characteristic hydrochemical response of the stream. The rivers
we consider herein are large enough that instantaneous How on any given day does
not differ appreciably from the mean How for that day and that water quality
concentrations do not vary significantly during the day. For these systems, dally
sampling would produce very good annual load estimates. The quality of estimates
would decline as sampling rates fell below the characteristic duration time of a
runoff event. Technique (1) wa.s employed by Baker (1988) to estimate sediment,
nutrient and pesticide loads to Lake Erie from U.S. tributaries. In contra.st to
Ontario monitoring activity, the programs supervised by Baker and colleagues have
obtained from 400-500 samples per year at the main sites which are co-located with
How gauges.

TIME SERIES MODELS


Motivated by anticipated need to detect improvements from agricultural NPS pollu-
tion abatement initiatives, MOE sponsored a research project (McLeod et al., 1991)
to develop statistical methods for detecting subtle time trends in How-bia.sed water
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 275

quality time series like those generated at ETM sites. Lengthy concentration series
from outlet sites of the two main Ontario PLUARG watersheds, the Grand and
Saugeen Rivers, served as the development data set. Beyond statistical develop-
ments and some indications of modest trends, the results clearly demonstrated that
these larger southwestern Ontario watersheds exhibit generally stable hydrochemi-
cal response over the long term. Fundamental concentration-flow relationships and
seasonal cycles remain largely unchanged in the presence of negligible to modest
concentration trends. Thus, to good first approximation, the water quality concen-
tration process may be conceptualized as the additive process

(2)
X t is constituent concentration, Xt the central tendency of X t , Ct is covariate effect,
8 t is seasonal effect, Tt is chronological time trend and et is residual noise. Here
Ct , 8t , and Tt represent relative departures from process mean Xt • Covariate effect
Ct is defined by a functional relation Ct = I( Qt) with stream discharge. Trend Tt
is the temporally local level of the constituent in the chronological time dimension
t at which the process evolves. Seasonal 8t represents stable annually recurrent
variation in the circular dimension of seasonal time T defined here on the interval
[0,1] as the fraction of time from the beginning of the year. In decimal format,
chronological time t is the sum of the year plus the seasonal fraction T.
For watersheds with stable hydrochemical response, these times series models
of water quality concentrations afford a means of estimating reductions in riverine
contaminant mass delivery attributable to changes in upstream land practice. Sup-
pose that model (2) is successfully fit to the Grand River P series which extends
from 1972-1992 and that we wish to evaluate net changes in annual P delivery that
may have occurred since the PLUARG reference years 1975-76. Diagnostic graph-
ics and trend tests reveal that P concentrations have declined gradually through
the 1980s. After adjustments for flow and seasonal effects, we determine that the
respective 1975-76 and 1990-92 mean levels were Xl and X2. Next we construct
the two hypothetical series

(3a)

(3b)
which amounts to centring the entire data series about the two respective reference
levels. For each series we determine annual P loads by a consistent method for each
year from 1975-1992 and average the results to obtain £1 and £2 the mean annual P
loads as if mean levels Xl and X2 respectively had prevailed over the entire period.
The difference a£ = £2 - £1 gives an estimate of the mean annual reduction in
P mass delivery that could be expected for the hydrologic sequence observed from
1975-1992. On consideration of the intrinsic stochasticity of mass delivery, the
binational P reduction targets for Lake Erie can be meaningfully interpreted only
as mean annual estimates determined over a suitably long period of representative
hydrologic record of 10 years or more.
Recently, Baker (1992) proposed essentially this method for estimating the ex-
tent of P load reductions from U.S. tributaries to Lake Erie, and gave preliminary
276 B. A. BODO ET AL.

results suggesting that significant progress towards the U.S. P reduction commit-
ment of 1700 t per annum (IJC, 1987) had been achieved by NPS remediation
measures applied in the upper reaches of Ohio's Maumee lliver in the 1980s. Re-
cent analyses (llichards and Baker, 1993) support the early results. Though Ohio
monitoring data are superior, there is no good reason preventing the application of
the same approach to Ontario's ETM records. Bodo (1991) applied a similar time
series adjustment technique at the Thames ETM site to contrast differences in the
expected seasonal risks of exceeding Canada's water quality guideline for the her-
bicide atrazine between years of high and low applications. The main requirement
of the technique is a good time series model fit, demonstrated for the Grand and
Saugeen lliver sites in the trend assessment study (McLeod et al., 1991). Due to
hapzard ETM records and difficulties with stratified Beale ratio estimation, it is
proposed to simulate concentrations on unsampled days according to model (3) and
then determine annual loads by the technique of equation (1).

FITTING THE ADDITIVE MODEL COMPONENTS


To determine mass-discharge trends, the best fit possible must be obtained for model
(2) particularly at high flows which dominate mass transport. In the earlier trend
assessment work (Mcleod et al., 1991) with Grand and Saugeen time series, model
(2) systematic components were fit by conventional sequential reduction without
iteration. First the discharge effect was estimated as a smooth function of flow
8; = !(Qi) with the LOWESS scatterplot smoother (Cleveland, 1979). Next, sea-
sonal effects Si were estimated as the calendar monthly means of flow adjusted
concentrations Vi = Xi - Xi - 8i • Finally, trend term Ti was determined as the
LOWESS smooth of the flow-adjusted, de- seasonalized residuals Pi = Vi - Si' Dif-
ficulties with autocorrelation effects induced by event sample clusters were circum-
vented by analysis of both the original concentration series and the reduced series
of monthly mean concentrations. While this approach was adequate for trend as-
sessment, extensive experience fitting simplified seasonal adjustment models to low
frequency water quality concentration series (Bodo, 1991) suggests various ways
that the fitting of model (2) can be improved.
It is useful to consider model (2) in the contemporary context of generalized ad-
ditive models [GAM] (Hastie and Tibshirani, 1990) which are a broad generalization
of linear regression in which the predictors on the right side - systematic terms Ct ,
St, Tt in model (2) - are arbitrary smooth functions. Though formal parametric
models are permissible in the GAM framework, we consider fitting systematic terms
by nonparametric smoot hers that are robust against outliers and asymmetric data
distributions. Additive time series models are usually fit by iterative decomposition
schemes (e.g., Mcleod et al., 1983; Cleveland et al., 1979) that are particular forms
of the Gauss-Seidel algorithm known as backfitting (Hastie and Tibshirani, 1990)
which gives the systematic components as

Gj = h(Xt - X- 2: Gk) (4)


k:¢j

where Gj are Ct , Sf, and Tt for model (2) and hO are arbitrary smooth functions.
Iteration reduces the influence of potentially significant sampling biases embed-
ded in the initial fits of systematic terms. For the Grand and Saugeen lliver data
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 277

Figure 2. P concentration versus flow, Grand River.

1,000
..J
........
E
!0)
....
0 100
'e
0

D- ..
...
10
10 100 1,000
Flow cubic mI s

these include: (1) chronological biases toward heavily sampled periods, particularly
the PL UARG years 1975-1977, (2) seasonal bias towards spring runoff months, and
(3) a covariate bias towards higher flows. To varying degrees, these three sampling
biases overlap. The seasonal and covariate bias are largely synonymous. In Figure
2, the unadjusted P concentration-flow relationship fit with the LOWESS smoother
is biased towards the PL UARG years, the spring period, and high flows. On second
and higher iterations of the backfitting algorithm, chronological and seasonal bias
are reduced as the relation is determined on data that have been de-trended and
de-seasonalized.
To optimize the model fit, it is necessary to understand clearly the respective
roles of the model components. In seasonal and chronological time respectively,
seasonal St and trend Tt are expected to represent temporally local concentration
norms where norms are interpreted as concentrations that prevail most of time.
Consequently, model estimates St and 1\ should be close to the median concen-
trations expected at time t. In contrast, the modelled covariate effect Ct must
represent as precisely as possible the contribution of high flows that are generally
atypical of seasonal flow norms; hence, maintaining a bias towards high flows is de-
sirable. Tuning the fitting procedures for 'St
and 1\ to best achieve their objectives,
will contribute to attaining the best fit possible of Ct.
Iteration does not necessarily eliminate all the potential distortions introduced
by sampling biases, but additional measures can be applied to optimize the fit of
specific components. Heavy chronological and seasonal sampling biases introduce
autocorrelation effects that play havoc with the LOWESS filter currently used to fit
trend component Tt • LOWESS smoothing is controlled by a nearest neighbour spec-
ification. For time series smoothing, it performs best when data density is relatively
uniform over the time horizon of the series. More uniform chronological data density
was achieved in previous work by reducing data to a series of monthly mean concen-
278 B. A. BODO ET AL.

trations before applying the LOWESS smoother. The procedure could be improved
in the following ways. Because Tt should represent temporally local concentration
norms, replacing the arithmetic mean with a median or a bi-weight mean (Mosteller
and Tukey, 1977) would provide resistance to abnormal concentrations. Further im-
provement can be obtained by introducing sample weights Wf <X 6i = (ti+l -ti-l )/2
that restrict the influence of closely spaced samples. Component Tt is now estimated
as the LOWESS smooth of a reduced series of monthly weighted means. Experience
with a wide variety of low frequency water quality series (Bodo, 1991) suggests that
Tt is most often best fit by tuning LOWESS to approximate smoothing at a fixed
temporal bandwidth of about one year by specifying the number of nearest neigh-
bours mas 1-2 times the typical annual sample density. Individual estimates
the original sample times may be derived by interpolating the smoothed output of
n at
the reduced series.
Like trend term Tt , St should represent seasonal variation in concentration
norms; hence, a robust central tendency statistic is again preferred over an arith-
metic mean. Approximating St as a smooth function by replacing the monthly
mean sequence - essentially a bin filter with a mid month target point - with the
output of a fixed bandwidth running bin filter applied on a uniform grid of target
points distributed throughout the year will, by at least a small amount, better the
fit of recurrent seasonal variation by eliminating end of month step discontinuities.
Generally, a bandwidth of one month, or for simplicity 1/12 year, eliminates prob-
lems with high spring sampling densities and provides a sharp seasonal fit that is
rarely improved by smaller bandwidths even with very high frequency water qual-
ity data. A current implementation, the seasonal running median rSRM] smoother
(Bodo, 1989, 1991), employs 100 target points spaced every 3.65 days in non leap
years. This may be excessive, but the added computational burden is small.
For the ETM river data, the robust SRM :filter may yet generate output biased
to heavily sampled years and flows higher than the expected seasonal norms. If the
median is replaced by a weighted mean as the measure employed to summarize the
observations within bins, weights can be introduced to minimize the influence of
(a) heavily sampled years, and (b) sample flows remote from the expected seasonal
mean daily flow norms. Weights wf defined previously can be applied to treat the
chronological bias. Weights w? to minimize flow bias for sample i at chronologi-
cal time ti could be proportioned as the probability of occurrence of sample flow
Qi conditional on the seasonally local distribution of mean daily flows encountered
at the corresponding seasonal target point Ti within a bin of the same bandwidth
employed for seasonal smoothing. Alternatively, the flow weight might be made
inversely proportional to the distance of sample flow Qi from the seasonal median
of mean daily flows at time Ti. Some experimentation will be necessary to obtain
acceptable weights. The influence of both chronological and flow bias on the def-
inition of the seasonal St may ultimately prove inconsequential; nonetheless, the
effects are presently unknown and some investigation is required.
The covariate term Ct may also be influenced by chronological and seasonal
sampling biases. Applying the chronological weights Wf should approximately
equalize the influence of all years on the fitted Ct. It is reasonable to ignore the
seasonal sampling bias which is largely equivalent to the bias towards higher flows.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 279

JOINTLY FITTING COVARIATE AND SEASONAL COMPONENTS


A deficiency of additive model (2) is that it assumes that the shape of the covariate
response function remains constant throughout the year, or equivalently, that the
seasonal response remains constant at all flow levels. Effectively, the combined co-
variate and seasonal response Ct + 5 t maintains a constant functional relation with
discharge that is simply adjusted upward or downward as the seasons progress.
With model (2), there is no obvious way of adding a covariate-seasonal interaction
term to account for a covariate response that varies seasonally. However, within the
GAM framework it is permissible to jointly fit the response two or more predictors.
For water quality concentrations it is most logical to consider modelling the con-
centration response surface to combined covariate-seasonal effects represented here
as C05t • The additive time series model is now written

(5)
This requires smoothing concentrations over the two dimensional cylindrical field
defined by the circular seasonal time dimension and the longitudinal covariate di-
mension. The resulting response surface inherently includes covariate-seasonal inter-
actions. LOESS (Cleveland and Devlin, 1988; Cleveland and Grosse, 1991), the mul-
tivariate extension of the LOWESS filter, has been used to fit C05t within a back-
fitting algorithm, but other procedures such as two dimensional spline smoothers
could be considered.
Experimental results (Figure 3) confirm suspicions that the shape of the con-
centration response to flow can vary through the course of the year. For the Grand
and Saugeen Rivers, variations are generally confined to low flows which suggests
that potential bias in the additive covariate-seasonal fit 8t + St of model (2) is
small and may not unduly affect annual mass-discharge estimates. Though promis-
ing, model (5) currently remains best suited as a conceptualization aid. The main
difficulties lie in optimizing the degree of smoothing obtained by the LOESS fil-
ter. Irregular water quality data distributions over the C05t field may confound
the usual automatic smoothing control selection procedures such as cross valida-
tion (Hastie and Tisbshirani, 1990) which perform best with relatively uniformly
distributed data. Currently, the best approach is by graphical comparison of the
respective two dimensional and unidimensional filter outputs on thin strips of the
C05t field. For example, for each month, concentration-discharge can be plotted
with the LOWESS fit for that month's data and the LOESS fit at the mid point
of that month. Similarly, for a series of discrete flow ranges, seasonal plots may be
developed with overlays of the respective SRM filter output and the LOESS fit at
the range mid point. With experimentation, it is expected that objective rules can
be formulated for adapting LOESS and other two dimensional smoothers to jointly
fitting the covariate and seasonal effect as the two dimensional field C05t •

AUTOCORRELATION EFFECTS
While autocorrelation effects can be eliminated during fitting of the systematic
model components, to simulate daily water quality concentrations it is necessary to
develop a model for autocorrelation structure that is evident in the residual noise
Ci of model (2) particularly during periods of high flow. To first approximation,
simple autoregressive or ARMA error structures will probably suffice. Practically,
280 B. A. BODO ET AL.

-.....I
~
""- ~

E: ~
e ~

e
t::J)
~
"'i
.~
E: ~

-
"'i
Cl... "I-
"-
<:;::) ~
t::J)
C ~
-.....I ....:

Figure 3. Joint discharge-seasonal effect, Grand River P series.

error models can be fit only to the series of daily concentrations derived by reduc-
ing multiple sample concentrations obtained on certain runoff event days to a single
value. Presuming that a suitable noise model can be fit, concentrations can then
estimated for missing days j by Monte Carlo methods. As the task involves filling
gaps in the concentration series, closure rules are required to assure that concen-
tration trajectories simulated forward from the left end of the gap, close within
plausible proximity of the next observed concentration at the right end of the gap.

HYSTERESIS EFFECTS
Hysteresis is a secondary effect that may merit attention for some water quality
variables. The hysteresis effect is observed over runoff events when the concentra-
tion peak precedes the discharge peak with the net result that for the same flow,
higher concentrations are observed on the rising limb of the hydrograph than on the
recession limb. Sediment and sediment associated variables like P have exhibited
hysteresis in both the Grand and Saugeen River monitoring records. As a result, the
covariate response function underestimates concentration on rising flows and over-
estimates on receding flows. The net effects on cumulative annual mass-discharge
estimates are likely small; nonetheless, some investigation is necessary to verify this
hypothesis.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 281

TIME VARYING MODEL COMPONENTS


Covariate and seasonal effect that change over chronological time may significantly
affect annual mass-discharge estimates. Because, P data in any given year except
for PL UARG years are generally insufficient to precisely define the covariate term
Ct at the high flows so crucial to accurate mass-discharge estimation, long term
variations in covariate response must be investigated by segmenting the record into
multi-year periods and comparing the fitted covariate and seasonal components. If
necessary, long term mean annual mass delivery reductions can be evaluated via the
hypothetical re-constructed water quality series

(6a)

(6b)
for the two respective reference periods where the systematic components C}j) and
S~j) have been fit separately to two multi-year segments of the record. Prelimi-
nary analysis suggests that nutrient P appears relatively unaffected, but further
confirmatory analysis is warranted.
In contrast to P, the nutrient species total nitrate (NO;=N03" +N02") is a dis-
solved ionic constituent with unique biochemical response. As the Saugeen lliver
plot (Figure 4) shows, a well defined seasonal cycle dominates the series. Discharge
variation is a virtually negligible determinant of seasonal NO; concentration vari-
ation which is driven by the annual growth-decay cycle of terrestrial and aquatic
biomass. In agricultural watersheds, year-to-year shifts in mean level reflect mainly
variations in annual fertilizer application rates. For the Saugeen lliver, the annual
amplitude increases as annual mean level increases. Also the amplitude is affected
asymmetrically as the late winter maxima increase disproportionately relative to
the late summer minima that vary only from one year to the next.
While the fixed amplitude seasonality models of (6a,b) are adequate to evaluate
changes in mean annual NO; mass delivery between two reference periods with
stable mean levels, introducing flexible models that permit year-to-year variation
seasonal response would be advantageous for NO; series. Specifically the variable
amplitude Fourier harmonic representation

Sk,Tk,i = Ak cos 21l"Tk,i + Bk sin 21l"Tk,i (7)


where k indexes the year and Tk,i indexes the seasonal time of observation i in
year k, is often employed for variables with well defined seasonal response (e.g.,
EI-Shaarawi et al., 1983). Because the summer minima are relatively stable from
year to year, Sk,T should be fit to yearly periods beginning during the summer lows.
To assure continuity from one year to the next, we can place 'knots' at the summer
minima and formulate the seasonal model as a regression spline (Eubank, 1988).
Additionally, time weights can be introduced to neutralize the usual spring sampling
bias and robustness weights can be applied to reduce the effect of outliers. For long
NO; series like those at the Grand and Saugeen lliver sites, there exists a distinct
possibility that the seasonal model coefficients Ak, B k can be linked empirically to
the respective mean level Xk in year k in which case long term NO; mass delivery
variations can readily explored over a range of expected mean levels.
282 B. A. BODO ET AL.

Figure 4. Nitrate trends, Saugeen River

....I
.........
OJ
3
. . :
~
E .0'
o

c:

-
2
~'-
c:
~
c:
8
0
74 76 78 80 82 84 86 88 90

SUMMARY
Nonpoint source agricultural pollution of waterways is a stochastic process driven
by episodic hydrometeorological events. Thus, riverine NPS mass delivery reduction
targets must be estimated as mean annual loads over a suitably long, representative
hydrological sequence. It is proposed to estimate the cumulative gradual impacts of
farm scale NPS remediation measures implemented in the headwater catchments of
southwestern Ontario Great Lakes tributaries by modelling and simulation of water
quality concentration time series. Because the crucial systematic model components
characterizing the river basin's hydrochemical response, namely discharge and sea-
sonal effects, have remained largely stable over the horizon of the available data
sets, probable mean annual NPS mass delivery reductions may be estimated using
water quality time series simulated to represent (1) pre-remediation, and (2) post-
treatment scenarios. It is expected that good results can be obtained by treating
water quality series as generalized additive models and fitting the systematic model
components by nonparametric smoothing filters. Preliminary trials with arbitrarily
chosen smoothers have been able to capture 50-75% of the variability in available
data series for the major nutrients phosphorus and total nitrates. Detailed inves-
tigations of sampling biases, secondary systematic effects and correlated residual
error structure embedded in the available data are required to obtain an optimal
model fit and ultimately to assure that good quality estimates of NPS mass delivery
reductions are available to provide policy makers with a means a to assess progress
towards the Lake Erie phosphorus reduction targets and, generally, the success of
agricultural NPS remedial measures being applied in southwestern Ontario.
WATER QUALITY TIME SERIES OBTAINED FOR MASS DISCHARGE ESTIMATION 283

ACKNOWLEDGEMENT
The first author's efforts were in part supported by a Natural Sciences and Engi-
neering Research Council of Canada grant.
REFERENCES
Baker, D.B. (1988) Sediment, nutrient and pesticide transport in selected lower
Great Lakes tributaries, EPA-905/4-88-00l, U.S. Environmental Protection Agency,
Great Lakes National Program Office, Chicago.
Baker, D.B. (1992) "Roles of long-term chemical transport in river basin manage-
ment", In Abstracts, 13th Annual Meeting, November, 1992. Society of Environ-
mental Toxicology and Chemistry.
Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly
spaced water quality time series", Environ. Monitoring Assessment, 12, 407-428.
Bodo, B.A. (1991a) "Trend analysis and mass-discharge estimation of atrazine
in southwestern Ontario Great Lakes tributaries: 1981-1989", Environ. Toxico!.
Chem., 10, 1105-1121.
Bodo, B.A. (1991b) TRENDS: PC software, user's guide and documentation for
robust graphical time series analysis of long term surface water quality records.
Environment Ontario, Toronto.
Bodo, B.A., and Unny, T.E. (1983) "Sampling strategies for mass-discharge estima-
tion", ASCE J. Environ. Eng. Div., 109, 812-829, 1984; "Errata and discussion",
110, 867-871.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter-
plots", J. Am. Stat. Assoc., 74, 829-836.
Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: An approach
to regression analysis by local fitting", J. Am. Stat. Assoc., 83(403),596-610.
Cleveland, W.S., and Grosse, E. (1991) "Computational methods for local regres-
sion", Statistical Computing, 1, 47-62.
Cleveland, W.S., Devlin, S.J., and Terpenning, I.J. (1979) "SABL A resistant sea-
sonal adjustment procedure with graphical methods for interpretation and diagno-
sis" , In A. Zellner {ed.) Seasonal Adjustment of Economic Time Series, U.S. Dept. of
Commerce, Bureau ot the Census, Washington, D.C.
Cochran, W.G. (1977) Sampling Techniques, 2nd ed., Wiley, New York, NY.
Coote, D.R., MacDonald, E.M., Dickinson, W.T., Ostry, R.C., and Frank, R. (1982)
"Agriculture and water quality in the Canadian Great Lakes Basin: I. Representa-
tive agricultural watersheds", J. Environ. Qual., 11(3), 473-481.
Dolan, D.M., Yui, A.K., and Geist, R.D (1981) "Evaluation of river load estimation
methods for total phosphorus", J. Great Lakes Res., 7, 207-214.
EI-Shaarawi, A.H., Esterby, S.R., and Kuntz, K.W. (1983) "A statistical evaluation
of trends in water quality of the Niagara River" , J. Great Lakes. Res., 9(2), 234-240.
Eubank, R.L. (1988), Spline Smoothing and Nonparametric Regression, Dekker,
New York.
Frank, R., Braun, H.E., Holdrinet, M.V.H., Sirons, G.J., and Ripley, B.D. (1982)
"Agriculture and water quality in the Canadian Great Lakes Basin: V. Pesticide
use in the 11 agricultural watersheds and presence in stream water, 1975-1977",
J. Environ. Qual., 11(3),497-505.
284 B. A. BODO ET AL.

Hastie, T.J., and Tibshirani, R.J. (1990) Generalized Additive Models, Chapman
and Hall, London.
IJC (1983) Nonpoint source pollution abatement in the Great Lakes Basin: an
overview of post-PLUARG developments, International Joint Commission, Great
Lakes Regional Office, Windsor, Ontario.
IJC (1987) Revised Great Lakes Water Quality Agreement of 1978 as amended by
Protocol signed November 18, 1987, International Joint Commission, Great Lakes
Regional Office, Windsor, Ontario.
Miller, M.H., Robinson, J.B.,Coote, D.R., Spires, A.C., and Draper, D.W. (1982)
"A~riculture and water quality in the Canadian Great Lakes Basin: III. Phospho-
rus ,J. Environ. Qual., 11(3), 487-493.
Neilsen, G.H., Culley, J .L.B., and Cameron, D.R. (1982) "Agriculture and water
quality in the Canadian Great Lakes Basin: IV. Nitrogen", J. Environ. Qual., 11(3),
493-497.
McLeod, A.I., Hipel, K.W., and Camacho, F. (1983) "Trend assessment of water
quality time series", Water Resour. Bull., 19, 537-547.
McLeod, A.I., Hipel, K.W., and Bodo, B.A. (1991) "Trend analysis methodology
for water quality time series" , Environmetrics, 2(2), 169-200.
Mosteller, F., and Tukey, J.W. (1977) Data Analysis and Regression: A Second
Course in Statistics, Addison-Wesley, Reading, MA.
Preston, D.S., Bierman Jr., V.J., and Silliman, S.E. (1989) "An evaluation ofmeth-
ods for the estimation of tributary mass loads", Water Resour. Res., 25,1379-1389.
Richards R.P., and Holloway, J. (1987) "Monte Carlo studies of sampling strategies
for estimating tributary loads", Water Resour. Res., 23, 1939-1948.
Richards R.P., and Baker, D.B. (1987) "Trends in nutrient and suspended sediment
concentrations in Lake Erie tributaries, 1975-1990", J. Great Lakes Res., 19(2),
200-211.
Tin, M., (1965) "Comparison of some ratio estimators", J. Am. Stat. Assoc., 60,
294-307.
Wall, G.J., Dickinson, W.T., and Van Vliet, L.P.J. (1982) "Agriculture and wa-
ter quality in the Canadian Great Lakes Basin: II. Fluvial sediments", J. Envi-
ron. Qual., 11(3), 482-486.
Young, T.C., and DePinto, J.V. (1988) "Factors affecting the efficiency of some
estimators of fluvial total phosphorus load", Water Resour. Res., 24, 1535-1540.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR
SUDBURY, ONTARlO 1973-1992

BYRON A. BOD01,2 and PETER J. DILLON3


1Byron A. Bodo & Associates, 240 Markham St., Toronto, Canada M6J 2G6
2Department of Statistical and Actuarial Sciences
The University of Western Ontario, London, Ontario, Canada N6A 5B7
3Dorset Research Centre, Ontario Ministry of Environment
P.O. Box 39, Dorset, Ontario, Canada POA lEO

Historically, Clearwater Lake on the Precambrian Shield near Sudbury, Ontario, has
received significant acid deposition from local smelters and remote sources. From
1.46x106 tonnes (t) in 1973, local S02 emissions fell to .64x106 tin 1991. To assess
lake response, temporal trends were examined for 26 water quality variables with
records dating from 1973-1981. But for brief drought-induced reversals, aqueous
SO!- fell steadily from 590 p.eq/L in 1973 to under 320 p.eq/L in 1991-92, while pH
rose from 4.3 in 1973 to exceed 5 in May 1992 for the first time on record. Dispropor-
tionate lake response to local S02 emission reductions suggests that remote-source
acid deposition is an important determinant of Clearwater Lake status. Chloride
adjusted base cation, AI and Si trends mirror SO~- trends indicating that geochemi-
cal weathering is decelerating as acid deposition declines. Lake levels of toxic metals
Cu and Ni derived from local smelter emissions seem to have fallen appreciably in
recent years and there has been a small surge in biological activity that may have
declined abruptly in 1991. With its unique long term record, continued sampling in
Clearwater Lake is advised to monitor the success of local and remote S02 emission
reduction commitments in the U.S. and Canada.

INTRODUCTION
For nearly a century terrestrial and aquatic ecosystems in the Sudbury area of
northeastern Ontario have suffered the consequences of mining and smelting of
nickel and copper bearing sulphide ores. Reports in the 1960s of soils and acidic
surface waters (Gorham and Gordon, 1960a,b) with elevated levels of metals Cu
and Ni (Johnson and Owen, 1966) prompted comprehensive limnological studies of
Sudbury area lakes begun in 1973 (Dillon, 1984; Jeffries, 1984; Jeffries et al., 1984).
Clearwater Lake, in a small headwater catchment 13 km south of the large nickel
smelter in Copper Cliff (Figure 1) was maintained as an unmanipulated control for
neutralization and fertilization experiments conducted in downstream lakes. The
past two decades have seen substantial reductions in local smelter emissions of acid
precursor sulphur dioxide (S02) and metals Cu and Ni. Pre-1972 mean annual
S02 emissions of 2.2 x 106 tonnes (t) fell to a mean of 1.41 x 106 t/yr over 1973-
1977 (Figure 2). Low emissions over 1978/79 and 1982/83 reflect extended smelter
285
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 285-298.
© 1994 Kluwer Academic Publishers.
286 B. A. BODO AND P. J. DILLON

• SMELTER SITES
(Conislon inoclive)

"Y
• FALCONBRIDGE

.CONISTON

"""" "i,"
oI
5
I
10
I
?7 CLEARWATER LAKE Km
81°00'

Figure 1. Location Map, Sudbury and Clearwater Lake.

shutdowns. S02 emissions have declined gradually from .767 x 106 t/yr in 1984 to
.642 x 106 t/yr in 1991 for an average yearly decrease of 17,900 t, at which rate
target emissions of .47 x 106 t/yr or 1/2 the 1980 level should be reached in about
10 years.
The Sudbury area also receives acid deposition from long range southerly air
flows. Though quantitative estimates vary, eastern North American sulphur emis-
sions peaked about 1970, declined significantly to about 1982 and then stabilized
through 1988 (Dillon et al., 1988; Husar et al., 1991). Bulk deposition data south of
Sudbury in Muskoka-Haliburton for 1975-1988 (Dillon et al., 1988; OMOE, 1992)
and west of Sudbury in the Algoma highlands for 1976-1985 (Kelso and Jeffries,
1988) reflect the general eastern North American emission trends. As local and
remote S02 emissions have declined, water quality has improved in the Sudbury
region's acid-stressed lakes (Dillon et al. 1986; Gunn and Keller, 1990; Keller and
Pitibaldo, 1986; Keller et al., 1986; Hutchinson and Havas, 1986); however, some
lakes remain beset by chronic acidity and heavy metal concentrations that would
likely prohibit the recovery of desirable aquatic biota. Recently, Keller and Pitibaldo
(1992) noted that ongoing de-acidification trends in Sudbury area lakes reversed
following drought in 1986/87. In addition to dry acid deposition that accumulates
steadily throughout dry periods, drying and exposure to air induces re-oxidation
of reduced sulphur retained in normally saturated zones of the catchment. Thus,
when it resumes, normal precipitation can generate substantial acid loads. Keller
and Pitibaldo present data for Swan Lake in the watershed neighbouring Clearwater
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 287

Lake that shows drought related acidification began as the drought ended in late
1987, was most prominent in 1988 and had begun to subside in 1989.
Spanning nearly 20 years, Clearwater Lake records offer a unique, ongoing
chronicle of de-acidification processes. The primary objectives of the present work
were to rigorously review the entire historical data base from 1973, to update the
chemical status of Clearwater Lake, last reported by Dillon et al. (1986) and to
examine trophic indicators for signs of incipient biological recovery.

SITE CHARACTERISTICS AND WATER QUALITY DATA BASE


Clearwater Lake hydrology and morphometry are summarized in Table 1. The lake
stratifies near the end of May and turns over in late September. During the 1970s,
the hypolimnion remained oxic with dissolved oxygen declining to 2-4 mg/L by
late summer. Normally, ice cover forms in early December and breaks between mid
March and mid April. The influent drainage area comprises 82% exposed alumino-
silicaceous bedrock (quartzite, gabbro, gneiss) with the remainder as thin till (10%),
peat (5%) and pond.s (3%). The lake margin supports some cottage development.
Two of the four influent subcatchments are impacted by road salting. Detailed
descriptions can be found in Jeffries et al. (1984) and OMOE (1982).
Sudbury area annual precipitation trends are shown in Figure 3. The 1986/87
drought is the most significant extended dry period since the early 1960s. Below
normal precipitation occurred from July 1986 to October 1987 coincidentally with
extraordinarily high temperatures that extended into 1988, one of the wettest years
on record. A lesser drought was observed in 1975/76. Cumulative precipitation
deficits from long term norms were 147 mm (1976/77) and 286 mm (1986/87).
Except for 1982, 1977-1985 precipitation was well above long term norms.
Clearwater Lake water quality records extend from June 1973 to May 1992 for
the 26 water quality variables in Table 2. Sample counts were determined after
averaging quality control replicates. In the 1970s, 13-24 samples per year were
obtained mainly during open water season. Approximately monthly sampling was
maintained through the 1980s. Secchi depth and chlorophyll a are obtained only in
open water season. Due to severe fiscal constraints, sampling has all but ceased as
of 1992. In the 1970s, samples were analyzed in Sudbury and Toronto labs (OMOE,
1982). Over 1980/81 Dorset Research Centre assumed responsibility for perishable
parameters including pH, alkalinity, and nutrients, while other analyses continued
to be performed in Toronto. Dorset lab and field methodologies are described by
Locke and Scott (1986). Some excessively crude 1973-1977 ion (Ca, Mg, Na, K, CI)
measurements were excluded from analysis.

DATA VALIDATION
Ionic species data were examined for consistency with (a) carbonate equilibria,
(b) electroneutrality, and (c) equivalent conductance. Validation exercises were
generalized as tests of agreement between two independently determined quantities
within the framework of robust regression analysis, specifically, Rousseeuw and
Leroy's (1987) 'high breakdown' Least Median of Squares [LMS] regression which
can withstand contamination by up to 50% outliers. Resistance to outliers imparts
greater confidence to the results and means to assess the frequency of measurement
errors - an important aspect of retrospective data analysis.
288 B. A. BODO AND P. 1. DILLON

TABLE 1. Clearwater Lake morphometry and 1977-78 water balance.

Drainage Lake Total Lake Mean Maximum Shoreline


area area area volume depth depth length
342 ha 76.5 ha 419 ha 6.42 x 106 m 3 8.4 m 21.5m 5.0km

Precipitation Evapotranspiration Mean outflow Retention time


918 mm 388mm 64.5 L/s 3.2 yr

TABLE 2. Clearwater Lake water quality record summary.


Variable 1st year samples Variable 1st year samples

Conductance 1973 238 Fe 1975 193


pH 1973 231 Mn 1975 193
Gran alkalinity 1980 114 Cu 1975 150
Total alkalinity 1979 95 Ni 1975 150
Ca* 1977 169 Zn 1975 169
Mg* 1977 165 DIe 1981 105
Na* 1974 213 DOC 1981 109
K* 1975 191 N0 3 1973 222
CI* 1975 186 NH4 1973 232
S0 4 1973 217 TKN 1973 238
F 1983 97 P 1973 233
Si 1973 226 Chlorophyll a 1973 204
Al 1975 175 Secchi depth 1973 212
* early records considered unreliable

Carbonate Equilibria
At the low pHs ranging from 4 to 5 in Clearwater Lake over 1973-1992, the definition
of Acid Neutralizing Capacity (ANC) in fresh surface waters simplifies to
(1)
where concentrations are expressed in equivalents. Virtual 1:1 correspondence be-
tween concurrent Gran alkalinity (ALKG) and negative hydrogen ion data (Figure
4) confirms that ALKG accurately represents Clearwater ANC. Scatter in the lower
left quadrant is due to the earliest ALKG measurements of 1980/81. Robust regres-
sion between total alkalinity ALKT and -[H+] yields similar results. The relation
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 289

Figure 2. Sudbury smelter S02 annual missions; Figure 3. Annual precipitation deviations
1960-1991. from norms; Sudbury airport 1956-1990

200

E
E
I/)
100
E
(;
I:
E o
g
~
::s
t::
[-100
CD
o

-200

1960 1970 1980 1990 56 61 66 71 76 81 86 91

Figure 4. Gran alkalinity versus -[H+ ] Figure 5. Equivalent versus measured


with LMS regression line. conductance, 19n-1992.

-5r---------------------~
115
E
-15 Q.
~ 105
~
I:

~ 95
"8
I:

E 85
CD
"ii
>
'3
ti1 75
-45

-50 -45 -40 -35 -30 -25 -20 -15 -10 -5


-[H+ 1/Jeq/L Measured conductance /JS/cm
290 B. A. BODO AND P. 1. DILLON

between the two alkalinity measurements is less precise than their respective rela-
tions with hydrogen ion; however, with ALKT as independent variable, the intercept
and slope coefficients [-31.2, .731] are remarkably close to the theoretical approxi-
mation [-31.1, 1.0] of Harvey et al. (1980, Appendix V) for acidified Precambrian
Shield waters.

Electroneutrality and Equivalent Conductance


By electroneutrality, C+ = C- where C+ and C- are the respective sums of posi-
tively and negatively charged species, and ANC may be defined as ANCE = CB-CA
where CB are base cations and CA are strong acid anions. For Clearwater Lake,

C+ = [H+] + [Ca2+] + [Mg2+] + [Na+] + [K+] + [NHt] + [Alm+] + [Fe2+] + [Mn2+]


= [H+] + CB (2a)
and
C- = [SO!-] + [NO;] + [CI-] + [F-] = CA (2b)

where all [.] are analytical total concentrations in p.eq/L except [Alm+] which should
be total monomeric Al concentration. Available data for total aluminum; however,
supplementary speciation data suggest that most Al in Clearwater Lake samples is
inorganic monomeric. In (2a), net charge m of monomeric Al species is unspecified.
According to speciation models (e.g., Schecher and Driscoll, 1987), below pH 4 most
monomeric Al exists as AIH. As pH rises from 4 to 5 - the change in Clearwater
Lake from 1977-1992 - the AlH fraction gives way to complexes that yield a net
Al charge approaching 2. A comprise value of m = 2.5 was selected.
Data quality was investigated via the relationships between C+ and C-, and
between ANCE and titrated ALKG. A set of 172 'complete' ionic records for 1977-
1992 was prepared by substituting time series model estimates (see Trend Analysis
section) for missing observations if no more than two of the major species (Ca2+,
Mg2+, Na+, CI-, SO!-) were absent. Before 1983, fluoride was estimated as [F-] =
1.57 + .0181 [SO;-] where [·1 are p.eq/L. Given the numerous potential sources
of measurement error, the C1earwater charge balance for 1977-1991 shows good
agreement with only a few outliers. Because ANCE estimates at Clearwater Lake pH
comprise largely ·the cumulative measurement error of 13 ionic species, the ANCE-
ALKG relationship was not especially insightful.
Equivalent conductance (Laxen, 1977) was calculated for 167 'complete' 1977-
1992 ionic records with concurrent specific conductance measurements. Figure 5,
the plot of equivalent conductance against independently measured conductance
with robust LMS regression line, shows good agreement but for several, mostly pos-
itive, outliers occurring mainly in 1977 when pHs fell to about 4, the lowest recorded
observations. Were true pH 4.2, an incorrect measurement of 4 would overestimate
[H+] equivalent conductance by 13 p.S/cm which would explain the conductance dis-
crepancies of 1977. Regression of III records for 1981-1992 reveals that since 1981,
about 3.6% of ionic records contain potentially erroneous measurements; however,
in terms of total measurements performed, the error frequency is well below 1%.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 291

TEMPORAL TREND ANALYSIS


Trend analyses of the Clearwater Lake records were accomplished by robust, graph-
ically oriented procedures for characterizing nonmonotonic (reversing) trends in
irregular quality time series with seasonal periodicity (Bodo, 1991). Data series are
conceptualized as either the linear additive processes

Zi = Ti + Ci (3a)
Zi = Ti + Si + C; (3b)

where index i corresponds to sample time ti and Zi is the data series comprising
trend Ti representing the temporally local level of the constituent, optionally sea-
sonal Si representing stable annually recurrent phenomena, and a residual noise Ci.
Zi may be either raw concentrations G;, or an appropriate skew reducing transform
of the raw series. Time trends in the sense of changes in Ti over time and the signifi-
cance of seasonal cycle Si are judged by visual inspection of graphics and statistical
diagnostics including the seasonal Mann-Kendall test (Hirsch et al., 1984).
Often, it is most expedient to initially assume seasonal adjustment model (3b)
which is solved iteratively for Ti and Si, and revert to the simple model (3a) if
seasonal effects are negligible. With model (3b) plots are generated of (a) trend Ti
superimposed on the seasonally adjusted series Zi-S;, and (b) the relative seasonal
Si against the de-trended series Zi-Ti. Numerically, T; is fit by modified LOWESS
smoothing filters (Cleveland, 1979) controlled to approximate short term year-to-
year trends. Better results for a few variables (DOC, Secchi depth, Chlorophyll
a) were obtained with a companion algorithm that fits Ti as the calendar year
medians of de-seasonalized data in which case Ti plots as a step function. Seasonal
Si is a zero centred function fit by a fixed bandwidth running median filter (Bodo,
1990) applied to de-trended series reordered by date within the year. Filter design
minimizes distortions introduced by seasonally variable sampling density.

Trend Results: pH, SO!- and related minerals


Increasing pH and declining SO!- reported by Dillon et al. (1986) to the end of
1985, have continued (Figure 6). Over 1977-1987, SO!- declined at a net rate of 23
p.eq/L/annum, while pH rose at a net rate of .06 units per year. After the 1986/87
drought, trends reversed during 1987/88, and since have continued as before. Sul-
phate seems to be declining at the same rate as over 1978-1987, but the rate of
pH increase accelerated significantly in 1991; so that, the final May 1992 sample
exceeded 5 for the first time in historical record. Total and Gran alkalinity behave
almost identically to pH. Though hydrologically less severe, drought acidification
effects during 1976/77 seem to have been stronger than in the 1987/88 episode.
Greater dry acid deposition from local and remote sources over 1976/77 was likely
responsible. Suspiciously low 1977 pH readings exaggerate effects on pH.
Declining acid deposition should decelerate chemical weathering of the water-
shed's alumino-silicaceous terrain that should yield reduced aqueous levels of pri-
mary base cations (Ca2+, Mg2+, Na+, K+), Al, Si, and Mn (Dillon, 1983; Jeffries
et al., 1984) that mimic the SO!- trend. Neutral road salt (NaCI, CaC12 ) applica-
tions have masked the expected base cation trends, but a plot (Figure 7) of base
292 B. A. BODO AND P. 1. DILLON

a
cations adjusted by subtracting chloride equivalence, i.e, C = [Ca2 +)+[Mg2+)+
[Na+)+[K+)-fCI-), mirrors the SO~- trend including the 1987/88 drought-induced
reversal. Simiiar trends are strongly evident in Al and Si, and weakly in Mn and F.
Collective ionic trends between 1977/78 and 1990/91 are summarized in Figure 8.
Substantial decreases in H+, Alm+ and SO~- charge equivalence are compensated
by road salt constituents Ca2+, Na+ and CI-; so that, ionic strength is virtually
identical for the two periods.
Trend Results: Heavy Metals
Heavy metals Cu, Fe, Ni and Zn have been significant components of Sudbury
smelter emissions (Hutchinson and Whitby, 1977; Chan et al., 1984). While Cu, Fe
and Ni are emitted as coarse particulate that deposits near the source, Zn emissions
are fine particulates prone to wider dispersal. Concern focuses mainly on 'toxic
metals' Cu and Ni which are present at significantly elevated levels in precipitation
near the Sudbury smelters (Jeffries and Snyder, 1981; Chan et al., 1984). Cu and
Ni trends (Figure 9) generally parallel the declining SO~- levels except for the
1987/88 drought-induced reversal. Recent 1990/91 data show substantial decreases
in lake concentrations to 15-20 p.g/L Cu and 70 p.g/L Ni; however, since 1989,
measurements are too few to consider these encouraging results conclusive. Zn
shows similar but less dramatic decline as levels have fallen from near 50 p.g/L in
1976 to 10-15 p.g/L in 1991. Other than a late 1970s decline that may be related to
reduced smelter emissions, Fe shows neither appreciable trend nor correlation with
any other time series.
Trend Results: N, P, DOC, Chlorophyll and Secchi depth
Trends for total N [TN~TKN+NOS"), organic N [ON=TKN- NHt), and inorganic
N [IN=N0S" +NHtJ are overlaid on Figure 10. Over 1973-1985, TN fell from 250 to
100 p.g/L, driven by declining IN as NOs dropped from 120 to 25 p.g/L and NHt
fell 50 to 15 p.g/L. The cause is unclear as N has not been associated with smelter
emissions (Chan et al., 1984) and no significant long range emission or deposition
trends have been noted since the early 1970s (Dillon et al., 1988; Husar et al.;
1991, OMOE, 1992). From 1984, TN climbed back to over 150 p.g/L mostly due
to a rise in ON to 100 p.g/L over 1989/90 indicating increased biological fixation
that declined somewhat in 1991. Organic N correlates strongly with DOC whose
1981-1987 median level of .4 p.g/L, rose to .75 p.g/L over 1988-1991, a small but
statistically significant increase (Figure 11).
Phosphorus has shown little significant trend except for a decline from pre-
1979 mean of 4.5 p.g/L to a mean of 2.7 p.g/L since then that is likely the artefact
of imp-roved measurement technology. Echoing recent organic N trends, after the
1986/87 drought P rose slightly in mean to 3.3 p.g/L over 1989/90 and declined
to 1.7 p.g/L in 1991. Except for a 1973 high of 1.25 mg/L and a 1991 record low
of .2 mg/L, annual median chlorophyll a concentrations have varied from .5-.85
mg/L about long term mean .66 mg/L with no appreciable trend. Historically,
annual median Secchi depth varied in the range 7-10 m about long term mean 8.6
m without perceptible trend; however, the 1991 annual median rises to 11.1 m,
matching a previously recorded high in 1973. Collectively, this suite of variables
suggests that, in the late 1980s, there was a modest increase in primary biological
productivity that may have fallen off abruptly in 1991.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 293

Figure 6. pH and 804 (dashed line) Figure 7. Chloride adjusted base cation
trends 1973-92. & aluminum (dashed line) trends.
r-----------------~1650
~r-----------------~600

5.0 \

',\ \ \
550
.
\
'.
"
4.6 o:::!.
...: \- Ii"
450 ::L
C5
en
\
"
4.2
'. -, \"\ " , 350

3.8 250 200 0


72 76 80 84 88 92 72 76 80 84 88 92

FiRure 8. Ion distribution diaRram; Figure 9. Ni and Cu (dashed line)


1977-78 and 1990-91. trends 1975-92.
600-
AI
01

,.,
350
H
500-
Na 01
Na t
400- 90

Mg Mg
..J
IT 300-
~ 804

200- Oa 50
804
150
Oa
100-

0- 50 10
1977-78 1990-91 72 76 80 84 88 92
294 B. A. BODO AND P. J. DILLON

Figure 10. Organic, inorganic & total FiQure 11. DOC trends 1981-92.
Nitrogen trends 1973-92.
2~~~~~~~~~~~-----; 1.2.---------------,

......TN
xx
200
0.9 x _Xli(

1~
...J
:::!. Cio.6
g E
100

0.3

xx xx x x

x x
xx

FiQure 12. Total chlorophyll a trends Figure 13. Inorganic N seasonal plot.
1973-92.
60
x

..
-....
x

!IE lE ...J
•• •
x
x
Ci
~
40
• •
x ~

)( )(
:J
x 1::
xx x
1.0
xx IiCD
...J
0; "o
C

I
E
..••••,.•• -.
U
()

~-40
• ••
.... ' .....
• ••• •
• •• I "


~L-~...J..~~~~L-~...J..~....J
0.0 0.2 0.4 0.6 0.8 1.0
0.1 l...I...-'-'-...i....I.............J.....'-'-.i-.l..-'-'-...I...J..................~
72 76 80 84 88 92 Seasonal date as fraction of year
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 295

SEASONAL CYCLES
For most Clearwater time series, strong cnronological trends overwhelm season-
ality; nonetheless, examination of Si provides some useful insights into seasonal
biogeochemical cycling. Figure 13 shows the strong, distinctive seasonal function
for inorganic N that peaks between mid March and mid April (.2 yr) just before
ice break-up, drops quickly to the end of May (.4 yr) just before stratification, de-
clines more gradually to its annual minimum near the end of August (.7 yr), and
then rises steadily to the late winter peak. The declining phase reflects biomass
assimilation during the growing season, and the rising phase reflects IN release by
decaying organic debris. The rapid loss of IN between ice break-up and stratifica-
tion is mainly as NHt which then remains relatively stable until autumn overturn.
In contrast, N0 3 declines gradually from ice break-up to late August. Similarly,
NHt was preferentially consumed over N0 3 early in the growing season during
fertilization experiments (Yan and Lafrance, 1984) in downstream lakes. The net
IN seasonal amplitude for 1973-1992 is 60 ""giL to which NH+ and N0 3 contribute
equally. Seasonal amplitude of IN species is most likely level dependent. Analysis
of the log transformed IN series suggests that at recent IN levels of 50 ""giL the
expected seasonal amplitude is about 35 ""giL.
Other variables also show perceptible seasonal variation. The mineral ions
(CaH , MgH, Na+, K+, Mn H , 80~-, Cl-, F-) and conductance exhibit a uni-
modal pattern with a broad peak extending from ice formation (mid December) to
break-up (mid March to mid April) followed by rapid decrease to the annual low
about mid May that is likely the result of dilution by low ionic strength snowmelt
water. Though the net amplitude is only .4 mglL, silica has a strong cycle with a
broad peak from ice break-up to stratification followed by a decline to a late sum-
mer low. Aluminum has shown a generally similar pattern. Fe has weak bimodal
seasonality with a first higher peak that occurs after ice break-up and persists into
stratification before falling to a mid summer low followed by a smaller secondary
peak at autumn turn-over.

SMELTER S02 EMISSIONS AND CLEARWATER LAKE STATUS


The link between lake status and smelter 80 2 emissions was examined with sim-
ple regression analo~ues of time series transfer function models of form (a) Yi =
ao +aIXi-k, and (b) Yi = f30 +f3IYi-1 +f32Xi where Xi is either 802 orloglo S02
emissions lagged from k = 0,2 years, and Yi is annual median H+, pH or SO!-.
For model (a), the current year's S02 emissions gave the best overall predictive
capability; however, predicted lake response disagreed significantly with observed
response for 1977 - a year of drought effects and suspiciously low pH data - and
the smelter shut down years of 1978/79 and 1982/83. Using the previous year's 80 2
emissions as independent variable gives better predictions during smelter shut down
years, but the quality of predictions declines in other years. In particular, lake pH is
increasingly underpredicted since 1985. Regression with both current and previous
years' 80 2 emissions is not significant. Model form (b) using the previous years
lake response as an independent variable yields better results, but again, abnor-
mal years were identified: (1) 1974 for which surficial grab sample mean pH and
80~- concentrations from OMOE (1982) were employed, (2) 1978 for which pH is
underpredicted due to suspiciously low 1977 pH, (3) 1987 for which pH and SO~-
296 B. A. BODO AND P. 1. DILLON

are poorly predicted due to drought effects, and (4) 1991 for which pH is underpre-
dicted. Forecasts of future steady state lake response under constant annual target
Sudbury smelter S02 emissions of .47x10G t degenerate to implausible predictions
2-6 years forwards of the 1990-91 Clearwater Lake median concentrations used as
initial values.
At 19 years, annual lake concentration series are too brief to reliably fit a
forecasting model; however, some tentative conclusions emerge. To varying degrees,
statistical model residuals and plots of lake concentrations against S02 emissions,
suggest that long term decline in Clearwater H+ and SO!- levels has been greater
than expected if lake status were governed by simple linear dynamical response to
local S02 emissions. Aerometric and bulk deposition data of the late 1970s (Scheider
et al., 1981; Chan et al., 1984) suggested that beyond the immediate vicinity of the
smelters (>5 km, Jeffries, 1984) acid deposition was dominated by remote sources.
Thus regression results suggest that the much of the drop in Clearwater SO!- and
H+ over 1973-1985 was attributable to general decline in North American S02
emissions over 1970-1982. Reasons for the continuing recent decline in Clearwater
SO!- and H+ cannot be assessed until concurrent bulk deposition figures become
available at sites remote from Sudbury smelters.

SUMMARY
Clearwater Lake continues to respond favourably to declining acid deposition from
local and remote sources, and declining heavy metal emissions from local smelters.
In May 1992, a pH reading above 5 was observed for the first time in recorded
water quality history and the lake is poised to experience significant biological re-
covery as further emission controls are implemented. Declining levels of chloride
adjusted base cations, aluminum, silica, manganese, and fluoride confirm that min-
eral weathering rates have decelerated. Concentrations of toxic metals Cu and Ni
have fallen appreciably over 1990/91. Since 1988, a small surge in biological activ-
ity occurred that appears to have declined abruptly in 1991 as indicated by DOC,
organic N, P, chlorophyll and Secchi depth data. Though droughts of 1975/76 and
1986/87 induced brief reversals of de-acidification trends, Clearwater Lake is rela-
tively drought resistant. Comparable data for neighbouring Swan Lake (Keller and
Pitibaldo, 1992) reveal that some Sudbury area waters may remain at serious risk
from episodic drought induced re-acidification and metal toxicity for some time after
acid emission targets are achieved. Clearwater Lake acid-base status has improved
disproportionately relative to local smelter S02 emission reductions supporting in-
dications that remote source acid deposition is an important determinant of surface
water status in the Sudbury area and that further improvements depend on re-
duced acid deposition from both local and remote sources. Maintaining adequate
surface water monitoring in an era of severe fiscal restraint presents an immediate
challenge. Metal analyses for 1990/91 are so sparse that the ability to characterize
ambient levels and time trends within practical time horizons is severely jeopardized.
With its unique long term record of de-acidification processes in an unmanipulated
headwater catchment, Clearwater Lake ranks foremost among Sudbury area sites
for continued surveillance to judge the success of remedial actions implemented in
Canada and the U.S. through the forthcoming decade.
DE-ACIDIFICATION TRENDS IN CLEARWATER LAKE NEAR SUDBURY 297

ACKNOWLEDGEMENT
The first author's efforts were supported by a research grant from Limnology Sec-
tion, Water Resources Branch, Environment Ontario.

REFERENCES
Bodo, B.A. (1989) "Robust graphical methods for diagnosing trend in irregularly
spaced water quality time series" , Environ. Monitoring Assessment 12,407--428.
Bodo, B.A. (1991) TRENDS: PC-Software, users guide and documentation for ro-
bust graphical time series analysis of long term surface water quality records, On-
tario Ministry of the Environment, Toronto.
Chan, W.H., Vet, R.J., Ro, C., Tang, A.J., and Lusis, M.A. (1984) "Impact oflnco
smelter emissions on wet and dry deposition in the Sudbury area", Atmos. Environ.,
18(5), 1001-1008.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter-
plots", J. Am. Stat. Assoc., 74(368), 829-836.
Dillon, P.J. (1983) "Chemical alterations of surface waters by acidic deposition in
Canada", p. 275-286, In Ecological Effects of Acid Deposition, National Swedish
Environment Protection Board, Report PM 1636.
Dillon, P.J. (1984) "The use of mass balance models for quantification of the ef-
fects of anthropogenic activities on lakes near Sudbury, Ontario", p. 283-347, In
J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New York.
Dillon, P.J., Reid, R.A., and Girard, R. (1986) "Changes in the chemistry of lakes
near Sudbury, Ontario following reductions of S02 emissions", Water Air Soil Pol-
lut., 31, 59-65.
Dillon, P.J., Lusis, M., Reid, R., and Yap, D. (1988) "Ten-year trends in sulphate,
nitrate and hydrogen ion deposition in central Ontario", Atmos. Environ., 22,901-
905.
Gorham, E., and Gordon, A.G. (1960a) "Some effects of smelter pollution northeast
of Falconbridge, Ontario", Can. J. Bot., 38, 307-312.
Gorham, E., and Gordon, A.G. (1960b) "The influence of smelter fumes upon the
chemical composition of lake waters near Sudbury, Ontario, and upon the surround-
ing vegetation", Can. J. Bot., 38, 477-487.
Gunn, M.J., and Keller, W. (1990) "Biological recovery of an acid lake after reduc-
tions in industrial emissions of sulphur", Nature 345, 431--433.
Harvey, H., Pierce, R.C., Dillon, P.J., Kramer, J.R., and Whelpdale, D.M. (1981)
Acidification in the Canadian aquatic environment, Pub. NRCC No. 18475 of the
Environmental Secretariat, National Research Council of Canada, Ottawa.
Hirsch, R.M., and Slack, J.R. (1984) "A nonparametric trend test for seasonal data
with serial dependence", Water Resour. Res., 20, 727-732.
Husar, R.B., Sullivan, T.J., and Charles, D.F. (1991) "IDstorical trends in atmo-
spheric sulfur deposition and methods for assessing long-term trends in surface water
chemistry", p. 65-82, In D.F. Charles [ed.] Acid Deposition and Aquatic Systems,
Regional Case Studies. Springer-Verlag, New York.
Hutchinson, T.C., and Havas, M. (1986) "Recovery of previously acidified lakes
near Coniston, Canada following reductions in atmospheric sulphur and metal emis-
sions", Water Air Soil Pollut., 29, 319-333.
298 B. A. BODO AND P. 1. DILLON

Hutchinson, T.C., and Whitby, L.M. (1977) "The effects of acid rainfall and heavy
metal particulates on a boreal forest ecosystem near the Sudbury smelting region
of Canada", Water Air Soil Pollut., 7, 123-132.
Jeffries, D.S. (1984) "Atmospheric deposition of pollutants in the Sudbury area",
p. 117-154, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley, New
York.
Jeffries, D.S., Scheider, W.A., and Snyder, W.R. (1984) "Geochemical interactions
of watersheds with precipitation in areas affected by smelter emissions near Sudbury,
Ontario", p. 195-241, In J. Nriagu [ed.] Environmental Impacts of Smelters, Wiley,
New York.
Jeffries, D.S., and Snyder, W.R. (1981) "Atmospheric deposition of heavy metals
in central Ontario", Water Air Soil PoI1ut., 15, 127-152.
Johnson, M.G., and Owen, G.E. (1966) Report on the biological survey of survey of
streams and lakes in the Sudbury area, 1965. Ontario Water Resources Commission,
46 pp.
Keller, W., Pitibaldo, J.R., and Carbone, J. (1992) "Chemical responses of acidic
lakes in the Sudbury, Ontario, area to reduced smelter emissions, 1981-89", Can.
J. Fish Aquat. Sci., 49(Suppl. 1), 25-32.
Keller, W., and Pitibaldo, J.R. (1986) "Water quality changes in Sudbury area
lakes: a comparison of synoptic surveys in 1974-76 and 1981-83", Water Air Soil
Pollut., 29, 285-296.
Keller, W., Pitibaldo, J.R., and Conroy, N.!. (1986) "Water quality improvements
in the Sudbury, Ontario, Canada area related to reduced smelter emissions", Water
Air Soil Pollut., 31, 765-774.
Kelso, J .R.M., and Jeffries, D.S. (1988) "Response of headwater lakes to varying at-
mospheric deposition in north central Ontario, 1979-1985" , Can. J. Fish Aquat. Sci.,
45, 1905-1911.
Laxen, D.P.H. (1977) "A specific conductance method for quality control in water
analysis", Water Res., 11, 91-94.
Locke, B.A. and Scott, L.D. (1986) Studies of Lakes and Watersheds in Muskoka-
Haliburton, Ontario: Methodology (1976-1985), Ontario Ministry of the Environ-
ment, Data Rep. DR-86/4, Dorset, Ontario, Canada.
OMOE (1982) Studies of lakes and watersheds near Sudbury Ontario: finallimno-
logical report, supplementary volume 10. Sudbury Environmental Study, Ontario
Ministry of the Environment, Toronto.
OMOE (1992) Summary: some results from the APIOS atmospheric deposition
monitoring program (1981-1988). Environment Ontario, Toronto.
Rousseeuw, P.J., and Leroy, A.M. (1987) Robust Regression and Outlier Detection.
Wiley, New York.
Schecher, W.D., and Driscoll, C.T. (1987) "An evaluation of uncertainty associated
with aluminum equilibrium calculations", Water Resour. Res., 23(4),525-534.
Scheider, W.A., Jeffries, D.S., and Dillon, P.J. (1981) "Bulk deposition in the Sud-
bury and Muskoka-Haliburton areas of Ontario during the shutdown of Inco Ltd in
Sudbury" , Atmos. Environ., 15, 945-956.
Yan, N.D., and Lafrance, C. (1984) "Responses of acidic and neutralized lakes
near Sudbury, Ontario, to nutrient enrichment", p. 457-521, In J. Nriagu [ed.]
Environmental Impacts of Smelters, Wiley, New York.
PART VI

SPATIAL ANALYSIS
MULTIVARIATE KERNEL ESTIMATION OF FUNCTIONS OF SPACE
AND TIME HYDROLOGIC DATA

U. LALL. Utah Water Research Laboratory. Utah State Univ .• Logan'UT 84322-8200.
K. BOSWORTH. Dept. of Mathematics, Idaho State 'Univ:, poCatello, ID 83209-8500.

A nonparametric methodology for exploring multivariate hydrologic data is presented. A


multivariate kernel estimator for estimating the joint probability density function of a set of
random variables is developed. A multivariate Gaussian kernel is used with its covariance
matrix specified through a truncated singular value decomposition of a robust, local data
covariance matrix. Estimators for conditional probabilities and expectations are also
presented. An application to data from the Great Salt Lake is presented.

INTRODUCTION

Statistical estimation problems of interest to hydrologists include spatial interpolation (e.g.•


of groundwater levels or rainfall), assessment of space/time trends (e.g.• in contaminant
concentration data), functional dependance between different parameters (e.g., rainfall and
runoff). and generation of stochastic time series with attributes similar to the data (e.g.
monthly streamflow into a reservoir). A basic building block for such analyses is the joint
probability density function (p.d.f.) of two or more variables.
Traditionally, hydrologists have explicitly or implicitly fit parametric (usually in a Gaussian
framework) probability models to the available data. Such an approach can be
parsimonious, expedient and efficient if the correct parametric structure is chosen
fortuitously. The latter is not easily verified. Techniques that reveal the structure of the
observed spatial or temporal field from the relatively sparse data are desirable. A refreshing
alternative to the traditional parametric approaches was provided in the applications of
nonparametric regression and density estimation to hydrologic problems by Yakowitz
(1985); Yakowitz and Szidarovsky (1985); and Karlsson and Yakowitz (1987a and b). A
perusal of the statistical literature shows that nonparametric statistical estimation using
splines. kernel functions. nearest neighbor methods and orthogonal series methods is one
of the most active and exciting areas in the field. with major developments still unfolding.
Most of the statistical literature on the subject has a theoretical flavor and is concentrated on
the univariate case. Exceptions are Silverman (1986), and Scott (1992). A pragmatic,
multivariate kernel function estimator that is effective in moderate (3 to 5) dimensional
settings for the estimation of the joint p.d.f. of a set of random variables, as well as for the
estimation of functions related to the conditional p.d.f. of a subset of the variables is
presented here.
Nonparametric estimation schemes are weighted moving averages of the data and thus
provide "local" estimates of the target function as opposed to parametric methods that are
inherently global in their assumptions and application. While the local nature of the
nonparametric estimates is attractive since it allows the procedure to adapt to the underlying
function. it also leads to their suffering from a "curse of dimensionality" in multivariate
settings. An exponentially increasing number of points is needed to densely populate
301
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 301-315.
© 1994 Kluwer Academic Publishers.
302 U. LALL AND K. BOSWORTH

Euclidean spaces of increasing dimension. Consequently, the estimation variance increases


dramatically for a fixed sample size as the dimension of the data set increases. Global
methods (e.g., generalized additive models, projection pursuit, average derivative estim-
ation, sliced inverse regression) to address this issue are reported in Scott (1992). These
methods try to project the data into a subspace of ~maller dimension" d~temlined by some
criteria. We pursue a similar strittegy here but with locally varying projections. The rationale
is that even where the underlying structure is complex and high dimensional, it may be
dominated locally by a few variables. Ability to reproduce Gaussian relationships was also
of concern. This is similar in spirit to the locally weighted linear regression approach of
Cleveland and Devlin (1988) and to the kernel estimation framework (NKERNEL) used by
NSSS (1992).
An application of the techniques that explores the inter-relationship between annual inflow
(q) into the Great Salt Lake (GSL), and its annual precipitation (p) and evaporation (e) is
used to illustrate the techniques developed. Scatterplots of p and e, q and p, and q and net
precipitation (p-e), for the 136 year record are shown in figure 1. A cubic smoothing spline
fitted to each data set, with smoothing parameter chosen by Generalized Cross Validation
(GCV, see Craven and Wahba, 1979), is also shown. The nonlinearity of these
relationships is apparent. The correlations are -0.8 for p and e; 0.15 for q and p; -0.06 for
q and e; and 0.26 for q and (p-e). All correlations and scale parameters referred to in this
paper are estimated using the robust procedures described in the Appendix. The GSL is in
an arid region where e exceeds p on the average. Based on the q and (p-e) data, one can
speculate whether there is a critical intermediate response regime related to aridity. Inflow
seems to have little dependence on (p-e) during this regime. Outside this regime, q is
responsive to (p-e). We shall revisit this question during the course of our exploration of
this data.

METHODOLOGY

Kernel density estimators (k.d.e.) can be motivated as smoothed histograms, or as an


approximation to the derivative of the cumulative distribution function of the data. A local
approximation to the underlying density is developed through a weighted average of the
relative frequency of data. This is achieved by averaging across (the convolution of) weight
functions centered at each data point. Say we have data ui £ Rd, i= 1...n. Then the k.d.e. is:
1 n -d
t(u) = nL h K«u - u. )Ih)
i=l 1

The pointwise bias and variance oft(u) depend on the sample size, the underlying density,
the scale parameter or bandwidth h, and the Kernel function. If the weight or Kernel
function K(.) is a valid p.d.f., t(u) is also be a valid p.d.f. Typical choices for K(.) are
symmetric Beta p.d.f.'s, and the Gaussian p.d.f. In terms of asymptotic Mean Square Error
(MSE) of approximation it has been shown that there is little to choose between the typically
used kernels. The critical parameter is then h, since it governs the degree of averaging.
Increasing h reduces variance of t(u) at the expense of increasing bias. Objective, data
based methods (based on minimization of a fitting metric with respect to the class of smooth
p.d.f.'s, and data based measures of their attributes, or with respect to a parametric p.d.f.
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 303

GSL Ann. Precipitation and Evaporation


0.6,--...,.--,---,---,--.....---,

0.5 ·····1)~····L···~···t·············+···········+········.....L. . . . . ..
0.4 +--l!d::-o=--+-:--r---+..--l---l
c:

i
o
0.3
~
a.. 0.2

0.1 +---;---+----+--+---+---1

1.0 1.1 1.2 1.3 1.4 1.5 1.6


Evaporation

GSL Ann. Inflow and Precipitation

·----t--~---t·--t--
10.---~------~---------,
. .

6+----+'---0-+i----'~_0-~i~-~
~ 0 ~ 0
£ 4+--~+-=n~~~~~~~----4
o 91
: 0
0,
i
2 ···~·······;······· ..········t·-······-·····
o i i
00 : \
O+-~~~~-~+-~~~--l
0.1 0.2 0.3 0.4 0.5 0.6
Precipitation

GSL Ann. Inflow and Net Precipitation


10.-----.----,----,----,----,

· ----I----r----+--t-~--
:

6 ····_·········-r-··_·_···_··r·······o·······l·················t·__············
~ i 0 o~~ i
S 1001 o~ i
4 0:
rP.
2+-_~~~~Oi:o~~~~~_~0~

O+---~--~+-~-+~--~--~
-1.5 -1.3 -1.1 -0.9 -0.7 -0.5
net precipitation

Figure 1
Relationships between GSL Data (Solid Line is a Cubic Smoothing Spline)
304 U. LALL AND K. BOSWORTH

considered as a viable choice) for choosing h are available (see Scott (1992». Minima of
MSE based criteria with respect to h, tend to be broad rather than sharp. Thus, even when
using an objective method for choosing h, it is desirable to examine the estimate at a range
of h values in the neighborhood of the optimum.
In the multivariate setting K(.), two popular choices for K(.) are :
1 n d 1 .. '
"The Product Kernel": ~(u) =Ii' LIT h K.«u.- u ..)Ih')
i=l j=l j J J 1J J
with independent bandwidths hj and univariate kernels ~(.) located at ~j; and
·1(2 n
"Sphering:" t(u) = det (S) L K(h-2(U - u/8-\u - u J )
nhd i=l 1 1

where 8 is a dxd data covariance matrix, and K(v) =(21t)d/2e-v/2; i.e. a Gaussian function.

The motivation for these choices is that the attributes of the p.d.f are likely to vary by j, and
that a radially symmetric, multivariate kernel (i.e. h invariant with j) would be deficient, in
the sense that it may simply average over empty space in some of the directions j.
Consequently, at least variation of h by j is desirable. The product kernel has the dis-
advantage that it calls for the specification of d bandwidths by the user. Where a Gaussian
kernel is used, the product kernel is equivalent to the sphering kernel with an identity co-
variance matrix. If the variables of interest are correlated, sphering is attractive since it
aligns the kernel with the principal axes of variation of the data. The advantage of this is
striking, if the rank of the data covariance matrix, 8, is r« d, i.e., the data cloud in d
dimensions can be resolved completely in r linearly independent directions. Further, the
bias oft(u) is proportional to the Hessian matrix off(u). The matrix 8 is proportional to an
approximation to the Hessian. Thus sphering can help reduce the bias in t(u) by adjusting
the relative bandwidths. Wand and Jones (1993) show that while sphering offers significant
gains in some situations, it can be detrimental where the underlying p.d.f. exhibits a high
degree of curvature, and/or it has multiple modes that have different orientations.
It is clear that it is worth exploring locally adaptive bandwidth variation. For example,
suppose the underlying p.d.f. has two distinct modes that have different directions for their
principal axes, and perhaps involve different subsets r1 and r2 of the d variables. A
reasonable adaptation of the bandwidth would be achieved by partitioning the raw data into
the 2 appropriate subsets, and using matrices 81 and 82 that are aligned with the principal
axes of the two modes. This is indeed the strategy followed here. Our k.d.e. algorithm is
outlined in Table 1. A manuscript with a more formal presentation is in preparation.
First, it is desirable to scale the data so that all variables are compatible. This can be done by
mapping each coordinate to the interval [0,1], "normalizing", i.e. subtract the mean and
then divide by the standard deviation, or through an appropriate logarithmic or other
transform.
Then the scaled data is recursively partitioned into a k-d tree (Friedman (1979». An
illustration of this process with the q, p data is shown in Figure 2. The first partition was
for p, at 0.34. The two resulting partitions were then split along q, as shown. Another
iteration takes us to the eight partitions shown, each with 17 points. Note that requiring
each partition
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 305

TABLE 1
K.D.E. ALGORITHM

1. Scale the data so that each column has the same scale
2. Partition the data ui> i=1..n into a k-d tree (npan partitions)
- Split each partition at the median of the coordinate with'the greatest spread
- Stop splitting partitions if resulting number ofpoints in the partition <mink
mink =max(d+2,..Jn)
- Define indicator function Iij=1 if ui is, or 0 if it isn't in partition j
3. Compute a (dxd) Robust Covariance Matrix Cj for each partition j (see Appendix)
4. Perform a Singular Value Decomposition (SVD) of Cj = EjAjE{
where Ej is an eigenvector matrix (EjTEj=I) and Aj is an ordered (descending)
diagonal eigenvalue matrix corresponding to Ej- All matrices are (dxd)
5. Retain the rj leading components of the decomposition, such that
r.
J d
Dn .1 ~)'n . ~ crit ,where crit is 0.95 or 0.99
1=1 J 1=1 J
6. Form Sj = EjAjEjT ,where Ej is (dxrj), Aj is (rjxrj), and EjT is (rjxd)

7. Define the multivariate p .df. estimate as:


n npart -r./2 or. T -t -2
t(u) = n- 1 I, I, I.. (21t) J h Jdet(S.) -1/2 e - 0.5(u - u j) Sj (u - Uj» h
i=l.i=l IJ J
r.
where h is a specified bandwidth, det(S.) = ITA... and S ~t=E.A,-IE~
J 1=1 J,j J J J J

8. For a partition of u defined as (y, x), where the dimension of y is q, and of x is (d-q),
the conditional p.d.f. f(ylx) is estimated as:
n npart n
t(ylx) =(I, I, I..G ..(y) i(x.) )/I,t(x.)
i=l .i=1 IJ IJ 1 i=l 1

where t(Xi) is the p.d.f. of x evaluated at Xi, and Gij(y) is a q-variate Gaussian p.d.f. with

mean Rj.
"
=y,+ S YX,J,S XX,J.(x - x,), and covariance Syy x J' =S YY,j,- SYX,j,S XX,j,S XY,j..
1
~

1 .,
~

9. The regression/unction r(x) =E[ylx], of y (q=l) on x is then estimated as:


n npart n
f(ylx) =(I, L I ..R. ,t(x,) )/I,t(x.)
i=l.i=l Ij IJ 1 i=l 1
306 U. LALL AND K. BOSWORTH

K-D Tree Development for Great Salt Lake Data


Full Sample Correlation 0.15, (136 years) 17 points Ipartition
1.0

0
0
0.8 0.55
-0.20 0
c
0 00
°ta
,
c
a
0

.(3 0.6 o 0
0 000
00 I 0
!!?
Q.

"iii
::l
j,':'c 0
0
c
c
1\1
0.4 0.2l oX; 0 0 o 0 4110
~ 0
00 0.17

"0
<II o 'b "0 OJ 0 0 0 0 0

0.0'9 0 ftQ. ... ~. ~


"iii o ~ 0
1Il 0.2 ;~ 0
0 0
000
0.23 Of$) 0 -0.20
0
I? 0 0 0
0 ~.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
scaled annual inflow

Figure 2

to have the same number· of points leads to partitions that are large in data sparse regions.
The partition variance is larger, leading to a larger effective bandwidth. A natural adaptation
of the bandwidth to tails and modes of the data is thus effected. The emphasized numbers in
each box report the robust correlation between q and p values in the box. We find, the
clustering of data shown by a k-d tree to be a useful data exploratory tool as well. Here,
note that the partition corresponding to the highest p,q values has the highest p,q
correlation; correlation varies with partition; and that most partitions have p,q correlations
that are higher than the full sample value. As the number of points in a partition approaches
the dimension of the space, d; the matrix Cj becomes singular. This brings up the need to
decide on an optimum data partitioning.
Our approach is exploratory. The number of partitions is a smoothing parameter - bias
decreases, variance increases as the number of partitions increases. Arguments for
parsimony suggest that using a minimum number of partitions. Our experiments (also,
NSSS(1992» with a variety of synthetic data, suggest that a value of mink greater than d,
and somewhere between vn and n4/(4+<1) works well. Since the variation in mink is by
factors of 2, consistent results upon partitioning are significant. So the strategy is to form a
sequence of estimates, starting with the full sample.
A robust covariance matrix is computed for each partition, and a truncated singular value
decomposition of the matrix is performed. The resulting matrix Sj is then used for
multivariate k.d.e, as outlined. Robustness is important, since the effect of an outlier is
pronounced as the sample size shrinks upon partitioning. It can force the covariance matrix
to be near singular, and lead to the specification of a wild eigenvector sequence, and hence
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 307

of kernel orientation.
One can choose the bandwidth h "automatically" by optimizing a criteria such as maximum
likelihood or mean integrated square error (MISE) through cross validation. However, it is
known that such estimators have very slow convergence rates (O(n-l/l~ for d=I), high
variance (induced by fme scale structure), and are prone to undersmoothing. The recent
trend in the statistical literature (Scott (1992» is to dev.elop data based in~tl)ods for choosing
h, that use recursive estimation of the terms (involving derivatives of f(u» needed in a
Taylor series expansion of MISE, and thereby developing an estimate of the optimal h. In
the univariate case, this can get a close to theoretically optimal convergence rate of O(n- 1f2).
Similar developments in the multiva.tiate case are forthcoming. In the data exploration
context, choosing h by reference to a known parametric p.d.f. is reasonable. It has
Bayesian connotations - correct choice under proper specification, and adaptation under
mild mis-specification. Since we are interested in discerning structure in the data, it is
desirable to oversmooth to avoid focusing on spurious structure. Choosing h by reference
to a Gaussian p.d.f. is attractive in this context, since for a given variance, one would
oversmooth relative to a mixture, and the multivariate framework is well known. In our
context, Silverman (1986), gives the MISE optimal h with reference to a multivariate
Gaussian as:
1/(d+4) -1/(d+4)
h
opt
={4/(d+2)} n

It was pointed out earlier that the number of partitions is a smoothing parameter as well. We
have not yet determined the optimal h taking that into account The use of mink, instead of n
in the above has been used with some success. NSSS (1992) uses a strategy similar to our
kd.e., and simply takes h=l, regarding the estimator as a convolution of Gaussians with
locally estimated covariance structure. At this point, recall the insensitivity of MISE to h,
and note that increasing h, or reducing variance smooths out structure and vice versa,
without affecting MISE very much. If the goal is data exploration, varying h and noting
data structures resistant to h variations is desirable. We shall illustrate the effect of such
choices by example in the next section.
The conditional p.d.f. estimator (item 8, Table 1) is based on a weighted convolution of
conditional p.d.fo's centered at each observation, with weights proportional to the estimated
density at the point. One can also estimate this as t(u)h(x), or equivalently by taking the
appropriate slice out of an estimate t(u) and normalizing it.
The regression estimator (item 9, Table 1) is presented in Owosina et al (1992), and
compared with other nonparametric regression schemes for spatial estimation. NKERNEL
by NSSS(1992) has the same framework as described here, except for data partitioning,
and treatment of the covariance matrix (no SVD, no robustness).

APPLICATIONS

Selected kd.eo's for the data set introduced earlier are presented in figures 3 through 8. In
each case, the variable referenced first is on the x-axis, and second on the y-axis. The
kd.e. estimate of the p and e p.d.f. (fig. 3), with I partition and h=O.4 (Gaussian ref-
erence), appears consistent with a bivariate Gaussian density with a correlation coefficient
of -0.8. The k.d.e. of the q and p p.d.f. (fig. 4) was constructed with 8 partitions (as in fig.
2), and h=1 (oversmoothed), clarifies the features apparent from the clusters in figure 2.
308 U. LALL AND K. BOSWORTH

P
d

Evaporation

0.00 0.10 0.20 0.30 0.40 O.SO 0.60 0.70 0.80 0.90
Precipitation

Figure 3
Joint p.d.f. ofp and e, npart=l, h=O.42
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 309

0.96

Precipitation

P
70 r
e
c
p
i
t
a
t
i
0
n

10

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
Inflow

Figure 4
Joint p.d.f. of q and p, npart=8, h=O.42
310 U. LALL AND K. BOSWORTH

K.d.e.'s of the p.d.f. of q, p and e were also constructed and evaluated along a slice
defined through the line segment between (0,0.72) and (0.72,0) on the p,e axis (i.e. scaled
p-e=-O.72). This line segment corresponds approximately to the principal axis of the p, e
p.d.f. in figure 3. Figure 5 was constructed with 1 partition and a h chosen by reference to
a Gaussian p.d.f.. The kernel orientation and the bandwidth are global~y prescribed. We
see two weakly separated modes in the conditional p.d.f. in contourS that suggest a
skewed, p.d.f., with principal axes consistent with the eigenvalues and eigenvectors of the
(q,p,e) covariance matrix. In figure 6, we worked with four partitions, and h=1. With this
modification we see one mode in the p.d.f., but more complex structure in the joint p.d.f.
than in figure 5. Finally if' Figure 7, we worked with 4 partitions, but with h=O.4
(Gaussian reference). The structure of figure 6, has now sharpened into 2 modes with
major principal axes that are nearly orthogonal to each other. This is consistent with figure
1, where we saw low dependence between q and (p-e) in the middle range of the data. The
antimode between may reflect an instability in the q, (p-e) relationship, speculated about in
figure 1.
Correlation between q and the other variables is weak, and the n is small for estimating a
trivariate k.d.e., let alone its conditionals. The correlation between p and e is relatively
high. One would expect k.d.e. to perform poorly for conditionals of q on the other
variables. A direct k.d.e. of q and (p-e) (figure 8) is similar to the p.d.f. for the slice from
(q,p,e) for npart=1 and 4 (figures 5 and 7), h by Gaussian reference, but d=2, instead of 3.
The purpose of this application has been to illustrate k.d.e's potential for revealing the
underlying structure in hydrologic processes, as an aid to better understanding. We find the
nonparametric estimates (regression in fig.l, scatterplot organization in fig. 2 and p.d.fs in
the others) to be very useful tools for data exploration. The application shows the effect of
varying the bandwidth and the number of partitions on the resulting k.d.e. As expected
partitioning affects the orientation of the kernels, as well as the degree of local smoothing
and hence of the resulting p.d.f., while the bandwidth controls the degree of smoothing and
the ability to see modes. Varying both, clarifies the underlying structure.

SUMMARY

The utility of multivariate k.d.e. for using data to improve our understanding of hydrologic
processes is obvious. Model specification, as well as estimation of hydrologic variables can
be improved. Pursuit of k.d.e. to generate vitriolic debates on the multimodality of a data
set, or to justify one's favourite parametric p.d.f is counterproductive. Choices of models
and estimation procedures are always subjective at some level. The spirit of k.d.e is to
highlight features of the data set at the expense of estimation efficiency (in the classical
parametric sense). Clearly artifacts of the particular realization at hand are likely to be
highlighted as well as genuine underlying features. Implementations of multivariate k.d.e.
need to be carefully designed to balance such a trade-off.
The k.d.e. algorithm presented here was shown to be effective in circumventing the
detrimental effect of sphering with heterogeneous data as shown by Wand and Jones
(1993). The p.d.f. in figures 5 through 8 appears to be precisely of the type that would be
obscured by global sphering. We noted that was the case, and that the structure was
resolved upon partitioning. Further work on improving parameter specification is needed
and is in progress.
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 311

/'
//~ 1 .

25.00 /
22.50~
20.00-i
17.50 1
15 .00 ~

c
e

J-I-'-4-..L.1-LLj-.l....!....I....I-I-.Ll....L...LO.OO
0.00 0.21 0.42 0.62 0.83
Inflow

Figure 5
Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=1, h=O.42
312 U. LALL AND K. BOSWORTH

0.85
Inflow 042 0 .64
~0 0.21' p.e slice
o .00

p
+ H-+t-t-1H-++++ 0.64 ~

s
->-H-+t-t-1H-+O.42 I
c
e

0.00 0.21 0.42 0.62 0.83


IDflow

Figure 6
Joint p.df. of q and slice of p,e along (0.72) to (0.72,0), npan=4,h=1
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 313

60.00
54.00
48.00
42.00
~ 36.00
~ 30.00
~ 24.00
l I8.00
~ 12.00
L6.00

0.85
0.64
Inflow
:;:3 0.21 0 .42 p,eslice
0 0.00

++-+-H++-+-H+ +++0.85

p.e slice

0.00 0.21 0.42 0.62 0.83


Inflow

Figure 7
Joint p.d.f. of q and slice of p,e along (0,.72) to (.72,0), npart=4, h=O.45
U. LALL AND K. BOSWORTH
314

1-+-+++4--1-4+-1--W 0.8

(p-e)

o 0.2 0.4 0.6 0.8


Inflow

o 0.2 0.4 0.6 0.8


Inflow

Figure 8
Joint p.d.f. of q and (p-e), (a) npatt=l, h=O.42, (b)npart=4, h=O.42
FUNCTIONS OF SPACE AND TIME HYDROLOGIC DATA 315

APPENDIX

Robust estimators of scale and covariance as suggested by Huber [1981] are used. The pair
wise covariance Cij is computed as rijtitj' where ti is a robust estimator of standard
deviation obtained as 1.25(Mean Absolute Deviation of ui)' and rij is a robl,!st estimator for
correlation, given as:
2 2
t[(au.+bu.) ] - t[(au.-bu) ]
1 J 1 J
riO = 2 2
J t[(au.+bu.)] + t[(au.-bu.) ]
1 J 1 J
where a = 1/ti and b = 1/tj. Huber indicates that this estimator has a breakdown point of
1/3, i.e. up to 1/3 of the data can be contaminated without serious degradation of the
estimate.

ACKNOWLEDGEMENTS

The work reported here was supported in part by the U.S. Geological Survey through
Grant No. 14-08-0001-G1738 and in part through his 1992-3 assignment with the Branch
of Systems Analysis, WRD, USGS, National Center, Reston VA, while on sabbatical
leave.

REFERENCES

Cleveland, W. S. and S. J. Devlin. (1988). "Locally weighted regression: an approach to


regression analysis by local fitting." JASA 83(403): 596-610.
Craven, P. and G. Wahba (1979). "Smoothing noisy data with spline functions."
Numerical Mathematics 31: 377-403.
Friedman, J. H. (1979). "A tree-structured approach to nonparametric multiple regression."
Smoothing Techniques for Curve Estimation 757: 5-22.
Huber, P. J. (1981). Robust Statistics. New York, John Wiley.
Karlsson, M. and S. Yakowitz (1987a). "Nearest neighbor methods for nonparametric
rainfall-runoff forecasting." Water Resources Research 23(7): 1300-1308.
Karlsson, M. and S. Yakowitz (1987b). "Rainfall-runoff forcasting methods, old and
new." Stochastic Hydrol. Hydraul. 1: 303-318.
N-SSS (1992). N-Kernel User's Manual, Non-Standard Statistical Software, Santa
Monica, CA
Owosina, A, U. LaB, T. Sangoyomi, and K. Bosworth, (1992) Methods for Assessing
the Space and Time Variability of Groundwater Data. NTIS 14-08-ooo1-G1738.
Scott, D.W.(1992). Multivariate Density Estimation, John Wiley and Sons, New York.
Silverman, B. W. (1986). Density Estimation, Chapman and Hall, London.
Wand, M.P. and M.e. Jones (1993). "Comparison of smoothing parametrizations in
Bivariate Kernel Density Estimation." JASA 88(422):520-528.
Yakowitz, S. J. (1985). "Nonparametric density estimation, prediction, and regression for
Markov sequences." JASA 80(389): 215-221.
Yakowitz, S. 1. and F. Szidarovsky (1985). "A comparison of Kriging with nonparametric
regression methods." Journal of Multivariate Analysis 16(1): 21-53.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR
PRECIPITATION ANALYSIS

J. SATAGOPANl and B. RAJAGOPALAN2


1 Department of Statistics, University of Wisconsin, Madison, WI 53706
2Utah Water Research Laboratory, Utah State University, Logan, UT 84322

Precipitation data from Columbia River Basin was analyzed using different spatial esti-
mation techniques. Kriging, Locally weighted regression (lowess) and Smoothing Spline
ANOVA (SS-ANOVA) were used to analyze the data. Log(precipitation) was considered as
a function of easting, northing and elevation. Analysis by kriging considered precipitation
only as a function of easting and northing. Various quantitative measures of comparisons
were considered like maximum absolute deviation, residual sum of squares and scaled vari-
ance of deviation. Analyses suggested that SS-ANOVA and lowess performed better than
kriging. Residual plots showed that the distribution of residuals was tighter for SS-ANOVA
than for lowess and kriging. Precipitation seemed to have an increasing trend with elevation
but seemed to stabilize after certain elevation. Analysis was also done for Willamette River
Basin data. Similar results were observed.

INTRODUCTION
Spatial estimation of precipitation is of fundamental importance and a challenging task in
hydrology. It has significant application in flood frequency analysis and regionalization of
precipitation parameters for various watershed models.
The irregularity of sampling in space and the fact that precipitation exhibits substantial
variability with topography (i.e nonstationarity) makes the spatial estimation task more
difficult. Kriging is the most popular geostatistical techniques used by hydrologists for
spatial estimations. It assumes a priori specification of the functional form of the underlying
function that describes the spatial variation of the parameter of interest. Most often, this
assumption is never satisfied in nonstationary situations, resulting in possible errors in
the estimates. Akin (1992) has extensively compared kriging with other nonparametric
techniques on a large number of data sets, and found that kriging was inferior to all the
other methods. Yakowitz and Szidarovszky (1985) compared the theoretical properties of
kriging and kernel functions estimation and gave comparative results from Monte Carlo
simulations for one and two dimensional situations. The kernel estimator was superior in
their theoretical and applied analyses. These serve as a motivation for our exploratory data
analysis.
In tIllS paper, we present results from preliminary analysis of precipitation data from
mountainous region in Columbia River Basin for the relative performance of three methods
for spatial interpolation. The methods considered are kriging, locally weighted regression
(lowess) and smoothing spline analysis of variance (SS-ANOVA). The rest of the paper is
organized as follows. A brief discussion on kriging, SS-ANOVA and lowess are presented
first followed by a note on the study area, data set and statistical models. Comparative
317
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 317-330.
© 1994 Kluwer Academic Publishers.
318 J. SATAGOPAN AND B. RAJAGOPALAN

results and discussion are presented in the end.

KRIGING
Kriging is a parametric regression procedure due to Krige (1951) and Journel (1977). It
has become synonymous with geostatistics over the last decade and represents the state of
the art for spatial analysis problems. Isaacs and Srivastava (1989) present a comprehensive
and applied treatment of kriging, while Cressis (1991) provides a comprehensive treatment
that covers much of the recent statistical research on the subject. Most of the work has
been largely focused on ordinary kriging. The model considered is

y=f(x)+€ (1)
where the function f(x) is assumed to have a constant but unknown mean and stationary
covariance, y is the observed vector and € is the vector of LLd noise. Most often the
assumptions for the function f are not satisfied, especially in the case of mountainous
precipitation. In our data analysis we have looked at ordinary kriging only. Cressie (1991),
Journel (1989) and de MarsHy (1986) have detailed discussions on the various types of
kriging and their estimation procedures as applied to different situations.
Kriging is an exact interpolator at the points of observation, and at other points it
attempts to find the best linear unbiased estimator (BLUE) for the underlying function
and its mean square error (MSE). The underlying function f( x) is assumed to be a random
function . f( x) and f( x + hx) are dependent random variables leading to ergodicity and
stationarity assumptions. The kriging estimate !k of f is formed as a weighted linear
combination of the observations as
n
fk(XO) =L )..o;y; (2)
;=1
where the subscript k stands for kriging estimate. The weights are determined through a
procedure that seeks to be optimal in a mean square error sense. The weights relate to the
distance between the point at which the estimate is desired and the observation points, and
to the degree of covariance between the observations as a function of distance, as specified
by the variogram r(h). The variogram is given as

r(h) = Var(y(x) - Cov(y(x),y(x+ h» (3)


where h is the distance.
The weights )..0; are determined by solving the normal equations for kriging which are
n
L)..ojr(x; - Xj) + Jt = rex; - xo), i = 1,·· ·,n (4)
j=l

(5)
;=1
where JL can be interpreted as a Lagrange multiplier for satisfying the constraint that the
weights sum to unity, in an optimization problem formed for minimizing the mean square
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 319

error estimation. AOi'S are obtained by solving the above two equations. The ideas of
Gaussian linear estimation are thus implicit in the kriging process.
The MSE of the estimator II< is given by Cressie (1991) as
n
MSE(j,,(xo)) = EAoir(Cxi) + I' (6)
i=1

where r( 6Xi) is the variogram and 6Xi = Xi - Xo.


The above estimation procedure is under the presumption that the variogram is a known
function. In practice, the variogram is never known a priori. In reality the observations are
unequally spaced. Hence a direct estimate of r( h) from the data is not feasible. Therefore,
the data are grouped into distance categories and a parametric function (e.g exponential or
spherical) is fit to the estimated or raw variogram. This is called variogram fitting and
is the central issue in kriging. Fitting the variogram is the most difficult and important
part of kriging, more so in case of nonstationarity. Lack of objective methods to fit the
variogram result in a poorly fit variogram and consequently the estimates are likely to be
significantly in error. For details on variogram fitting we refer the reader to Cressie (1991).
Yakowitz and Szidarovszky (1985) argue that there is no consistent variogram estimator,
even for the case where the data are noise-free. Wahba (1990) also shows that no consistent
estimators of the variogram parameters from the data are readily available as part of the
kriging estimation process. Journel (1989) has discussions on the demerits of kriging, and
stresses that stationarity assumptions are made for ease of analysis and are not necessarily
properties of the process studied. Akin (1992) has studied kriging with known data sets
and also groundwater data and found that kriging performed very poorly in almost all the
cases as compared to other techniques. Bowles et al., (1991) compared kriging with thin
plate smoothing splines on precipitation data from mountainous region and made similar
inferences. These results support argument by Yakowitz and Szidarovszky (1985).
Universal kriging, Co-kriging and the Intrinsic random function hypotheses attempt
to deal with non-stationary situations, but fitting variograms in these cases is even more
tenuous which affects the estimates and are difficult to implement. We have analyzed the
precipitation data using Ordinary kriging on the public domain software GeoEAS widely
used by government regulating agencies and consulting firms.

SMOOTHING SPLINE ANOVA


Smoothing spline analysis of variance (SS-ANOVA) is a semiparametric procedure for fitting
models. The model considered in this case is similar to the kriging model. The SS-ANOVA
method decomposes the function I into various components like in any analysis of variance
model Le., the function I is split into main effects and interaction terms. This is useful
because one can find out how the observed data is affected by each variable.
Consider the model

i = 1,"',n (7)
where Ylo Y2, •• " Yn are observations, I is the function to be estimated, Xlo X2,' " , Xl: are
variables such that the jth variable Xj E Xi, some measurable space, and flo' • " fn are LLd
320 J. SATAGOPAN AND B. RAJAGOPALAN

with Ei '" N(O, 0- 2 ),0- 2 unknown. Usually the space considered is Xj = [0,1]. Whenever the
variables are not in the range [0,1], we can rescale them to lie in this range. Wahba (1990),
Gu (1989) and Gu and Wahba (1992) give an overview of the SS-ANOVA models. They
discuss applications to polynomial splines, tensor product splines and thin plate splines.
The SS-ANOVA model is described briefly in what follows.
The assumption in this model is f E H, where H is a Hilbert space. The function f

J:
is required to be smooth in its domain with f, f(1) absolutely continuous, f(2) E £2, where
f{i) denotes the ith derivative of f, and f(t)dt = O. The space H is uniquely decomposed
into a tensor sum as

H = 1 E9 LHi E9 LHi 0Hj E9 ... (8)


i<;
Based on this representation of H, the function f is decomposed uniquely as

f = J1- + L fi +L fi; + ... (9)


i<;
where J1- is a constant, fi E Hi are the main effect terms, 1;; E Hi 0 H; are the 2 factor
interaction terms and so on. The space H is decomposed in such a way that the resulting
subspaces are unique and orthogonal in the tensor product norm induced by the original
inner products. This decomposition is very similar to the decomposition in any analysis
of variance problem. The hilbert space H can further be decomposed into polynomial and
smooth spaces. Let Hi have an orthogonal decomposition H"i E9 H. i , where H"i is the
polynomial or parametric space and H.i is the smooth space. The f;s satisfy

11 J;( x i)dJ1-i =° (10)

where J1-i is a lebesgue measure on [0,1]. fijS satisfy

(11)

and so on. These are similar to the side conditions in any analysis of variance model. The
SS-ANOVA procedure obtains 1>. as an estimate of f which minimizes the quantity

where A, (Ji, (Ji,j,' .• are smoothing parameters. J's are smoothness penalty functionals. The
polynomial space does not have any penalty. Kimeldorf and Wahba (1971) and Wahba
(1990) discuss different penalty functionals. Gu and Wahba (1991) discuss the idea of thin
plate splines and Bayesian confidence intervals for the main effect and interaction terms for
a thin plate spline. For further details, we refer the reader to Gu and Wahba (1991).
The publicly available code RKPACK by Gu (1989) enables one to fit tensor product
splines, polynomial splines and thin plate splines for SS-ANOVA models. We have used the
thin plate spline program in RKPACK for our data analysis.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECWITATlON ANALYSIS 321

LOCALLY WEIGHTED REGRESSION (LOWESS)


Locally weighted regression or lowess is a local regression procedure that is a quick and ver-
satile tool for exploratory data analysis and regression surface estimation. Cleveland et al.,
(1988) and Cleveland and Devlin (1988) developed lowess. Under certain conditions, Muller
(1987) shows an equivalence between lowess and nonparametric kernel regression estima-
tion. The estimates proposed by Cleveland et al., (1988) considers local linear or quadratic
fits using k nearest neighbors of the point at which the estimate is desired. Also, Cleveland
et al., (1988) consider the standard multivariate nonparametric regression situation defined
through the general model

Yi=f(Xi,{3)+f.i, i=I,···,n (13)


where f(x,{3) is the smooth regression function or conditional expectation with parameter
{3 and the f.i'S are Li.d N (0, q2) variables, with q2 usually unknown.
An estimate j(x) of f(x) is developed nonparametrically (in the global sense) by a
weighted loca} linear or quadratic least squares regression in the neighborhood of x defined
by its k nearest neighbors. The quadratic model fitted locally is

(14)
where D is a design matrix and the coefficients /3 are determined as the solution to a
weighted least squares problem defined as

minp E (Yi - f(Xi,{3)) 2Wi(X) (15)


iEk(.,)

where k(x) is the index set of Xi that are the k nearest neighbors of x and Wi(X) is the
weight function defined as

() W(P(X,Xi»
Wi x = dk(x) (16)

where p(.) is the Euclidean distance function and dk(x) is the Euclidean distance from x to
the kth nearest neighbor in Xi. W(.) is usually taken as the tricubic weight function.
The number of nearest neighbors, k, acts as a smoothing parameter. As k increases,
bias increases, but variance decreases. The fit determined by lowess depends on the choice
of the number, k, of nearest neighbors and the order, r, of the fit. Cleveland et al., (1988)
propose a graphical method, based on analyzing an M-plot for the selection of both these
parameters. For further details we refer the reader to Cleveland et al. (1988).
For the present analysis, the span (fraction of sample used to compute local regres-
sion) and degree (order oflocal approximation, where 1 represents linear and 2 represents
quadratic) were chosen by using the F -statistic to compare alternate values of span and
degree. A lowess surface is fit for a span and degree and the residual sum of squares is com-
puted. The F-statistic using the proper number of degrees offreedom in each case is used to
compare the residuals sums of squares (RSS) at a significance level of 0.01. In this case the
residual sum of squares are significantly different (at 0.01 level) and the span/degree with
322 J. SATAGOPAN AND B. RAJAGOPALAN

the lowest RSS is selected. Otherwise the span/degree with the lower number of equivalent
parameters (higher degrees of freedom) is selected.
Cleveland (1979) suggests that local linear fitting produces good results, especially in
the boundary region. Akin (1992) has shown that the scheme works well even for a small
number of independent variables and, from comparisons of lowess with kriging on a large
number of known data sets, found that lowess performed very well in reproducing the known
functions.

DESIGN AND MODEL


Study Area and Data Set
The application area is the Columbia River Basin in the states of Washington and Oregon,
with an area of 57 million hectares and subdivided into 9 subregions corresponding to the
USGS sub-basin classification. The data consisted of annual precipitation obtained from 491
gauges spread over the entire basin. By subregion, the number of gauges are 82, 41, 12,77,
77,50,75,25 and 52 respectively. The gauges are denoted in three dimensions by northing,
easting and elevation. Figure 1 gives the topographical map of the study area. The study
area is very mountainous and this could cause non-stationarity in the precipitation process.
Phillips et al., (1991) applied kriging, detrended kriging and co-kriging to the annual
precipitation data from the Willamette River Basin (region 9). They found that co-kriging
with elevation worked better than ordinary kriging for region 9. However, Phillips (personal
communication) indicated that it gave poor results when applied simultaneously to the en-
tire Columbia River Basin. This is likely because the data spans both sides of the mountain
range, which results in non-stationarity in the data set. This motivated our study for look-
ing at alternative techniques. We analyzed the data set from Willamette River Basin and
the data from the entire Columbia River Basin.

Model
We have used log(precipitation) throughout our analysis. This is a very common trans-
formation. One of the advantages of this transformation is it stabilizes the variance. The
basic requirements in any analysis is the assumption of constant variance for error. This
assumption is violated usually when the response y follows a probability distribution in
which the variance is functionally related to the mean. Since the inclusion of elevation as
a third variable in the model only makes estimation of variogram more complicated, we
have considered precipitation as a function of easting and northing only for kriging. For
lowess and SS-ANOVA the response was considered as a function of easting, northing and
elevation. The model considered for SS-ANOVA was

y = J.L + lelev + lea.t,north + lelev*(ea.t,north) + f (17)


where y = iog(precipitation), J.L is
the constant term, Ie lev denotes the effect of elevation,
denotes the joint effect of easting and northing, lelev*(ea.t,north) denotes the in-
lea.t,north
teraction between elevation and (easting,northing) and f denotes the Li.d Gaussian error.
This model is very similar to any analysis of variance model. Instead of considering the
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 323

Columbia lliver Basin

Willamette River Basin

Figure 1. Topographical Maps.


324 J. SATAGOPAN AND B. RAJAGOPALAN

effects of easting and northing separately, we have looked at them as a two dimensional
variable (easting,northing) and used a thin plate smoothing spline (Gu and Wahba (1991))
approach. Gu and Wahba (1991) show simulation results and suggest that for geographical
data, such an approach would be appropriate.

RESULTS AND DISCUSSION


Visual examination of the estimated surfaces are very helpful in assessing the performances
of different estimators. However, measures of accuracy are also important in decision making
as they are based on quantitative criteria. We estimated a few measures that quantify the
local and global measures of fit, for comparing the three techniques. Vve estimated the fol-
lowing measures, maximum absolute deviation (MAD), mean absolute deviation (MNAD),
residual sum of squares (RSS), scaled residual sum of squares (SRSS 1 and SRSS2 ), and
scaled variance of deviation (SVD). These measures are given below.

• MAD = maxi=l,. .. ,nIYi - i);1


• MNAD = kE?=l IYi - Yil
• RSS = E?=l(Yi - Yi)2
• SRSS1 = n * var(y)
1 "':' (y. _ y".)2
L...,.=l· ,

• SVD = vary-y)
var")"

where y's denote the response (log(precipitation)) and Y denote the estimated values.
MAD signifies how far away the estimate is from the true underlying function at the
point where the fit is poorest, but does not ensure the best fit over the entire range of the
function. MNAD represents the average nearness of the fit to the true function over the
entire range of the function. SRSS1 and SRSS2 measure the nearness of the fit to the true
function over the entire range of the function and to ensure no regions of relatively poor
fit. SVD would return the squared signal to noise ratio if the estimate matches the true
function. This also tells how well the techniques discriminate the noise in the function.
Akin (1992) has detailed discussion on these measures.
Table 1 gives various measures for comparing the three spatial techniques for both
Columbia and Willamette River Basins. SS-ANOVA and lowess seem to do better function
estimation than kriging for this data. Kriging is comparable to lowess on the Columbia data.
This substantiates the fact that kriging performs better as a global estimator, while on the
local scale (Willamette River Basin) it performs poorly compared to the other methods.
Lowess and SS-ANOVA are less sensitive to heterogeneous discontinuity in the data unlike
kriging.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 325

TABLE 1. ivleasures of comparison for the three spatial techniques

Columbia Willamette
Measure SS-ANOVA Lowess Kriging SS-ANOVA Lowess Kriging
MAD 0.1772 1.6769 1.2823 0.1412 0.2861 0.6777
MNAD 0.0402 0.2256 0.2657 0.0366 0.0657 0.1532
RSS 1.3691 55.5073 59.7683 0.1251 0.4133 2.1886
SRSS 1 0.0060 0.2444 0.2632 0.0280 0.0925 0.4899
SRSS2 0.0002 0.0063 0.0429 0.0001 0.0003 0.0017
SVD 160.9792 3.8041 2.8387 31.8066 10.1950 1.2468

Figure 2 gives the histogram of residuals for the three techniques for Columbia River
Basin. The residual distribution for SS-ANOVA method is tighter than for lowess and krig-
ing. For kriging, the precipitation was considered only as a function of easting and northing
whereas for SS-ANOVA and lowess, precipitation was a function of easting, northing and
elevation.
Figure 3 gives the histogram for residuals for Willamette River Basin. The fit for
Willamette River Basin was not as good as for Columbia River Basin. The distribution
of residuals for SS-ANOVA method was again tighter than for lowess and kriging. Even
though the figures in Table 1 suggest that lowess would be a better approach than kriging
for precipitation data, the distribution of residuals for kriging was tighter than for lowess
in the case of Columbia River Basin.
The contour plot of the estimated function obtained from the three methods for Columbia
River Basin and Willamette River Basin are given in figures 4 and 5 respectively. Though
precipitation was obtained as a function of easting, northing and elevation for SS-ANOVA
an lowess, we have shown precipitation against easting and northing only in the plot for
these two methods for ease of comparison with kriging. For the Columbia River Basin data,
SS-ANOVA and lowess were found to handle the boundary points better than kriging. Also,
SS-ANOVA and lowess did more smoothing than kriging for this data.
Figures 6 and 7 give plots of the effect of elevation on precipitation. The effect of
elevation was obtained from SS-ANOVA. the plots also give a 95% (Bayesian) confidence
interval for the effect of elevation. An increasing trend in precipitation with elevation can
be observed. From the plot corresponding to Columbia River Basin it could be seen that
though elevation seems to have an increasing trend, it seems to settle down after some time
(or certain elevation).
326 J. SATAGOPAN AND B. RAJAGOPALAN

SS-ANOVA KRIGING

••

Figure 2. Histogram of residuals of the three methods for Columbia lliver Basin.

KR G

Figure 3. Histogram of residuals of the three methods for Willamette lliver Basin.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 327

......

........ ........

f
.......

U"'"

-- _(1<_
1.2.'" u ....

·Uto.. ...
.,..., .,.r,... ,,.","

Figure 4. Contour plot of the estimated function for Columbia River Basin .

.71" ...... .71' ....

r'" r" r'"


.11"'" .......

.8O".ft ......
_I. . . . ,
·Z.OFt'" '2""

Figure 5. Contour plot of the estimated function for Willamette River Basin.
328 J. SATAGOPAN AND B. RAJAGOPALAN

olovOlionollOd
15%oonI.irO.
I I
J
/
( Columbia River Basin

500 1000 1500 2000 2500 3Il00

Eating

Figure 6. Elevation Effect for Columbia Rh'er Basin obtained from SS-ANOVA.

- .......on.ffId
- _conI.1n!.

Willamelle River Basin

200 400 500 800 1Il00 1200 1400

Figure 7. Elevation Effect for Willamette lliver Basin obtained from SS-ANOVA.
COMPARING SPATIAL ESTIMATION TECHNIQUES FOR PRECIPITATION ANALYSIS 329

So far we have looked at an exploratory data analysis which used three spatial estima-
tion techniques for precipitation analysis. Kriging has so far been the most widely used
geostatistical technique by hydrologists. As mentioned earlier, kriging has become synony-
mous with geostatistics over the last decade. Results and research by various authors like
Yakowitz and Szidarovszky (1985) and Akin (1992) motivated us to look at other spatial
estimation techniques and we found that the other techniques - SS-ANOVA and lowess -
better estimated the precipitation function than kriging. These methods suggest a possible
alternative for spatial interpolations of precipitation data, especially when the process is
non-stationary. Future work is required in terms of more data sets and other nonparametric
techniques.

ACKNOWLEDGEMENT
We would like to thank Dr. Upmanu Lall for valuable suggestions and for providing us with
relevant manuscripts. We also thank Mr. D.L. Phillips, Mr. J. Dolph and Mr. D. Marks
of USEPA, Corvallis, OR, USA for providing us with the precipitation data sets and also
the results of their analysis which motivated our study. We would also like to thank our
numerous friends for conversations and discussions on spatial estimation techniques.

REFERENCES
Akin, 0.0 (1992) "A comparative study of nonparametric regression and kriging for ana-
lyzing groundwater contamination data", M.S. Thesis, Utah State University, Logan, Utah.
Bowles, D.S., Binghan, G.E., Lall, U., Tarboton, D.G., AI-Adhami, M., Jensen, D.T., Mc-
Curdy, G.D., and Jayyousi, E.F. (1991) "Development of mountain climate generator and
snow pack model for erosion predictions in the western United States using WEPP", Project
report - III.
Cleveland, W.S. (1979) "Robust locally weighted regression and smoothing scatter plots",
Journal of American Statistical Association 74,368,829-836.
Cleveland, W.S., and Devlin, S.J. (1988) "Locally weighted regression: an approach to
regression analysis by local fitting", Journal of American Statistical Association 83, 403,
596-610.
Cleveland, W.S., Devlin, S.J., and Grosse, E. (1988) "Regression by local fitting", Journal
of Economics 37, 88-114.
Cressie, N. (1991) Statistics for Spatial Data, John Wiley and Sons, New York.
de Marsily, G. (1986) Quantitative Hydrogeology, Groundwater Hydrology for Engineers,
Academic Press, California.
Gu, C. (1989) "Rkpack and its applications: fitting smoothing spline models", Technical
Report 857, University of Wisconsin - Madison.
Gu, C. and Wahba, G. (1992) "Smoothing spline ANOVA with component-wise Bayesian
confidence intervals", Technical report 881 (rev.), University of Wisconsin - Madison.
Isaaks, E.H., and Srivastava, R.M. (1989) An Introduction to Applied Geostatistics, Oxford
University Press, New York.
Journel, A.G. (1977) "Kriging in terms of projections", Journal of Mathematical Geology
330 J. SATAGOPAN AND B. RAJAGOPALAN

9, 6, 563-586.
Journel, A.G. (1989) Fundamentals of Gcostatistics in five lessons, American Geophysical
Union, Washington, D. C.
Kimeldorf, G., and Wahba, G. (1971) "Some results on Tchebycheffian spline functions",
Journal of Mathematical Analysis and Probability 33,82-95.
Krige, D. G. (1951) "A statistical approach to some mine valuations and allied problems in
Witwaterstrand" unpublished Masters' Thesis, University of Wit waterstrand, South Africa.
Muller, H.G. (1987) "Weighted local regression and kernel methods for nonparametric curve
fitting", Journal of American Statistical Association 82, 397, 231-238.
Phillips, D.L., Dolph, J., and Marks, D. (1991) "Evaluation of geostatistical procedures for
spatial analysis of precipitation", USEPA Report, Corvallis, Oregon.
Wahba, G. (1990) Spline Models for Observational Data, SIAM series in Applied Mathe-
matics, Pennsylvania.
Yakowitz, S., and Szidarovszky, R. (1985) "A comparison of kriging with nonparametric
regression methods", Journal of Multivariate Analysis 16, 1,21-53.
PART VII

SPECTRAL ANALYSIS
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES

ANDRZFJ LEWANDOWSKI
Wayne State University
Detroit, MI 48202
U.S.A.

INTRODUCTION
Over 20 years have passed since the publication of a book by Box and Jenkins (Box and
Jenkins, 1970) where the principles of time series analysis and modeling were formu-
lated. During this period the Box-Jenkins methodology has been successfully applied
to numerous practical problems and has become the standard procedure for time se-
ries modeling and analysis (Pankratz, 1983). Most commercial statistical computer
packages support this methodology.
According to the Box - Jenkins approach, the process of model building consists of
3 steps:

1. Model identification, during which preliminary analysis is performed and an initial


version (structure) of the model is determined,
2. Parameter estimation, during which exact values of model parameters are com-
puted,
3. Model validation, during which the quality of the resulting model is examined.

The model building procedure is interactive and iterative in nature. The procedure
is repeated until the model attains desired form and accuracy.
Although several methods and algorithms for model estimation constituting the
2nd stage of the above procedure have been developed, the identification and validation
stages are somewhat more diffuse in nature.
The original identification method proposed by Box and Jenkins is based on the
visual inspection of the autocorrelation and the partial autocorrelation functions. This
approach is very exploratory in its nature but rather difficult to apply. Although in
several cases structure of the model generating the time series can be relatively easily
deduced from the shape of the autocorrelation function, in many other cases it is
difficult or is not possible at all. Moreover, since the dependence between values of
the autocorrelation function and parameters of the model generating the time series is
rather complicated, values of model coefficients cannot be easily determined without
performing complicated numerical calculations. Several attempts have been made to
classify possible patterns of the autocorrelation function and to build a catalog of such
333
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 333-346.
© 1994 Kluwer Academic Publishers.
334 A. LEWANDOWSKI

functions (Polasek, 1980). Unfortunately, this approach leads to a catalog containing


hundreds of patterns, making it difficult to match the sample autocorrelation function
with one of the catalog entries.
An up-to-date review of identification methods has been recently published by
Choi (Choi, 1992). In addition to the above mentioned analysis of the autocorrelation
function, he discusses 4 other groups of methods: penalty (unction methods, innova-
tion regression methods, pattern identification methods and testing hypothesis methods.
With the exception of the pattern identification methods, all these methods do not
possess exploratory character: the data is given to some "black box" mechanism which
produces the information about the order of the model generating the time series. The
pattern identification methods discussed by Choi use the fact that if a time series has
ARMA(p,q) structure then the theoretical autocorrelation function for lags greater than
q satisfies a linear difference equation of order p. Therefore, to determine the struc-
ture of a model it is sufficient to check whether a subsequence of the autocorrelation
function satisfies a linear difference equation. Several algorithms based on this test
are discussed by Choi Most of these algorithms base on testing of a rank of a Henkel
matrix. The basic difficulty in applying these methods is connected with the fact that
the computation of a rank of a matrix is an ill-defined problem and that instead of
visual inspection of the autocorrelation function the analyst must use the same visual
inspection procedure to analyze patterns of columns of several matrices. The early
analysis performed by De Goojier and Hents (1981) shows that these methods have
limited applicability.

TIME SERIES ANALYSIS AND EXPLORATORY STATISTICAL ANALYSIS


The role of the model identification step in time series analysis is frequently underes-
timated. Model identification is not simply a procedure for determining the structure
of a model; it is a scientific process which leads to a deeper understanding of the phe-
nomena being investigated. In many practical cases this increased knowledge of the
system is more important than the resulting model. This is one of the main princi-
ples of Exploratory Data Analysis (Tukey, 1977). For these reasons a new approach to
Box-Jenkins model identification is proposed in this paper. In contrast to the existing
methods, this approach is based on spectral methods and involves frequency analysis
of ARMA models. However, it differs from the standard spectral approach presented in
the textbooks on time series analysis since it provides a way of understanding and in-
terpreting the spectrum and hence the model itself. The theoretical background of this
method is well known in control engineering and circuit theory and has been applied
successfully in these fields.
According to experience gained in electrical engineering and control engineering,
spectral methods are in general more useful than time domain methods. Spectral char-
acterization of a model is easier to interpret, analyze and understand than time domain
characteristics such as the autocorrelation function.
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 335

LINEARIZED TRANSFER FUNCTIONS OF ARMA MODELS


Before presenting the proposed method for time series analysis, it is necessary to dis-
cuss the basic concepts from Z transform theory (Elliott, 1987). Let
{xt! tE(-oo,+oo) (1)

be a real sequence. The Z transform of this sequence is the complex function


+00
X(z) = I Xi Zi (2)
-00

which is denoted by
X(z) ~ txt! (3)
Under certain conditions this transformation is invertible and X(z) uniquely charac-
terizes the series {xt! (Eliott, 1987).
Let two time series {Xt} and {Ut} be connected by a linear relationship (operator)
G:
{xd=G({ud) (4)
The complex function G(z)
G( ) = X(z) (5)
z U(z)
will be called the transfer (unction of the operator G. It is easy to calculate the transfer
function of an ARMA model. Taking (2) and multiplying both sides by z, the following
relationship can be obtained
+00 +00
zX(z) = IXiZi+1 = I Xi-l Zi (6)
-00 -00

Thus, if
X(z) ~ {xd (7)
then
zX(z) ~ {Xt-d (8)
It follows from the above that the complex variable z can be formally interpreted as
the shift operator B used by Box and Jenkins. Therefore, in order to obtain the transfer
function for this model it is sufficient to replace operator B by complex variable z.
The term spectral or frequency transfer (unction will be used to describe the follow-
ing formula
. P(e- jW )
G{Jw) = Q( e-}W
.) (9)
where P and Q are respectively the numerator and denominator of the operator transfer
function (from now on we will consider only rational operator transfer functions). The
spectrum of the output signal x generated by the model (which is actually the time
series being analyzed) is equal to the modulus of the transfer function
few) = IG(jw) I (10)
336 A. LEWANDOWSKI

Formula (10) shows where most of the basic difficulties in interpreting the spec-
trum arise. Although the structure of the transfer function itself is rather simple, the
spectrum is a highly nonlinear function of frequency w since it is the modulus of
the transfer function evaluated on the unit circle and therefore includes trigonometric
functions.
This leads to an important question: does the modulus of a complex function eval-
uated on the unit circle characterize this function uniquely? The answer is generally
no. If a function G(z) is given such that its modulus evaluated on the unit circle is

(11)

then the function


- Z-D(
G(z) = G(z)-I- (12)
--D(
z
will have the same modulus on the unit circle (Hannan, 1970). But if the function G(z)
has the minimum phase property (Le .. has no poles or zeros inside the unit disc), the
modulus evaluated on the unit circle will characterize the function uniquely. This is
one of the most important results in the theory of signal processing, electrical circuit
theory and automatic control (Robinson, 1981).
Another problem can also be formulated: is it possible to evaluate the modulus of
G (z) on a curve other then the unit circle to produce a simpler form off (w) ?
The real and imaginary axes appear especially attractive for this purpose. In this
case a similar result can be obtained: ifG(z) has the minimum phase property (Le., it
has no zeros or poles in the right half-plane) it is sufficient to know the modulus of
G(z) evaluated at z = jw to determine this function uniquely. Moreover, when the
real and imaginary axes are used instead of the unit circle, the function IG(jw)1 has a
much simpler structure than IG(e-jW)1 since it contains no trigonometric components.
Unfortunately, computing the spectrum on imaginary axis is of little immediate use in
time series analysis since it is unreasonable to expect that the transfer function of a
model generating the time series under study will have no poles or zeros in the right
half-plane. Fortunately, there is a way to bypass this difficulty by transforming the
region outside the unit circle into the left half plane and evaluating the modulus of the
resulting (unction on the imaginary axis. This can be achieved by applying the following
transformation (called the A transformation)

,\=l-z (13)
l+z
This transformation has the following properties:

1. It is invertible
1-,\
Z=-- (14)
1+,\
2. It transforms the region outside the unit circle in the complex plane z into the left
half of the complex plane ,\ and the unit circle into the imaginary axis (Silverman,
1975),
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 337

3. If this transformation is applied to the rational transfer function then the resulting
function will also be rational. If the ARMA model is stable and invertible then all
poles and zeros of the transfer function of a model generating this time series will
be located outside the unit circle. After applying the A transform to this transfer
function, all the poles and zeros of the transformed transfer function will be
located in the left half of the A complex plane. This means that the resulting
transfer function has the minimum phase property (Robinson, 1981).
The minimum phase property is important since it allows one to work with the
modulus of the function rather than with the function itself.
The rational function
P (1- A)
o
(A) = G(1I+A
-A) = Q(I-A) p(A)
"i""+X = q(A) (15)
I+A
will be called the linearized transfer (unction. Similarly,
(·w)=p(jw) =G(I- i
OJ q(jw) 1 +jw
W) (16)

will be called the linearized frequency transfer (unction, and


](00) = lo(jw) 1 (17)
will be called the linearized spectral density (unction (LSDF).
Comparing the definition of the standard spectrum and the linearized spectrum
(see 16) it is possible to conclude that to calculate the linearized spectrum instead of
making the substitution
z = e- jw (18)
the following substitution must be used
I-jw
z = ---"-- (19)
l+jw
which is the Pade approximation (or linearization) of exponential function (18). This is
the source of the name linearized spectrum.
It is not difficult to find the exponential form of (19)
1 - jw = e-2jarctanw (20)
l+jw
Comparing this result with (18) it is possible to conclude that the linearized spectrum
can be interpreted as a standard spectrum with a distorted frequency scale. Therefore,
no special tools are necessary to calculate the linearized spectrum since it is suffi-
cient to have a plotting chart with a suitably scaled frequency axis. In contrast to the
standard frequency transfer function, the linearized transfer function (16) is a rational
function of frequency. This explains why analysis based on a linearized spectrum is
Simpler than for the standard spectrum. The transformation (13) is widely used in con-
trol engineering for the design and analysis of sampled data control systems (Bishop,
1975) and has been used by Hannan (1970) as a tool for investigating continuous-time
stochastic processes.
338 A. LEWANDOWSKI

ASYMPTOTIC FREQUENCY RESPONSES OF ARMA MODELS


It is possible to provide a simple graphical procedure for constructing the approxima-
tion of the linearized spectrum for general ARMA models. This procedure uses the
notion of an asymptotic linearized spectrum which is the standard linearized spectrum
plotted on a logarithmic plotting chart. If the logarithmic scale is used, the multiplica-
tive factors of the linearized transfer (unction became additive. Therefore the linearized
spectrum of the general ARMA model can be constructed by adding components gen-
erated by all poles and roots of the transfer function calculated independently.

linearized spectrum of MA(1) model


In this section the moving average MAW model will be considered
G(z) = 1- 9z (21)

After applying the A transformation, the above transfer function will have the following
form
l-i\ (1-9)(I+~~:i\)
9 (i\) = 1 - 9 1 + i\ = 1 + i\ (22)

or
(.00) = Kl + j9'oo = p(joo) (23)
9 J 1 + jw q(jw)
where
K=1-9 (24)

9' = 1+9 (25)


1-9
The logarithm of the modulus of the numerator (23) has the form

log Ip(jw)1 = ~log(l + 9'200 2 ) (26)

It follows from the above formula that for low frequencies such as 9' 00 « 1
log Ip(jw) I ::: 0 (27)

In the opposite situation, for high frequencies it is possible to write

10glp(jw>1 :::::log9' + log 00 (28)

It follows from (28)-(29) that if suitable axes are chosen, the function (26) can be
approximated by 2 asymptotes: a line of zero slope for 00 < W and a line of slope + 1
for w > ii:J where
ii:J = ; , (29)

The asymptotic construction provides a reasonably good approximation of the mod-


ulus of the function (26). This construction (the asymptotes of a frequency response
function) is known as the Bode plot. The analysis based on the Bode plot constitutes
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 339

Always pole for "'= 1


loglf(j",)1

log(1+8)

o ~----~----+---~~----------~----------~-----
-2 1 log(cu)

log(i-S) t - - Asymptotic
-1 - - Exact
Always zero

Figure 1: The Bode plot for MA(1) model

one of the basic tools in electronics and control engineering (Sage. 1981). The Bode
plot for MA(l) model can be easily constructed using the asymptotic representation
presented above. If the linearized transfer function of the MA(1) model is presented in
logarithmic form

log ID(jw) I = 10gK + log I p(jw) I -log Iq(jw) I (30)


then a simple graphical operation can be performed to obtain the asymptotic repre-
sentation of the function (30). The shape of the Bode plot depends on the value of 9;
however the Bode plot of MA(l) model always has a pole at w = 1 and zero at
~ 1- 9
w =---- (31)
1+9
The Bode plot for MA(1) model is presented in Figure 1.

Unearized spectrum of AR(l) model


The transfer function of the autoregressive AR(1) model has the following form
1
G(z)=-- (32)
1- 9z
340 A. LEWANDOWSKI

After applying the A transformation, the following linearized transfer function will be
obtained
1 1+~
g(~) = - (33)
1 - 9 - ~ - (1 - 9) (1 + 1 + 9~)
1
1+~ 1-9
and, analogously
. l+joo
g(Joo) = K(l + 9' joo) (34)

where
1
K=1_9 (35)

9' 1+9
= (36)
1-9
The methodology presented in the previous section can be used to construct a Bode
plot for AR(l) model. The only difference is that the transfer function of the AR(1)
model always has zero at 00 = 1 and pole at
A 1- 9
00=-- (37)
1+9
The Bode plot for AR(1) model is presented in Figure 2.

linearized spectrum of ARMA(l.l) model


Using the methodology described in the previous sections, it is easy to construct a Bode
plot for the autoregressive moving average ARMA(1,l) model. When the A transforma-
tion is applied to the transfer function of the ARMA(1,l) model

G(z) = 1- 9z (38)
1- cpz
the following linearized transfer function will be obtained

1 + 1 + 9~
9 (~) = ( 1 - 9 ) 1- 9 (39)
1-cp l+l+cp~
1- cp
This function has one pole and one zero. The corresponding Bode plot can be easily
constructed applying a procedure similar to that used in building the Bode plot for
MA(1) and AR(1) models (Figure 3).

linearized spectrum of AR.(2) model with complex roots


This situation is more complicated than those dealt with previously. The following is
the standard form of the transfer function of a general AR(2) model
1
(40)
G(z) = 1 - 91Z - 92Z2
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 341

Always zero for ('.0)= 1


loglf(j('.o)) I

-log(1+0)

- - Asymptotic
--- Exact
('.0)=(1-0)/(1+9)
o r---~-----T----~----~----~--------~~----
-2 -1 o 1 log «('.0))

-log (1-9)

-1 Always pole

Figure 2: The Bode plot for AR(1) model

After applying the A transformation this transfer function reduces to the following
form
(i\.) - 1 _ K (1 + ;\)2 (41)
(1_i\.)2- 1+9~i\.+9;i\.2
B -
91 1- 1
(I-i\.)
+ i\. - 92 + i\. 1
where
K= 1 (42)
1- 91 - 92
9' _ -2(1 + 92) (43)
1 - 1 - 91 - 92
9' _ 1 + 91 - 92 (44)
2 - 1 - 91 - 92
It is more convenient to use the canonical form of the denominator of (41)

i\.9 1
1-9 9 = 1+ (_1_)
,
12 - ~ + (_1_) = 1+
'2
22
i\.2 2~i\. (i\.)2
+ 2 W y W y (45)

~ ~
342 A. LEWANDOWSKI

loglf(jw)1

log(1-e)/(1-~) Zero generated by MA(l) part

w=(1-9)/(1 +9)
o r---~------r---~----~----~----r_--~------
-2 -1 o 1 log(w)
w=(1-~)/(1 +~)

- - Asymptotic
--- Exact

-1
Pole generated by AR( 1) part log (1 +9)/(1 +~)

Figure 3: The Bode plot for ARMA(l,l) model

The value
1
Wy = .J9I (46)

is known as the resonance frequency while the parameter

~= zF;
9'
(47)

is the damping factor. These two factors determine the resonance properties of the
transfer function. If ~ > 1 then the quadratic polynomial has real roots and can be
represented as the product of two first order factors. For 0 < ~ < 1 the roots are
complex and more careful analysis is required since for small values of ~ (low damping)
and for frequencies close to the resonance frequency, asymptotic approximation will
not be very accurate. However, experience has shown that asymptotic analysis can
be useful even in these cases and therefore the asymptotic behavior of (41) must be
investigated.
Making the substitution
.\ = jw (48)
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 343

and considering the canonical form (45), the following expression will be obtained
g( '00) = K + joo)2
(1 (49)
J
1+ 2~joo
- - - (00)2
-
ooy ooy

c:rr
The logarithm of the modulus of this function is as follows

Joglg(joo)1 ~ 10gK + 10g(1+ 00') - ~Iog{[1- +4~' (.:;'J'} (50)

The first two terms of this formula are similar to those for the MA(l) model, and their
asymptotic approximation has been discussed in previous sections. The results ob-
tained previously can be used in this case. The only difference between this and MA(1)
model is that in this case the slope of the asymptote of the second term is +2. Analysis
of the third term is also straightforward. For low frequencies 00

(51)

while for sufficiently large 00 the term (oo/ooy)4 is dominant and

(52)

Comparing this result with (SO) we observe that the third term of (SO) has two
asymptotes: for low frequencies the slope is equal to zero, while for high frequen-
cies the slope is -2. The asymptotes cross at the vertex corresponding to resonance
frequency 00 = ooy.
The following component of the frequency response (SO)

(53)

generates a peak for small values of~. The amplitude and frequency at which this peak
occurs are given by the following expression

M = log ( 1)
2~~1- ~2
(54)

oom = ooy~1 - 2~2 (55)


It follows from (55) that this peak exists only for sufficiently small values of the damp-
ing factor, namely for
, < V; (56)

It is now not difficult to construct a Bode plot for (49) (see Figure 4). The only
difference between the asymptotic frequency response presented in Figure 4 and the
frequency response of the AR(1) model is that the slope of the asymptote is equal to
- 2. The AR(2) model with complex roots always has a double zero at 00 = 1 and a
double pole at 00 = OOy.
344 A. LEWANDOWSKI

Always zero for c.J= 1


loglf(jc.J) I

o ~----~----------~----------~----------~-----
-2 o 1 log(c.J)

--- Asymptotic
Exact ~=O.9
-1 - - - Exact ~=O.3

t
Always double pole
-2

Figure 4: The Bode plot for AR(2) model

SPECTRAL ANALYSIS OF THE HYDROLOGICAL TIME SERIES


The procedure presented in this paper has been applied to several artificially generated
time series (Box and Jenkins, 1970) as well as to time series describing real phenomena.
In all cases, the structure of the model generating the time series was determined cor-
rectly and the parameters estimated using the linearized spectrum were close to values
obtained using a time series analysis package. Selected results have been presented by
Lewandowski (1983) and more results will be presented in the forthcoming publication.
The model identification procedure presented in this paper has been used to identify
the structure of a model which generates the daily flows of the Little White River, mea-
sured during the year 1979. The Little White River is a tributary to the Missisagi river
in Ontario with drainage area 1960 km2 .
The spectrum of the time series consisting of 358 points has been estimated using
the ARSPEC method by Jones (1978) based on an autoregressive approximation tech-
nique. Other spectrum estimation techniques give similar results. The identification
procedure can be easily applied to this data. Although a plotting chart, a pencil and
a ruler are sufficient tools to perform this procedure, it can be automated using the
EXSPECT program developed by the author (Lewandowski, 1993).
The procedure consists of the following steps:
EXPLORATORY SPECTRAL ANALYSIS OF TIME SERIES 345

Figure 5: Bode plot for daily flow data

1. A horizontal line is plotted. This line is the asymptote to the linearized spectrum
for w = O. This asymptote determines the amplification factor K of the transfer
function,
2. A line with slope -1 is plotted to approximate the spectrum for frequencies close
to w = 0.06. When the exact frequency response is calculated and plotted then
it becomes clear that the line with slope -2 must be used to approximate the
linearized spectrum for w > 0.5,
3. A horizontal line must be plotted to approximate the linearized spectrum for
w> 1

Therefore, the asymptotic Bode plot consists of 4 line segments (Figure 5). The lin-
earized transfer function has poles for w = 0.06 and w = 0.6 and 2 zeros for w = 1.
Comparing these results with standard patterns of AR(1) model it is possible to con-
clude that the model generating this time series must be AR(2) with real roots. Since
there is one-one relationship between the frequency w and values of roots and zeros
of the linearized transfer function it is possible to determine the parameters of the
model directly from the plot. These values for the model under study are 0.53 and
346 A. LEWANDOWSKI

0.94. Therefore, the transfer function of the model generating this time series is
1 1
(57)
G(z) = (1 - 0.53z)(1 - 0.94z) 1 -1.47z + 0.498z2

Parameters of the ARMA(2,O) model have been also estimated using the statistical pack-
age MINITAB. The transfer function of the model generating the time series and ob-
tained from MINITAB is as follows
1
G(z) = i - 1.54z + 0.593z2 (58)

The coincidence of coefficients in (57) and (58) is satisfactory.

REFERENCES
Bishop, A B. (1975) Introduction to Discrete Unear Controls: Theory and Application.
Academic Press.
Box G. E. P. and G. M. Jenkins (1970) Time Series Analysis: Forecasting and Control.
Holden-Day, San Francisco.
Choi B. S. (1992) ARMA Model Identification. Springer-Verlag, New York.
De Gooijer, j.G. and R.M.J. Hents (1981) "The corner method: an investigation of an or-
der discrimination procedure for general ARMA processes". University of Amsterdam,
Faculty of Agricultural Science and Econometrics, Report AE 9/81.
Eliott, D. F. (1987) "Transforms and Transform Properties" in D. F. Elliot (ed.) Digital
Signal Processing: Engineering Applications, Academic Press, New York.
Hannan, E. J. (1970) Multiple Time Series. John Willey and Sons, New York.
Jones, R.H. (1978) "Multivariate autoregression estimation using residuals", in D.F. Find-
ley (ed.), Applied Time Series Analysis, Academic Press.
Lewandowski, A (1983) "Spectral methods in the identification of time series", Interna-
tional Institute for Applied Systems Analysis, WP-83-97, Laxenburg, Austria.
Lewandowski, A (1993) "EXSPECT - computer program for exploratory spectral analy-
sis of time series", to be published.
Pankratz, A (1983) Forecasting with Univariate Box-Jenkins models: Concepts and
Cases, j. Willey, New York.
Polasek, W. (1980) "ACF patterns in seasonal MA processes", in O. D. Anderson (ed.),
Time Series, North-Holland Publ. Co., Amsterdam.
Robinson, E. A (1981) "Realizability and minimum delay aspects of multichannel mod-
els", in E. A Robinson (ed.), Time Series Analysis and Applications. Goose Pond Press.
Sage, AP. (1981) Linear System Control. Pitman Press.
Silverman, H. (1975) Complex Variables. Houghton-Mifflin.
Tukey, j. W. (1977) Exploratory Data Analysis. Addison-Wesley Publishing Company,
Inc, Reading, Massachusetts.
ON THE SIMUlATION OF RAINFALL BASED ON TIlE CHARACfEruSTICS OF
FOURIER SPECTRUM OF RAINFALL

U. Matsubayashi 1, S. Hayashi2 and F. Takagjl


1 Department of Civil Engineering
Nagoya University, Chikusa-ku,
Nagoya 464-01, Japan
2 Kawasaki Heavy Industries
Noda City, Chiba 278,Japan

Design rainfall is usually detennined by magnifying historical data to an amount correspond-


ing to a certain return period. However, the spatial distribution of the precipitation is usually
not considered in the design rainfall computation. With this point of view, in this paper we aim
to discuss the spatial characteristics of rainfall.
Precipitation occurs with various turbulences, i.e. convective cells, rain bands, cyclones,
etc., which can be successfully expressed bya Fourier series. Moreover, a Fourier series
formulation can easily relate the volume of rainfall through its linear term and various other
shapes through its trigonometric terms. We apply the Fourier analysis to the rainfall data of
radar observations and discusses the Fourier coefficient and phase angle.

INTRODUCTION

Design rainfall is the most basic quantity in the planning of flood control projects. This design
rainfall is based on the frequency analysis of historical records of point rainfall, i.e., T-year
return period. Because rainfall varies in time and space, the design rainfall is sometimes
reproduced from historical rainfall data by adjusting the total amount of rain to be equal to the
T-year rainfall depth. This method however, gives umealistic results when the magnitude of
the rainfall referred to is different from the design rainfall depth. To improve the determination
of the design rainfall, a stochastic simulation procedure is needed wherein the design rainfall
is distributed in time and space and corresponds to a certain return period.
As for the simulation of rainfall distribution in time and space, there are several procedures
which can be used (i.e., Amorocho and Wu 1977, Cortis 1976, Meija and Rodriguez-lturbe
1974, Bras et al. 1986, Kavvas et al. 1987 and Matsubayashi 1988). However, these methods
do not succeed in introducing the return period. With this point of view, this paper utilizes the
Fourier series to express the rainfall field subjected to a given magnitude (i.e. the areal averaged
rainfall).
347
K. W. Hipel et al. (eds.j,
Stochastic and Statistical Methods in Hydrology and Environmental Engineermg, Vol. 3, 347-359.
© 1994 Kluwer Academic Publishers.
348 U. MATSUBAYASHI ET AL.

THEORETICAL DISCUSSION

The rainfall field is a typical random field which originates from the fluctuation in the
meteorological properties of the atmosphere such as vapour pressure, temperature, phase
change of matter and wind field. However it is also recognized that the rain field consists of
several scales of phenomena individually known as extratropical cyclone, rain band, cell
cluster and convective cells where a number of smaller scale phenomena are included in a
larger phenomenon (Austin and Haze 1972). This hierarchical characteristics is a kind of
regularity in the randomness found in turbulent flow. These characteristics can be expressed
by a Fourier series which consists of various scales of waves expressing each scale of rainfall
phenomena stated above.
The Fourier series for a two-dimensional field can be expressed by equation 1.

m n
r(x,y) =aoo + ,.z;;,';1 Aijsin ((JZlj + kz,x) x sin ((Jyij + ky jY)
(1)
_iz..
k z;- L
x

It is worthwhile to discuss the one-dimensional Fourier series shown in equation 2 because it


inherently posses almost all the important properties of the two-dimensional case, i.e.
significance of the constant term a00' distribution characteristics of wave number, kxi and kyj'
e,
and phase angle difference (Jx ij' ij'
a m .
rex) = 2° + ~ Aj sin (k;x +(Jj) where kj = t (2)

In equation 1, ao is the areally averaged rainfall in a region of the length L, Ai and (J; are
amplitude and phase angle of the sinusoidal function with a wave number ki. In these
parameters, randomness is included in ao,Ai and (J;. As mentioned above, the rainfall field is
not simply a random process but also a deterministic process shown in the hierarchical
structure of the rain band and convective cell and others. Therefore, to simulate the rainfall
field, these variables should be properly evaluated based on their stochastic and deterministic
properties.

Tbe Fourier spectrum A,.2

Among three variables, the amplitude Ai, or the Fourier spectrum A,.2 has two aspects for
treatment in the analysis. One is from the knowledge about the turbulence. Kolmogorov
derived a relationship between the energy spectrum Ek of turbulence and wave number k by
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 349

means of dimensional analysis as shown in equation 3,

(3)

This equation explains the distribution of energy of turbulence where energy is transferred
from a long wave to a short wave like a cascade, and finally dissipates as heat of molecular shear
stress.
By analogy with turbulence, the rainfall field may also be considered to have a similar
mechanism wherein rainfall has large scale of rain band and numerous small convective cells
where water is transferred to larger scale of phenomena to a smaller scale and falls as rainfall.
Based on this consideration and applying the achievement in turbulent studies, the Fourier
spectrum of rainfall field is expected to have a form shown in equation 4 for the one
dimensional case, corresponding to Eq. (3) for turbulence.

(4)

On the other hand, the Fourier spectrum ,Ai2 , is theoretically obtained by applying the Fourier
transformation to the auto-correlation function of the rainfall field. The characteristics of the
auto-correlation function of the rainfall have been studied by many researchers (1.1. Zawadzki
1974, Ohshima 1992) reporting that the A.C.F. of the rainfall decreases linearly or exponen-
tially. The Fourier spectrumAi2 derived by applying the inverse Fourier transform to the
exponential A.C.F. is shown as,

(5)

(6)

In this research equation 6 is also used to express the Fourier spectrum of the rainfall, which
can express both equations 4 and 5. That is, equation 6 with k=2 is reduced to equation 5 and
equation 6 with as=O means equation 4.
These expressions of the Fourier spectrum give clues about the discussion of the rainfall
simulation.

Areal average rainfall ao

Various rainfall simulators mentioned before can produce rainfall in time and space. However
they cannot properly simulate the rainfall field corresponding to a certain return period T. The
return period, an average recurrence interval of rainfall, is usually defined for point rainfall
depth of an event. In considering the spatial distribution of rainfall, the return period should
350 U. MATSUBA YASHI ET AL.

be analyzed for rainfall of a specified area because rainfall has centrifugal characteristics
originating from the hierarchical structure of the rain band, the convective cell, and etc.
Therefore the maximum value of spatially averaged rainfall for a certain area is strongly
dependent on the area considered. These characteristics are usually discussed as DAD analysis.
In the Fourier series, arJ2 in equation 3 is the spatial average value of rainfall within the area
where the Fourier series is spanned. Therefore from the above discussions the ao characterizes
the total amount of rainfall in the area and is evaluated for a certain return period through DAD
analysis. In other words, through the parameter ao, the statistical characteristics (i.e., return
period of the occurrence of rainfall) are explicitly introduced in the simulation model. On the
other hand, the parametersAj and 8i determine how the total rainfall (a0l2)L should be
distributed in space and time. So these parameters should be carefully evaluated to mimic the
stochastic characteristics of rainfall. In this research, however, we concentrate on the
discussion of the spatial distribution of rainfall and the time changing of rainfall is not treated.

RAINFALL DATA

Date Duration Type of rainfall

Domain of 6(29186 1:00- warm frontal


the Analysis
12:00 rainfall
7/10/86 0:00- stationary frontal
12:00 rainfall
6/30/88 5:00- warm frontal
8:00 rainfall
9(20/88 1:00- warm frontal
11:00 rainfall
9(25/88 4:00- stationary frontal
14:00 rainfall

Figure 1. Nagoya radar site and region Table 1. Characteristics of rainfall data
of rain data

Rainfall data used to analyze the Fourier spectrum and the phase angle are the radar rain gage
data obtained at the Nagoya Meteorological Observatory (Nagoya City, Aichi, Japan) shown
in Figure 1. PPI data are given at grid points of 2.5 km by 2.5 km mesh in a 500 km by 500 km
square region at every 7.5 minutes. We analyze five storms which occurred during the years
1986 and 1988. The characteristics of the rainfall are listed in Table 1. Because we do not
discuss the time changing of rainfall, spatial distribution of hourly rainfall are analyzed
independently with time. In the case of one dimensional rainfall field, rainfall data along a
certain grid line are used.
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 351

150

~
.§.
100

~
iii
50

a: 0
2.5 10 17.5 25 100

Figure 2-a. One dimensional rainfall distribution of a single peak rain


(September 20, 1988, 6:~7:(0)
- Pha.. Angle e 1 00·
• Phase AnOIa Difle.. nc:. 1-
~1000~~~--------~--------.

~ .....~............ .. ~·~r~<I:·~~~··..····-..·..··
~ EQ.(6) 'I....... ~
10 ········r·...···················..·_··
~ ....;~~:~~~....................-. ! ....: .::::-..:-._ .::.-:.
! 0.1 .... +.. . . . ..... . .. ..... -..+~.
.~ )...3.3 (1.0 .134 ~._._.u~ ....... u..~:.
~s -3.9 a. -0.04 ~
&.:J 0.001 .l..-~_ _ _ _ _ _ _ _ _
1.
i--____..L..:" - '
0.1 1 2 3 456 7
Wave Number. k (km") Wave Number, 'k (km" )

Figure 2-b. Fourier spectrum distribution Figure 2-c. Distribution of the phase angle

_I
and the pbase angle difference

2.5 10 17.5 25 32.5 40 47.5 55 82.5 70 71.5 85 92.5 100


Distance (km)
Figure 3-a. One dimensional rainfall distribution of uniform rain
(July 10, 1988, 5:~:OO)
• Phase Angle 0
• Phase Angle ~c:e 81-I 56
180 • "! .... • ~ _ "'! _ 'f • '!
•i ..l e i ·· l ••~
: , ~ . =a t . :. :
,. •a!i:
i! II", .,···i'J+··..- ······. ·.·..···~··i· ... ~·.......····
. 1 .,. r. ~.. !II. ! - .1
J o
• i
-..
:
'---1JI..a"i·
I · ••
,~ ; . : • ' ri .\.
~ :

i·..·..l-r·····~··········!,....···
r "'.· .,T"i .....
R •• !: .! ;. ' I · i ~ 1
,90 c,..._·..
! i. ! •• ~ .....
·.,·--·i.......·i·........!-.......1 .. ..... .

,180.•• :
: - - :"!II" .,;,,'"
.~ i " i .. k !
!II. :
~: : :
o 1 234 5 6 7
Wave Ntmb8r, k (km") Wave Ntmber, k (km" )

Figure 3-b. Fourier spectrum distribution Figure 3-c. Distribution of the phase angle
and the phase angle difference
352 U. MATSUBA YASHI ET AL.

RESULTS OF THE ONE DIMENSIONAL ANALYSIS

In the following discussions, one dimensional rainfall fields are analyzed because the
simplified treatment is helpful in understanding the fundamental characteristics of the rainfall.
Figure 2 and 3 show typical examples of rain events and the results of the analysis. Figure 2-
a is an example of uniformly distributed rainfall and Figure 3-a, a rainfall with a single peak.
These two types of events are called as "uniform rainfall" and "single peak rainfall" in the
following discussions. In between these typical cases, there is a type of rainfall with several
peaks in one event and is called "complex rainfall." Both lengths of the rain field analyzed in
these figures are 100 kIn.

Fourier spectrum Ail

Figures 2-b and 3-b shows the Fourier spectrum A,.2 in relation with the wave number ki for two
rainfalls. From these figures, it can be seen that Ai2 decreases with an increase of the wave
number ki for both cases. However, the different characteristics are also found for the two types
of rainfall, that is, the single peak rainfall has rather large values of Ai2 in almost the whole
region of ki and is convex in the low wave number range. On the other hand, the uniform rainfall
shows smaller Ai2 values and linear recession in a log-log axis.
In the modeling of the Fourier spectrum, three types of relationships (equations 4 to 6) are
used based on the previous discussion. One of these relationships is from the analogy of
turbulence, where the equation is assumed from Kolmogorov's power law. Another is from the
exponential auto-correlation function of rainfall. The other is a combined form of these two
relations. These relationships are applied to the observed spectrum and is shown as a solid line
for equation 4, as a broken line for equation 5 and as a dotted line for equation 6 in Figures 1-
b and 2-b. From these figures, it may be concluded that equation 4 is applicable to the spectrum
of uniform rainfall for a wide range of wave numbers, but for the single peak rainfall, the solid

SO-r--------------.
• Sharp Rain
90~--------------------------~
40 III Complex Rain
• Sharp Rain
67.5
m Complex Rain
C Uniform Rain
~ ,
~ 45 ~
~
II..
10 22.5 '

o 0.83 1.7 2.5 3.3 4.2 0.075 0.15 0.225 0.3


a
Figure 4. Histogram of A. Figure S. Histogram of a
CHARACTERISTICS OF FOURmR SPECTRUM OF RAINFALL 353

line shows a remarkable deviation from the observed data in the lower range of kj. Because the
vertical axis is logarithmic, the deviation cannot be ignored. This difference may originate
from the non-uniform characteristics of the minfall in nature while equation 4 is derived on the
assumption of a unifonn field. Figure 4 shows the histogram of parameter A in equation 4 for
363 cases of rainfall. In this figure, rainfall events are classified into three types, namely, single
peak rain, uniform rain and complex rain. This figure shows that a single peak rain has a large
value of A compared to uniform rainfall. It is interesting to note that the overall average value
A(2.28) is almost comparable to Kolmogorov's theoretical value of 5/3. On the other hand,
equation 5 seems to be applicable especially in the lower range of kj. Although the apparent
errors in the larger range of kj are big, real errors of the speCtrum are small. Figure 5 shows the
histogram of the parameter a. This figure shows that a is large for the single peak rain event
and almost zero in the uniform rain event which converges to equation 4.
Compared to equation 4 and equation 5, equation 6 is most applicable for both single peak
rain and uniform rain because of its high degree of freedom to express the Fourier spectrum.
The estimated dotted line has a good fitting in both small and high range of kj.
In addition to the deterministic parts discussed above, random fluctuations in the Fourier
spectrum is also observed in Figures 2-b and 3-b. The deviation of plots from the deterministic
trend is not clear, but it seems to depend on the type of rainfall whether it is a single peak rain,
a complex rain or a uniform rain.

Phase angle 8i

Figure 2-c and 3-c show the phase angleq; and the phase angle difference .18;(=8;-8;-1). From
these figures, an obvious difference can be seen between the single peak rainfall and uniform
rainfall. The uniform rainfall shows random scatter in both 8; and .18;. On the other hand, the
phase angle 8; of the single peak rainfall shows almost a linear change and consequently ..18;
takes a constant value, to understand these differences, it should be recognized that the phase
angle 8; determines the location of the peak of the basic wave, and that the other phase angle 8;
has meanings with respect to 81. In addition, for the Fourier series where the peak of the sine
curve coincides at a certain point which produces a single peak rain, it can be proved
theoretically that .18; should satisfy the relation .18j = 81 -nl2. These characteristics can be
found in figure 2-c of the single peak rainfall.
Although the deterministic component is dominant in the pbase angle of the single peak
rain, a random component is also observed around the deterministic part. The randomness in
the phase angle, however, is found to be strong in the order of uniform rain, complex rain and
weak in the single peak rain.

The magnitude of the stonn and tbe amplitude of tbe Drst term

As explalJ1ed above, the spatial distribution of rainfall is described by the average rainfall aol
354 U. MATSUBAYASHIET AL.

2 and the Fourier coefficient Ai. Among these properties, afl2 is determined from DAD
analysis and Ai can be determined by equation 4 to equation 6 for adequate parameters.
Parameter Ai, however, also cannot be independent of the rainfall magnitude. Here we focus
on the AI, the amplitude of the filst term as the representative parameter, and relate it to the areal
average rainfall. Figure 6-a and figure 6-b show the relationship between Al2 and the square
of the areal average rainfall for sharp and uniform rainfall. Although some scatter was

300 300
200 200
100 100
o o
o 50 100 150 200 o 50 100 150 20(
Square of Average Rain «mm/h~) Square of Average Rain «mm/h)2)
Figure 6-a. Relationship of A/ to r2 at a Figure 6-b. Relationship of A/ to r2 at a
single peak rain uniform rain
observed, almost linear relationships can be assumed. It can also be agreed with that the slope
of the relationship for single peak rainfall is steeper than that for uniform rainfall. Because
Figure 6-a and Figure 6-b are plotted by selecting typical homogenous and single peak rain
events, complex rain plots may fall in between these two linear relationships.
For the purpose of simulation these results promise to determine not only aol2 but also Ai
based on the return period of the design rainfall.

CHARACI'ERISTICS OF TWO DIMENSIONAL RAIN FIELD

The two-dimensional field is described by equation 1. Among the parameters in equation 1 the
Fourier spectrum Aij2 is expressed in equation 7 by multiplying equation 5 applied to two
directions. This expression is based on the assumption of independence of the process in two
directions. This simplified approach, however, can express heterogeneity of the field which is
reported by Zawadzki (1974), Matsubayashi et al. (1987) and Oshima (1972).

A2 _ 16 r20 (7)
2 X 2 k2)
U-( ax2 + k xl a y + yl

Figure 7-a and 8-a are the typical examples of the single peak rainfall and uniform rainfall
respectively. Figure 7-b and 8-b are the obsetved Fourier spectrum distributions and Figure 7-
c and 8-c are the distribution estimated by equation 7. In Figure 7-a and 7-c, the Fourier
spectrum in the small kx zone are relatively large which means that the rainfall field is
prolonged in the N-S direction. This assures that equation 7 can express heterogeneous
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 355

Figure 7-a. Two dimensional (2-D) rainfall distribution


of single peak rain (September 20, 1988, 6:00-7:00)
78.71 82.0

;:::
E
§ a.=O.060
E a ,=0.33
2
U
"c.
(/)

Wave Number,k,(km") Wave Number,k,(km,1)

Figure 7-b. Observed 2-D spectrum Figure 7-c. Estimated 2-D spectrum
-I
Wave Number,k y (km ) Wave Number,k y (km- 1)
0.961 .26

~~~~
'E
:!!.

Figure 7-d. 2-D phase angle (J'<ij Figure 7-e. 2-D phase angle (JYij
356 U. MATSU BAYASH IET AL.

Figure 8-a. Two dimensional (2-D) rainfall distribution


of uniform rain (September 25, 1988, 11:00-12:00)

a.=O.0l2
a, .... O.070
23.14
21.47

.1 Wave Number,ky(km"'J

Figure 8-b. Observed 2-D spectrum Figure S-c. Estimated 2-D spectrum

Wave Number,k y (km- 1 )

~~~~~~.26

-
Figure S-d. 2-D phase angle ()"ij Figure S-e. 2-D phase angle ()yij
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 357

characteristics of the field. It can also be seen that they have similar characteristics with the one-
dimensional analysis. These results shows the applicability of equation 7 to express the two-
dimensional Fourier spectrum.
As for the phase angle, Figure 7-d, e and Figure 8-d, e shows the distribution of 8xij and Byij
of single peak rainfall and uniform rainfall. The thick and thin lines are contour lines of positive
and negative phase angles. A cyclic change of the phase angle is observed in the single peak
rainfall, on the contrary, homogenous rainfall shows a random distribution. These character-
istics correspond to the ones observed in the one-dimensional case.

SIMUlATION OF THE ONE·DlMENSIONAL RAINFALL FIELD

Based on the characteristics of the rainfall described above, the simulation procedure for the

120
)..-3.9
~ 88.75
.s 57.5
a s -O.04
Averl9t of Rat na 13.6 mm/h
~
iii 26.25
cr
-5
2.5 10 17.5 25 32.5 40 47.5 55 62.5 70 n.5 85 92.5 100
Oi stence (lem)

Figure 9. Example of simulation for a single peak rain

120
~ ). -Z.7
E 88.75 s
.s 57.5
as-O.OZ
Averl9t of Rain-9 .ee mm/h
~iii
cr 26.25

-5
2.5 10 17.5 25 32.5 40 47.5 55 62.5 70 n .s 85 92.5
.-100
Dlstence (km)

Figure 10. Example of simulation for uniform rain

one-dimensional rainfall field is proposed here. The two-dimensional case is not presented
because the data analyzed are not sufficient to obtain reliable characteristics of the parameters.
The procedures are as follows:
1) Evaluate the areally averaged rainfall aol2, the parameter of spectrum distribution as, A.s and
the phase angle (JJ of the first term.
2) Evaluate Ai2 from the relationship between Ai2 and average rainfall.
358 U. MATSUBAYASHIET AL.

3) Calculate Al.2 distribution based on equation 4,5 or 6 with random noise which will express
the scatter observed in Figures 2b and 3b.
= =
4) Calculate the 8; by using ti(Ji 8; - 8;-1 81 - 9lr and a certain scatter for single peak rainfall
and by using uniform random distribution for uniform rainfall.
5) Calculate rainfall distribution by equation 2.
Figures 9 and 10 show two examples of simulation for single peak rainfall and uniform
rainfall respectively. In these figures, especially in Figure 9, it is found that rainfall is negative
in some points for the single peak rainfall case. This is an intrinsic characteristic of the Fourier
series and is difficult to remove. But the effect of these negative values is small enough in
practice. These two results, compared to the observed rainfall events, are feasible as the natural
rainfall.

CONCLUSIONS

In this paper, the Fourier series is utilized to simulate the design rainfall. It is easy to introduce
the return period to the simulated rainfall. The characteristics of the Fourier coefficient and
phase angle are discussed in which they play an important role in reproducing the spatial
distribution of the rainfall.
Results obtained here are summarized as follows;
1) The return period can be explicitly included in arJ2 through DAD analysis.
2) The Fourier spectrum for uniform rainfall can be formulated by a similar relationship as
Kolmogorov's law of turbulence. On the other hand, the spectrum for the single peak rainfall
with a single dominant peak can be derived from the exponential auto-correlation function.
3) The formulation incorporated both Kolmogorov type and exponential auto-correlation type
can be adopted to almost all rainfall.
4) The Fourier coefficient of the first term can be linearly related to the areally averaged
rainfall. Therefore, these coefficients can also be related to the return period.
5) The phase angle varies randomly for uniform rainfall and linearly change with the number
of terms for the single peak rainfall.
6) It is shown that the proposed simulation procedure can reproduce rainfall field with similar
characteristics with natural rainfall.

REFERENCES

1) Austin, P.M. & Houze, R.A.(1972) "Analysis of the structure of precipitation patterns in
New England", Jour. of Applied Meteorology, Vol. 11 ,926-935 ,.
2) Amorocho, J. & Wu, B.(1977) "Mathematical models for the simulation of cyclonic storm
sequences and precipitation fields", Jour.Hydrology ,32,329-345.
3) Bras, R.L. & Rodoriguez-Iturbe, 1.(1984) "Random Functions and Hydrology", Adison
Wesley Publication, Menlo Park, California.
4) Corotis, R.B.(1976) "Stochastic considerations in thunderstorm modeling", Jour.of the
Hydroulic Division ,ASCE , HY7 , 865-879.
CHARACTERISTICS OF FOURIER SPECTRUM OF RAINFALL 359

5) Hobbs, P.V. (1978) "Organization and structure of clouds and precipitation on the
mesoscale and microscale in cyclonic storm", Review of Geophysics and Space Physics, 16,
No.4,741-755
6) Kavvas, M.L., Saquib, M.N. & Puri , P.S.(1987) "On a stochastic description of the time-
space behavior of extratropical cyclonic precipitation field", Stochastic Hydraulics, Vol.1, 37-
52.
7) Matsubayashi, U. and Takagi F. (1987) "On the probabilistic characteristics of point and
areal rainfall", Proc. of International Conference on Hydrologic Frequency Modelling, 265-
275.
8) Matsubayashi, U.(1988) "On the simulation of the rainfall of the extratropical cyclones",
Bull. Nagoya University Museum, 4,81-94.
9) Meija, J.M. & Rodriguez-lturbe, 1.(1974) "On the synthesis of random field sampling from
the spectrum: An application to the generation of hydrologic spatial process", Water
Resour.Res., Vo1.10, No.4, 705-711.
10) Oshima, T.(1992) "On the statistical spatial structure of rainfall field", Graduation thesis,
Civil Engineering Department, Nagoya University.
ll)Waymire, E. , Gupta, V.K. & Rodriguez-Iturbe, 1.(1984) "A spectral theory of rainfall
intensity at the mesoscale", Water Resour.Res., Vo1.20, No.10, pp.1453-1465.
12) Zawadzki, 1.1.(1973) "Statistical properties of precipitaion patterns", Journal of Applied
Meteorology, Vo1.12 ,459-472.
PART VIII

TOPICS IN STREAMFLOW MODELLING


CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF
STREAMFLOWS

T. KOJIRI1, T.E. UNNY2, U.S. PANtP


lDepartment of Civil Engineering, Gifu University,
Gifu, Japan, 501 11
2Systems Design Engineering, University of Waterloo,
Waterloo, Ontario, Canada N2L 3G 1
3Department of Civil Engineering, Lakehead University,
Thunder Bay, Ontario, Canada P7B 5E1

Traditional methods of streamflow analysis and synthesis are based upon information
contained in individual data. These methods ignore information contained in and among
groups of data. Recently, the concept of extracting information from data groupings
through pattern recognition techniques has been found useful in hydrology. For
streamflow analysis, this paper proposes several objective functions to minimize the
classification error encountered in currently used techniques employing minimum
Euclidean distance. The relevance of these functions has been tested on the streamflow
data at the Thames river at Thamesville. Specifically, three objective functions
considering the properties of shape, peak, and gradient of streamflow pattern vectors are
suggested. Similar objective functions can be formulated to consider other specific
properties of streamflow patterns. AIC, intra and inter distance criteria are reasonable
to arrive at an optimal number of clusters for a set of streamflow patterns. The random
initialization technique for the K-mean algorithm appears superior, especially when one
is able to reduce initialization runs by 20 times to arrive at optimal cluster structure. The
streamflow synthesis model is adequate in preserving the essential properties of historical
streamflows. However, additional experiments are needed to further examine the utility
of the proposed synthesis model.

INTRODUCTION

Various kinds of data related to hydrologic variables such as precipitation, snow fall and
streamflows are measured at a number of points using various equipment with the_view
of assessing and controlling the water resources systems. Several techniques exist for
separately handling the time sequenced and multi-point data. However, one can extend
363
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 363-380.
© 1994 Kluwer Academic Publishers.
364 T. KOJIRI ET AL.

such techniques to time sequenced multi-point data by considering all kinds of data at a
measurement point to form time sequenced data vectors. Such vectors can then easily be
treated and analyzed as pattern vectors [panu et al. (1978) and Unny et al. (1981)].
Based on the consideration of spatial and temporal correlations, one can classify
time sequenced spatial pattern vectors corresponding to precipitation or streamflows for
the extraction of representative reference vectors. Similarly, the differences among
pattern vectors can be utilized to classify them by incorporating information related to
precipitation, meteorology, geology, physiography etc.
Consideration of groups of data makes the process of estimation and prediction
easier. It is in this vein that the capability of pattern recognition techniques in handling
the time sequenced multi-point data becomes readily useful.
A pattern recognition system (PRS) utilized by Panu et al (1978) and Unny et al
(1981) for streamflow analysis and synthesis was based on the minimum Euclidean
distance concept. In their investigations, it was found that in some cases, the minimum
distance concept tends to misclassify streamflow patterns.
In such cases, additional constraints were invoked to minimize the classification
error. It is noted that the misclassification of streamflow patterns by PRS is caused by
the consideration of the entire shape of patterns in the minimum distance concept. To
overcome such difficulties in the PRS for classification of streamflow patterns in
particular and hydrologic patterns in general, the following classification functions are
suggested.

OBJECTIVE FUNCTIONS FOR CLASSIFICATION

Streamflow patterns inherently possess several characteristics. Among them, the most
obvious is the peak flow. Other characteristics could be long periods of low flows. The
following objective functions are based on some of these characteristics. Each of these
functions considers a specific property of streamflow patterns. These functions are later
shown to be effective in dealing with specific problems of streamflow analysis and
synthesis such as in cases of flood and drought conditions.

Objective function-one [OFl]

This function examines the shape aspect of streamflow patterns in the form of individual
distances as follows.

OFl [Xi, zj] (1)

Where, Xi(t) is observed or transformed data value at time, t of the ith pattern vector; zj(t)
is the value of the jth reference vector (or cluster centre) at time, t. The absolute
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 365

difference at each element is normalized. A small value of OFI indicates strong


similarity between the observed pattern vector and the reference vector. It is noted that
OFI determines the degree of similarity based on the variations in each element rather
than the entire shape of the pattern vector.

Objective function-two [OF2)]

The peak discharge is the single most important characteristic influencing any flood flow
analysis and planning of flood control structures. The function defined below examines
the peak flow component of a pattern vector and consequently facilitates the classification
of all the patterns related to flood flows.

OF2[Xi, zj] = Ixi pet) - zj pet) 1


(2)
z j p (t)

The subscript, p indicates location of the peale

Objective function-three [(OF3]

The streamflows tend to rise and fall sharply (Le., steep gradient) in response to rain or
snowmelt conditions. Such rises and falls are more pronounced in daily rather than in
monthly streamflows. Further, the rises and falls are milder (Le., mild gradient) during
the low streamflow periods. The gradient can thus distinguish the streamflow patterns
with sharp fluctuations from those with low fluctuations, while such streamflow patterns
may have the same value of the OFl function. The objective function OF3 based on
normalized gradient is given below.

OF3[Xi,Zj] =max[ I(Xi(t)-xi(t:l»-(~j(t)-zj(t+l»


t l P(zJ(t)-zJ(t+l»
I] (3)

Where, {3 represents the normalizing factor for comparing the above three functions
(OFI, OF2 and OF3) in the same order of magnitude.
In some situations, one may need all the above functions collectively to improve
upon the classification process. In such cases, one can formulate an aggregate objective
function (OFa) as follows.
OFa [X i, Z j] =max[oFl (x i , Z j) , OF2 (x i , Z j) , OF3 (x i , Z j)] (4)

The aggregate function collectively involves all the above three functions. Therefore, this
function can be used for simultaneously classifying streamflow patterns corresponding
to different events, measurement points, and seasonal variations.
366 T. KOJIRI ET AL.

CLASSIFICATION PROCEDURE FOR STREAMFWW PATTERNS

The objective functions OFt, OF2, OF3, and OFa are used in the K-mean algorithm for
classification of streamflow patterns. The manner in which the K-mean algorithm is
applied for classification is described in the Appendix. Further, the bias in selection of
initial centres of clusters is avoided by using the random initialization of K-mean
algorithm suggested by Ismail and Kamel (1986).

STRUCTURAL RELATIONSHIPS IN MULTIVARIATE DATA

For processes obeying linear transition among observed data at different time periods,
one would obtain the same number of clusters (or reference vectors) containing the same
number of pattern vectors within a specified time period. Most of hydrologic processes
are inherently non-linear and as a result in an actual process, one may obtain a different
number of clusters, or the clusters may have different combinations of pattern vectors.
The structural relationships among various clusters within a process or among processes
are evaluated through the concept of goodness of fit. Secondly, one defines the
conditional probability of occurrence, p(j/j') of a reference vector [Suzuki (1973)] as
follows:

k(j)
p(j/j') = n(j/j')/L, n(j/j') (5)
j=l

Where, n(j/j') is the number of pattern vectors associated with the cluster, j given j' has
occurred. k(j) is the number of clusters considered for the analysis. It is advantageous
to develop structural relationships among clusters exhibiting higher correlations within
a process or among processes. Such structural relationships are, in tum, utilized in the
prediction or simulation of streamflow patterns. The Markovian structure among clusters
is obtained as follows:
k(u)
p(j/j') > L, p(u/u') V u excluding j (6)
u

k(j) k(u) k(j) k(u)


i.e., [n(j/j')/L, n(j/j')] > L, n(j,u)/L, L, n(j,u) (7)
j=l u, u~j j u, w.j
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 367

DEVEWPMENT OF SIMULATION AND PREDICTION ALGORITHMS

Simulation algorithm

A process having no correlation structure among measurement points, events, and/or


seasons can be simulated independently. However, the streamflow patterns exhibit
correlation among them and therefore, can be synthesized by following a procedure
suggested by Panu and Unny (1980a, 1980b), where the conditional probability of
occurrence of pattern vectors and the normal distribution of the intra distance are
utilized. In this paper, streamflow patterns are considered to belong to two seasons and
are simulated as follows:

Step 1: Generate a sequence of clusters according to the Markovian probability of


their occurrence.

Step 2: Synthesize each cluster to its pattern vector by using a multivariate normal
distribution.

Step 3: Test whether the elements of a synthesized pattern vector lie within their
specified limits. If not, synthesize another pattern vector until its elements
are found within limits.

Step 4: Return to Step 3, until an acceptable pattern vector corresponding to each


cluster in step 1 is found.

Prediction algorithm

Assuming the membership functions to be exponentially distributed and utilizing the


concept of fuzzy inference, the pattern vectors are predicted [Kojiri and Ikebuchi (1988)].
Beyond the observed season, the serial sequences are predicted by combining the fuzzy
inference with the expectation method. In general, a real time prediction of a pattern
vector is used for forecasting flood or drought events. A pattern vector is forecast based
on the value of OFI between the actual observed pattern vector and its representative
reference vector as follows:
(8)

(9)
368 T. KOJIRI ET AL.

Further, assuming that the fuzzy membership function of each cluster has the same
weights as the frequency of occurrence of a cluster, the membership function is
represented as follows.

k(i)
Vj =exp { (-a j h j D&'served) / Ei h (i) } (10)

Where hj denotes the frequency gained in the classification procedure and a;; is a constant
depending on the logic situations of the distance i.e., large, medium, and small related
to l)iobserv.... One can then predict the pattern vector based on the fuzzy inference
technique [Kojiri et al (1988)] as follows:
k(j) k{j)
Predicted Pattern Vector = E Xiredicted/ E
j j
Vj (11)

APPUCATION OF THE METHODOWGY: A CASE STUDY

The Thames river basin covering 4300 krn2 area at Thamesville was selected to test the
applicability of the proposed pattern synthesis and forecasting procedures. The monthly
discharge and precipitation records are available from October 1952 to September 1967.
The mean monthly discharge values are used in the analysis. Based on the correlogram
and spectral analysis, the discharge data was divided into two seasons: a dry season from
October to March, and a wet season from April to September. In general, every seasonal
segment appears to be different from the rest, and the variation in standard deviation for
some months is very large.
The seasonal segments (or pattern vectors) are now clustered into groups to derive
the structural relationships among them. The K-mean algorithm is used for grouping the
seasonal segments. A random initialization technique [Ismail and Kamel(1986)] is used
to achieve the global optimum. Because, the behaviour of the K-mean algorithm is
influenced by several factors such as the choice of initial cluster centres, the number of
cluster centres, the order in which seasonal segments are considered in clustering
process, and the geometrical properties of seasonal segments. Several test runs indicated
that four clusters would be adequate to capture the relationships among and within
various seasonal segments. In general, there exists ISC4 combinations to group 15
seasonal segments into four clusters in each season. To find out the minimum possible
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 369

run of the K-mean algorithm for optimal cluster configurations, 200 runs of the K-mean
algorithm were made to group 15 seasonal segments into four clusters. The value of OF1
was evaluated for each run [Figure 1]. From this figure it is apparent that a significantly
small value of OF1 has occurred twice in 200 initial runs of the K-mean algorithm.
These significantly small values can be attributed to a situation when four clusters have
attained optimal cluster configuration, i.e., a condition when the intra distance DK(K)
is minimum and inter distance EK(K) is maximum. Therefore, the number of initial
conditions could be appreciably less [Table 1] for various combinations. Further, this
table also contains the values of the intra distance DK(K), inter distance EK(K) and the
Akaike Information Criteria (AIC) [see; Appendix] for the OF1. The values of DK(K),
EK(K) and AIC are plotted against the number of clusters in Figure 2. An examination
of the figure and the table indicates that for a case of four clusters, and a reasonable
number of 100 initialization runs, the value of AIC is minimum, the intra distance is
continuously decreasing up to four clusters and the rate of decrease is very small from
four to eight clusters, and the inter distance is fluctuating but is maximum for the four-
cluster case. Based on such considerations, it was assumed reasonable that four clusters
sufficiently describe the variability of pattern vectors in both the seasons. Considerations
of intra and inter distances and the values of AIC provide a useful but somewhat
inflexible method of obtaining optimal number of clusters for a set of given pattern
vectors.

3.5

.......
..... 3
u..
Q.
c 2.5
0
:p
(.)
c 2
::J
u..
4)

~
(.)
1.5
4)
"JS
0

0.5

0
0 20 40 60 80 100 120 140 160 180 200
Run Number

Figure 1. Sequence of Values of Objective Function One [OF1].


370 T. KOJIRI ET AL.

Table 1. Summary of AIC, Intra, and Inter Distances Using OF!

Number of Run Intra Inter


Clusters Number Distance Distance Ale
1 2 2.811 0.000 n/a
2 50 2.373 0.399 20.966
3 70 2.007 0.735 16.199
4 100 1.250 0.876 14.084
5 150 0.942 0.716 15.124
6 250 0.942 0.686 16.383
7 300 0.942 0.472 17.489
8 300 0.942 0.681 18.792
9 250 0.436 0.967 18.880
10 150 0.436 0.433 20.741
11 100 0.436 0.433 22.406
12 70 0.299 0.433 24.243
13 50 0.203 0.344 26.128
14 2 0.181 0.344 28.059
15 1 0.000 0.344 30.000

4 30

3.5
25
en 3

-...
<D
0
5i 20
en 2.5
i5
2 15 0
~ <
"0
fa 1.5
i 1
10

5
0.5

0 0
0 2 4 6 8 10 12 14 16
Number of Clusters

Figure 2. Optimal Number of Clusters as Function of DK(K.), EK(K.), and AIC


CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 371

Another method of obtaining optimal number of clusters is through multi-


optimization technique. In this technique, the target (goal) intra distance (DK(K» is
defined as minimum and the target inter distance (EK(K» as maximum of the intra and
inter distances of all the feasible clusters. The values of intra and inter distances and the
associated target distances when plotted [Figure 3] define the transformation curve (TC).
Because, all the conditions for the multiplication factor ('Y) are not satisfied, the
indifference curve (IC) becomes a straight line parallel to the line passing through the
points defined by the cluster-l and cluster-IS. Incidently, these points also exist on the
ends of the transformation curve. The optimum solution again lies at the four-cluster
case.

3~-----------------------------------'
C (Goal: de, eo)

2.5

2
a>
0
c:
!9
Ul
0 1.5
L- OplimaJ Poinl
...-c:a> / Transformotlon Clrve (TC)

1 ---~---- ___41 ~
--------3~

'2
- - - - _ _ _ .!..rId~ferent::e Ctrve QC)A (d l l e 1 )
B (~, ell - - :- - - - - - - ____ ~" 1
o ----~
o 0.5 1 1.5 2 2.5 3
Intra Distance

Figure 3. Optimal Number of Clusters as Function of Multi-Optimization


Criterion.
372 T. KOJIRI ET AL.

Based on the results of above considerations, the constraints for the K-mean
algorithm are obtained [Table 2]. The values of constraints are found to be less than
seven tenth of the maximum value of DK(K) and greater than two tenth of the inter
distance EK(K). The optimum number of clusters is obtained as the minimum number
satisfying the constraints that such a number should not be greater than half the total
number of pattern vectors. In cases, where neither the inter distance nor the intra
distance satisfies the constraints, the inter distance takes priority over the intra distance.

Table 2. Constraints Used in AIC, Intra and Inter Distances for Obtaining Optimal
Number of Clusters

Classification Method Optimal Cluster Number Constraints

Intra-constraint < 0.7 x


max {intra (1 - IS)}, and
> half the total number of pattern Vectors.
K-means Algorithm 4

Inter-constraint > 0.2 x


[max {inter (2- IS)}
- min {inter (2-1S)}] + min{inter (2-1S)}

Multi-Optimization 4 None

AlC 4 None

The same optimum number of clusters was obtained using the K-mean algorithm
for various cases of the objective functions [Table 3]. The objective functions OFa and
OF! render the same structure for optimal number of clusters because the resulting
values of OFa are strongly influenced by the function OFl. However, OF2 function
related to peak and OF3 function related to gradient give different structure to optimal
number of clusters. These functions evaluate properties of pattern vectors such as
occurrence of peak flows or gradient between successive events and as a result, deals
with properties which are least correlated. It is in this vein that these functions will
provide optimal structure of clusters in specific situations such as flood or drought
analysis.
The Markovian transition from one cluster to another is summarized in Table 4.
The cluster centres in each season are exhibited in Figure 4. As each reference vector
is unique, the OFa function has been effective in classifying streamflow data, especially
for peak considerations. It is noted that if one were to consider drought characteristic,
one would replace the OF2 function to reflect the low flow characteristics.
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 373

Table 3. Configuration of Optimal Number of Clusters for Various Objective Functions


Discharge Objective Number of
Data Set Function Clusters Cluster Configuration
Cluster-I: 13
Dry Season OFa 4 (optimum) Cluster-2: 3,6,8,9
Cluster-3: 1,2,4,5,7,11,14
Cluster-4: 10, 12, 15

Cluster-I: 13, 14, 15


Wet Season OFa 4 (optimum) Cluster-2: 11
Cluster-3: 1,2,3,5,6,8,9,10,12
Cluster-4: 4

Cluster-I: 13
Dry Season OFI 4 (optimum) Cluster-2: 10,12,15
Cluster-3: 3,4,6,8,11
Cluster-4: 1,2,5,7,9,14

Cluster-I: 1,3,4,6,9,10,11,12,13,14,15
2 (optimum) Cluster-2: 2,5,7,8

Dry Season OFJ Cluster-I: 1


4 Cluster-2: 2,4,6,7,9,10,11,12,14,15
Cluster-3: 3,5,8
Cluster-4: 13

Cluster-I: 1,2,3,4,5,6,7,10,13,15
2 (optimum) Cluster-2: 8,9,11,12,14

Dry Season OF2 Cluster-I: 8,9,12,14


4 Cluster-2: 11
Cluster-3: 1,2,5,6,10,15
Cluster-4: 3,4,7,13

Based on the above cluster configuration and their intra and inter structural
relationships, streamflow patterns were synthesized for the Thames River at Thamesville.
The observed and synthesized Markovian transition probabilities for various clusters are
summarized in Table 5. In this table, the variation between the observed and synthesized
Markovian structure is less than 5 % . In other words, the Markovian structure is
preserved in the synthesized streamflow patterns. A few sample realizations of
synthesized streamflow patterns are exhibited in Figure 5. The variations in these
realizations indicate the flexibility of the proposed procedure in synthesizing the extreme
as well as the normal streamflow characteristics.
The results of the forecast model are given in Figure 6. The forecast sequence at
three sequential time stages from April, May and June 1966 are made on the assumption
that these data points are not known. The forecast model needs further improvements.
374 T. KOJIRI ET AL.

120

Ca)
100

:@" 80
CO)
<
.s
(I) 60
01
la
.s::.
0
.!!!
0 40

20 "'I" ............... ..

0
0 2 3 4 5 6
Elements of Pattern Vectors

120
.
.,
(b) "
!\
100 ! ~

l \
.
.
~
. \
! ,
.. .
~
:§: 80
CO) f \
,
.s.
(
! ~
: "
CD 60 : \
! \
~
.c. :
.'
.\
\
'
0 2" :
is 40
:
','
'0'
t·"
'"
I

\

\
1" '" ,
3 ."......... ...... \
20 .;
..
, "

'- '"
1

\ I """
/
••••• ~"'" \ . - - .. - - - - - .... - ................... J-#"

...... -...... -~-:.-:.:--:.::---::-:.--..-.-.--.............. ..


0
0 2 3 4 5 6
Bleaents of Pattern Vectors

Figure 4. Representative Reference Vectors for (a) Dry Season [Oct. to March]
and (b) Wet Season [April to September].
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 375

Table 4. Summary of Historical and Simulated Markovian Probability


of Occurrence of Various Clusters in a Season

Case (a): Probability of Occurrence: Dry Season to Wet Season


/" Clusters of Wet Season

/" 1 2 3 4
.... c: 0.0 (0.0) 0.0 (0.0) 1.0 (1.0) 0.0 (0.0)
o ~ 1
~ ~
.... tI:l 2 0.496 (0.5) 0.0 (0.0) 0.26 (0.25) 0.244 (0.25)
~~ 3 0.135 (0.143) 0.0 (0.0) 0.602 (0.571) 0.263 (0.286)
Uel
4 0.0 (0.0) 0.330 (0.333) 0.316 (0.333) 0.354 (0.334)
~ote: HistOrIcal values of probalJihties are given in parenthesIs.

Case (b): Probability of Occurrence: Wet Season to Dry Season


/' Clusters of Dry Season

/' 1 2 3 4

.... § 1 0.0 (0.0) 0.660 (0.667) 0.340 (0.333) 0.0 (0.0)


o rIl

~ ~ 2 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 1.0 (1.0)


2t1:l
rIl ....
=' 0 3 0.134 (0.143) 0.144 (0.143) 0.563 (0.571) 0.159 (0.143)
O~
4 0.0 (0.0) 0.259 (0.250) 0.502 (0.500) 0.239 (0.250)
-lote: Historical values of probabliities are given in parenthesis.

Table 5. Summary of Observed and Simulated Probability of Occurrence of Clusters of


Wet Season given that the Occurrence of Cluster-3 of Dry Season

Cluster Number Observed Probability Simulated Probability Percent Error

1 0.143 0.136 4.8

2 0.143 0.138 3.5

3 0.571 0.577 1.1

4 0.143 0.149 4.2


...
Note. The values of Simulated probablhtles 10 thiS table are based on 1000
synthesized realizations of streamflows.
376 T. KOJIRI ET AL.

120

100

'0 80
Cli
(

S
60 3rd Cluster

'\
Q)
Cl
"-
2nd Cluster
co
.J::.
0
UI
is 40

20
..', " ,,,
'

0
~
--
-6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24
Observed Simulated
lime in Monltls

Figure 5. A Sample of Synthesized Realizations of Streamflows.

50

45 U:GENO

Observed
40
-+-- 1=.4
35 -)« ~~
1=5 Predlellon TIme
..·.,
·
~

··· ,,,,
(')
( 30 -~- 1=6 ,,
S
··· ...,.
---7
Q) 25 ,
~
·
·· .,.,
.J::.
0 20
UI ............. ,
is ,
15
,,
,
,
\" ://
10 ,, ,,
,
,, ~'

5 ~
---. '.
-
0
0 2 3 4 5 6 7 8 9 10 11
Season 2 Season 1
TIme in Monltls

Figure 6. A Sample of Forecasted Streamflow Patterns.


CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 377

CONCLUSIONS

Several objective functions are proposed to improve upon the existing pattern recognition
(PRS) system for streamflow pattern analysis and synthesis. Specifically, three objective
functions considering the properties of shape, peak, and gradient of streamflow pattern
vectors are proposed. Similar objective functions can be formulated to consider other
specific properties of streamflow patterns. AlC, intra and inter distance criteria are
reasonable to arrive at optimal number of clusters for a set of streamflow patterns. The
random initialization technique for the K-mean algorithm appears superior, especially
when one can reduce 20 times the initialization condition runs to arrive at an optimal
structure of clusters. The streamflow synthesis model is adequate in preserving the
essential properties of historical streamflows. However, additional experiments are
needed to further examine the utility of the proposed synthesis model.

REFERENCES

Ismail, M.A. and M.S. Kamel (1986) Multidimensional Data Clustering Using Hybrid
Search Strategies. Unpublished report, Systems Design Engineering, Univ. of Waterloo.
Kojiri, T., Ikebuchi, S., and T. Hori (1988) "Real-Time Operation of Dam Reservoir by
Using fuzzy Inference Theory". A paper presented at the Sixth APD/IAHR Conf. held
at Kyoto, July 20-22, 1988.
Panu, U.S., Unny, T.E. and Ragade, R.K. (1978) "A Feature Prediction Model in
Synthetic hydrology Based on Concepts of Pattern Recognition". Water Resources
Research, Vol. 14, No.2, pp. 335-344.
Panu, U.S., and Unny, T.E. (1980a) "Stochastic Synthesis of Hydrologic Data based on
Concepts of Pattern Recognition: I: General methodology of the Approach". Journal of
Hydrology, VoL, 46, pp. 5-34.
Panu, U.S., and Unny, T.E. (1980b) "Stochastic Synthesis of Hydrologic Data based on
Concepts of Pattern Recognition: I: Application to Natural Watersheds". Journal of
Hydrology, VoL, 46, pp. 197-217.
Suzuki, E. (1973) Statistics in Meteorology, Modern Meteorology No.5, Chijin-syokan
Co., 4th Edition, pp. 254-261, [in Japanese].
Unny, T.E., Panu, U.S., MacInnes, C.D. and Wong, A.K.C. (1981) "Pattern Analysis
and Synthesis of Time-dependent Hydrologic Data". Advances in Hydroscience, Vol. 12,
Academic Press, pp. 222-244.
378 T. KOJIRI ET AL.

APPENDIX

Classification procedure

By using the K-mea.ns algorithm, the reference vectors at K cluster centres are obtained
as follows:

(i) define K initial cluster centres, in other words, the tentative reference vectors.
The arbitrary pattern vectors are available for these centres. The following
matrices are defined to explain the procedure.
z(j,1)
z(j,2)
Z(j,u) = (12)

z(j,6)

x(i,1)
x(i,2)
X(i,t) = (13)

x(i,6)

Where, Z(j,u) is the jth reference vector at uth iterative step in the K clusters, and
X(i) is the pattern vector consisting of data point x(i,t), t=I,2, ... ,6.

(ii) at uth iterative step, if


OFa[X(i) ,Z(j,u)] <OFa[X(i) ,Z(n,u)]
(14)
for n=1, 2, •. ,Kandn~ j.

then the pattern vector, X(i) belongs to the cluster, j.

(iii) calculate the maximum distance of the cluster, j as follows.

DK(j, u) = :(1) [OFa (X(i) , Z(j, u)] (15)

The new centre at the cluster, j is decided among the pattern vectors belonging
to the cluster, j as follows.
N(j)
Z(j, u+1) =1/N(j) E x(i) (16)
i=l
CLUSTER BASED PATTERN RECOGNITION AND ANALYSIS OF STREAMFLOWS 379

Where, NO) is the number of the pattern vectors in the rearranged cluster, j.

(iv) if Z(j, u+ 1) = Z(j,u), the iteration stops. Otherwise, go back to the step (ii),
iteratively.

(v) calculate the maximum intra distance, DK(K) and the distance between the centres
i.e., the inter distance, EK(K) according to DK(K) at K clusters.

EK(K) =. ~i~""/[OFa(ZK(jIU)IZK(jl,u»] (17)


J , ] IJ""J

DK(K) = mr [ DK(j 1 u) ] (18)

(vi) go back to the step (i) with the next number of clusters, (K + 1). Otherwise, the
iterations are completed.

Upon completion of the above procedure, the optimum number of clusters should be
decided using criteria as described below.

[1] The first criterion is to give the thresholds to the objective function and choose the
minimum number of clusters. For example,

(i) the intra distance, DK(K) which is similar to the objective function at
cluster centres, is less than 3.

(ii) the inter distance, EK(K) is greater than 1.

(iii) the number of clusters is less than half the total


number of the pattern vectors.

The inter distance is calculated as the mean values of distances for same combinations
of the reference vectors, because any reference vector is possible to be the centre of the
objective function [Equation (10)].

[2] The second criterion is to decide the optimum number of clusters through the multi-
optimization technique. The objective function is formulated as the vector as follows.

[DK(K) --min]
EK(K) "max
(19)

When the restriction of EK(K) is relaxed in the optimization and the value of EK(K) is
calculated according to the value of DK(K) in the same cluster number K, each
combination [DK(K), EK(K)] is plotted in Figure 3.
380 T. KOJIRI ET AL.

Considering the arc AB as the pareto optimum, (also called as transformation curve
(TC», one can convert this curve on the equivalent coordinates. The arc AB represents
one part of TC, which is defined by the limited number of clusters. Coordinate of EK(K)
is multiplied by the rate II BC II1II AC II, because the points A and B have the same
weight and the same difference from the goal, which has been found to be the most
desirable position [Kojiri et al (1988)]. The multiplication factor 'Y is calculated as
follows.
y="/«e2-eO)2_ (el-eo)2)/«dl-dO)2- (d2-dO) 2) (20)

subject to e2> e1 and d1 >d2, (21)


or e2 <e1 and d1 <d2 .

Where, (eO, el, e2) and (dO, dl, d2) are the elements of EK(K) and DK(K) coordinates
of the points A, B and C. On an equivalent coordinates, the shortest point from the goal
becomes the optimum. The optimum number of clusters is decided by the corresponding
cluster number. The element dO is taken as the minimum value among all the probable
intra distances or the value zero such that all pattern vectors are same. The element eO
is taken as the maximum value among the probable inter distances or the intra distance
in the case of cluster number 1, where all pattern vectors are in the significant range of
reference vectors. If the necessary conditions are not satisfied, the indifference curve (IC)
is drawn parallel to the line which passes through points A and B.

[3] The third criterion is to estimate the distribution of the pattern vectors using the
Akaike Information Criterion (AlC). Assuming that the K-means algorithm gives
the optimum value of the objective function for pattern vectors distributed
normally around the centre, the maximum log-likelihood of the cluster, j is
represented as follows.

I:
N(j)
w(j) = [(log21t/4) (OFa(x(i),Z(j,u»2] (22)
~

As the whole information is given by summarizing W(j) against j, the optimum number
of the clusters is decided as the number which denotes the minimum value of the
following equation among all the clusters.
K
AIC= ~ w(j) + 2K - min (23)

Where, the second term means the number of the parameters treated in the objective
function. In the multi-optimization and the AIC, it is not necessary to consider the
constraints to arrive at the optimum solution. However, the K-means algorithm has the
flexibility according to the observed pattern vectors.
REMUS, SOFTWARE FOR MISSING DATA RECOVERY

PERRON, HI, BRUNEAU, p.2, BOBEE, B.I, PERREAULT, L.I


I INRS-Eau, University of Quebec, Institut national de la recherche scientifique,
Carrefour Molson, 2800, rue Einstein, Quebec (Quebec), Canada, G1X 4N8
2 Hydro-Quebec, Division Hydrologie, Place Dupuis (17 e etage), 855, Ste-Catherine
east, Montreal (Quebec), Canada, H2L 4P5

INTRODUCTION
In order to manage adequately their water resources, Hydro-Quebec often uses
simulation of energy at different points of the hydrological network. This simulation is
based on monthly means computed from daily observed flows. However, it is possible
that some daily values are missing. Hydro-Quebec will then reject the calculated
monthly mean flow when four or more daily observations are not available or if they
seem incorrect. Furthermore, the monthly means may be rejected for many consecutive
months at certain sites. Also, when a basin is large, more than one station is needed to
obtain a good estimation of flows at sites of reservoirs. As very few stations have
complete series on a long period, it is therefore very important to be able to estimate
these missing values in order to obtain reliable estimates from the energy production
simulation models.
The missing values for a given site are estimated by multiple regression using data
from other sites. Until recently, Hydro-Quebec used the software REMUL, which is an
adaptation ofHEC-4, developped by the US Army Corps of Engineers (Beard, 1971).
HEC-4 and REMUL suffer from many weaknesses mainly due to the theoretical
hypotheses which must be verified prior to using the regression models. Some of these
are:

• No correction or alternative method is available to treat multicollinearity


problems;
• No multivariate method is available to consider correlations between
dependent variables;
• No procedure for model validation through residual analysis is available;
• It is assumed, without validation, that observations follow a log-normal
distribution;

If theoretical assumptions are not fulfilled, data may be incorrectly recreated and the
interpretation of the results may be fallacious. Therefore, in order to overcome those
weaknesses, ReMuS has been developed at INRS-Eau, within a partnership project with
Hydro-Quebec and NSERC.
The first part of this paper shows how it is possible to make data recovery using
multiple regression. In the second part, we discuss problems caused by multicollinearity
and the solution proposed in ReMuS, that is the ridge regression. Also, we explain the
procedure that gives an optimal value of the ridge parameter k. In the third part, we
381
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 381-393.
© 1994 Kluwer Academic Publishers.
382 H. PERRON ET AL.

present the multivariate regression. A procedure that considers the relation between
dependent variables. We show how the parameters are estimated and the data
reconstituted.
Finally, in the last part, we present the software ReMuS and all its characteristics.

THE MULTIPLE REGRESSION

In order to reconstitute missing data, Hydro-Quebec uses multiple regression on data


from neighboring stations. The main purposes of the regression are

• To establish a functional relationship between a variable Y and a set of


explanatory variables X}, X2,··, Xn.
• To predict the unknown variable Y from known values of X]> X 2, .. , Xp

The estimation of parameters and the reconstitution of data (prediction of Y) are


described in sections 1.1 and 1.2.

Estimation of parameters

In the multiple regression procedure used by Hydro-Quebec for extending data series,
the relation between the dependent variable Yand the explanatory variables Xl, X2, .. ,
Xp is linear, i.e. given by the following expression:

(l.)

The parameters, Pi, must be estimated from observed historical data series. This is done
by the least square (LS) method, which consists in minimizing the squared residuals ei :

(2.)

I n
Ifwe let:

n
[11
m
x\2
x"
y =
X 21 X 22 ... xx"
... 2p b = PI
Y2 X = .. e
nxl ;n nx(p+l) : : : (p+l)xl :
and
nxl = (3.)
1 x nl x n2 xnp Pp

it is well-known that the LS-estimators of Pi are given by :

b = (X'Xr 1 X'y (4.)

Prior to making the data reconstitution, it is necessary to examine the residuals to verify
that basic assumptions are fulfilled, especially that :
REMUS, SOFfWARE FOR MISSING DATA RECOVERY 383

• residuals are normally distributed;


• residuals are independent;
• the variance of the residuals is constant.

If the analysis of residuals indicates that the underlying assumptions are violated, it is
usually possible to correct the problem by an appropriate transformation of the
variables. Two common situations where a transformation is needed are when :

1- the relation between the dependent variable and the explanatory variables is not
linear. In this case, one may transform the explanatory variables to linearize the
relation;

2- the residuals are not normally distributed and/or their variance is not a constant.
One may apply a transformation (for instance Box-Cox, 1964) to the dependent
variable

Reconstitution of data

Not only the prediction of the dependent variable is important for Hydro-Quebec, but
also extensions of data series, which preserve the mean and the variance of the observed
series. To this end, one computes first the value predicted by the model:

(5.)

It can be shown that the mean of the estimated value ofY is equal to that of the observed
series. However, the variance is not reproduced by this estimator. Fiering (1923)
proposed to add the random term given by :

(6.)

where ui is a standard normal variate and:

=L (Yi - yy .
n
SeE (7.)
;=1

The data obtained by the equation :

(8.)

preserve the mean and the variance of the n observations Yj. Y2• ...• Yn used in the
regression.
384 H. PERRON ET AL.

RIDGE REGRESSION

Ridge regression (Hoerl and Kennard, 1970a) is a technique used to circumvent


problems caused by multicollinearity. The estimators obtained by ridge regression are
biased, but more stable than those determined by ordinary regression when
multicollinearity is present. In that case, bias is not necessarily a disadvantage.

Estimation of parameters
The problems of inverting the matrix X'X in the classical regression model (eqn I) when
multicollinearity is present can be transposed to the matrix rxx constructed from the
standardized variables XI,.Xn. Ridge regression estimators are obtained by introducing
a constant k?O in the normal equations of the LS procedure:

(9.)

where b R = (b~, b:, ... , b:) is the vector of standardized parameters estimated by ridge
regression, Ip is the p x p identity matrix, and ry.x the vector YX obtained from the
standardized variabels Xj, .. X.IJ. and Y. A constant k IS thus added to each element on the
diagonal in the matrix rXX' This facilitates the inversion of the matrix. The solution of
the system of equations depends now on the constant k.

(10.)

The value of the k is related to the bias of the estimators. If k=O, equation (10) is
equivalent to (4), and the ridge estimators correspond to those obtained by ordinary LS.
When k>O, the estimators are biased, but more stable than the LS-estimators. As for an
ordinary regression, the analysis of residuals can be done, the data can be transformed if
necessary, and finally missing data can be reconstituted by ridge regression following the
same procedure as in the case of ordinary LS analysis.

Determination of k
It can be shown (Hoerl and Kennard, 1970a) that increasing the value of k leads to
increased bias of b R , but its variance decreases. In fact, it is always possible to find a
value of k such that the ridge estimators have smaller mean square error than the
ordinary LS estimators. However, the choice of the optimal value of k is difficult. A
commonly used method for determining k is based on a graphical inspection of traces of
the estimates of the p parameters as function of k. Figure 1 is a typical example of ridge
traces for a model with three independent variables.
In general, the values of the estimated parameters can fluctuate considerably when k
is close to zero, and can even change sign. However, as k increases, the values of the
estimated parameters stabilize. In practice, one examines the ridge traces and chooses
graphically the smallest value of k in the zone where all traces show reasonable stability
(Hoerl and Kenard, 1970b). However, Vinod (1976) showed that this procedure may
lead to an overestimation of k, and devised an alternative method by which k is
estimated automatically. This procedure uses the index ISRM defined by :
REMUS, SOFfW ARE FOR MISSING DATA RECOVERY 385

ISRM = L [S;A -k )
P 2
-1
1 (11.)
;=1 L~
j=1 Aj +k

where AI' A2 , ... , A p are the eigenvalues of the matrix r.\X. The index is zero if the
explanatory variables are uncorrelated. Vinod (1976) suggests to use the value of k
which corresponds to the smallest value of the index ISRM.

k
Figure 3.1. Example of a trace for a model with 3 variables.

MULTIVARIA TE REGRESSION

If one is interested in reconstituting monthly means at q neighboring sites (several


dependent variables), then one could perform q independent regression analyses.
Proceeding as in section 1, one could thus obtain q values which preserve the mean and
the variance. However, this so-called parallel approach does not reproduce the
correlation that may exist between the q sites. This can lead to important errors and to
loss of information if the results are used for decision making in a regional context
(Bernier, 1971).
To avoid these kind of problems and to extract as much information as possible from
observed data, the reconstituted data should reflect the relationship among them. This
can be done using a multidimensional model which considers the q sites simultaneously.
The multidimensional regression technique is implemented in ReMuS, allowing for
conservation of the structural correlation when data are reconstituted at several sites in a
region. However, the method has two major constraints which may limit its practical
applicability:

• The q variables must be function of the same set of explanatory variables.


• Only concomitant values of the q dependent variables can be used.
386 H. PERRON ET AL.

The multivariate regression model

Let ~, li, ... , ~ be a set of q dependent variables, and XI' X 2 , ••• , X p be p explanatory
variables. Assu~e that we have n .correspon~ing measurements of J:'li' Y2i' ... , Yqi and
Xli' X 2i ' ... , X pi , 1 =1, 2, ... , n (for mstance, discharges measured dunng n years at p+q
sites). Moreover, the values of the explanatory variables are assumed known exactly.
Hence, using matrix notation, the multidimensional regression model can be written in
the following form:

y= b X + qxn
e (12.)
qxn qx(p+l)(p+l)xn

where

YI2 YI3
Y =[Y"
Y:I
Y22 Y23
Y'"]
Y2n
:
_ [
- YI Y2 •••. Y n]
qxn :

Yql Y q2 Y q3 Yqn

X
(p+l)xn
=[ 1
XII

:
XI2 xl3
.:.]-[
: - XI x2 ... xn ]

x pl x p2 Xp3 xpn

[P,"
/311 /312
P" ]
b
qx(p+l)
= /320
:
/321 /322
/3~p = [b o bl '" bpJ
/3qo /3ql /3q2 /3qp

[."8:
...
8 21
8 12

8 22
813
8 23 ... &,.]
e = 8 2n
:
_ [
- el e2 ... en]
:
qxn
1
8 q2 8 q3 8 qn

The matrix Y contains n column vectors YI' Y2' ... , Yn whose q elements correspond to
measurements of the q dependent variables for a given period. The X matrix contains n
column vectors xl> x2 ' ... , xn with p+ 1 elements. The first element of each of these
vectors is equal to one, whereas the others correspond to the p explanatory variables for
a given period. The b matrix contains the column vectors /30' /31 , ... , /3P' The first vector
corrresponds to the intercept and each of the following vectors corresponds to a given
REMUS, SOFTWARE FOR MISSING DATA RECOVERY 387

explanatory variable (f3ij' j -::1= 0, is the parameter of the explanatory variable Xj for the
dependent variable Yi).
Finally, the e matrix contains the n vectors &1' &2' ... , &n of error terms. We assume
that each of the residual vectors is multi dimensionally normal distributed with zero mean
and covariance matrix I:

e; "" N( 0, s),
qxl q><q
Vi

therefore, there is, for this model, a non-zero correlation between error terms
corresponding to different dependent variables.

Estimation of parameters

In order to estimate the parameter matrix b, i.e. to determine the matrix B

flJO flll flu flIP


A

B
fl20 fl21 fl22 fl2p
bl ... b p]
qx(p+l) =
-- [b 0 (13.)

flqo flql flq2 flqp

the LS method is once again invoked. Srivastava and Carter (1983) show that the
estimator ofB is given by

B=[b o b l bp ]
=)T)(,(){){,)-I (14.)

The estimators /3; of flij are jointly distributed according to a multidimensional normal
distribution with ~ean and covariance matrix given by

and (15.)

where a;- and ail are elements of the matrix (xx't l ,and CTu and CTij are elements ofthe
matrix 1:. Note that the parameter estimators are unbiased and correlated, reflecting the
correlation that exists between the dependent variables. If q independent multiple
regressions had been made, this correlation would have been zero, that is, for the two
variables 1; and ~, we would have:

Cov{Pij ,P.J = 0, j = 0, 1, ... , p (16.)

Once the parameters for each dependent variable have been obtained, their significance
should be tested. The software ReMuS permits one to test, for each explanatory
variable, if the corresponding parameter vector is equal to the null vector. For a given
variable X k , the following hypotheses are tested:
388 H. PERRON ET AL.

Ho :Plk =P2k =.. , =Pqk = 0 against HI: at least one Pik :1; 0

The test is based on the statistic F;, :

(n-p-q-l) (I-U)
F.k -- k
(17.)
q Uk
where:

• U - IVkl IXI is the determinant of the matrix X;


k -IVk+Wkl'
• Vk =IqY[In - X'( XX' t X ]Y'I In is the identity matrix of dimension n;
q,

• Wk =IqBCk[C~(xx'tckxr C~B'Iq, C k is a column vector of dimension


p+ 1 having the kth element equal to 1 and 0 elsewhere.

For a given significance level, a, the null hypothesis is :

• accepted if Fk ~ ~.n-p-q+1 (1- a)


• rejected if F;, > ~.n-p-q+1 (1- a)

where ~n_p-q+I(I- a) is the (I-a)-quantile in an F distribution with n-p-q+1 degrees of


freedom.'A comprehensive review of this test is given by Srivastava and Carter (1983).
In addition to the F;, (k = 0, 1, ... , p) statistics, ReMuS gives the exceedance probability
associated with those values (p-values).
The test permits one to examine the importance of a given explanatory variable, X k •
Acceptance of the null hypothesis implies that there is no relation (at an a-level)
between the variable X k and the q dependent variables. There is thus no reason to use
that variable in the model. On the other hand, if the hypothesis is rejected, there is a
relation between X k and at least one of the dependent variables and X k should be used
in the model.

Reconstitution of data

In a given period, for example a year, where the dependent variables are missing, we
want to reconstitute the vector YI = (YII' Y21, ... , Yql ) from the observed vector
XI =(XII' X21' ... , Xpl) by means of multidimensional regression in a way that preserves
the mean, the variance, and the correlation structure observed in the series of dependent
variables. The same principle as in the case of multiple regression is used here. In fact,
we first compute the prediction, YI , and then add a random vector drawn from a
multidimensional normal distribution to obtain the reconstituted data vector, YI' More
precisely, we have the following expression :
REMUS, SOFrWARE FOR MISSING DATA RECOVERY 389

0"]··· - [~,,]
f310 f31l f312 f3IP
021 Y21
y = B X +d = f320 f321 f322 f32P X2/

qx~ qx(p+l)(p+l)xl qx~


[X''] + ..
. (18.)

f3qo f3 q l f3 q 2 f3qp X pl Oql Yql

where the vector 0 1 has a multidimensional normal distribution with zero mean and
covariance matrix 1:6 defined by

L" = -1-Y[In - X'(XX't


qxq ll-P
X]y, (19.)

In ReMuS, the vector 01 is generated using a technique described in detail in Devroye


(1986, pp. 563-566), and in Law and Kelton (1982, p. 262). The introduction of the
random vector (used among others by Bernier, 1971) ensures that the mean, the
variance, and the correlation of dependent variables are preserved. This, of course, is
only true if the basic assumptions are valid. In fact, the remarks on the validity of the
assumptions made in the section on multiple regression also apply to the
multidimensional case.

THE SOFTWARE REMUS

The software ReMuS was developed in order to overcome some of the deficiencies in
REMUL, which was previously used by Hydro-Quebec. The following improvements
have been introduced in the new software:

- Ridge regression. In order to cope with possible multicollinearity between the


independent variables, ReMuS allows the use of the ridge regression (Hoer! and
Kennard, 1970 a). This procedure permits the user to study the value of the
parameters as a function (trace) of a positive constant k, which is added to the
diagonal of the correlation matrix. The user may toggle quickly between the choice
of the independent variables and the graph of the traces. A theoretical help feature is
added to guide the user in the choice of variables.

- Automatic optimal value of "k". If the user wishes so, ReMuS suggests an optimal
value for k (in ridge regression), depending on the chosen variables.

- Mutil'ariate regression. When generating the missing values, a conventional multiple


regression does not take into account the correlation between the dependent
variables. This is why we have included multivariate regresssion in this software.
This type of regression will simultaneously find p different models corresponding to p
dependent variables. Random numbers generated from a multvariate normal
distribution are subsequently added to the predictions to preserve the correlation
between the p dependent variables.

- Testing the hypotheses. One of the important weaknesses ofREMUL and HEC-4, is
that they do not check the following hypotheses:
390 H. PERRON ET AL.

• residuals follow a normal distribution;


• residuals are independent random variables;
• residuals possess constant variance.

ReMuS includes many graphical tests allowing the user to examine those hypotheses.
Transformations. Residuals may be normalized by a Box-Cox transformation of the
dependent variable.

ReMuS also gives the user a choice of transformations (the most widely used in
practice) of the independent variables in order to obtain a linear relationship with the
dependent variable.

Moreover, ReMuS is a user-friendly software that provides many other tools to help the
user in the modeling phase:

• correlation matrix;
• the Y vs X graphic;
• the graphic of concomitances;
• on-line theoretical and technical help.

Multiple regression in ReMuS


As implemented in ReMuS, mUltiple regression permits the user to determine the
appropriate model for reconstituting the variable Y. ReMuS provides two different
methods for choosing independent variables:

• manual, where the explanatory variables are chosen by the user;


• automatic, where the explanatory variables are introduced by stepwise
regression.

After the estimation of a given model, its adequateness can be graphically visualized.
The output of the regression is :

• the value of the parameters Pi;


• test of significance of the parameters;
• analysis of variance;
• regression model.
It is also possible to perform an analysis of residuals.

Ridge regression in ReMuS


In order to cope with possible multicollinearity between the independent variables, ridge
regression (Hoer! and Kennard, 1970a) has been introduced in ReMuS. Its application in
ReMuS is similar to that of manual multiple regression. The user can at any time
visualize the ridge traces on the screen. This will help him make the appropriate choice
of independent variables and of the ridge parameter, k. The user can, with a single
touch, switch between the menu in which the choice of independent variables is made
and the ridge traces.
REMUS, SOFfWARE FOR MISSING DATA RECOVERY 391

ReMuS contains a procedure which permits the user to automatically select the
optimal choice of the constant k as a function of the chosen independent variables.
Having chosen the explanatory variables and the constant k, the user can invoke the
model computation procedure, examine the results graphically, and analyse the residuals
just as in the case of multiple regression. If k is set equal to zero, one obtains the same
result as with mUltiple regression.

Analysis of residuals
In multiple (or ridge) regression, it is important to verifY the basic assumptions
concerning the residuals:

• residuals are normally distributed;


• residuals are independent random variables;
• residuals have constant variance.

ReMuS produces graphics of :


• residuals on probability paper (normal);
• residuals as functions of predicted variables;
• residuals as function of time.

It can also show the residuals as function of an independent variable which permits the
user to identifY the variables that convey information to the model.

Reconstitution of data
The reconstitution of data is based on known explanatory variables. Equation (8)
permits one to preserve the mean and the variance of the observed data series. However,
the user can choose to omit the random term, 0;. In this case, the reconstituted data are
those obtained directly from the regression.
ReMuS produces the following output, which can be used to verify the quality of the
reconstitution:

• mean of the original data and of the reconstituted data;


• variance of the original data and of the reconstituted data;
• coefficient of variation of the original data and of the reconstituted data.

We have included the Wilcoxon's (1945) test in ReMuS which can be used to verify if
the means of observed and predicted data are significatively different. Likewise,
Levenne's (1960) test for equal variance in two data sets is available in ReMuS.

Multivariate regression in ReMuS


We have introduced a multidimensional regression procedure in ReMuS in order to treat
the case where correlation exists between several dependent variables. This Kind of
regression permits to estimate several models simultaneously and to preserve the
correlation structure, when data at several sites are reconstituted.
The choice of variables is done as in the other types of regression (manually), with
the exception that here the user has the option to introduce several dependent variables
at the same time.
392 H. PERRON ET AL.

The output produced by ReMuS consists of the parameters corresponding to a given


independent variable and given explanatory variable. We have introduced a test to verifY
that at least one of the parameters corresponding to a given explanatory variable is
significantly different from zero.

Some other characteristics of ReMuS


ReMuS contains several tools which can help the user in his choice of variables, models,
etc.:

graphic of Y versus X, that is the relation between the dependent variable and
each of the explanatory variables
graphic of concomitance, which may help to choose the explanatory variables
correlation matrix
Box-Cox transformation, which permits the user to normalize the residuals
other classical transformations :
• 1/ X
• 11 X 2
• 10gX
• .JX 2
• X
which permits the user to linearize the relation between the explanatory variables
and the dependent variables
technical as well as theoretical on-line help;
can be used for monthly, annual, weekly (or other time period) means and for
other type of data (ex. flood flows vs rain, basin area, etc ... ).

A note on the implementation


The software ReMuS has been implemented in PASCAL for use in a DOS environment.
Advanced programming techniques (object oriented, graphical librairies, etc ... ) has
permitted to make it highly user-friendly.

CONCLUSIONS
As discussed in the introduction, HEC-4 and REMUL are based on hypotheses witch
are not always valid. This may result in incorrect data reconstitution. The software
ReMuS, which includes basic functions similar to those in HEC-4 and REMUL, has been
developed in order to cope with the following problems :

• multicolinearity between explanatory variables (corrected with Ridge


regression);
• correlation between dependant variables (corrected with multivariate regression);
• model validation (corrected with residual analysis and graphics);
• assumption that the observations follow log-normal distribution (corrected with
the possibility to use several transformation methods);
• difficult to visualize relation between variables (corrected with graphics).

These additions make ReMuS a powerful tool to reconstitute missing data and to extend
short data series in hydrology as well as in many other domains.
REMUS, SOFfWARE FOR MISSING DATA RECOVERY 393

REFERENCES
Beard, L.R (1971). HEC-4 Monthly Streamflow Simulation. The Hydrologic
Engineering Center, Corps of Engineers, US Army, Davis, California 95616.
Bernier, J. (1971). Modeles probabilistes a variables hydrologiques mUltiples et
hydrologie synthetique. International Symposium on Mathematical Models in
Hydrology, Warsaw. _
Box, G.E.P., Cox, D.R (1964). An analysis of transformations. Journal of the Royal
Statistical Society, Ser. B, 211-252.
Devroye, L. (1986). Non-uniform random variate generation. Springer-Verlag, New
York.
Feiring, M.B. (1963). Use of Correlation to Improve Estimates of the Mean and
Variance. United States Geological Survey, Professionnal Paper 434C.
Hoerl, A.E. and Kennard, RW. (1970a). Ridge regression: Biased estimate for non
orthogonal problems. Technometrics, 12, 55-67.
Hoerl, A.E. and Kennard, RW. (1970b). Ridge regression: Applications to non
orthogonal problems. Technometrics, 12, 69-82.
Law, A.M. et Kelton, W. (1982). Simulation Modeling and Analysis. McGraw-Hill,
Inc.
Levene, H. (1960). Robust tests for the equality of variances, in Contributions to
Probability and Statistics, ed. I. Olkin. Palo Alto, Stanford University Press, 278-292.
Srivastava, M.S. and Carter, E.M. (1983). An Introduction to Applied Multivariate
Statistics. North-Holland, New York.
Vinod, H. D. (1976). Application of new ridge regression methods to a study of Bell
system scale economies. Journal of the American Statistical Association, 71, 835-841.
Wilcoxon, F. (1945). Individual comparison by ranking methods. Biometries, 1, 80-
83.
SEASONAliTY OF FLOWS AND ITS EFFECf ON RESERVOIR SIZE

R.M. PHATARFOD 1 and R. SRIKANTHAN2


lDepartment of Mathematics, Monash University
Oayton, Victoria, Australia 3168
2Hydrology Branch, Bureau of Meteorology
Melbourne, Victoria, Australia 3000

In this paper we consider the effect of seasonality of streamflows on


the reservoir size for within-year systems. We give two measures to
quantify the seasonality evidenced by the mean seasonal flows, and
show how these enable us to group the various variations in seasonal
means into a small number of groups. The l?robability of emptiness of
the reservoir for different reservoir sizes IS determined analytically
by using the technique of the bottomless dam, for a range of Cv's and
two draft ratios. Regression equations linking probability of
emptiness and the two measures of seasonality are derived.
1. INTRODUCTION
Ever since Rippl's (1883) pioneering work on the determination of the
size of a reservoir able to meet a constant demand of water from it, a
substantial amount of literature has ~rown on this problem, referred
to as the Reservoir size-yield-reliability relationship. Of course,
Rippl's method - The Mass Curve Method - did not consider reliability
of supply. The implicit but erroneous assumption was that with the
determmed reservoir size, supply was guaranteed all the time, i.e.
the reliability would be a hundred percent. Later on, however, it was
realized that statements about supply would be necessarily
probabilistic, and reliability of supply was explicitly involved, say
e.g. in the works of Hazen (1914) and Sudler (1927). Considerable
amount of l?rogress in this area was made by the Russian enBineers
Savarenskiy (1940) and Kritskiy and Menkel (1940), Moran (1959) and
his followers, and through the idea of synthetic hydrology - Fiering
(1967). Currently, storage size-yield-reliability relationships take
many forms. At one end of the spectrum is the determination of the
reliability of supply for a specific reservoir situation, with
specific streamflow characteristics and specific demand values.
Typically, this involves simulation of the water supply system
behaviour using either the historical record or synthetic streamflow
traces. These methods are extremely flexible, and are able to take
into account factors such as evaporation, seepage, sedimentation, as
well as draft varying from month to month. However, they do not
395
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 395-407.
© 1994 Kluwer Academic Publishers.
396 R. M. PHATARFOD AND R. SRIKANTHAN

impart much knowledge about overall reservoir system behaviour. At


the other end are results - normally analytical and employing
probability results pertaining to cumulative sums of random variables
- which are of general applicability but do not take into account the
specific characterIstics of the situation in hand; these methods,
generally referred to as "back-of-the-envelope" methods, are useful
for a preliminary estimation of reservoir size as well as for giving
an overall view of the behaviour of the system. One such result is
(see Gould (1964)),
t 2 C2
S - P v (1)
P - 4 (I-a)
where Sp is the storage size in terms of the mean annual flow J.I.,
a is the annual yield as a fraction of the mean annual flow (also
called the draft ratio), Cv is the coefficient of variation of the
annual flows, and tp is the pth percentile of a standard normal
variable with P as the steady-state probability of the reservoir
being empty, i.e. the probability of failure of being able to
deliver supply (giving (1 - P)100 as the percentage reliability of
the system). Another result of the type is (see Phatarfod (1986)),
S(p) = ¥-; S(O) (2)

which gives the inflation factor required when the annual flows have a
serial correlation p , as compared to when they are independent. The
above two results are valid for over-year systems where the effect of
seasonality of flows is damped out.
There are some situations, however, when we are interested only in a
preliminary estimate but simple and elegant expressions such as (1)
and (2) are not available. In such situations, one takes recourse to
charts or tables, quantitatively relating the various quantities of
interest. A prime example of such results is the charts of Svanidze
(see Kartvelshvili (1969») giving the reservoir size against the
coefficient of variation of annual flows, for different draft ratios,
reliability and annual serial correlation coefficients, for flows
having the general three-parameter gamma distribution - the Kritskiy
and Menkel distribution.
In this paper we consider the effect of seasonality of flows on the
reservoir SIZe for within-year systems, and for reasons given below,
our results belong to the middle category just mentioned, i.e. are of
fairly general aPflicability yet not simple enough to be expressed in
analytical forms 0 the type (1) and (2).
It is known, of course that a reservoir with flows which have a
seasonal variation would need to be larger than one without seasonal
variation for the same reliability, yield and annual flow
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 397

characteristics such as the mean and the coefficient of variation.


However, to the authors' knowledge there hasn't been any study which
quantitatively relates reservoir size to seasonality of flows.
It is not difficult to see why this should be so. First, methods
which have been successful in deriving relation of the type (1) and
(2) cannot be applied here. The reservoir content process is a random
walk process between two barriers - the empty reservoir and the full
reservoir - and for over-year systems, these barriers are fairly wide
apart, so that the random walk process is effectively a cumulative sum
process, and thus results such as the Central Limit Theorem are
applicable. The tp in (1) and the factor (l+p)/(I-p) in (2) are,
indeed, due to this effect. Such is not the case for within-year
systems where these .barriers are rather close together. Secondly, it
is difficult to quantify seasonality since a measure chosen would
depend not only on the individual seasonal values but also on their
order. For fivers with seasonal flows, the seasonality is reflected
not only in their means, but also in their variances, skewnesses and
serial correlations. It is obvious that the means would have the
greatest effect, and in this paper we shall concentrate only on the
means.
We simplify the treatment somewhat by considering seasonal, i.e.
three-monthly flows rather than monthly ones. We consider two
measures to quantify seasonality. This is done in section 2.
Assuming, for simplicity, that the seasonal flows have the negative
binomial distribution we calculate for draft ratios of 67% and 50% and
for various values of Cv of the annual flows, the steady-state
probabilities of emptiness of the reservoir by using the method of the
bottomless dam - see Phatarfod (1980). This is done in section 4. We
then fit regression equations for these probabilities against the two
measures. The fits are extremely good. These regression equations
thus form the defming relationships between probabilities of
emptiness, reservoir size and the seasonality of flows.
2. MEASURES OF SEASONALITY
One of the ways to quantify seasonality of mean flows would be to take
the mean flow of the ith season (say a month), i = 1,2,3,...,12, as
#Ii = n + A cos (1fi/6 + 9) (3)

The value of the phase 9 can be taken to be zero by suitable


adjustment of the water-year. A then remains the sole measure of
seasonality of flows. However, very few mean seasonal flows follow a
sinusoidal pattern. We therefore take another approach in this paper.
It should be stressed that the paper is of an exploratory nature
giving an outline of a possible line of approach, and does not work
out a full solution.
398 R. M. PHATARFOD AND R. SRIKANTHAN

First we take the number of seasons to be four rather than the


customary twelve. Since the method of obtaining the probability of
failure used in the paper requires that we take the constant seasonal
draft as the unit of volume, (thus making the total annual draft equal
to four units) we have the mean annual flows for the two cases of
draft ratios o( 0.67 and 0.50 , to be 6 and 8 units respectively.
Let· us consider in detail the case of the mean annual flow of 6 units.
Let a,b,c,d be the mean seasonal flows of the four seasons, so that
a+b+c+d = 6. Theoretically, a,b,c,d can assume any non-negative
values, subject to the condition a+b+c+d = 6. However, to keep the
number of cases to be considered fairly manageable we need to put some
restrictions on the values of these quantities. Accordingly, we
assume that a,b,c,d can be zero or a multiple of 1/2. In effect, we
are assuming, for example, that if a seasonal mean is less than a 1/4
(in terms of mean annual flow equal to 6) we can take it to be zero.
To avoid dealing with fractions, however, we shall assume, in the
discussion dealing with seasOnality in this section, that the mean
annual flow is 12 units. Thus calling _the four seasonal means
A,B,c,n, the means form a sequence A,B,c,D such that A+B+C+O = 12
where now A,B,C,O can be zero or a positive integer. It can be shown
that - see Feller (1967) - that the number of such possible sequencies
is ISS = 455. At the two extremes are the cases 3,3,3,3 (the
non-seasonal case) and 12,0,0,0 (the extremely seasonal case). They
also include sequences such as 7,5,0,0 and 6,1,3,2 etc. Note, the
actual sequences of seasonal means for the four cases above are
(1.5,1.5,1.5,1.5), (6,0,0,0), (3.5,2.5,0,0) and (3,0.5,1.5,1)
respectively. What is required is to reduce the number of these
separate cases to manageable proportions, by grouping them into a
smaller number of groups having common values of suitable measures of
seasonality.
First we use a device which effectively reduces the number of cases to
about a quarter of the total number, 455, of cases. From the point of
view of reliability, we need to consider only that season for which
the probability of emptiness of the reservoir at the end of that
season is the maximum, and since the method of derivation of
probability of emptiness used in the paper, is geared to deriving this
probability at tile end of the last season, we can effectively
rearrange a sequence of seasonal means such that the last element
corresponds to that season. For example, if in terms of a
conventional water-year, the sequences of mean flows are 1,6,2,3 or
2,3,1,6 or 3,1,6,2, we shall, in fact, take our sequence as 6,2,3,1 -
the cyclic permutation of the three sequences. This device reduces
the number of cases to 116.
To devise measures relevant to the probability of emptiness at the end
of the last season, let A' ,B' ,C' ,0' be the excess or deJ>letion from
the mean of the seasonal means 3, i.e., let A' = A - 3 etc. Let
u1 = A' , u2 = A' +B' , u3 = A' +B' +C' , u4 = A' +B' +C' +0' = O. Let M
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 399

and m be the maximum and the minimum respectively of the sequence


u1,u2,u3u4. Then the Range R' = M-m, and T' = C' +0' are two
measures which seem to be relevant to the present problem. For
example, for the sequence 3,3,3,3, we have R' =T' =0 , whereas for the
sequence 12,0,0,0, we have R' = 9, T' = -6. The values of R' ,T' for
other sequences fall in the ranges (0,9) and (0,-6) respectively. We
show that the 455 different- sequences fall into 28 groups, with
sequences in each group having the same values of R' and T' .
For reasons of limitation of space, we do not show all the sequences
for all values of R' and T' . Table 1 shows the sequences for values
of R' = 3,4,5 and T' = -2,-3,-4. For each sequence of seasonal means
we work out the probability of emptiness of the reservoir for
different reservoir sizes. This is done in the next section.
Let us now consider the case- of draft ratio 50%, i.e. when mean annual
flow is 8 units. One way to deal with this case is to follow the same
procedure as above, i.e. to take seasonall~eans A,B,c,o such that
A+B+C+D = 16. This entails considering <; = 969 separate cases.
Instead, we adhere to the 455 cases considered before. Since the
value 12, (A+B+C+D = 12) taken as the mean annual flow there actually
means that the mean annual flow is 8, (a+b+c+d = 8) the actual
seasonal means are two-thirds the values shown for any sequence
A,B,C,D, i.e. a = 2A/3 , etc. This means that the sequence of
seasonal means shown as (5,3,2,2) actually represents the sequence
(10/3,2,4/3,4/3), giving the total 8.
3. INFLOW MODEL
We assume that the seasonal flows have the negative binomial
distribution:
Pr[X = r] = -nCrpnqr r = 0,1,2,...; q = I-p (4)

with the same value of p for all the seasons. The values of n for
the four seasons are denoted by nl,n2,~,n4' The annual flow also is
negative binomial with the same value of p and with
n = nl+~+n3+n4' to be denoted by k. The mean of X for the above
distribution is nq/p and Cv is (nqrl.l2. The probability
generating function (p.g.f.) is P(,) = pn/(1-q,)n. We fust take
values of k and p such that the annu8i flows have a specific mean
" and Cy- Table 2 gives the cases considered.

The p.g.f. of the seasonal flows are


n. n.
Pi(I) = P 1/ (1 - q,) 1 i = 1,2,3,4, (5)
400 R. M. PHATARFOD AND R. SRIKANTHAN

TABLE 1
SEQUENCES OF SEASONAL MEANS FOR VALUES OF R',
T' SHOWN

T' R' 3 4 5

-2 6240, 6231, 6222 7130, 7140 8040, 8031


5340, 3540, 4440
-3 6330, 6303, 6321 7230, 7221 8130, 8121
3621, 6312, 3612 2721, 2712
4521, 5412, 4512
-4 6420, 6402, 4602 8220, 8202
4620, 6411, 4611 2820, 2802
5520, 5502, 5511 8211, 2811
7320, 3720, 7302
3702, 7311, 3711

where ni are so chosen as to give the required seasonal means. For


example, for the case where p = 0.33 , k=4 , if the seasonal means
form the sequence 3,1.5,1,0.5, we have n 1 = 2, n2 = 1, n3 = 2/3,
n4 = 1/3.

It should be noted that for the method of determining the probability


of emptiness given in this paper it is not necessary to have the
annual flows or the seasonal flows following a negative binomial
distribution. This distribution was chosen because it is analytically
easy to deal with as well as being the discrete analogue of the gamma
distribution, a distribution which is commonly chosen to model
streamflows. Note, however, that from the relations, IS = kq/p and
kq = lIe; , r
we have Cv = (ISP 1/2; thus there are lower bounds for
the value of Cv ' namely 0.4082 (for IS = 6) and 0.3536 (for IS = 8).
There is however, no upper bound.
4. PROBABIUTY OF EMPTINESS OF TIIE RESERVOIR
The exact steady-state probability of emptiness of the reservoir, of
various sizes, fed by seasonally dependent inflows can be obtained by
a method due to Moran (1959). However, this method which deals with
matrix operations involves fairly' large scale computations since these
matrices are different for different reservoir sizes. Instead, we
derive here an approximate value of the probability of emptiness
obtained by assuming the reservoir to be bottomless and taking the
probability of depletion greater than or equal to an integer K to be
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 401

TABLE 2
Values of parameters p and k for given Ii and Cv

p 0.667 0.500 0.400 0.333 0.200 0.100


k 12 6 4 3 1.500 0.667
Cv 0.5000 0.5774 0.6455 0.7069 0.9129 1.2910

(i) Annual mean Ii =6 , Draft ratio 67%

p 0.667 0.500 0.385 0.333 0.200 0.111


k 16 8 5 4 2 1
Cv 0.4330 0.5000 0.5701 0.6124 0.7906 1.0607

(ii) Annual Mean Ii =8 , Draft Ratio 50%


the probability of emptiness of the reselVoir of size K. It was
shown by Phatarfod (1979) that the probability of emptiness so
obtained is fairly close to the exact value obtained by considering
the reselVoir to be finite. The requisite result - see Phatarfod
(1980) - is as follows:
Suppose we have N seasons. Let the inflow during the ith season
be Xi with probability generating function (p.g.f.), Pi (8),
i = 1,2,...N, and let the release be one unit per season. If the
N
mean annual flow, E(r i Xi) is greater than N, it is known that the
equation Jt =IPi(8) = 8N has N roots 81'8 2,... 8N in the unit
circle. Denote the depletion from the top of the reselVoir at the end
of the Nth season by Y, and let uj = Pr.(Y = j]. Then,
N .
uJ" = r ~8~ , j = 0,1,2,... (5)
k=1
where the C's are uniquely given by,
402 R. M. PHATARFOD AND R. SRIKANTHAN

(6)

An approximate varue of the probability of emptiness of a reservoir of


.
SIZe
K·IS given
. b
Y P
E =I - 't'K-l
t. j =0 uj .

To use the above result for our case, (N = 4), with the seasonal
flows having negative binomial distributions, we take, for given
values of p and k, values of nl,~,n3n4 such that the seasonal
means form a specified sequence. Using these Pi(8) in (6), the
probability of emptiness (or failure) of the reservoir was calculated
for all the 116 sequences of seasonal means for all the cases
considered in Table 2. It was found that within each group with the
same values of R' and T'. they were fairly close to each other.
5. EFFECT OF SEASONALITY ON PROBABIllTY OF FAILURE
First, to remove the dependence of the two measures on the unit of
measurement, as well as for convenience, we introduce the adjusted
measures R = R'!9 and T = T' /-6. The range for both R and T is
(0,1). Note, because of the way we selected the unit of measurement
in Section 2, the above relations are valid for both the cases of
draft values, and in general we have R = 4R' /3", and T = -zr, /""
with R' and T' calculated as in Section 2. To obtain an effective
relation between PE , the probability of emptiness or failure of the
reservoir and Rand T, (for a given combination of p and k and
reservoir size S) we tried fitting regression equations of PE on
various functions of R and T. The best fit obtained was for a
relation of the form
PE = fJ + '1T + sR2 .

Table 3 give the required regression coefficients fJ, '1, S for all
the cases considered. Figure 1 gives the probability PE plotted as
a function of R, for different values of the reservoir size Sf""
Cv and draft ratio 67%, and Figure 2 gives similar graphs for the
draft ratio 50%.
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 403

TABLE 3
Values of parameters fJ , .., , 6 for various cases.

0.5000 0.0249 0.0032 0.0726 0.5000 0.0134 0.0014 0.0376


0.5774 0.0567 0.0080 0.1004 0.5774 0.0350 0.0035 0.0602
0.6455 0.0952 0.0127 0.1183 0.6455 0.0642 0.0056 0.0786
0.7069 0.1360 0.0170 0.1288 0.7069 0.0975 0.0071 0.0923
(i) Draft Rat. 0.67 S/p=1.0 (ii) Draft Rat. 0.67 S/p=1.167

S/p

0.5000 0.0071 0.0011 0.0192 2.000 0.0801 0.0060 0.0345


0.5774 0.0214 0.0033 0.0345 2.167 0.0646 0.0047 0.0284
0.6455 0.0427 0.0058 0.0479 2.333 0.0521 0.0037 0.0230
0.7069 0.0690 0.0083 0.0584 2.500 0.0421 0.0030 0.0185

(iii)Draft Rat.0.67,S/p=1.33 (iv)Draft Rat.0.67,Cv=0.9129

S/p

2.833 0.1570 0.0060 0.0283 0.4330 0.0091 0.0076 0.0530


3.000 0.1400 0.0053 0.0254 0.5000 0.0241 0.0127 0.0826
3.167 0.1250 0.0047 0.0227 0.5701 0.0508 0.0187 0.1069
3.333 0.1120 0.0043 0.0201 0.6124 0.0710 0.0224 0.1200
(v) Draft Rat.0.67, Cv =1.2910 (vi)Draft Rat.0.5, S/p=0.50

0.4330 0.0023 0.0021 0.0254 0.5701 0.0100 0.0023 0.0398


0.5000 0.0088 0.0049 0.0467 0.6124 0.0165 0.0036 0.0563
0.5701 0.0229 0.0092 0.0711 0.7906 0.0740 0.0155 0.0996
0.6124 0.0350 0.0121 0.0899
(vii)Draft Rat.0.5,S/p=0.625 (viii)Draft Rat.0.5,S/1'=0.75
404 R. M. PHATARFOD AND R. SRIKANTHAN

(i) Cv
0.30
0.707
0.24
0.646
P 0.18
0.577
0.12
0.500
0.06

0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(ii) Cv
0.20

0.707
0.15
0.646
Q.10
p 0.577
0.05 0.500

0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(iii) Cv
0.15

0.12 0.707

0.09 0.646

P 0.06 0.577

0.03 0.500

0.00
0.0 0.2 0.4 0.6 0.8 1.0
R

Fig 1: Probability of failure. p. as a function of R (Draft ratio = 0.670)


(The upper curve corresponds to T=1. the lower to T=O).
(i) Reservoir Size S/1'=1.000
(ii) Reservoir Size S/p.=1.167
(iii) Reservoir Size S/p.=1.333
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 405

(i) Cv
0.20
0.612
0.15 0.570

0.10 0.500
P
0.433
0.05

0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(iI) Cv
0.15

0.12 0.612

0.09 0.570

P 0.06 0.500
0.03 0.433
0.00
0.0 0.2 0.4 0.6 0.8 1.0
R
(ill) Cv
0.20
0.791
Ol5

OlO
P
0.612
0.05 0.570

0.00
0.0 0.2 0.4 0.6 0.8 1.0
R

Fig 2: Probability of failure, p. as a function of R (Draft ratio=O.50)


(The upper curve corresponds to T=1. the lower to T=O).
(i) Reservoir size S/,,=0.5
(ii) Reservoir size S/" =0.625
(iii) Reservoir size 51,,=0.75
406 R. M. PHATARFOD AND R. SRIKANTHAN

6. CONCLUSIONS
The seasonality of flows, for the case of four seasons, is quantified
by two measures, R and T; if one considers the situation where we
have more than four seasons, say months, then the measure R would
remain the same, but, perhaps, T would be modified to be the
depletion of the last three seasons. For different values of these
two parameters and draft-ratios of 50% and 67%, and for six values of
Cv ' the probabilities of emptiness of the reservoir are calculated
for different reservoir sizes. The method used for this calculation
was analytical, using the al?proximating technique of the bottomless
reservoir. These probabilities enable(l regression equations to be
formulated, linking probability of emptiness to R, for various
values of T, Cv ' draft-ratio, and reservoir size.

REFERENCES
Feller, W. (1967) "An Introduction to Probability theory and its
applications, 3rd ed. YoU, John Wiley, New York.
Fiering, M.B. (1967) Streamflow Synthesis, Harvard Univ. Press,
Cambridge, Mass.
Gould, B.W. (1964) Discussion of Paper by Alexander, In Water
Resources Use and Management, Melbourne University Press, Melbourne;
161-164.
Hazen, A. (1914) "Storage to be J>rovided in impounding reservoirs for
municipal supply". Trans. Am. Soc. Civ. Engrs. 77, 1539-1640.
Kartvelshvili, NA (1969) Theory of Stochastic Processes in Hydrology
and River Runoff regulation, Israel Program for Scientific
Translation, Jerusalem.
Kritskiy, S.N. and Menkel, M.F. (1940) "A generalized approach to
streamflow control computations on the basis of mathematical
statistics" (in Russian) Gidrotekhn. Stroit., 2, 19-24.
Moran, PAP. (1959) The Theory of Storage, Methuen, London.
Phatarfod, R.M. (1979) "The Bottomless dam" J. Hydrol., 40, 337-363.
Phatadod, R.~. (1980) ''The Bottomless dam with seasonal inputs"
Austral. J. Statist., 22, 212-217.
Phatadod, R.M. (1980) ''The effect of serial correlation on Reservoir
size" Water Resources Research, 22, 927-934.
Rippl, W. (1883). ''The capacity of storage-reservoirs for water-supply"
MID. Proc. Instn. CIV. Engrs. 71, 270-278.
SEASONALITY OF FLOWS AND ITS EFFECT ON RESERVOIR SIZE 407

Savarenskiy, AD. (1940) Metod tascheta regulirovaniya Stoka.


Gidrotekhn. Stroit., 2, 24-28.
Sudler, C. (1927) "Storage required for the regulation of streamflow"
Trans. Amer. Soc. Civ. Engrs. 91, 622-660.
ESTIMA TION OF THE HURST EXPONENT hAND GEOS DIAGRAMS
FOR A NON·STA TIONARY STOCHASTIC PROCESS

GERMAN POVEDA and OSCAR J. MESA


Water Resources Graduate Program
Facultad de Minas, Universidad Nacional de Colombia
Medellin, A.A. 1027
Colombia

The Hurst effect is approached from the hypothesis that there is a fundamental problem
in the estimation of the Hurst exponent, h. The estimators given throughout the literature
are reviewed, and a test is performed for some of those estimators using i.i.d. and a non-
stationary stochastic processes. The so-called GEOS diagrams (R,,*/nO.5 vs. n) are
introduced as very powerful tools to determine whether a given time series exhibit the
Hurst effect, depending on the value of the scale of fluctuation. Various cases of the test
model are presented through both the GEOS and GEOS-h diagrams. Results indicate that
indeed there are problems in estimating h, and in some cases it could be due to an
erroneous estimation when using the classical estimators. A proposed estimator gives
better results which confirms the pre-asymptotic behavior of the Hurst effect.

INTRODUCTION

The Hurst exponent, h, has become one of the most important scaling exponents in
hydrology, transcending its old presence in hydrology and reaching status in the recent
literature on chaos and fractals (Mandelbrot, 1983; Feder, 1988; Schroeder, 1991). In
hydrology the whole paradox of the Hurst effect (Hurst, 1951) has received a renewed
attention due to the implications and the physical significance of its existence in
geophysical and paleo-hydrological time series (Gupta, 1991; Poveda, 1992), and also
because we have shown that the existence of the Hurst effect is not such a widespread
universal feature of time series, neither geophysical nor anthropogenic (Poveda, 1987;
Poveda and Mesa, 1991, Mesa and Poveda, 1993).
In part 2 we make a brief introduction on the Hurst effect. In part 3 some of the
approaches given to explain the paradox are mentioned. In part 4 we review the
estimators of h, and the hypothesis of the Hurst effect as the result of an incorrect
estimation of h, is developed. Section 5 presents the so-called GEOS and GEOS-H
diagrams for a non-stationary stochastic process. And part 6 presents the conclusions.
409
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 409-420.
© 1994 Kluwer Academic Publishers.
410 G. POVEDA AND O. 1. MESA

THE HURST EFFECT

The Hurst effect has been extensively studied in hydrology since the original paper by
Hurst (1951), and therefore its classical definition will not be developed here (see Mesa
and Poveda, 1993 or Salas et al., 1979, for detailed reviews). Let us define the Hurst
effect as an anomalous behavior of the rescaled adjusted range, Rn", in a time series of
record length n. For geophysical phenomena and "anthropogenic" time series Hurst
(1951) found the power relation Rn" = anh, with a=0.61 and the mean value of h=O.72.
For processes belonging to the Brownian domain of attraction it can be shown that the
expected value and the variance of the adjusted range is (Troutman, 1978; Siddiqui, 1978,
Mesa and Poveda, 1993):
(1)

(2)

I
where the scale of fluctuation, 9, is given as (Taylor, 1921; Vanmarcke, 1983)

tp(~)d~
8 = w:
T-
Tr(T)
(3)

n8(0)

where p is the autocorrelation coefficient, r is the variance function of local averages,


and g(O) is the normalized one-sided spectral density function at zero frequency.
The discrepancy between the average value of h=O.72 obtained by Hurst for
different time series and the asymptotic value of h=0.5 for i.i.d. processes (9=1) is known
as the Hurst effect. There is a more precise definition of the Hurst effect (Bhattacharya
et al. 1983) in terms of the functional central limit theorem that suggests examining the
behavior of sample values of Rn"/nb with n, which we have called as GEOS diagrams,
developed later on.

APPROACHES TO THE PROBLEM

Different types of hypotheses have been set forth to explain the paradox are reviewed in
Mesa and Poveda (1993), and a brief review of the models proposed to mimic the Hurst
effect (preserve h>O.5) is made by Boes (1990) and Salas et al.(1979b). Basically, the
problem has been explained as the result of violations of the functional central limit
theorem hypotheses: a) the correlation structure of geophysical processes, b) a pre-
asymptotic transient behavior, c) non-stationarity in the mean of the processes, d) self-
similarity, e) fat tail distributions with infinite second moments. In addition to these, we
have examined the possibility of an incorrect estimation of the Hurst exponent.
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 411

Non-stationarity of the mean of geophysical time series has been found in several
phenomena (Potter, 1976). This means that either their central tendency changes in time,
or it exhibits sudden changes (shifting levels). This last idea has re-emerged in the context
of climate dynamics (Demaree and Nicolis, 1990), trying to explain climatic variability
in Central Africa as a result of recurrent aperiodic transitions between two stable states
whose dynamics is governed by a non-linear stochastic differential equation.
Bhattacharya et al. (1983) showed that the Hurst effect is asymptotically exhibited
by a process X(n) formed by weakly dependent random variables perturbed with a small
trend, as the following:
K(n) = Yen) + c(m+n)p , (4)

where Y(n) is a sequence of iid random variables with zero mean and unit variance, and
c and m are integer constants. The value of Pis tightly linked to the asymptotic value of
the Hurst exponent, h, in the following way: for - 00 < P : ; -0.5 then h = 0.5; for -0.5<
P < 0 then h = 1 + P; for P = 0 then h=0.5; and for P> 0 then h=1.
ESTIMA TORS OF THE HURST EXPONENT

Different estimators have been proposed in the literature in order to determine the value
of h in finite time series. Each of them has been linked to the hypotheses presented to
explain the paradox. In this section we make a review of these estimators in order to test
their performance for both iid and non-stationary processes (equation 4). In his original
work, Hurst (1951) used three different estimators:
Estimator 1. The slope of the line passing through the point (log 2, log 1) and the
center of gravity of sample values of log ~. vs. log n.
Estimator 2. For each of the i sample values of ~', an estimator Kj is given as

Kj = log R; 1 10g(nj2) (5)

Estimator 3. For all sub-samples of length n, the average value of ~. is used in

K = log R; 1 log ( n/2 ) (6)

Chow (1951) questioned the adequacy of the linear relationship between log ~. and log
n passing through the point (2, 1) in logarithmic space. He proposed an estimator of h
(our estimator number 4) as the least-squares slope regression for sample values of log
~. vs. log n. That procedure applied to Hurst's data led to the relationship ~·=0.31no.87.
Estimator 5. Mandelbrot and Wallis (1969) suggested an estimator H defined as
the least-squares slope regression including all subsets of length j, 5 ::;; j ::;;n.
Estimator 6. Wallis and Matalas (1970) proposed a modified version of the
estimator 5, using the averaged values of ~'. Both the estimators 5 and 6 are biased in
the sense that they exhibit a positive asymmetrical distribution that diminishes as n
increases, and they also exhibit a large variance.
Estimator 7. In the chronology of the h estimation history, Gomide (1975) gives
a turning point because his estimator, YH, does not deal with least-squares slope
412 G. POVEDA AND O. J. MESA

regression. It is based on the expected value of R,,* for iid processes, in such a way that

YH '" (log R; - log (itfi.) / log n (7)

Estimator 8. Using the functional central limit theorem, Siddiqui (1976) introduced
the asymptotic result for the expected value of R" * for ARMA (p,q) process. Based on that
result he suggested the SH estimator

SH '" (log R; - log a) / log n (8)

a'" fiT2 yo-l{1. (I-te,) (l-t4lJ)-1


'~I J~I
(9)

where 'Yo is the ratio of theoretical variance of the process and the noise variance. The
ai •S and (j>;'s are the paraeters of the corresponding ARMA process. From (S) a similar
estimator, J, can be given as

[",(logR; -log(e1t/2»)/logn (10)

Poveda (1987), showed that a large set of geophysical time series exhibit values
of SH which are close to O.S. This result is in agreement with analyses developed in the
context of Bhathacharya et al.'s (1983) definition: a value of SH=O.5 (SH<O.S) (SH>O.S)
implies that, for the largest value of n, the sample value of R,,*/nO.5 is exactly (below)
(above) its expected value, which can be derived from (1). As a result, it turns out that
the Hurst effect is not such a widespread feature of geophysical time series. Estimator 8
is also tantamount to the slope SH of the regression line of R,,* vs. n (log space),
impossing a fixed intercept: log (a1t/2)0.5. Therefore, this seems to confirm our hypothesis
of the incorrect estimation of the exponent as the cause of the Hurst effect. As a matter
of fact, these results confirm a pre-asymptotic behavior in the relation R,,* vs. n before
than the behavior h=O.5 settles.
Estimator 9. Anis and Lloyd (1976) introduced an estimator for h in the case of
iid normal distributed processes, as a function of the sampling interval n, as

logE(R:+ 1 ) - logE(R:_ 1 ) (11)


h (n) '" --:----:--=-:---:-----..,.-~
log (n + 1) - log (n - I )
and they showed that, for these processes, the expected value of R,,* is

E(R:) = [P(n-l/2)]
nO.s r( n /2 ) ,=1
E(n -r)lf2r
(12)

Estimator 10. Sen (1977) developed an analytical procedure to evaluate the


expected value of the estimator 1 and 6, for the case of small samples of normal
independent random variables as
ESTIMATION OF THE HURST EXPONENT h AND GEOS DIAGRAMS 413

E(R*)
E(K) = II
log(nI2)
(13)

= 1 2 r[(n+l)/2] E(n-r)lf2
log(n!2) [1t n(n -l)f.5 r (n /2) ,...1 r
Estimator 11. McLeod and Hipel (1978) proposed two estimators of h. One is
based on the result obtained for E(R,.*) by Anis and Lloyd (1976), which is

K" = log E(R;) I log (nI2) (14)

where the value of E(R,.*) is evaluated according to (12).


Estimator 12. The second estimator proposed by McLeod and Hipel consists of a
modified version of Gomide's (1975) YH estimator, as follows

YH' = (log E(R;) - logJ1t/2) I log n (15)

in this case E(R,.*) is also obtained from (12).


Estimator 13. Salas et al' (1979a, b) introduced an estimator similar to that one
of Anis and Lloyd (1976), in the form

H = log E(R;+j) - log E(R;-J) (16)


II 10g(n+J) - log(n-J)

Evaluation of the h estimators for i.i.d. processes

Some of the estimators of h that have been proposed have been evaluated, and results
appear in Table 1. According to those results, the following conclusions can be drawn:
- Sen (1977, p.973, Table 1) presents an erroneous result for estimator 10 (Table
1, column 3) and the corrected values are shown here in Table 1, column 5. Also,
estimator 12 shows differences in Poveda's (1988) results compared with those of
McLeod and Hipel (1978), as can be seen in columns 6 and 7 of Table 1.
- Note that similar results are obtained with estimators 10 and 11 for values of n
~ 200. Despite the fact that estimator 10 was introduced for small samples , it produces
the same results as estimator 11 for n large. It is too simple to show their analytical
equality as n goes to infinity.
- Results obtained with estimators 9 and 13 differ for n=250, 500, and 2500. The
differences are due to simulated samples used to evaluate the latter one, as the former one
is an exact estimator.
414 G. POVEDA AND O. J. MESA

TABLE 1. Evaluation of h estimators for Li.d. processes


n Estimator 9 Estimator Estimator Estimator Estimator Estimator Estimator
Anis and 10 Sen 10 Poveda 11 Mcleod 12 Mcleod 12 Poveda 13 Salas
Lloyd (1979) (1987) and Ripel and Hipel (1987) et al.
(1976) (1978) (1978) (1979a, b)
10 0.627 0.69 0.655 0.687 0.432 0.382 0.627
25 0.584 0.65 0.649 0.657 0.481 0.445 0.548
50 0.561 0.63 0.635 0.639 0.497 0.468 0.561
100 0.543 0.62 0.622 0.623 0.5049 0.481 0.543
250 0.528 0.606 0.606 0.489 0.531
500 0.520 0.596 0.596 0.493 0.522
1000 0.515 - 0.587 0.587 - 0.496 0.515
2500 0.512 - 0.578 0.578 0.498 0.509
5000 0.506 0.572 0.572 - 0.499
10000 0.506 0.566 0.566 - 0.499 -
15000 0.504 0.563 0.563 0.499
20000 0.502 0.561 0.561 0.499

i I I 'I

II
<>

l
0 <>
0 0
0 0 0
<>

0
<08 0 0
I
<> 0° 0

iS IIi·
118 ,.,': ~
I 1;0
0 o 0 ~,

I
<> <>
<> S <><><><>~o
<> 0 0
81
et egg 0

<> 0 0 8<>0 0

J
0

-I
j
I I I I d 1IIIiI II ~~5
102 103 10 4
! ,

Figure 1. GEOS diagram, Bhatthacharya et al model, c=lO, m=25, ~=-1.0.

Evaluation of h estimators using a non-stationary stochastic process.

A non-stationary process such as the one given by (4) permits one to know the asymptotic
Hurst exponent, depending on the value of ~. We used that model to generate synthetic
time series of 20,000 terms to evaluate some of the aforementioned estimators of h, for
different values of ~. The estimators of h used are those numbered as 1, 2, 3, 6, 7 and
three other estimators described as follows.
ESTIMATION OF THE HURST EXPONENT h AND GEOS DIAGRAMS 415

Estimator 14At-is a modified version of Gomide's (1975) estimator 7, as

YH" = ~og R; - log bell) I log n (17)

Estimator IS. The least-squares slope regression of all sample values of log R",'
vs. log m, taking only values of m larger than or equal to n.
Estimator 16. Analogous to estimator 15, but in this case using averaged values
of R", * for each value of n.
Equation (4) was used to generate simulated non-stationary sequences, with
different sets of parameters c and m. Detailed analyses were conducted using two
different groups of parameters, the ftrst one for c=1 and m=I,OOO, and the second for c=3
and m=O. For the trend we used the values of /3= -1.0; -0.5; -0.4; -0.3; -0.2; -0.1; 0 and
0.5. As an illustration, Tables 2 and 3 show the results of the different h estimators for
the cases of c=l, m=I,OOO, /3=-0.3 (h=0.7), and c=3, m=O, /3=-0.2 (h=0.8), respectively.

TABLE 2. Estimators of h. P=-O.3, h=O.7, e=1, m=l,OOO


n Estimator 1 Estimator 2 Estimator 3 Estimator 6. Estimator 7. Estimator Estimator Estimator
Hurst (1951) Hurst (1951) Hurst (1951) WaJlis and Gomide 14.Poveda 15. Poveda 16. Poveda
Matalas (1975) (1987) (1987) (1987)
(1970)
5 0.482 0.706 0.717 . 0.134 0.268 0.574 0.541
10 0.680 0.678 0.690 0.654 0.377 0.384 0.553 0.537
25 0.617 0.647 0.657 0.622 0.414 0.445 0.532 0.536
50 0.751 0.628 0.637 0.604 0.560 0.467 0.521 0.536
100 0.704 0.613 0.620 0.589 0.549 0.477 0.521 0.540
250 0.544 0.599 0.605 0.576 0.479 0.488 0.497 0.551
500 0.565 0.590 0.594 0.566 0.466 0.491 0.505 0.576
1,000 0.594 0.572 0.575 0.551 0.502 0.485 0.551 0.626
2,500 0.577 0.554 0.556 0.535 0.497 0.478 0.662 0.709
5,000 0.541 0.547 0.547 0.522 0.471 0.476 0.797 0.800
10,000 0.571 0.563 0.564 0.525 0.504 0.497 0.826 0.792
15,000 0.586 0.586 0.586 0.537 0.520 0.520 0.453 0.453
20,000 0.581 0.581 0.581 0.542 0.518 0.518 . .

For values of /3 different from 0 (existence of trend), the obtained results confirm
a poor performance of all the estimators, except for the estimator 16 which reproduces
with good accuracy the asymptotic results of h according to the value of /3, although the
pre-asymptotic interval is variable. With the ftrst set of parameters c=l, m=1000 the
estimator gives very good results for values of n ~ 2500, and in the second simulation
there is a variable interval n, from which the asymptotic result of h is reached. Again,
these results seem to confirm the hypothesis of the Hurst effect as a pre-asymptotic effect.

DIAGRAMS GEOS AND GEOS·H

As it was mentioned before, the more precise definition of the Hurst effect deals with the
convergence in distribution of Ru"nb, with h > 0.5 (see Bhatthacharya et al., 1983).
416 G. POVEDA AND O. J. MESA

Recently, based on that definition we have introduced the so-called GEOS diagrams
(R,,"nO. 5 vs. n) and the GEOS-H diagrams (R,,"nb vs n, with h > 0.5) (Poveda,1987; Mesa
and Poveda, 1993). Based on those diagrams is a statistical test of the existence of the
Hurst effect in a given time series. The asymptotic distribution of R,,"nO. 5, for processes
belonging to the Brownian domain of attraction have a mean (I..t') and a standard deviation
(a') derived from (1) and (2) (Siddiqui, 1976; Troutman, 1978; Mesa and Poveda, 1993).
Convergence of sample values of R,,"n°.5 into the asymptotic interval given by 11'
± 2a' permits one to accept the hypothesis of non-existence of the Hurst effect. Thus, the
estimation of e becomes a fundamental issue for processes with a finite scale of
fluctuation (see Vanmarcke, 1983; Poveda and Mesa, 1991; Mesa and Poveda, 1993). On
the other hand, divergence of sample values of R,,*'nO.5 from that interval does not permit
the rejection of the hypothesis of non-existence of the Hurst effect in a time series.

TABLE 3. Estimators of 11. ~=-0.2, 11=0.8, c=3, m=O


n Estimator I Estimator 2 Estimator 3 Estimator 6. Estimator 7. Estimator Estimator Estimator
Hurst Hurst Hurst Wallis and Gomide 14. Poveda 15. Poveda 16. Poveda
(1951) (1951) (1951) Matalas (1975) (1987) (1987) (1987)
(1970)
5 0.482 0.706 0.717 .. 0.1343 0.2680 0.5739 0.5668
10 0.680 0.678 0.690 0.654 0.3773 0.3843 0.5544 0.5665
25 0.616 0.647 0.657 0.621 0.4137 0.4454 0.5361 0.5688
50 0.750 0.628 0.638 0.604 0.5601 0.4665 0.5273 0.5228
100 0.703 0.613 0.619 0.589 0.5486 0.4772 0.5217 0.5815
250 0.593 0.599 0.604 0.576 0.4780 0.4881 0.5157 0.6027
500 0.564 0.589 0.593 0.5661 0.4652 0.4913 0.5354 0.6407
1,000 0.596 0.572 0.575 0.5517 0.5035 0.4849 0.5996 0.7099
2,500 0.585 0.556 0.558 0.5359 0.5848 0.4800 0.7538 0.8275
5,000 0.539 0.547 0.548 0.5235 0.4687 0.4769 0.9586 0.9588
10,000 0.596 0.574 0.577 0.5320 0.5269 0.5088 0.9909 0.9105
15,000 0.612 0.6\2 0.613 0.5528 0.5449 0.5449 0.2905 0.2905
20,000 0.602 0.602 0.602 0.5626 0.5374 0.5374

For the case of the simulated sequences obtained using (4), the estimation of the
scale of fluctuation, e, makes no sense because its value is a function of time, and the
ergodic property fails to hold. Nevertheless, the qUalitative behavior of the sample values
of R,,"nO. 5 in the GEOS and GEOS-H diagrams was examined, for different values of c
and m. Some of the obtained results are shown in Figures 1 to 5.
Figure 1 shows GEOS diagram for the case c=lO, m=25 and ~=-1.0 (h=0.5). It is
clear that sample values of R,,*'n°.5 converge into the asymptotic interval 11' ± 2a'· For the
case of iid processes (11'=1.2533, a'=0.2733). The effect that trend produces on the iid
process is clearly observed in Figure 2 (GEOS for c=10, m=25, ~=-0.3, h=0.7). Sample
values of R,,"nO. 5 are contained within the asymptotic interval 11* ± 2a' corresponding to
the underlying iid process, except by a notorious bifurcation due to the trend itself. In this
case there is a clear evidence of the Hurst effect. This shows the power of GEOS
diagrams.
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 417

15

10 000
00/
~
00
0

~
0
0

~ I

'"
l

1
sL
0 0000
0
0

0
10 10
I
10
IIIU_ 10
§ ~ $8 0 000

10
~
lOS
I

Figure 2. GEOS diagram, Bhatthacharya et al model, c=lO, m=25, /3=-0.3.

~
"l
o
o
o 0

3 0 0

i "1
0
0

I 8 0
0
0

0
0

g 0 0
0

II
.~

°T 8

a8
0

00

o.L e
I
0
o 80 0
00 $00
0 0
O$C\>oo°""'l,
0

10 10 lOS
n

Figure 3. GEOS diagram, Bhatthacharya et al model, c=l, m=1000, /3=-0.3, h=0.7.


418 G. POVEDA AND O. J. MESA

The parameter c indicates the relative weight of the trend. The stronger the trend
the slower the convergence to the asymptotic value of R..·/n h • Figures 3 and 4 illustrate
this point: the theoretical limit values are 0.19 and 19, respectively, and the only
difference in the parameters is in the value of c. Notice that in Figure 3 the limit is
already reached, whereas in Figure 4 not yet.

l ''']
" jill 'I i j "I

6.- / ~I
I 00
00
l
j
00
00
0
0
0

~I
0

I
2~

0
00
00
0/

<> <> 00 0 00
1
--,
00 <paP
j
0
0

o~ooe~ <> <> °0

I I II iii,
o 0 og 0
o 8
0
10 10 10 10 105
n

Figure 4. GEOS diagram, Bhatthacharya et al model, c=lOO, m=lOOO, ~=-0.3, h=0.7.


To illustrate the possibility of error in the estimation of h on the POX diagram, Figure
5 represents the same set of data as in Figure 4. The least-square slope is close to 1.0,
whereas the theoretical exponent is 0.7. Evidently, this is due to a pre-asymptotic behavior
that does not distinguish between the asymptotic slope and the slope of the pre-asymptotic
tendency towards the limit from below.

CONCLUSIONS

The non-stationary model given by (4) facilitates the performance of experiments on the
estimation of the Hurst exponent, h, and allows one draw conclusions about the existence
of the Hurst effect.
The estimation of the Hurst exponent, h, has been found to be a delicate an
sensitive task. Most of estimators given in the literature make a poor performance
according to the experiment developed in this work. Especially for the cases where bO.5.
This behavior is explained due to the treatment of the regression intercept in the relation
ESTIMATION OF THE HURST EXPONENT hAND GEOS DIAGRAMS 419

~01_---'----'-L_LLLL~~2---...L-L-L...L...L.LL;~3--L-_L-L~~~~'-L---L-L'..L'~05
n

Figure 5. POX diagram, Bhatthacharya et al model, c=IOO, m=IOOO, ~=-O.3.

~. vs. n (log space). The intercept is not independent of the slope in the relation Rn' =anh.
This consideration is violated in most of the estimators proposed in the literature, and in
the whole approach to the Hurst effect. Indeed, estimator 16 (Poveda, 1987) provides
better results because it is concentrated in the region of values of n where the pre-
asymptotic behavior has disappeared, or at least is less dominant, and therefore the values
of both the intercept and the slope (the exponent) are statistically fitted to the limit values.
This work has shown that the estimation of the Hurst exponent h is not a trivial
problem: there are many difficulties involved, and therefore the tests of hypothesis that
use the GEOS and GEOS-H diagrams constitute a more direct and conclusive tool to
identify the presence of the Hurst effect.
Possible errors in the estimation of h using least-squares slope in the POX
diagrams are produced by pre-asymptotic effects, and the shortness of the record.
Estimation of e provides an answer to that problem, even in the case of a short record.
For in that case nle, a measure of the record length, will be small. This fact is not
apparent in the POX analyses.
The Hurst effect holds its importance in hydrology and geophysical time series
modeling and prediction due to its links with self-similar processes, fractals and
multifractals.
420 G. POVEDA AND O. J. MESA

REFERENCES

Anis, A.A. and Lloyd, E. H. (1976) "The expected value of the adjusted rescaled Hurst
range of independent nonnal summands", Biometrika, 63, 111-116.
Battacharya, R. N., Gupta, V. K. and Waymire, E. (1983) "The Hurst effect under trends", Jour. Appl.
Probab., 20, 3, 649-662.
Boes, D. C. (1988) "Schemes exhibiting Hurst behavior", in J. N. Srivastava (ed.) Essays in honor of
Franklin A. Graybill, Elsevier, 21-42.
Chow, V. T. (1951) "Discussion on 'Long-term storage capacity of reservoirs' by H. E. Hurst", Trans.
A.S.C.E. pap. 2447, 8~802.
Demaree, G. R. and Nicolis, C. (1990) "Onset of the Sahelian drought viewed as a fluctuation-induced
transition", Q. J. R. Meteorol. Soc., 116, 221-238.
Feller, W. (1951) "The asymptotic distribution of tbe range of sums of independent random variables", Ann.
Math. Stat., 22,427-432.
Gomide, F. L. S. (1975) "Range and deficit analysis using Markov chains" Hydrol. Pap. No. 79, Colorado
State University, Fort Collins, 1-76.
Gupta, V. K. (1991) "Scaling exponents in hydrology: from observations to theory". In: Self-similarity:
theory and applications in hydrology. Lecture Notes. AGU 1991 Fall meeting, San Francisco, 1991.
Hipel, K. W. and McLeod, A. I. (1978) "Preservatiou or the rescaled adjusted range, 2. Simulatiou studies
using Box-Jenkins models". Water Res. Res., 14,3, 509-516.
Hurst, H. E. (1951) "Long-term storage capacity of reservoirs", Trans. ASCE, 116,776-808.
Mandelbrot, B. B. (1983) The/ractal geometry o/nature. Freeman and Co., New York.
Mandelbrot, B. B. and Wallis, J. R. (1969) "Some long-run properties of geophysical records", Water Res.
Res., 5, 2, 321-340.
Mesa, O. J. and Poveda, G. (1993) "The Hurst pbenomenon: The scale of fluctuation approacb", Water Res.
Res., in press.
McLeod, A. I. and Hipel, K. W. (1978) "Preservation of the rescaled adjusted range. 1. A reassessment
of the Hurst phenomenon". Water Res. Res., 14, 3, 491-508.
Potter, K. W. (1976) "Evidence for nonstationarity as a physical explanation of the Hurst phenomenon",
Water Res. Res., 12, 5, pp. 1047.
Poveda, G. (1992) "Do paleoclimatic records exhibit the Hurst effect"" Fourth international conference on
Paleoceanography (Abstracts), GEOMAR, Kiel, Germany.
Poveda, G. (1987) "El Fen6meno de Hurst" (The Hurst phenomenon, in spanish). Unpublished Master
Thesis. Universidad Nacional de Colombia. Medellin, 230 p.
Poveda, G., and Mesa O. J. (1991) "Estimaci6n de la escala de fluctuaci6n para la determinaci6n del
fen6meno de Hurst en series temporales en hidrologia" (Estimating the scale of fluctuation to determine the
Hurst effect in hydrological time series, in spanish). II Colombian Congress on Time series analysis. Bogota,
Universidad Nacional de Colombia.
Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram, G. G. S. (1979a) "On the Hurst phenomenon", in H.
J. Morel-Seytoux (ed.) Modeling hydrologic processes. Water Resources Publ., Fort Collins, Colorado.
Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram., G. G. S. (l979b) "Hurst phenomenon as a
pre-asymptotic behavior", Jour. 0/ Hydrology, 44, 1-15.
Siddiqui, M. M. (1976) "The asymptotic distribution of the range and other functions of partial sums of
stationary processes", Water Res. Res., 12,6, 1271-1276.
Taylor, G. I. (1921) "Diffusion by continuous movements". Proc. London Math. Soc. (2),20, 196-211.
Troutman, B. M. (1978) "Reservoir storage with dependent, periodic net inputs". Water Res. Res., 14, 3,
395-401.
Vanmarcke, E. (1983) Randomjields: analysis and synthesis. The M.1.T. Press, Cambridge.
Wallis, J.R. and Matalas, N.C. (1970) "Small sample properties of H and K estimators of the Hurst
coefficient h", Water Res. Res., 6, 1583-1594.
OPTIMAL PARAMETER ESTIMATION OF CONCEPTUALLY·BASED
STREAMFLOW MODELS BY TIME SERIES AGGREGATION

P. CLAPSl and F. MURRONE2


IDept. of Environm. Engineering and Physics, University ofBasilicata
Via della Tecnica, 3 Potenza 85100 - Italy
2Dept Hydraul., Wat. Resour Manag. and Environm. Eng., Univ. of Naples "Federico II"
Via Claudio, 21 Napoli 80126 - Italy

In the framework of an integrated use, among different scales, of conceptually-based


stochastic models of streamflows, some points related to efficient parameter estimation
are discussed in this paper. Two classes of conceptual-stochastic models, ARMA and
Shot Noise, are taken under consideration as equivalent to a conceptual system
transforming the effective rainfall into runoff. Using these models, the possible benefits
of data aggregation with regards to parameter estimation are investigated by means of a
simulation study. The application made with reference to the ARMA(1,1) model shows
advantageous effects of data aggregation, while the same benefits are not found for
estimation of the conceptual parameters with the corresponding Shot Noise model.

INTRODUCTION

Streamflow time series modeling is generally intended as the closest possible


reproduction of the statistical features displayed by the phenomenon under
investigation. This is certainly what is needed in the majority of the practical cases for
which time series are analyzed, for instance planning and management of water
resources systems. Practical needs have led, in the last decades, to a prevailing
"operational" approach to time series modelling, in which little space has been left to the
analysis of physical, observable aspects in riverflow series. On the other hand, a
physically based approach to this problem addresses the reproduction as well as the
interpretation of the features of the phenomenon.
One of the requirements for a correct reproduction of the runoff process is that the
model related to a given scale must be compatible with the models referred to smaller or
aggregated scales. Even out of a conceptual approach, the problem of determining
stochastic models for aggregated data has received so far limited attention. Among the
few papers in this field, Kavvas et al. (1977),Vecchia et al. (1983), Obeysekera and Salas
(1986) and Bartolini and Salas (1993) are worth mentioning.
With regard to the above requirement, using conceptually based models allows the
basic advantage that the information related to a conceptual parameter can be
transferred from larger to smaller scales, because its conceptual meaning does not
depend on a particular time scale. Therefore, derivation of stochastic models from a
general conceptual representation of the runoff process is a first step towards
integration of models among different scales.
421
K. W. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 421-434.
© 1994 Kluwer Academic Publishers.
422 P. CLAPS AND F. MURRONE

Claps and Rossi (1992), and Murrone et al. (1992) identified stochastic models of
streamflow series over different aggregation scales starting from a conceptual
interpretation of the runoff process. In this conceptual-stochastic framework there are
conceptual parameters common to models related to different scales. The point of view
characterizing this framework, which is summarized in the next section, is that the
analysis of streamflow series should be extended beyond the scale at which data are
collected, taking advantage of information available from models of the aggregated data.
The question arise if there is a particular time scale (and, consequently, a particular
model) leading to an optimal estimation of a given parameter. The choice of an optimal
time scale is important because aggregation of data tends to reduce correlation effects
due to runoff components with small lag time with respect to the effect produced by
components with high lag time. At the same time aggregation reduces the number of
data and, consequently, the quality of estimates.
In the above approach, Claps and Rossi (1992) and Murrone et al. (1992) considered
a limited number of time scales, such as annual, monthly, and T-day (with T ranging
from 1 to 7) and showed that conceptual parameters of models of monthly and T-day
runoff are more efficiently estimated using different scales of aggregation.
An attempt to introduce a more systematic procedure in the selection of the optimal
time scale for the estimation of each parameter is made in this paper. In this direction,
simulation experiments are performed with regards to ARMA (Box and Jenkins, 1970)
and Shot Noise (Bernier, 1970) stochastic models equivalent to a simple conceptual
model of the runoff process.

CONCEPTUAL-STOCHASTIC MODELS AND TIME SCALES

The rationale of conceptualization

In the approach by Claps and Rossi (1992) and Murrone et al. (1992), formulation of a
conceptual model for river runoff is founded on the "observation" of riverflow series over
different aggregation scales and on the knowledge of the main physical (climatic and
geologic) features of basins.
Considering Central-Southern Italy watersheds, dominated by the hydrogeological
features of Apennine mountains, distinct components can be recognized in the runoff: (1)
the contribution provided by aquifers located within large carbonate massifs, that has
over-year response time to recharge (deep groundwater runofi); (2) a component, which
is due to both overflow springs and aquifers within geological non-carbonate formations,
which usually run dry by the end of the dry season (seasonal groundwater runofi); (3)
the contribution by soil drainage, having a delay of several days with respect to
precipitation (subsurface runofi); (4) the surface runoff, having lag-time that depends on
the size of the watershed (for the rivers analyzed by the Murrone et al. (1992), this lag
ranges between a few hours to almost two days). In some cases, the deep groundwater
component is lacking, reducing runoff components to three. The snowmelt runoff in the
region considered is negligible. The above runoff components assume different
importance with respect to the time scale of aggregation, leading to conceptual models of
increasing complexity moving from the annual to the daily scale.
Bases for conceptual-stochastic model building proposed for the monthly scale
(Claps and Rossi, 1992; Claps et aI., 1993) and for the daily scale (Murrone et aI., 1992)
are essentially: (1) subsurface and groundwater systems are considered as linear
CONCEPTUALLY-BASED STREAMFLOW MODELS 423

reservoirs, with storage coefficients Kl , K2 , Ka, going from the smallest to the largest; (2)
runoff is the output of a conceptual system made up of the above reservoirs in parallel
with a zero-lag linear channel reproducing the direct runoff component; (3) when a
storage coefficient is small with respect to the time scale considered, the related
groundwater component becomes part of the direct runoff term, which is proportional to
the system input; (4) the effective rainfall, i.e. total precipitation minus
evapotranspiration, is the conceptual input to the system; this variable is not explicitly
accounted for in the models, which are univariate; (5) effective rainfall volumes
infiltrate into subsurface and groundwater systems at constant rates (recharge
coefficients Cl, C2, Ca, respectively) over time.
The main issues of model identification for annual, monthly and daily scales are
summarized below.

Annual scale

Rossi and Silvagni (1980) first supported on conceptual basis the use of the ARMA(l,l)
model for annual runoff series, based on the consideration that the correlation structure
at that scale is determined by the deep groundwater runoff component. The use of this
model for annual runoff modelling was proposed by O'Connell (1971) in virtue of its
capacity of reproducing the long-term persistence displayed by annual runoff data. Salas
and Smith (1981) showed how a conceptual system composed by a linear reservoir in
parallel with a linear channel fed by a white noise input behaves as an ARMA(1,!)
process.
Given an effective rainfall input It which infIltrates with a rate Calt and whose part
(l-ca)It goes into direct runoff, based on the hypothesis that the input is concentrated at
the beginning of the interval [t-1, t], the volume balance equations produce

D I - e- lIK 3 D 1-1 -- (1- c 3 e - lIK 3 ) I I - e - lIK 3 (1- C3 ) I 1-1 (1)

where Dt is the runoff in year t. This hypothesis can be removed considering different
shapes of the within-year input function (Claps and Murrone, 1993). The hypothesis that
It is a white noise process leads to an ARMA(1,l) model

(2)
in which dt equals Dt - E[Dtl, cI> and e are the autoregressive and moving average
coefficients, respectively, and Et is the zero-mean model residual. Conceptual and
stochastic parameters in (1) and (2) are related by:

9 = cI>(1 - c3)
(1 - c3cI»
(3)
cI>-9
K3 = -1/ In(cI>); c3 = 1-9
The expression of Ca for uniform within-period distribution of input is
424 P. CLAPS AND F. MURRONE

<1>-6
(4)
(1-6) K3 (1_e- IIK3 )
The ARMA model residual is proportional to the effective rainfall by means of:

(5)
In absence of significant groundwater runoff, Rossi and Silvagni (1980) showed that
annual runoff in the hydrologic year is an independent process that follows a Box-Cox
transformation of the Normal distribution. The notion of hydrologic year, which starts at
the end of the dry season, is important because if a wet season and a dry season can be
distinguished, the absence of significant runoff in the dry season determines absence of
correlation in the hydrologic year runoff series.

Monthly scale

The assumptions recalled above on the role of the different components in streamflow
lead to consideration that correlation effects in monthly runoff are due both to long-term
persistence, due to the deep groundwater runoff, and to short-term persistence due to
the seasonal groundwater runoff. The conceptual system identified by means of these
considerations consists of two parallel linear reservoirs plus a zero-lag linear channel.
This latter accounts for the sub-monthly response components included into the direct
runoff.
The share C3It of the effective rainfall is the recharge of the over-year groundwater,
with storage coefficient Ka, while C2It is the recharge of the seasonal groundwater, with
storage coefficient K2. All Cj and Kj parameters are kept constant. Approximations
determined by the latter assumption are compensated for by parsimony in the number of
parameters and by the significance given to the characteristics of the input It,
considered as a periodic-independent process. Periodic variability of the recharge
coefficients C2 and C3 is substantially due to variability in soil moisture, which is a
product of rainfall periodic variability.
Claps and Rossi (1992) and Claps et al. (1993) have shown that volume balance
equations for the conceptual model under exam are equivalent to an ARMA(2,2)
stochastic process with periodic-independent residual (PIR-ARMA), expressed as

(6)
with dt and Et having zero mean. The formal correspondence between the stochastic and
conceptual representations is obtained through the relations:

(7)

(8)
CONCEPTUALLY -BASED STREAMFLOW MODELS 425

(9)

-(8 1 - 8 2 ) N + (<I>I - <1>2) M + (1 + 2 e- IIK2 ) (N - M)


(10)
C2 = 2 M (e- 1/K3 _e- 1/K2 )r2
where N = (1 - cl>1 - cl>2), M = (1 - 81 - 82), r3 = Ka (l-e-11K3 ) and r2 = K2 (1-e-1IK2). In
addition, in the conceptual scheme the residual £t; is proportional to the zero.-mean
effective rainfall it according to the relation

(11)

If the over-year groundwater component is negligible, as for instance in practically


impermeable basins, the conceptual system reduces to one reservoir in parallel with a
linear channel, underlying a PIR-ARMA(l,l) stochastic process.
Probability distribution of monthly effective rainfall is assumed by Claps and Rossi
(1992) as the sum of a Bessel distribution (Benjamin and Cornell, 1970, p. 310), arising
from the sum of a Poissonian number of exponentially distributed events, and a
Gaussian error term. A Box-Cox transformation of non zero data was also proposed by
Claps (1992).
To preserve the formal correspondence between the conceptual and stochastic
representations of the process, neither deseasonalization nor transformation procedures
are applied to recorded data.

T-day scale: multiple Shot Noise model

The Shot Noise (Bernier, 1970) is a continuous-time stochastic process representing a


phenomenon whose value, at a certain time, is determined additively by the effects of a
random number of previous point events. This process is determined by knowledge of: (1)
the occurrence times of events, 'ti; (2) the input impulse intensity related to the events,
Yi; and (3) the response function ofthe system, h(·), describing the propagation in time of
the effects of each impulse.
The hypotheses made for this kind of process are: (a) the h(·) function is continuous,
infmitesimal for t tending to infinity, and integrable; (b) intensities Yi are random
variables independent and identically distributed, with finite variance; and (c) event
occurrence times 'ti are generated by a homogeneous Poisson process. The process is
stationary if its origin tends to -00, meaning that the origin must be far enough from the
time under consideration. Runoff D can be thus expressed, in continuous time, as

N(1)
D{'t) = 'LYih(t-t) (12)
N(-~)

where N('t) is the counting function of the Poisson process of occurrences.


In the conceptual framework considered (Murrone et aI., 1992), the response
function hO is a linear combination of responses of the conceptual elements. If the
surface network is considered to behave as a linear reservoir, h(·) is expressed as
426 P. CLAPS AND F. MURRONE

h(s) - elK
- 0 0 e-
s/Ko + elK e- s/KI + elK e- s/K2 + elK e- s/K3
II 22 33 (13)
with s = 't-'t;.. The basin response is dermed by 8 parameters: the four storage
coefficients, Kj , and the four recharge coefficients, Cj, of which only 7 are to be estimated
given the volume continuity condition, ~=1. The Cj coefficients represent the share of
runoff produced, in average, by each component. To limit the number of parameters and
to take advantage by the linearity hypotheses, coefficients Cj and K; are considered
constant, i.e. the response function h(·) is kept constant.
The process (12) has wmite memory, which represents the current effect of
previous inputs to the system. This effect can be evaluated at a fIxed initial time, to = 0,
by knowing the groundwater runoff quota at that time. At the beginning of the
hydrological year (October 1 in our case), the seasonal and subsurface groundwater
contributions are negligible relative to the deep groundwater runoff. Therefore the value
Do of discharge at that time can be a good preliminary estimate of the groundwater
runoff amount, thus expressing (12) as:

N(t)
D(t) = Do e-t/K3 + I,Yj h(t-t j ) (14)
N(O)

The discretized form of the continuous process (14) is obtained by its integration
over the interval [(t-l)T, tTl, where t = 1,2,... is the index describing the set of sampling
instants and T is the sampling time interval. If the aggregation occurs on a T-day scale
and integration is applied according to the linearity and stationarity hypotheses, the
following discretized formulation is obtained:
t
D t -- K 3 e-tT/K3(eT/K3_1)X0 +~y.
~ t-s+1
h
s (15)
s=1

where yt represents the sum of impulses occurred during the interval [(t-l)T, tTl and
the integrated response is expressed as:

(16)
3 c. -Kj [T/K.
h = I, e +e-T/K· 2] e -T(s-I)/K·
SIT
I 1_ I
'
s>1
j=O

The function hs represents the response of the system determined by a unit volume
impulse of effective rainfall, uniformly distributed within the interval.
When the scale of aggregation T is chosen as considerably larger than the surface
runoff lag-time, the surface runoff component can be considered as the output of a zero-
lag linear channel, which has response function col)(O), with ~(.) as the Dirac delta
function. This reduces to six the number of parameters to be estimated.
The structure of daily effective precipitation has been represented as uncorrelated,
like in Poisson white noise models (Bernier, 1970) or characterizes by Markovian arrival
CONCEPTUALLY-BASED STREAMFLOW MODELS 427

process (e.g. Kron et aI., 1990) or described by models based on the arrival of clusters of
cells, such as the Neyman-Scott instantaneous pulse model (e.g. Cowpertwait and
O'Connell, 1992). The distribution considered by Murrone et al.(1992) is a Bessel
distribution, corresponding to a Poisson white noise probabilistic model.

SIMULATION STUDY

Prerequisites to the simulation


For the reasons expounded in the introduction, the simulation study undertaken
here aims primarily to set a number of basic points in evaluating theoretically the
effects of aggregation on parameter estimation. The problem here is not to identify the
most correct model (as, for instance, in Jakeman and Homenberg, 1993) but to
understand if there are peculiar scales for estimation of parameters of a given model
with pre-determined structure, as in Claps et al., 1993. Simple hypotheses in terms of
input and system structure were adopted for the simulation, to grasp the basics of the
positive or negative effects of aggregation in time.
A linear system was considered, which consisted of one linear reservoir, with
storage coefficient K, and one linear channel, in parallel, with lag zero. As shown with
reference to annual runoff, this system, fed by a stochastic input, is equivalent to an
ARMA(1,I) model when the input is a continuous process. For input as a point process
this system is equivalent to a single Shot Noise model (as compared to the multiple
version arising from the presence of more than one reservoir).
For each set of "true" parameters c and K (written in bold) of the linear system,
20000 output data were generated. On the data obtained from Gaussian input,
parameters of the ARMA(1,I) model were estimated and expressed, through (3), in terms
of conceptual parameter estimates cand K. Shot Noise model parameters were
estimated on data generated from Bessel input. The fll'St 10000 synthetic runoff data
were not considered in the estimation, as a warm-up length (Salas et aI., 1980, p.356).
This length was set well beyond the suggested limits, to deflnitely eliminate possible
"starting condition" effects.
The recharge coefficient c, indicating the amount of input entering the reservoir,
ranged from 0.5 to 1. In the model of annual runoff c is less than 1 while the case c=1
corresponds to the model of a spring (see Claps and Murrone, 1993). The storage
coefficient K was set in a range from 2 to 120 time units (t.u.). The 'time unit' is one unit
of the time scale at which input and output data are generated and is also called the
reference scale. There is no need to express the storage coefficient in terms of hours or
days because what is important is to indicate the value of the parameter in terms of a
multiple of the scale of generation. Accordingly, time scales at different levels of
aggregation are identifled in number oftime units.
In a preliminary set of simulations, the effect of the input standard deviation 0 was
recognized as null for Gaussian data and practically negligible for Bessel data. For this
reason, only one level of input variability was considered for each distribution, namely
0=113 for Gaussian input and 0=3 for Bessel input. For both cases the mean was set to 1.
To allow comparison of parameter estimates made on data obtained with the same
"true" values but in different conditions, standard errors of parameters and the
explained variance R2 were used. R2 is defmed as 1-0£2/02 , where 0£2 indicates the
428 P. CLAPS AND F. MURRONE

residual variance (taken as the variance of the surface runoff component in the Shot
Noise model) and (J2 indicates the variance of the synthetic runoff series.

Application

Main points to focus with the aid of simulations are: (1) In which manner the
resolution of a linear reservoir depends on the relative mean (coefficient c) of its output
with respect to total runoff? (2) Is there a preferential scale for the estimation of the
storage coefficient?
Results of parameter estimation on simulated data, reported below, suggest a
number of comments.

ARMACl,l) model
c
The following comments arise from estimation of and K through the ARMA(l,l)
model:
1. Results of parameter estimation, reported in Table 1a to 1c (referred to c=O.5,
c=O.8 and c=l, respectively), show that aggregation reduces the variance of the fraction
of input not entering the reservoir, producing higher values of the explained variance
R2. The obvious exception is the case c=l, in which there is no pure white noise
component. For the case c=l, the model fitted to the data is still the ARMA(1,l), for it is
the most general model of a single linear reservoir with generic within-period form of
the input function (Claps and Murrone, 1993). This adds information in providing an
estimate of c, obtained through (4).
2. A progressive increase in the standard error of the estimates also occurs with the
aggregation, due uniquely to the decrease in the number of data. Table 2 shows that
estimations made on the reference scale (1 t.u.) over limited samples produce standard
errors greater than the corresponding standard errors for data aggregated on 7,15 and
30t.u.
3. For cd, K is clearly underestimated. This tendency becomes more noticeable
with increasing K and with decreasing c. Values found for R2, which decreases in the
same circumstances, reflect the poor estimate of K. A tendency toward a preferential
scale for parameter estimation is not recognizable from the results shown in Table 1a
and lb.
4. More understandable results are obtained by estimating K and C on scales
aggregated in unit steps, from 1 to 15 t.u. In this regard, Figures 1-3, clearly show that
when K is much greater than the reference scale, aggregation produces better conditions
for parameter estimation. The progressive increase in K and c up to a sill (Figure 1),
give sufficient indications of this benefit. Therefore, the preferential scale, must be the
one in correspondence of which the sill is reached (7 t.u. for this case), as a trade-off
between the increase in R2 and the increase in the standard error of estimates.
Based on the results reported above, it seems that the preferential scale decreases
with increasing c and with decreasing K (in general one should speak in terms of non-
dimensional preferential scale, Le. divided by K). Figure 2, with c=O.5 and K=15
confirms this tendency showing a substantial constancy both in K and c, that would
indicate that the sill is reached at the reference scale. On the other hand, when c=l the
reference scale is the best one for estimation regardless of K, since quality of estimates
CONCEPTUALLY ·BASED STREAMFLOW MODELS 429

degrades with aggregation (see the decrease of c and R2 in Table Ie and the decrease of
c in Figure 3).
TABLE la. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.5 (scale in t.u.)

e e
A A A A
scale :t{(t.u.) C :t{(t.u.) C

K=120 K=90
1 10.49 0.227 0.909 0.884 0.002 17.01 0.322 0.943 0.917 0.004
7 104.5 0.490 0.935 0.877 0.029 89.53 0.508 0.925 0.853 0.037
15 96.18 0.508 0.856 0.727 0.063 81.30 0.524 0.832 0.677 0.078
30 83.31 0.528 0.698 0.457 0.100 72.00 0.548 0.659 0.383 0.117
60 84.12 0.538 0.490 0.171 0.111 73.20 0.553 0.441 0.102 0.117
K=60 K=30
1 29.08 0.461 0.966 0.938 0.008 22.74 0.509 0.957 0.914 0.018
7 67.72 0.522 0.902 0.805 0.051 34.42 0.513 0.816 0.657 0.074
15 61.82 0.537 0.785 0.588 0.098 36.08 0.537 0.660 0.393 0.117
30 57.12 0.565 0.592 0.272 0.135 36.30 0.559 0.438 0.092 0.128
60 57.82 0.557 0.354 0.012 0.111 36.67 0.523 0.195 -0.084 0.069
K=15 K=7
1 13.14 0.515 0.927 0.855 0.033 7.22 0.529 0.871 0.744 0.059
7 14.78 0.492 0.623 0.380 0.090 6.33 0.491 0.331 0.048 0.083
15 18.63 0.505 0.447 0.154 0.098 7.63 0.471 0.140 -0.083 0.046
30 25.99 0.483 0.315 0.041 0.078

TABLE lb. Estimations from: ARMA(I,I) model, Gaussian input, c = 0.8 (scale in t.u.)
scale :t{(t.u.) C
A
e
A
:t{(t.u.) C
A
eA

K=120 K=90
1 83.24 0.784 0.988 0.946 0.067 71.23 0.795 0.986 0.934 0.089
7 119.9 0.799 0.943 0.746 0.273 95.48 0.806 0.929 0.683 0.317
15 120.4 0.811 0.883 0.507 0.398 96.02 0.814 0.855 0.413 0.430
30 109.8 0.816 0.761 0.164 0.456 90.78 0.820 0.719 0.068 0.462
60 109.6 0.791 0.578 -0.079 0.386 87.66 0.781 0.504 -0.141 0.350
K=60 K=30
1 55.27 0.809 0.982 0.910 0.126 29.45 0.811 0.967 0.835 0.205
7 64.34 0.806 0.897 0.565 0.369 30.44 0.796 0.795 0.289 0.414
15 67.23 0.812 0.800 0.266 0.448 33.87 0.787 0.642 0.017 0.399
30 67.80 0.812 0.643 -0.039 0.436 40.41 0.747 0.476 -0.109 0.303
60 62.46 0.750 0.383 -0.1920.273 36.44 0.654 0.193 -0.188 0.126
K=15 K=7
1 15.57 0.813 0.938 0.707 0.300 7.59 0.814 0.877 0.481 0.398
7 14.36 0.780 0.614 -0.008 0.384 6.59 0.743 0.345 -0.207 0.255
15 15.57 0.733 0.381 -0.168 0.255 5.51 0.762 0.066 -0.265 0.093
30 27.79 0.596 0.340 -0.039 0.137
430 P. CLAPS AND F. MURRONE

TABLE lc. Estimations from: ARMA(I,I) model, Gaussian input, c = 1.0 (scale in t.u.)
A A
scale I{(t.u.) C «i> ~ R2 I{(t.u.) C «i> ~ R2
K=120 K=90
1 128.8 1.000 0.992 -0.981 0.996 98.27 1.000 0.990 -0.980 0.995
7 124.1 0.985 0.945 -0.301 0.937 94.22 0.980 0.928 -0.300 0.917
15 132.7 0.968 0.893 -0.260 0.863 99.47 0.957 0.860 -0.258 0.823
30 135.2 0.938 0.801 -0.252 0.752 104.7 0.920 0.751 -0.242 0.691
60 115.7 0.878 0.596 -0.285 0.546 89.46 0.848 0.511 -0.277 0.458
K=60 K=30
1 66.11 1.000 0.985 -0.980 0.993 32.74 1.000 0.970 -0.977 0.985
7 62.92 0.971 0.895 -0.299 0.878 30.81 0.943 0.797 -0.299 0.767
15 65.88 0.937 0.796 -0.255 0.747 32.13 0.879 0.627 -0.253 0.555
30 73.47 0.884 0.665 -0.221 0.581 41.82 0.781 0.488 -0.158 0.348
60 63.00 0.791 0.386 -0.253 0.326 36.34 0.663 0.192 -0.196 0.134
K=15 K=7
1 16.14 1.000 0.940 -0.969 0.969 7.46 1.000 0.875 -0.947 0.933
7 14.97 0.893 0.627 -0.303 0.585 6.82 0.812 0.358 -0.304 0.330
15 15.08 0.793 0.370 -0.266 0.309 4.98 0.858 0.049 -0.304 0.103
30 29.16 0.602 0.358 -0.030 0.144

TABLE 2. ARMA(1,I) model: standard errors of estimates made on different scales


compared to standard errors of estimates made on limited samples (c=0.5, K=60 t.u.)
aggregated limited sample
n. of std. err. std. err. std. err. std. err.
data
«i> ~ «i> ~
«i> ~ «i> ~
1448 0.902 0.805 0.0309 0.0429 0.6473 0.6061 0.1826 0.1906
666 0.785 0.588 0.0636 0.0838 0.4936 0.4596 0.345 0.3526
333 0.592 0.272 0.1180 0.1407 0.2146 0.1833 0.5067 0.5124

Shot NQilm mQd~1

The fust consideration arising from the observation of Tables 3-4 and Figure 4 is that
aggregation has quite different effects on the Shot Noise model estimates than for the
ARMA model. With the Shot Noise model there are no evident benefits arising from
aggregation, since best estimates are always obtained at the reference scale. The
increasing bias of the estimated values of both parameters with aggregation does not
leave much room for other considerations.
This outcome could be due to the alterations that aggregation induces in the
impulse occurrence and intensity and reflects some peculiar characters of this class of
models. The positive aspect of this behavior is that even large storage constants can be
identified (with some bias) at the reference scale.
CONCEPTUALLY-BASED STREAMFLOW MODELS 431

Another interesting aspect is that the increase of c reduces negative effect of


aggregation on the estimate c.
This could be due to the reduced alteration of the white
noise component, with aggregation. occurring when c increases.

0.6

0.5 .
<.>
!l
I
0.4

0.3

0.2
0 4 6 8 10 12 14 16
aggregation scale (t.u.)

150

~ 100 .
i
~:J 50

0
0 2 4 6 8 10 12 14 16
aggregation scale (tu.)
Figure 1. Arma(1,l) model: Parameter estimates on aggregated data
(c=O.5, K=120 t.u.).

0.54r----,.---..---~--,..----___,_--__._--_,_-___,

<.>
0.52
i
~ 0.5·

:J 0.48

O.46L-----'-----'---.L-----'----'-----'---":----'
o ~
aggregation scale (l.u.)

30
25 •••••• , ••••••••••••••• _.! ••••••••••••••~ •••••••••••••• ,"

~
'0 20 .......... .. _....
~ 15
il
10

~L-----'-2--~4~-~6--~8~--1~0--~12~-~1~4-~16

aggregation scale (l.u.)

Figure 2. Arma(1,l) model: Parameter estimates on aggregated data


(c=O.5, K=15 t.u.).
432 P. CLAPS AND F. MURRONE

0.99 ...
u

J" 0.98

0.97 .........

0.96
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)

135

130
~

J" 125 ............. -: ...

120

115
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)

Figure 3. Arma(1,l) model: Parameter estimates on aggregated data


(c=l, K=120 t.u.).

TABLE 3. Estimations from: Shot Noise model, Bessel input, conceptual model of one
reservoir with a linear channel (K=60 t.u.)
c=0.5 c=0.8 c = 1.0
scale
A
C it R2 A
C it R2 C
A
:It R2
1 0.532 48.52 0.026 0.813 55.19 0.092 0.993 66.39 0.933
7 0.704 49.54 0.210 0.872 77.68 0.386 0.969 110.56 0.796
15 0.756 82.36 0.242 0.896 112.24 0.474 0.953 172.23 0.707
30 0.837 214.76 0.263 0.912 261.86 0.419 0.942 229.29 0.613

TABLE 4. Estimations from: Shot Noise model, Bessel input, (c=O.5, K=120 t.u.)
c=O.5
scale
A
C it R2
1 0.533 87.11 0.018
7 0.695 69.35 0.158
15 0.766 162.39 0.175
30 0.837 339.92 0.216
CONCEPTUALLY-BASED STREAMFLOW MODELS 433

0.8

u
0.7
~
.§'"
~ 0.6

0.5
0 4 6 8 10 12 14 16
aggregation scale (t.u.)

100

80
:.:
~ 60
.'E"~"
40

20
0 2 4 6 8 10 12 14 16
aggregation scale (t.u.)

Figure 4. Shot noise model: Parameter estimates on aggregated data


(c=0.5, K=60 t.u.).

FINAL REMARKS

In a conceptually-based stochastic framework for the analysis of the runoff data at


different scales, a simulation study was undertaken to assess possible effects of
aggregation on parameter estimation. A simple linear conceptual system, made up of a
linear reservoir and a linear channel, was used to generate runoff data from Gaussian
and Bessel input, and a conceptually based ARMA(1,1) model and a single Shot Noise
model were respectively fitted to the data, providing estimates of the conceptual
parameters.
Analysis of the results emerging by re-estimation of "true" parameters by means of
these models showed that aggregation plays a significant role in achieving correct
estimates for the ARMA(1,1) model. In particular, the optimal aggregation scale is the
one at which both estimates of the conceptual parameters attain a "sill" level, which is
shown to correspond to the least biased value. On the other hand, aggregation does not
produce the same effect on the Shot Noise model, for which the scale of generation was
found to be the most significant for parameter estimation.
Although a more extensive work is needed to test the effect of aggregation on
estimation of parameters of more complex systems, these results constitute an
interesting starting point as a theoretical support to the use of integrated conceptually-
based models.

REFERENCES

Bartolini, P. and J.D. Salas (1993) "Modeling of streamflow processes at different time
scales", Water Resour. Res., 29 (8),2573-2587.
Benjamin, J.R. and C.A. Cornell (1970) Probability, Statistics and Decision for Civil
Engineers, Mc. Graw Hill Book Co., New York.
434 P. CLAPS AND F. MURRONE

Box, G.E. and G. Jenkins (1970). Time Series Analysis, Forecasting and Control. Holden-
Day, San Francisco (Revised Edition 1976).
Bernier, J. (1970) "Inventaire des Modeles des Processus Stochastiques applicables ala
Description des Debits Joumaliers des Rivieres", Rev. Int. Stat. Inst., 38, 49-61.
Claps,P. e F. Rossi (1992) "A conceptually-based ARMA model for monthly streamflows",
in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92, Proc. of the Sixth IAHR IntI.
Symp. on Stochastic Hydraulics, Dept. of Civil Engrg., NTU, Taipei (Taiwan), 817-824.
Claps, P. (1992) "Sulla validazione di un modello dei deflussi a base concettuale", Proc.
XXIII Conf. Hydraul. and Hydraul. Struct., Dept. Civil Eng., Univ. of Florence, D.91-
D.102.
Claps, P. and F. Murrone (1993) "Univariate conceptual-stochastic models for spring
runoff simulation", in M.H. Hamza (Ed.) Modelling and Simulation, Proc. of the XXIV
lASTED Annual Pittsburgh Conference, May 10 - 12, 1993, Pittsburgh, USA, 491-494.
Claps, P., F. Rossi and C. Vitale (1993) "Conceptual-stochastic modeling of seasonal
runoff using Autoregressive Moving Average models and different scales of
aggregation", Water Resour. Res., 29(8), 2545-2559.
Cowpertwait, P.S.P. and P.E. O'Connell (1992) "A Neyman Scott shot noise model for the
generation of daily streamflow time series", in J.P. O'Kane (Ed.) Advances in Theoretical
Hydrology, Part A, chapter 6, Elsevier, The Netherlands.
Jakeman, A.J. and G.M. Hornberger (1993) "How much complexity is warranted in a
rainfall-runoff model?", Water Resour. Res., 29 (8), 2637-2649.
Kavvas, M.L., L.J. Cote and J.W. Delleur (1977) "Time resolution of the hydrologic time-
series models", Journal of Hydrology, 32, 347-361.
Kron W, Plate E.J. and Ihringer J. (1990) "A Model for the generation of simultaneous
daily discharges of two rivers at their point of confluence", Stochastic Hydrol. and
Hydraul., (4), 255-276.
Obeysekera, J.T.B. and J.D. Salas (1986) ''Modeling of aggregated hydrologic time
series", Journal of Hydrology, 86, 197-219.
O'Connell, P.E. (1971) "A simple stochastic modeling of Hurst's law", Int. Symp. on
Math. Models in Hydrology, Int. Ass. Hydrol. Sci. Warsaw.
Murrone, F., F. Rossi and P. Claps (1992). "A conceptually-based multiple shot noise
model for daily streamflows", in J.T. Kuo and G.F.Lin (Eds.) Stochastic Hydraulics '92,
Proc. of the Sixth IAHR Inti. Symp. on Stochastic Hydraulics, Dept. of Civil Engrg.,
NTU, Taipei (Taiwan R.O.C.), 857-864.
Rossi,F. and G. Silvagni (1980). "Analysis of annual runoff series", Proc. Third IAHR Int.
Symp. on Stochastic Hydraulics, Tokio, A-18(1-12).
Salas, J.D. and R.A. Smith (1981) "Physical basis of stochastic models of annual flows",
Water Resour. Res., 17(2),428-430.
Salas J.D., Delleur J.W., Yevjevic V. and Lane W.L. (1980). Applied Modeling of
Hydrologic Time Series. Water Resources Publications, Littleton, Colorado.
Vecchia, A.V., J.T.B.Obeysekera, J.D. Salas and D.C. Boos (1983) "Aggregation and
estimation oflow-order periodic ARMA models", Water Resour. Res., 19(5),1297-1306.

Acknowledgments

This work was supported by funds granted by the Italian National Research Council -
Group for Prevention from Hydrogeological Disasters, grant no. 91.02603.42.
ON IDENTIFICATION OF CASCADE SYSTEMS BY
NONPARAMETRIC TECHNIQUES WITH APPLICATIONS TO
POLLUTION SPREAD MODELING IN RIVER SYSTEMS

A. KRZYZAK
Department of Computer Science
Concordia University
1455 de Maisonneuve Blvd. West
Montreal, Quebec, Canada H3G 1M8

In the paper identification of nonlinear memoryless cascade systems is discussed.


An optimal model of a memory less cascade system is derived and estimated by
the kernel regression estimate. Convergence of identification procedures is inves-
tigated. Possible extensions to nonlinear dynamical systems of the Hammerstein
and Wiener type are also discussed.

INTRODUCTION

Identification of pollution spreading in the river systems is an important prob-


lem. In order to simplify the modeling process we assume that a river or canal
can be divided into segments connected serially and each segment is modeled by
a nonlinear mapping describing polution concentration at the begining and the
end of the segment. Segment boundaries may correspond to locations of estuaries
or industrial dumping sides. In the present model we only describe the steady
state behaviour, that is we do not incorporate dynamics into our model, however
we will indicate possible extensions in that direction. All measured quantities are
assumed to be random with completely unknown distributions. The identified sys-
tem is assumed to be highly nonlinear. Identification algorithms are nonparamteric
and are based on kernel regression estimates. Identification of nonlinear systems
remains an important and challenging problem (Billings 1980). Methods based
on Volterra and Wiener series expansions are available for identification of gen-
eral nonlinear systems (Banks 1988). These methods are rather complicated and
require selecting the number of terms in the expansion. They also result in high-
dimensional parameter estimation problem. Nonlinear difference and differential
equations require detailed knowledge of the system and often lead to complicated
solutions involving phenomena such as bifurcations and chaos. In this context
many authors considered specific nonlinear systems and obtained efficient iden-
tification algorithms. The simplest configurations include block oriented systems
connected in a cascade. Block oriented systems have been studied by Sandberg
435
K. W Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 435-448.
© 1994 Kluwer Academic Publishers.
436 A.KRZYZAK

(1991). Identification of a cascade of memoryless subsystems has been studied by


Greblicki and Krzyzak (1979) and Krzyzak (1993). The Wiener system consists of
a linear dynamic subsystem followed by a nonlinear memoryless subsystem, while
the Hammerstein system consists of zero-memory nonlinearity followed by a lin-
ear filter. The Wiener model has been applied to signal detection by Masry and
Cambanis (1980). The Hammerstein model has been introduced to system identi-
fication by Narendra and Gallman (1966) and further studied by Chung and Sun
(1988) and Billings and Fakhouri (1979). It has been applied to adaptive control
by Kung and Womack (1987) and to identification of biogical systems by Hunter
and Korenberg (1986).
In the present paper we study in detail optimal modeling and identification
of a cascade of memory less subsystems and we briefly mention its extension to
dynamical cascade systems such as Hammerstein and Wiener nonlinear systems.
A particular version of a cascade memoryless system has been considered by Gre-
blicki and Krzyzak (1979). The authors derived the optimal model of a system
consisting of two subsystems in the case when the second subsystem was invert-
ible. They studied weak consistency of identification procedures. We extend these
results to a cascade of n not-necessarily-invertible subsystems. We study strong
convergence of identification algorithms. Subsequently we apply nonparametric
techniques to two configurations of dynamic, nonlinear systems: the Hammerstein
system and the Wiener system. We first consider Hammerstein system and then
apply the results to the Wiener system by inverting the estimate of memoryless
nonlinearity using the estimate of regression function inverse developed for cascade
systems. In both systems we identify linear and nonlinear 'components simulta-
neously from the observations of the inputs and outputs of the whole systems.
The linear subsystems are described by the ARMA model, whose coefficients are
estimated by the correlation method. The nonlinearities cen be identified by the
nonparametric kernel regression estimate. The intermediate signal between the
nonlinear and linear components are not measured resulting in the possibility of
identyfing both components up to an additive and multiplicative factor. Identifica-
tion of the Hammerstein system has been studied by many authors. Nonrecursive
kernel estimation has been used to identify the system by Greblicki and Pawlak
(1986) and Krzyzak (1990). The recursive version has been investigated by Krzyzak
(1992). The estimate based on orthogonal expansions has been studied by Krzyzak
(1989) and Pawlak (1991). In order to identify Wiener system we use the estimates
of linear and nonlinear components similar to the those used in the Hammerstein
system. To recover the nonlinear part we utilize the techniques developed for the
cascade system identification. The convergence results for the kernel estimate of
the Wiener model have been given in Greblicki (1992) and Krzyzak (1993).
In nonparametric identification we do not restrict nonlinearities to a class of
functions described by a finite number of parameters, such as polynomials or
trigonometric functions. The class of nonlinearities we are capable to recover
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 437

z z 2 z
1 n
1
x~ Sl
xl .1 S2 ~2 Sn
xn
1

Figure 1: Identified cascade system

~ ~21 ~n
u1
Ml M2 Mn
1 1

Figure 2: Model of identified cascade system

by our kernel estimate includes the class of all Borel measurable functions. This
class is too large to be finitely parameterized: therefore a nonparametric approach
is chosen in this paper. Standard nonlinearities such as dead-zone limiters, hard-
limiters and quantizers are included in the class of nonlinearities we can identify
by our techniques.

OPTIMAL MODEL OF CASCADE STRUCTURE

We consider a system consisting of a cascade of n (n ~ 2) memoryless subsystems


as shown in Figure 1. All signals Ul, ••• ,Un are random and of the same dimension,
say d. The model of the system from Figure 1 has the same structure shown in
Figure 2. Our goal is to find the optimal Mean Integrated Square Error (MISE)
model of the system in Figure 1, that is the model minimizing
n

Q = EEllXi - <Pi(Ui-dW (1)


i=1

where Xo = Uo = X, Ui = <Pi(Ui-t}, E{IIXill} < 00, i = 1, ... , n and II· II is a


norm in d dimensional space. Mappings <Pi, i = 1, ... , n minimizing the MISE are
going to be derived in Theorem 1 below.
The equations of the optimal model are given in the theorem below.

Theorem 1 Suppose that domain of the function \II i( x) = E {Xi IX = x}, i =


1,· .. ,n can be divided into areas where the function is invertible. The optimal
MISE model in the class of models shown in Figure 2 is given by
438 A. KRZYZAK

cI>*(Ui-l) = lI1i(II1Ll(Ui-l)) (2)


where 111 Ll (Uj-l) is an inverse of a component of'Il containing Ui-l.

Remark 1 If the whole function 'Il is invertible then 'Ilt ='Il- l , where 'Il- l is
an inverse of 'Il. To clarify the stated results consider the following example. Let
d = 1 and 'Ill(u) = u 2 for - 1 ~ U ~ 1 and 'Ill(u) = exp(u - 1) for U ~ 1.
Then 'Ilt(u) = -v'U for - 1 ~ U ~ 0, 'Ilt(u) = v'U for 0 ~ U ~ 1 and
'Ilt(u) = In(u + 1) for u ~ 1.

Proof of Theorem 1. Consider quality index


n
Q = E EIiXi - Oi(X)W (3)
;=1
Notice that the mapping OJ(X) uses X as its input therefore its input is the input
of the whole cascade system. Consequently minimizing criterion (3) in the class
of models OJ, i = 1, ... , n is equivalent to minimizing criterion Q in the class of
multivariate models ignoring the cascade structure of the model of Figure 2. On
the other hand the class of models in (1) is constrained to the class of cascade
models shown in Figure 2. Clearly minimization of Q with respect to cI>; is con-
strained minimization subject to the cascade structure of the class of models unlike
minimization of Qwhich is unconstrained minimization in the class of multivariate
models. Hence obviously:

Q* = minQ ~ minQ. (4)


It is clear that the minimizer of Qis

O;(x) = 'Il;(x) = E{XiIX = x}.


By definition of 'IlLl and a simple substitution we can see that

Consequently Q = Q* for a particular cascade model cI>i(u;-d = 'Ili('IlLl(Ui-l)).


This implies
minQ ~ minQ.
This together with (4) implies that (1) is minimized by (2) concluding the proof.
From now on, for simplicity but without losing generality, we will concentrate
on the system and the model consisting of only two components: 8 1, 8 2 and M l ,
M 2 , respectively. Denote X, Xl, X 2 by X, Y, Z and cI>, 'Ill by cI>, 'Il and Ul , U2 by
U and T. Assume moreover that cI>-1 exists.
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 439

In order to estimate the optimal model (2) we assume that we have a se-
quence of independent observations (Xt, Yi, Z1),"', (Xn' Yn, Zn) of the random
vector (X, Y, Z) and X has distribution J.L. We apply the kernel regression esti-
mate to recover cI> and a combination of kernel regression estimate and regression
inverse estimate to recover M 2 •
IDENTIFICATION OF SUBSYSTEM S1

The following kernel regression estimate will be applied to identify M1


'"'~ YoK(X-Xi)
cI> ()
= L...=1' h n ( )
n X '"'~_
L..._1
K(X-X;)
h
5
n

where K is a kernel and {h n } is a sequence of positive numbers.


Estimate (5) has been investigated by Greblicki and Krzyzak (1980), Devroye
(1981), Greblicki et al. (1984), Krzyzak and Pawlak (1987), Krzyzak (1986, 1990)
and Stone (1982).
We can regard (5) as a weighted average of output measurements y;, i =
1" .. ,n. The weights K( XI.;i)/ Ei':1 K(XI.;;) are probability weights (that is they
sum to one). They depend nonlinearly on the input point at which we calculate
the estimate and on the random input measurements. The measurements close
to the input point are generally assigned higher weights than the measurements
that are farther away. The weights also depend on the kernel and the smoothing
sequence. The idea is·to make weights more concentrated around the input point
as the number of observations increases. This is achieved by adjusting the smooth-
ing parameter hn which scales kernel K appropriately. In the estimate we have
to select two parameters: a kernel and a smoothing sequence. The choice of the
smoothing sequence is more critical than the choice of the kernel, however the care-
lessly selected kernel may introduce some rigidity into the estimate which in turn
may adversly affect the rate of convergence. In order to make the estimate con-
verge we must impose some conditions on the kernel and the smoothing sequence
(see Theorem 2 below). The best nonparametric estimates currently available have
parameters which automatically adapt to the measurements. The theorem below
deals with the pointwise consistency of (5). For the proof refer to Krzyzak and
Pawlak (1987).
Theorem 2 Let E{IIYW} < oo,s > 1. Suppose that nonnegative kernel K satis-
fies the following conditions:
c1 H(llxlD ~ K(x) ~ C2H(llx1D
cl{llxlI:S;r} ~ K(x) (6)
tdH(t)-+O as t-+oo
for some positive constants C1 ~ C2 and c and for some nonincreasing Borel func-
tion H.
440 A.KRZYZAK

If the smoothing sequence satisfies

hn -t 0
n(s-1)/8h~/logn - t 00 (7)

then
I1>n(x) -t l1>(x) almost surely as n - t 00

for almost all x mod fl.

Remark 2 Convergence in Theorem 2 takes place for all input distributions (that
is for ones having density or discrete or singular or any combinations of afore-
mentioned) and we impose no restrictions on regression function 11>. Examples of
kernels satisfying (6) are listed below
a) rectangular kernel
I«x) = {10/2 for IIxll S; 1
otherwise
b) triangular kernel
I«x) = { 1 -lIxll for IIxli. S; 1
o otherwIse
c) Gaussian kernel
1
I«x) = rn=exp{( -1/2)lIxIl 2 }
v21r
d) de la Valee-Poussin kernel

I«X) = {
I (8in(X/2))2
2". x/2
if x =J 0
1/21r if x = O.

If hn = n- Ct then (7) is satisfied for 0 < a < l/d.


The next theorem states sufficient conditions for uniform consistency. This type of
consistency is essential for convergence of the estimate of M 2 • The result presented
here is an extension of Theorem 2 in Devroye (1978).

Theorem 3 Let ess supx E {IIYW IX = x} < 00. Suppose that there exist positive
constants r, a, b, r S; b such that

(8)

where A is a compact subset of Rd , SX,T is the ball with radius r centered at x and
). is the Lebesgue measure. Let I< assume finitely many k values. If
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 441

hn -t 0 (9)
nh~jlog n - t 00 (10)
as n - t 00, then
esssup II<l>n(X) - <l>(X)II-t 0 almost surely (ll)
A
as n - t 00.
If K is a bounded kernel satisfying (6) then (ll) follows provided that condition
(10) is replaced by
nh!d / log n -t 00. (12)
Remark 3 Hypothesis (ll) follows under condition (10) if K is for example a
window kernel. Essential supremum is taken with respect to the distribution of
the input. Notice that we do not assume that X and Y have densities. Condition
(8) guarantees that there is sufficient mass of the distribution in the neighborhood
of point x. The condition imposes restrictions on fl and on the shape of A. It
also implies that A is contained in the support of fl, that is the set of all x such
that fl(Sx,T) > 0 for all positive r. If X has density f then (8) is equivalent to
ess infxEA f( x) > O.
Proof of Theorem 3. Define Y = E{YI{lYI:::>nl/2}} and <l>(x) = E{YIX = x}. Let
also Kh(x) = K(x/h). Clearly
<l>(x) - <l>n(x) = l:i=l (Y; - Y;)Kh(x - X;)/ l:i=l Kh(x - Xi)
+ l:i=l (Y; - ~(Xi))Kh(X - X i)/ l:i=l Kh(x - Xi)
+(l:i=l ~(X;) - <l>(X))Kh(X - X i)/ l:i=l Kh(X - Xi)
=I+II+III.
By the moment assumption I = 0 a.s. for n large enough. By the result of
Devroye (1978, p. 183) there exist finite constants C3, C4 andc5 such that for hn
small enough
n
P{esssup IIIII > €}:::; P{in(~=Kh(x - Xi) < C3nh:} :::; c4h;;-dexp(-C5nh:).
A A i=l

Term I I needs special care. Suppose that K takes finitely many different values
al,"', ak. Vector [K((x - Xd/h),···, K((x - Xn)/h)] as a function of x can take
at most (2n )c(d)k values contrary to the intuitive number kn (Devroye 1978, Theo-
rem 1). Constant c(d) is equal to the VC dimension of the class of d-dimensional
balls. We thus obtain
P{esssuPA IIII > €}:::; P{(Xl"",Xn) f/. B}
+E{IB(XI,"', Xn)ess sup(X1 , ••• ,xn )(2ny(d)k SUPAjEA
,P{I l:i=l (Y; - m(Xi))aj;f l:i=l ajd ~ €IX l , " ' , Xn} (13)
442 A.KRZYZAK

where B is the set of all (XI,' .. ,xn) E Rdn such that

i~f jln(Sz,h n ) ;::: c3h~

and Aj = (ail! ... ,ajn) is a member of partition A induced by the vector


[K( (x - X 1 )/ h), ... ,K( (x - X n )/ h)] on n-dimensional space of multivalued vectors
each component of which can assume k different values. To bound the second
probability on the rhs of (13) we are going to use McDiarmid inequality (Devroye
1991). Let Xl!'" ,Xn be independent random variables and assume that

sup If(xI, ... ,Xi, ... Xn) - f(xl! ... ,x~, ... xn)1 ~ c;, 1 ~ i ~ n.
xi,xi

Then
n
P{lf(XI,'" ,Xn) - Ef(XI,'" ,Xn)1 ;::: f} ~ 2exp(-2f2 / L:c~).
i=l
Using this inequality we obtain for the second probability in (13)

P{I Li'=l(l'i - m(X;))aj;!Li'=l ajd ;::: fIXI,'" ,Xn}


~ 2exp( -nf2 / L':=l 2 max; If?la~;/(Li'=l aji)2)
~ 2exp( -nf2 Li'=l aji/ maxi aji)
~ 2exp( -Csnh~)

where the last inequality follows from the fact that on set B
n
L: aji/ m~x aji;::: const nh~.
i=l •
So the second term on the rhs of (13) is not larger than

2(2ny(d)kexp( -Csnh~).

The first probability in (13) can be bounded above by

P{i~f jln(Sz,h n ) ;::: c3h~} ~ c4h;.d exp( -c5nh~).

Theorem 3 follows from (10) and the Borel-Cantelli lemma. In case when K
assumes infinitely many values we can use the stepwise approximation of K and
obtain an upper bound for (13)

c7(2ny(d)/h d exp( -c8nh~).

The theorem follows from (12).


IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 443

ESTIMATE OF \]1

In order to obtain consistent estimate of M2 we need to estimate regression \]1 and


regression inversion <I>-1. The estimate of \]1 is given by
,,~_ Z1«X-X i )
\]1 ( ) L.,.,._1· hn
(14)
n X = l:i':1I«X);:i)

The convergence of (14) follows from Theorem 2.

ESTIMATE OF REGRESSION INVERSION <I>-1

As we shall see later we would need a consistent estimate of <I>-1 in order to identify
M 2 • The estimate of <I>-1 will be derived from <I>n. Since <I>n may not be invertible
even when <I> is we need to define a psudoinverse.

Definition 1 Let </> : X -- Y where X is a complete space and let s = infxEx II</>(x)-
yllfor some y E y. A function </>+ : Y -- X is called a pseudoinverse of </> if for
any y E y, </>+(y) is equal to any x E X in the set
00 00

A(y) = Ucl({x:II</>(x)-yll~s+l/n})= UAn (15)


n=1 n=1

where cl(A) denotes closure of set A.

Remark 4 Since cl(An) are closed and nonempty and X is complete set then set
A(y) is nonempty and </>+ is well defined. If </> is continuous and X is compact then
</>+(y) is equal to any x* such that

min
",EX
II</>(x) - yll = II</>(x*) - yll
If </> is invertible then </>+ coincides with </>-1. The pseudoinverse depends on the
norm.

Two versions of </>+ will be useful in applications in the case of a scalar function

</>+(y) = YEA(y)
inf y (16)

and
</>+(y) = sup y. (17)
yEA(y)

The next theorem deals with consistency of <I>~.


444 A.KRZYZAK

Theorem 4 Let <.I> : A - B be a continuous function, A be a compact subset of


Rd and let <.I> A denote an image of A by <.I>. If

sup II<.I>n(x) - <.I>(x)ll_ 0


xEA

then

as n - 00, at every y E <.I>A.

The proof will be omitted.

IDENTIFICATION OF SUBSYSTEM S2

Using equation (2) the natural estimate of S2 is given by

(18)

where <.I>: is pseudoinverse of <.I>n. The following straightforward result is useful in


proving the consistency of (18).

Lemma 1 If f is continuous and

sup Ilf(x) - fn(x)ll- 0


x

and
xn-x
as n - 00, then

Lemma 1 and Theorem 3 imply the convergence of identification algorithm of S2.

Theorem 5 Assume that <.I> and \II are continuous on A and esssupx E{IIYWIX} <
00,
esssupx E{IIZWIX} < 00. If [{ assumes finitely values and (8-10) hold then

(19)

as n - 00 at every u E <.I>(A). If [{ assumes infinetely many values, then (19)


follows when in addition [{ satisfies (6) and condition (10) is replaced by (12).
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 445

xn Sn Yn
cp r '"'\ wn
{k;}
\....J

Figure 3: Hammerstein System

HAMMERSTEN AND WIENER SYSTEMS

The outline of the MIMO discrete Hammerstein system is given in Figure 3.


The nonlinear memoryless subsystem is described by

(20)
where Xn is Rd-valued stationary white noise with distribution f.L and ~n is a
stationary white noise with zero mean and finite variance at.
No correlation is
assumed between ~n and X n • Assume for simplicity that 'IjJ is a scalar function.
The linear dynamic subsystem is described by the ARMA model (assumed to be
stable):

Sn + a1Sn-l + ... + a/Sn_/ = boWn + b1 W n- 1 + ... + b/Wn_/


Yn = Sn

where 1 is unknown order of the system and Sn is its noise-free output. The linear
subsystem can also be described by state equations.

(21)

where Xn is an I-dimensional state vector and A is assumed to be asymptotically


stable. These conditions imply that Xn and Yn are weakly stationary as long as
Wn is weakly stationary. By (21)

E{YnIXn} = d1CP(Xn) + a = m(Xn) (22)


where a = Ecp(X)cT(I - A-1b).
From equation (21) we obtain a weighting sequence representation
00

Yn = EkjWn - j (23)
j=O
446 A.KRZYZAK

TJn

un rn tn Zn
{b i } f 1\ ?jJ
\..L/

Figure 4: Wiener System

where ko = dl =f:. 0, ki = cT Ai-lb, i = 1,2"" and I:i:o Ik;l < 00 guarantees


asymptotic stability of the linear subsystem. lt follows from (23) that
E{YnIXn} = ko¢>(Xn) + (3
where (3 = E¢J(X) I:i:l ki •
lt is obvious from the above equation that we can use kernel regression estimate
to estimate min (22) and consequently recover ¢> (up to multiplicative and additive
factors). The only difference with the problem in Section 3 is that now {Xi, Yi} is
a sequence of dependent random variables. For identification procedures and their
asymptotics refer to Greblicki and Pawlak (1986) and Krzyzak (1990). Let us now
consider Wiener system shown in Figure 4.
The nonlinear memoryless subsystem is described by
(24)
where for simplicity we assume that Tn is one dimensional output of the linear
dynamic subsystem and Zn E R. with distribution f.L and ~n is a stationary white
noise with zero mean and finite variance O'l.
No correlation is assumed between ~n
and X n . The linear subsystem is described by the ARMA model:
Rn + CtRn-l + ... + c/R n- 1 = doUn + d1Un- 1 + ... + d/Un- 1
Tn = Rn + 'TJn (25)
Un is a stationary gaussian noise with distribution f.L and TJn is a stationary gaussian
noise with zero mean and finite variance O'~. No correlation is assumed between 'TJn
and Un. The linear subsystem can also be described by state equations.
X +l
n = BXn + eRn
(26)
where parameters of the system (26) have the similar form as in (21) and B is
asymptotically stable. From (26) we get a weighting sequence representation,
Tn = I:~o ljUn_j + 'TJn
Zn = ?jJ(Tn) (27)
IDENTIFICATION OF CASCADE SYSTEMS BY NONPARAMETRIC TECHNIQUES 447

where 10 = e -I 0, Ii = f Bi-1e, i = 1,2, ... and E~o 11il < 00 guarantees asymptotic
stability of the linear subsystem. It can be shown that estimation techniques for
inverses of regression functions from Section 5 are be applicable to Wiener system
identification (see Greblicki (1992) and Krzyzak (1993)). Wiener and Hammer-
stein systems can be combined into a cascade of memoryless nonlinear systems
interconnected with dynamic linear components. Such models are very general
but still simple enough to obtain identification algorithms.

CONCLUSION

We considered modeling of pollutants in the river and canal systems by inter-


connected nonlinear systems. Particular attention has been devoted to cascade of
memory less subsystems. Identification algorithms have been given and their strong
convergence properties investigated under very mild restrictions on the measure-
ments and parameters. Possible extensions to dynamic systems such as Hammer-
stein and Wiener systems have been explored. The rates of convergence of the
algorithms will be addressed in the subsequent papers.

ACKNOWLEDGEMENTS

This research was sponsored by NSERC grant A0270 and FCAR grant EQ
2904.

REFERENCES

Banks, S. (1988) Mathematical Theories of Nonlinear Systems, New York, Prentice


Hall, 1988.
Billings, S.A. (1980) "Identification of nonlinear systems-A survey", Proc. lEE
127, D, 6, 277-285.
Billings, S.A. and Fakhouri, S.Y. (1979) "Non-linear system identification using
the Hammerstein model", Int. J. Syst. Sci. 10,567-578.
Chung, H.Y. and Sun, Y.Y. (1988) "Analysis and parameter estimation of nonlin-
ear systems with Hammerstein model using Taylor series approach", IEEE Trans.
Circuits Syst. CAS-35, 1533-1544.
Devroye, L. (1978) "The uniform convergence of the Nadaraya-Watson regression
function estimator", The Canadian J. Statist. 6, 179-191.
Devroye, L. (1981) "On the almost everywhere convergence of nonparametric re-
gression function estimates" , Ann. Statist. 9, 1310-1319.
Devroye, L. (1991) "Exponential inequalities in nonparametric estimation", In:
Roussas, G. (ed) Nonparametric Functional Estimation, Kluwer, Boston, 31-44.
Greblicki, W. (1992) "Nonparametric identification of Wiener systems", IEEE
Trans. Information Theory IT-38, 1487-1493.
448 A.KRZYZAK

Greblicki W. and Krzyzak, A. (1979) "Non-parametric identification of a memo-


ryless system with cascade structure, Int. J. Syst. Science 10, 1301-1310.
Greblicki W. and Krzyzak, A. (1980) "Asymptotic properties of kernel estimates
of a regression function", J. Statist. Planning Inference 4, 81-90.
Greblicki, W., Krzyzak A., and Pawlak, M. (1984) "Distribution-free pointwise
consistency of kernel regression estimate", Ann. Statist. 12, 1570-1575.
Greblicki W. and Pawlak, M. (1986) "Identification of discrete Hammerstein sys-
tems using kernel regression estimates", IEEE Trans. Automat. Contr. AC-31,
74-77.
Hunter, LW. and Korenberg, M.J.(1986) "The identification of nonlinear biological
systems: Wiener and Hammerstein cascade models", BioI. Cybern. 55, 135-144.
Krzyzak, A. (1986) "The rates of convergence of kernel regression estimates and
classification rules", IEEE Trans. Information Theory IT-32, 668-679.
Krzyzak, A. (1989) "Identification of discrete Hammerstein systems by the Fourier
series regression estimate" , Int. J. Syst. Science 20, 9, 1729-1744.
Krzyzak, A. (1990) "On estimation of a class of nonlinear systems by the kernel
regression estimate", IEEE Trans. Inform. Theory IT-36, 1, 141-152.
Krzyzak, A. (1992) "Global convergence of the recursive kernel regression esti-
mates with applications in classification and nonlinear system estimation", IEEE
Trans. Inform. Theory IT-38, 1323-1338.
Krzyzak, A. (1993) "Identification of nonlinear block-oriented systems by the re-
cursive kernel estimate", J. of the Franklin Institute, vol. 330, 605-627.
Krzyzak A. and Pawlak, M. (1987) The pointwise rate of convergence of the kernel
regression estimate", J. Statist. Planning Inference 16, 1590-166.
Kung M. and Womack, B.F. (1984) "Discrete-time adaptive control of linear
dynamic systems with a two-segment piecewise-linear asymmetric nonlinearity",
IEEE Trans. Automat. Contr. AC- 29, 170-172.
Masry, E. and Cambanis, S. (1980) "Signal identification after noisy, nonlinear
transformations", IEEE Trans. Inform. Theory IT-26, 50-58.
Narendra, K.S. and Gallman, P.G. (1966) "An iterative method for the identifica-
tion of nonlinear systems using the Hammerstein model", IEEE Trans. Automat.
Contr. AC-l1, 546-550.
Pawlak, M. (1991) "On the series expansion approach to the identification of Ham-
merstein systems", IEEE Trans. Automat. Contr. AC-36, 763-767.
L W. Sandberg, (1991) "Approximation theorems for discrete-time systems",
IEEE Trans. Circuits Syst., CAS-38, 564-566.
Stone, C. (1982) "Optimal global rates of convergence for nonparametric regres-
sion", Ann. Statist. 10, 1040-1053.
PATCHING MONTHLY STREAMFLOW DATA - A CASE STUDY USING THE
EM ALGORITHM AND KALMAN FILTERING

PEGRAMGGS
Department of Civil Engineering
University of Natal
King George V Avenue
4001 Durban, South Africa

Water Resource Systems in many parts of the world rely almost exclusively on surface
water. Streamflow records are however woefully short, patchy and error-prone, therefore
a major effort needs to be put into the cleansing, repair and possible extension of the
streamflow data-base. Monthly streamflow records display an appreciable amount of serial
correlation, due mainly to the effects of storage in the catchment areas, both surface and
subsurface. A linear state-space model of the rainfall-runoff process has been developed
with the missing data and the parameters of the model being estimated by a combination of
the Kalman Filter and the EM algorithm. Model selection and outlier detection were then
achieved by recursively calculating deleted residuals and developing a cross-validation
statistic that exploits the Kalman filtering equations. The method used here that relates
several streamflow records to each other and then uses some appropriate rainfall records
to increase the available information set in order to facilitate data report, can be developed
if one recasts the above models in a state space framework.

Experience with real data sets shows that transformation and standardization are not
always necessary to obtain good patching. "Good" in this context is defined by the cross-
validation statistic derived from the deleted residuals. These in tum are a fair indicator of
which data may be in error compared to the remainder as a result of them being identified
as possible outliers. Examples of data patching and outlier detection are presented using
data from a river basin in Southern Africa.

INTRODUCTION

Water resource system designs depend heavily on the accuracy and availability of
hydrological data in addition to economic and demand data; the latter being more difficult
to obtain. However system reliability is becoming more commonly based on simulation
for its assessment.
449
K. w. Hipel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 449-457.
© 1994 Kluwer Academic Publishers.
450 G. G. S. PEGRAM

In order to simulate one has to have data to mimic and when records of streamflow in
particular are short or patchy, the reliability of the system as a whole can be severely called
into question. In southern Africa there has been a strong emphasis on water resource
system development and analysis since the savage drought of the early eighties and again
in the early nineties until late 1993. The Lesotho Highlands Scheme for hydro-electric
generation for Lesotho with a spin-off of water supply bought by the Republic of South
Africa bears testimony to the expense and ingenuity required to fuel major economic
growth in the region. Part of the water resources analysis programme involved the
patching of rainfall data in Lesotho to enable streamflow records to be constructed and in
addition the lengthening and repair of existing streamflow records in South Africa. The
methodology discussed in this paper was developed to support this endeavour.

PATCHING STREAMFLOW VIA REGRESSION

Streamflow records are strongly correlated to records of streams which flow in their
vicinity and less strongly correlated to rainfall records where the rain gauges are located
on or around the appropriate catchments. This is especially so with short-term daily
flows but is somewhat better for the monthly data typically used in water resources
studies, which primarily involve over-year storage.

Where data are missing in a streamflow record it is tempting to use other streamflow and
raingauge records to infill the missing data. There are several difficulties which arise when
one uses conventional regression techniques. The first and most serious is that there are
frequently pieces of data missing in the records of the control stations being used to infill
the record of the target station. These gaps may and often do occur concurrently. A
conventional regression requires that special steps be taken such as re-estimating the
regression for the reduced set by best sub-set selection procedures etc. or alternatively,
abandoning the attempt to infill or patch. A second difficulty arises from the strong
dependence structure in time, due to of catchment storage, making standard regression
difficult to apply to the untransformed flows. The third problem is one of seasonality
which violates the assumption of homoscedasticity. A fourth is the problem of non-
linearity of the processes. These problems were addressed by Pegram (1986) but the
difficulty of the concurrent missing data was not overcome in that treatment. Most other
methods of infilling are defeated by the concurrently missing data problem.

THE EM ALGORITHM IN CONJUNCTION WITH THE KALMAN FILTER

A powerful method of infilling data and simultaneously estimating parameters of a


regression model was suggested by Dempster et al (1977). The EM algorithm exploits
estimates of the parameters to estimate the missing data and then uses the repaired data set
to estimate the parameters via maximum likelihood; this alternating estimation procedure
is performed recursively until maximization of the likelihood function is achieved.
Shumway and Stoffer (1982) combined the EM algorithm with a linear state-space model
PATCHING MONTHLY STREAMFLOW DATA 451

estimated by the Kalman Filter to estimate parameters in the missing data context and also
as a bi-product to estimate the missing data. This procedure will be referred to as the
EMKF algorithm in this paper.

It was important to ascertain what characteristics of the rainfall runoff process producing
monthly streamflow totals could be maintained without sacrificing the validity of the
EMKF approach. Specifically one is concerned about the seasonal variation of the process
especially in a semi-arid climate where the skewness of the flows during the wet season
and dry season tend to be quite different, as do the mean and variance. The state-space
model lends itself to modelling the dependence structure we expect to be in streamflow.
Whether the linear aspect of the model is violated can only be tested by examining the
residuals of the fitted regression model. Thus the EMKF algorithm has promise in
addressing the four difficulties associated with patching monthly streamflow referred to
above - the concurrently missing data, the seasonality, the dependence and the non-
linearity.

CROSS VALIDATION AND DELETION RESIDUALS

The EMKF algorithm of Shumway and Stoffer (1983) provides estimates of the
parameters and the missing data and some idea of the variance of the state variables.
There still remains the problem facing all modellers as to which records to include in the
regression and which model to fit where model is defined by the number of lags in time
and the paramaterization. The Ale is a possible answer but presents considerable
difficulties in the context of the Kalman Filter when data are missing. An alternative was
suggested by Murray (1990) using a method of cross-validation for model selection. This
technique has the added advantage that the cross-validation calculation reduces a so-called
deletion residual which gives estimates of how good the intact data are in relation to the
model, and flags possible outliers for attention or removal. The methodology is fully
described with examples in Pegram and Murray (1993).

APPLICATIONS OF THE EMKF ALGORITHM WITH CROSS-VALIDATION

To demonstrate the efficacy of this approach it was decided to perform the following
experiment:

take some well-known, intact streamflow and rainfall records


hide a section of one of the streamflow records
compare the infilled record with the observed, hidden section
comment on the appropriate model.

Twenty-eight years (1955 to 1983 inclusive) of the streamflow record into the Vaal Dam
in South Africa were selected for the experiment together with flow into an upstream dam
- Grootdraai - and six rainfall gauges scattered around and in the catchment. Three years
of the Vaal Dam flows (1964 to 66) were hidden and re-estimated using the EMKF
452 G. G. S. PEGRAM

algorithm with cross-validation. A variety of combinations of lags varying between 1 and


2 for streamflow and 0 and 1 for the rainfall with between 4 and 6 rainfall gauges being
used were the constituents of the suggested model. In Figures 1, 2 and 3 appear three
attempts at infilling the missing streamflow data. In Figure 1 the maximum number of lags
were used with all available rainguages. The difference between the three figures is in the
treatment of the data. The first set is subject to complete standardization i.e. all monthly
values were divided by their monthly standard deviations after subtracting the monthly
means. In Figure 2, a scaling was performed based on the assumption that the coefficient
of variation is reasonably constant throughout the year. Here the series were scaled by
their monthly standard deviations without shifting. Figure 3 shows the recorded and
estimated flows where the data were untransformed. This version of the model assumes
that the parameters in the linear dynamic model are time invariant. It happens that this is
the most parsimonious of the three models being specified by only 40 parameters
compared to 88 and 64 parameters each for the modelling done for Figures 1 and 2.
Comparing the overall estimating performance it appears that the model employing
untransformed data depicted in Figure 3 performs best.

In Figure 4 are shown the deleted residuals for the 36 months during 1974 to 1976 which
were years which most closely corresponded to the "missing" years (deleted residuals are
not estimated for missing data but only for intact data). One of the reasons why this
section of data was used, is because it includes the largest flow on record - that in
February 1974 - which was 2 200 units (millions of cubic metres). The value concerned is
shown as a very large deleted residual skewing the regression above the line. This
suggests that the nonlinearities have not been satisfactorily handled, however the obvious
choice of log-transformation does not eliminate the problem but raises others.

CONCLUSIONS

A methodology has been suggested in this paper which should provide a useful tool for the
water resources practitioner in the search for better ways of repairing and extending
streamflow data.

ACKNOWLEDGEMENTS

Thanks are due to the Department of Water Affairs and Forestry, Pretoria, South Africa,
for sponsoring this work and for giving permission to publish it. The software to
accomplish the streamflow patching (PATCHS) and the manual to go with it are available
from them at nominal cost.
'"0

~
:I:

~
Comparison - Series 1/Recorded (1964-6)
(Series 1 =Standardized) ><
1400
I
V,l

;J
1200
I f Record~. Series 1
.! 1000
0
I'CII
I
~
;;
.... 800
~
it
>-
:E 600
c
-0
:E 400

200

Months

Figure 1. Comparison of Patched with hidden recorded flows for Vaa1 Dam during 1964-66. The flows are patched using the ~
...,
fully standardised flows in both target and controls.
Vt
"""
"""
Comparison - Series 21Recorded (1964-6)
(Series 2 =Scaled)
1400

1200
r~ - Recorded - . --Sen~ 2I
!! 1000

~~ 800
o
u:::
~600
:;
c
o
:e 400

200

Months
p
P
t:I>
Figure 2. Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the
scaled (unshifted but divided by monthly standard deviation) flows in both target and controls. ~
>
~
~
g
Comparison - Series 3/Recorded (1964-6) ~
(Series 3 = Untransfonned)
1400
en
1200
~
;;3
[ milll __ Recorded • Series 3
It 1000
'ii
0
I-
- 800
~
I~
u::
>- 600
:E
C
0
~ 400

200

Months

Figure 3. Comparison of Patched with hidden recorded flows for Vaal Dam during 1964-66. The flows are patched using the
untransfonned flows in both target and controls.
.l>-
v.
v.
~
0\

Deleted Residuals and Vaal Flows (1974-6)

10

8 •
II)
7i 6

"';j=
II .
D::
11
Ii •
~ 2+ ,
'i
III
> 0
. - ...... - 600 1000 1600 2000 2600
·2 + ....
• •
-4

Vaal Monthly Flows


o
o
~
Figure 4. Scatter plot of the deleted residuals and recorded flows for Vaal Dam for the years 1974-76 which include the largest
deleted residual in the record. It is positive, thus the data-point concerned has been under-estimated by the model. ~
~

~
PATCHING MONTHLY STREAMFLOW DATA 457

REFERENCES

Dempster, A. P., N.M. Laird, and D.B. Rubin, (1977) "Maximum likelihood from
incomplete data via the EM algorithm", 1. of the Royal Statist. Soc., Ser. B, 39, 1-38.

Murray, M. (1990) "A Dynamic Linear Model for Estimating the Term Structure of
Interest Rates in South Africa", Unpublished Ph.D. thesis in Mathematical Statistics,
University of Natal, Durban.

Pegram, G. G. S. (1986) "Analysis and patching of hydrological data", Report


PCOOO/OO/4285, by BKS WRA for Department of Water Affairs, Pretoria. 124 pages.

Pegram, G. G. S. and M. Murray, (1993) "Cross-validation with the Kalman Filter and
EM algorithm for patching missing streamflow data", Resubmitted to Journal of American
Statistical Association in January.

Shumway, R. H. and D. S. Stoffer (1982) "An approach to time series smoothing and
forecasting using the EM algorithm", Journal of Time Series Analysis, Vol. 3,253-264.
RUNOFF ANALYSIS BY THE QUASI CHANNEL NETWORK KDEL
IN THE 'l'OYOHIRA RIVER BASIN

H.SAGA
Dept.of civil Eng.,Hokkai-Gakuen Univ.,Chuo-ku,Sapporo 064,Japan
T • NISHIMURA
Hokkaido E.P.Co.,Toyohira-ku,Sapporo 004,Japan
M.FUJITA
Dept.of civil Eng.,Hokkaido Univ.,Kita-ku,Sapporo 060,Japan

INTRODUCTION

This paper describes runoff analysis using the quasi channel network
model of the Misumai experimental basin, which is part of the Toyohira
River basin. The Toyohira River flows through Sapporo which has a popu-
lation of 1.7 million. Four multi-purpose dams are located in the TOYo-
hira River basin; thus, it is very important to verify the runoff
process not only analytically but also based on field observations.
In this study, a quasi channel network model and a three-cascade tank
model were adopted as runoff models. Both provide for treatment of the
spatial distribution of rainfall.

OUTLINE OF THE MISUMAI EXPERIMENTAL BASIN

The Misumai experimental basin is located


near Sapporo, Hokkaido, Japan. An outline
of the basin is shown in Figure 1. It lies
at latitude 42°55' N and longitude 141°16'
E. The catchment area of this basin is
9.79km2 and the altitude ranges from 300m to
120Om. The basin contains brown forest soil
and is mainly covered with forest. This
basin is equipped with five recording rain-
gauges, an automatic water level recorder,
soil moisture content meters, a turbidime-
ter and snow-gauges.

RUNOFF MODEL

The physical process model adopted in this


research is the quasi channel network model No:RalnuuJe
shown in Figure 2. This figure shows flow
direction as determined from the elevations Figure 1.The Misumai exper~
mental basin.
of adjacent grid points. The smallest
459
K. W. Ripel et al. (eds.),
Stochastic and Statistical Methods in Hydrology and Environmental Engineering, Vol. 3, 459-467.
© 1994 Kluwer Academic Publishers.
460 H. SAGA ET AL.

Figure 2. Quasi channel network


model.
.IrCII)

/
mesh-size is 250x250nl. Each
Misuni
link of this model has an inde-
pendent sub-catchment area. The 100 Experilental
tank model transforrndng rainfall Basin
into discharge in the sub- +
catchment is adopted because 80
this model includes the mecha-
nism of rainfall loss. The
three-cascade tank model is 80
shown in Figure 3, and the state
variables and parameters of this 40
model are defined in the next
section. The data from the five
raingauges indicate that the 20
distribution of rainfall inten-
sity generally depends on alti-
tude. Figure 4 shows the o 200 400 BOOAltitudeC.)
distributions of 9 rainfall
events observed in 1989. In a FiguEe 4. Distribution of rainfall.
large-scale rainfall, the amount
of rainfall is proportional to
the altitude, though for a small-scale event, the amount is independent
of altitude.
The combination of a quasi channel network model and tank model ap-
peared to be effective for treating the spatial distribution of rainfall
because a quasi channel network model can treat the rainfall data ob-
served by five raingauges as multi-inputs. The estimation of the veloc-
ity of propagation of a flood wave is one of the problems that must be
solved in order to apply this model to practical cases of runoff analy-
sis. In this paper, the velocity Vis assumed to be spatially constant
along the channel, and not vary with time. Figures 5 and 6 show the
QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN 461

r(II/IO.in)
<:> 0
~IVW"I'T
No.1 No.1
l:r=1l3.. l:r=51.5 ••
E. L=750. E. L=750.

co co
0 0
T
No.2 No.2
l:r=97.5 .. l:r=S3.S ..
E.L=600. E. L=600.

rr
co
0 0
U1U1Ua1
No.3 No.3
l:r=94 •• l:r=46.511
E. L=510. E. L=51O.

rr
co
0 0
lllllJlIU
,..
No.4 No.4
..... l:r=88 .. l:r=43 ..
E.L=460. E. L=460.

<0
0
0

No.5 No.5
l:r=84.S.. l:r=49.S ..
E. L=390. E. L=390.

q(II/IOlin) q(u/lO.in)
V=2.0./sec
V=2.0./sec

U) Obs er ved
o
o Calculated

20 t(hr) o 20 t(hr)

Figure 5.Comparison between Figure 6.Comparison between


observed and calculated observed and calculated
hydrographs. hydrographs.
462 H. SAGA ET AL.

hyetographs and hydrograph for a heavy storm and weak rainfall respec-
tively. In this calculation, V was assumed to be 2.0 mlsec and the tank
model parameters were set by trial and error. In spite of using the same
values for the tank parameters and propagation velocity, the calculated
results were in good agreement with the observed ones.

IDENTIFICATION OF TANK PARAMETERS BY THE EXTENDED KALMAN FILTER


TECHNIQUE

Researchers have developed some mathematical optimization methods for


estimating unknown variables, notably, the Powell method and the
Davidson-Fletcher-Powell method. In this paper, the unknown parameters
are identified by the Extended Kalman Filter technique which automati-
cally estimates the unknown parameter and requires much less corrputer
memory. The state variables are the discharge Q and the storage depths
in each tank, which are denoted by CI' C2 , and C3 , respectively. The un-
known parameters are the coefficients of lateral runoff hole aIu ' aIm' all'
a2 and a3 , and coefficients of the permeable hole bl' b2 and br The heights
of each hole, hIu ' hIm' hll , h2 and h3 are constant. The continuity equation
of the tank model is shown as follows.

d;1 ==/1 "" r(/) - qluY(Cl,hlu) - QlmY(Cl, hIm) - QllY(Cl,hll) - il (I) (1)

d;2 -/2 =il(t)-Q2Y(C2,h2)-i2(/) (2)

d;3 =/3 =i2(/)-Q3Y(C3,h3)-i3(/) (3)

where, Qi =ai x (C1 -hi) ,ij '" bj x Cj, Y(C,h) ... k{tan- C ;h + ~}, (0 <
1 £ <)
Y(C,h) is a Heviside function and £ '" 10-6

By introducing the special function of Y(C,h) into the continuity equa-


tion, calculation is considerably eased as there is no need to consider
the relationship between the runoff hole height and the storage depth.
The discharge is calculated as follows;

The unknown parameters are identified at each time that the discharge is
observed.

(a) state variables and model parameters


state variables C 1 =Xl(/), C2 =X2(/), C3 =X3(/), Q =x4(/)
model parameters alu = Xs(t), aIm -= X6(/), all == X7(t), a2 = Xg(t)
a3 =X9(t), b 1 "'XlO(t), b2 =xll(t),b 3 =XI2(t)
QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN 463

(b) System equation


In this paper, the system equation is described by a set of non-linear
ordinary differential equations, which are linearized and transfonned
into discrete equations. The resulting set of equations in vector fonn
is given as:

Xk+1 = (~: ) k+1 = (~I ~2 ) ( ~: ) k+ ( ~ ) ( : ) + (~k ) (5)

where, (Xd = (XI X2 X3 x4l


(X 2 ) = (Xs X6 X7 Xg X9 XIO XII XI2)T

~i,r,d :discrete transfonning coefficient(varies with time)


Wk :vector of system error

(c) Measurement equation


The measurement equation in the Extended Kalman Filter is given by the
following relationship:

(6)

where, (Hd = ( 0 0 0 1 ), (H2) = ( 0 0 0 0 0 0 0 0 )


observed discharge at time k, computed by eq.(4)

°[jcl"""""""
: measurement error i i

MODEL VERIFICATION ~

Through a simulation method, 00


i
-; (t) O:Discharae by the initial value
under the assumption that the en X:Dischar,e h the first value
true value of the parameters are q (t) O:Oiscahrge h the second value
known, the approach described on ° -;: - :Oischarse by the true value
the preceding section is sub- ~
stantiated. The discharge com-
puted using known values of : j
parameters is assumed to be the
observed data and denoted as a
solid line in Figure 7. The Ini-
o
tial values of the parameters
were set up as being different

"<>"
from the true values; The symbol
denotes the discharge cal- 10 20 t(hr)
culated by using the initial Figure 7.Results of calculation using
values. The identification E.K.F.
464 H. SAGA ET AL.

~r---------------------------~ 00
0
o 81u bI
0

------- -
0r---------------------------~ '-

(0. 04)
(0. 01)
t
0

"<I'
~,---------------------------~ 0
o 81m
0 b2
01- - -
--
(o.OO})
(0. o-1 ) --=
0

CO
0
b3
0
- - --(6'-:-0·05) -------
0

1--
0

00
0
82
0 :Parameters after the first pass
:Parameters after the scond pass
I- \ :True value of parameters
'-

(0.003)
0

LO

0 83

Figure a.Variation of the parameters for tank model.


QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN 465

process by the Kalman Filter was


repeated to refine the estimates r(u/tO.ln)
of the parameters; parameters
obtained by the Extended Kalman
J~" '1' • • ••• • • '~;.;;',:: •
Filter were used to be the ini-
tial value of the parameters for
the next identification process.
The curve denoted by the sym-
bol "x" is the discharge ob-
tained by using parameters
identified after the first pass Observed
through the Extended Kalman Fil-
ter and "0" denotes the dis- Calculated
charge obtained after the second
pass. Figure 8 shows the time
variations of the parameters for
the tank model identified by the
presented method. The values in
parentheses and the dot-dash o to 20 t(hr)
line show the true values of
parameters. The broken line de- F1~ 9.Comparison between observed
notes the estimated parameters and calculated hydrographs
after the first pass; the solid using E.K.F.
line shows these after the sec-
ond pass. It is clear that estimates for the unknown parameters converge
to the true values.
There is another problem in the application of this technique to the
quasi channel network model. A sub-catchment area is such a small area
that a discharge can not be observable. Consequently, model parameters
have to be identified by using data observed in a very small basin. This
small basin is situated near the Misumai experimental basin and its
catchment area is of 1.7 knl and its topography is similar to the Misu-
mai experimental basin. Figure 9 depicts the hyetograph and hydrograph
observed in the small basin. The symbol "x" in this Figure denotes the
discharge calculated by using data identified by the Kalman Filter. Fig-
ure 10 depicts the observed data in the Misumai experimental basin and
the curve denoted by "x" is the discharge calculated by the quasi chan-
nel network model and tank model using the parameters shown in Figure 9.
There is a slight difference during the recessive period, however, good
agreement is obtained during the rising period.

CONCLUSION
A model based on the combination of a quasi channel network model and
tank model has been shown to be an effective model for treating the spa-
tial distribution of rainfall. The estimates of the unknown parameters
of the model converge well toward to the true values in successive,
automatic iterations of an Extended Kalman Filter.
466 H. SAGA ET AL.

No. I
Lr=39.5u
E.L=75011

No.2
L r=42. 5..
E.L=600.

No.3
L r=45. 5u
E. L=510.

No.4
Lr=48.5111
E.L=46011

No.5
L r=43. 5..
E.L=390.

co
q (l1li11 011 in)
V=3.01/sec

Calculated
Observed

U")
00

XJ-..J.....L--l-I I I I I I I I I I LLLI I I I I I I I
o 10 20 t(hr)

Fi~ lO.Comparison observed hydrograph


and results by quasi channel network model.
QUASI CHANNEL NETWORK MODEL IN THE TOYOHIRA RIVER BASIN 467

REFERENCE

S. Kobatake, Y. Ishihara (1983): "Synthetic runoff model for flood fore-


casting", Proc. of JSCE, Vo1.337/II, pp.129-135. (in Japanese)
K.Hoshi (1985) : "Fundamental study on flood forecasting study (2) ",
Monthly Report of C.E.R.I, No.386, pp.48-68. (in Japanese)
Powell,M.J.D. (1964): "An efficient method for finding the minimum of
several variables without calculating derivatives" , Corn. Jour. ,
pp.155-162.
Fletcher,R., Powell,M.J.D. (1963): "A rapidly convergent descent method
for minimization", Com.Jour.6., pp.163-168.
Chao-lin Chiu(1978): "Application of kalrnan filter to hydrology and wa-
ter resources",
T . Yasunaga, K. Jinno, A. Kawamura (1992): "Change in the runoff process
into an irrigation pond due to land alteration", Proc. of H.E, JSCE,
Vol.36, pp.629-634. (in Japanese)
AUTHOR INDEX

A K
Alpaslan, N. 135,177 Kapur, J. N. 149
Kesavan, H. K. 149
B Kojiri, T. 363
Baciu, G. 149 Krzyzak, A. 435
Bardossy, A. 19
Bass, B. 33 L
Benzaquen, F. 99 Lachtermacher, G. 229
Bobee, B. 381 Lai, L. 99
Bodo, B. A. 271,285 Lall, U. 47, 301
Bogardi, I. 19 Lee, T.-Y. 87
Bosworth, B. 301 Lennox, W. C. 77
Bruneau, P. 381 Lettenmaier, D. P. 3
Lewandowski, A. 333
C Liu, C.-L. 87
Chili, C.-L. 121
Claps, P. 421 M
Cohen, J. 33 Matsubayashi, U. 347
Corbu, I. 99 McLeod, A. I. 245, 271
Mesa, O. J. 409
D Murrone, F. 421
Dillon P. J. 285 Muster, H. 19
Druce, D. J. 63
Duckstein, L. 19 N
Nishimura, T. 459
F
Fujita, M. 205, 459 p
Fuller, J. D. 229 Panu, U. S. 191,363
Pegram, G. G. S. 449
G Penn, R. 99
Goodier, C. 191 Pereira, B. de B. 105
Perreault, L. 381
H Perron, H. 381
Harmancioglu, N. B. 135, 163 Phatarfod, R. M. 395
Hashimoto, N. 205 Poveda, G. 409
Hayashi, S. 347
Hipel. K. W. 245,271 R
Huang, G. 33 Rajagopalan, B. 47, 317
469
470 AUTHOR INDEX

S
Saga, H. 459
Sales, P. R. H. 105
Satagopan, J. 317
Singh, V. P. 135
Srikanthan, R. 395

T
Takagi, F. 347
Tao, T. 77,99
Tarboton, G. 47

U
Unny, T. E. 363

V
Vieira, A. M. 105

W
Watanabe, H. 217
y
Yamada R. 217
Yin, Y. 33
Yu, P.-S. 87
Z
Zhang, S. P. 217
Zhu, M.-L. 205
SUBJECT INDEX

A C
Abelson-Tukey trend test 245, 248 Canada - U.S. Great Lakes Water Quality
Akaike information criterion 369, 380, 451 Agreement 272, 282
aggregation of data 422 Central Southern Italy watersheds 422
Aras river basin, Eastern Turkey 168, 171 change of information 175
atmospheric variables 3 circulation patterns (CPs) and classifica-
autocorrelation 279-280 tions 19-21
autocorrelation function (ACF) 349, 352, circular pipe 121, 124-129
358,264,267 Clearwater Lake, Ontario 285
autoregressive (AR) 340-341 climatic change 1
autoregressive integrated moving average clustering/clustering analysis 194,196,363
(ARIMA) 223 Cold Lake, Alberta 38, 44
autoregressive moving average (ARMA) 106- Columbia River Basin, Washington and
107,110,116,230,234,239,334- Oregon 317,322-323
conceptual-stochastic models 422-427, 434
335, 340, 342, 422, 424, 427-431,
434,445-446 convergence of identification algorithms
436,440,448
autoregressive moving average exogenous
cost effectiveness of monitoring network
(ARMAX) 106-107, 111, ll6
138, 140, 142-143, 147
auto-series (AS) model 193-194
cross series (CS) model 193, 195
cross validation 451
B cumulative periodogram 263, 265, 266, 268
backfitting 276 curse of dimensionality 301
'back-of-the-envelope' methods 396
back propagation 229, 231-232 D
backward propagation 207-208 DAD analysis 350,354,358
data vs. information 137, 141, 147, 164
Bayes/Bayesian 11-12, 149-155, 161
de-acidification trends 285
Bayesian decision theory 142
decision-making 80-81
bin filter 276
decision support system for time series anal-
bode plot 338-340
ysis 263
Box-Jenkins model identification 334
dynamic efficiency index (DEI) 179, 187
Box-Jenkins models 229-231, 237
Box-Cox transformation llO, ll6 E
Brillinger trend test 245-248 East River watershed, China 82-83
British Columbia (B.C.) Hydro 63, 65-66, efficiency of water and wastewater treat-
68-69,72 ment plants 177-178
Butternut Creek, N.Y. 205, 215-216 ELETROBRAS 105, 109, ll6
471
472 SUBJECT INDEX

EM algorithm 450-451 hidden Markov model (HMM) 11


emptiness of reservoirs 400-402 heuristic forecast 99-102
English River at Sioux Lookout, Ontario Hurst effect 409-411,415-416, 418-419
196 Hurst exponent 409-415, 418-419
entropy 119 hybrid backpropagation (HBP) 237
error back propagation algorithm 221-222 hydraulics 121
estimators of hurst exponent 409-415 Hydro-Quebec, Canada 381-383
European circulation patterns 21-23, 27 hysteresis effects 280
expected forecast 99
exploratory data analysis 334 I
extended Kalman filter 462-463, 465 identification of nonlinear systems 435
infilling of missing data 191
F inflow forecasting 99
Fei-Tsui reservoir, Taiwan 90, 93 information based design 142
fluctuation analysis 224-225 information content 139
fluid-mechanics principle 121, 133 interval runoff prediction 215
forecasting 61, 229-230
J
forward-propagation 206-208
joint probability density function
fourier series 347-348, 350
(joint p.d.f.) 301
fourier spectrum 348-349, 351-352, 355-
356,358 K
frequency transfer function 335 Kalman filter 449,465
frequentist 150 Kalman filtering theory 223
fuzzy inference 205, 213-215, 367 k-d tree 304, 306
fuzzy rule (FR) 19, 21, 24, 27-30 kernel density estimator (k.d.e.) 301-302,
304-305, 307, 310
G
kernel probability density estimators 48-
Gaussian kernel 304
54
general circulation models (GCMs) 34, 38, kernel regression estimate 435-437, 439
44
k-mean algorithm 366, 368-369, 372, 378
generalized additive models 276 knowledge based classification 23
GEOS diagrams 409-410, 419
Kolmogorov's power law 348, 352-353, 358
GEOS-H diagrams 409, 415-419
kriging 317-319, 322, 324-325, 329
global atmospheric general circulation mod-
els (GCMs) 4-6, 9, 12-15 L
global circulation model (GCM) 20 'lag components' 230, 234
Grand and Saugeen Rivers, Ontario 273, learning 229, 231, 233, 235
275-276, 279-282 learning process 221
Great Lakes precipitation 263-265 linearity and nonlinearity 436
Great Salt Lake, U.S.A. 302-303, 306 linearized spectrum 338-344
grey prediction model (GPM) 34, 39-40, linearized transfer functions 335,337-338
44 locally weighted regression (lowess) 317,
grey theory 34-38 321-322, 324-325, 329

H M
hammersten systems 436, 445-447 Mann-Kendall trend test 245,248,249
SUBJECT INDEX 473

marginal cost model 67 open-channel 121, 129-133


Markov process 73, 78-79 optimality 138
Markovian 194 'order' and 'disorder' 183
Markovian arrival process 426-427
p
Markovian probability 367, 372, 375
maximum entropy principle 149, 151, 160- parameter estimation 421
161 patching streamflow 449
mean integrated square error (MISE) 437 pattern recognition system 364
memoryless cascade system 436-439 periodic independent residual ARMA (PI-
minimum cross-entropy principle 149, 151, RARMA) 424-425
155, 161 pH 287-288, 291, 295
minimum euclidean distance 364 phase angle 348, 351, 353, 355-356
minimum phase property 336 pollution spreading in rivers 435
Misumai experiment basin, Japan 459 Porsuk river basin, Turkey 167-168
monotonic trend 245 precipitation 47, 317, 347
Monte Carlo method 222, 317 prediction 237, 367-368, 383
moving average 338-339 prediction of daily water demands 217
multicollinearity 381, 384 probabilistic forecast 99, 102
multiple regression 382-383 Q
multivariate autoregressive moving aver- quasi channel network model 459
age
(MARMA) 106-107,114-116 R
multivariate autoregressive moving aver- rainfall 347
age exogenous (MARMAX) 107 rainfall forecasting model 87, 92
multivariate regression 385-389, 391 rainfall-runoff forecasting model 87-89, 92-
93
N real-time operations 80-81
Nagoya Meteorological Observatory, Aichi, redundancy of information 164, 167, 169,
J~u3W 1~
network efficiency and flexibility 141-142 regression equation 402, 406
neural network (NN) 19,21,25,27-30,203 ReMus 381-382, 385, 387-392
Nile River flows 266 reservoirs i Brazil 109
NKERNEL 302, 307 reservoir size 395
nonhomogeneous hidden Markov model (NHMM)ridge regression 384-385
8, 11-12 risk 33
non linearity 217,222 Rocky Island Lake, the Mississagi River
nonparametric estimation 48 103
nonparametric statistical estimation 301 runoff analysis 459
non point source water pollution 271 runoff prediction 205
non stationary stochastic processes 409,
414-415 s
numerical evaluation 78, 84 Schwartz's Bayesian criterion 90
nutrient phosphorus 272 seasonality 395
seemingly unrelated autoregressive mov-
o ing average (SURARMA) 106-107,
Ontario Hydro 99 112-114, 116
474 SUBJECT INDEX

Seka Dalaman Paper Factory, Turkey 184 W


sensitivity analysis 226, 234, 236 water budget modelling 41
simulation 54-57, 64, 347, 357-358, 367, water quality monitoring networks 135
422,427-434,463 weather classification schemes 6-8
shot noise 422, 425, 427, 431-434 Wiener systems 436,445-447
smoothing spine ANOVA (ss-ANOVA) 317, wet/dry spell 48-49
319-320, 322, 324-325, 329 Willamette River Basin, U.S.A. 322-323
SO~- 290-291, 295-296 Williston Lake, British Columbia 63-69,
Southwestern Ontario Great Lakes tribu- 74
taries 273-274 Woodruff, Utah 48, 51, 54
spatial analysis 299
spatial distribution 350, 355-356
z
spatial distribution of rainfall 460
z transform theory 335
spectral analysis 331
spectral transfer function 335
station discontinuance 163
stochastic dynamic programming (SDP)
64,66-67,69-72,74
stochastic precipitation models 3, 6, 9
storage forecasting model 93
storage-runoff forecasting model 88-89, 92-
93
streamflow modelling 361
streamflow forecasting 78-80, 85

T
thermodynamic efficiency 179, 185, 187
Thames river at Thamesville 368
three-cascade tank model 460
transfer function models 88
trend analysis 243
two-reservoir system 78, 82

U
uncertainties 121
univariate autoregressive moving average
(Univariate ARMA) 110
universal law 125, 128
user interface 99

V
Vaal Dam, South Africa 451
variogram fitting 319
velocity distribution 121-125, 129-131, 133
Water Science and Technology Library
1. AS. Eikum and R.W. Seabloom (eds.): Alternative Wastewater Treatment.
Low-Cost Small Systems, Research and Development. Proceedings of the
Conference held in Oslo, Norway (7-10 September 1981). 1982
ISBN 90-277-1430-4
2. W. Brutsaert and G.H. Jirka (eds.): Gas Transfer at Water SUifaces. 1984
ISBN 90-277-1697-8
3. D.A Kraijenhoff and J.R. Moll (eds.): River Flow Modelling and Forecasting.
1986 ISBN 90-277-2082-7
4. World Meteorological Organization (ed.): Microprocessors in Operational
Hydrology. Proceedings of a Conference held in Geneva (4-5 September
1984). 1986 ISBN 90-277-2156-4
5. J. Nemec: Hydrological Forecasting. Design and Operation of Hydrological
Forecasting Systems. 1986 ISBN 90-277-2259-5
6. V.K. Gupta, I. Rodriguez-Iturbe and E.F. Wood (eds.): Scale Problems in
Hydrology. Runoff Generation and Basin Response. 1986
ISBN 90-277-2258-7
7. D.C. Major and H.E. Schwarz: Large-Scale Regional Water Resources
Planning. The North Atlantic Regional Study. 1990 ISBN 0-7923-0711-9
8. W.H. Hager: Energy Dissipators and Hydraulic Jump. 1992
ISBN 0-7923-1508-1
9. V.P. Singh and M. Fiorentino (eds.): Entropy and Energy Dissipation in Water
Resources. 1992 ISBN 0-7923-1696-7
10. K.W. Hipel (ed.): Stochastic and Statistical Methods in Hydrology and
Environmental Engineering. A Four Volume Work Resulting from the
International Conference in Honour of Professor T. E. Dnny (21-23 June
1993). 1994
1011: Extreme values: floods and droughts ISBN 0-7923-2756-X
1012: Stochastic and statistical modelling with groundwater and surface water
applications ISBN 0-7923-2757-8
10/3: Time series analysis in hydrology and environmental engineering
ISBN 0-7923-2758-6
10/4: Effective environmental management for sustainable development
ISBN 0-7923-2759-4
Set 10/1-10/4: ISBN 0-7923-2760-8
11. S. N. Rodionov: Global and Regional Climate Interaction: The Caspian Sea
Experience. 1994 ISBN 0-7923-2784-5
12. A Peters, G. Wiuum, B. Herrling, D. Meissner, C.A Brebbia, W.G. Gray and
G.F. Pinder (eds.): Computational Methods in Water Resources X. 1994
Set 1211-1212: ISBN 0-7923-2937-6

Springer-Science+Business Media, B.V.

You might also like