0% found this document useful (0 votes)
36 views11 pages

Solomatine 2004

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views11 pages

Solomatine 2004

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

M5 Model Trees and Neural Networks: Application to Flood

Forecasting in the Upper Reach of the Huai River in China


Dimitri P. Solomatine1 and Yunpeng Xue2

Abstract: The applicability and performance of the so-called M5 model tree machine learning technique is investigated in a flood
forecasting problem for the upper reach of the Huai River in China. In one of configurations this technique is compared to multilayer
perceptron artificial neural network (ANN). It is shown that model trees, being analogous to piecewise linear functions, have certain
advantages compared to ANNs—they are more transparent and hence acceptable by decision makers, are very fast in training and always
converge. The accuracy of M5 trees is similar to that of ANNs. The improved accuracy in predicting high floods was achieved by building
a modular model (mixture of models); in it the flood samples with special hydrological characteristics are split into groups for which
separate M5 and ANN models are built. The hybrid model combining model tree and ANN gives the best prediction result.
DOI: 10.1061/(ASCE)1084-0699(2004)9:6(491)
CE Database subject headings: Hydrologic models; Hydrologic data; Flood forecasting; Artificial intelligence; China; Neural
networks; Multiple regression models; Data analysis.

Introduction Kundzewicz (1987). Another method of such type comes from the
“statistics” world—the approach by Friedman (1991) in his mul-
Artificial neural network (ANN) models have become a popular tiple adaptive regression splines algorithm. Yet another one, being
choice among the nonlinear flood forecasting methods (Hsu et al. the subject of this paper, is a M5 model tree (Quinlan 1992;
1995; Minns and Hall 1996; Solomatine and Torres 1996; Daw- Witten and Frank 2000), a method attributed to the area of ma-
son and Wilby 1998; See and Openshaw 1998; Govindaraju and chine learning. An earlier method classification and regression
Rao 2000; Dibike and Solomatine 2001; Bhattacharya and Solo- tree of Breiman et al. (1984) of regression trees should also be
matine 2002a; Birikundavyi et al. 2002). Being an accurate pre- mentioned; it generates, however, zero-order models (constant
dictive tool, the ANN technique has, however, a disadvantage that output values for subsets of input data) rather than the first-order
often limits its acceptance in practice—ANN models are not (linear) models.
transparent (“black box”) and do not help us to understand the
The M5 algorithm combines the features of classification and
nature of the solution. The arbitrary nature of the internal repre-
regression: trees—structured regression is built on the assumption
sentation means that there may be dramatic variations between
networks of identical architecture trained on the same data (Wit- that the functional dependency is not constant in the whole do-
ten and Frank 2000). Recently some attempts were made to pro- main, but can be approximated as such on smaller subdomains
duce the understandable insights from the structure of neural net- (Fig. 1). For the continuous variables, these subdomains are
works, such as saliency analysis (Abrahart et al. 2001) and the searched for and characterized by the average value (regression
methods of recovering rules reported by Setonio et al. (2002). The trees) or with a linear regression function (model trees) of the
latter method starts from building an ANN as the “right” tool that dependent variable (on Fig. 1, for example, for the domain
further needs a better interpretability. 关x2 ⬎ 2 , x1 ⬎ 2.5兴 Model 3 is used and its form is y = a0 + a1x1
There are, however, approaches that instead of constructing a + a2x2) The most attractive advantage is that by dividing the func-
single complex model use a number of simpler “local” models tion being induced into linear patches, M5 model trees provide a
specialized in a particular area of input space (called mixtures of representation that is reproducible and comprehensible by practi-
experts). Such models were developed already in the 1980s—see, tioners.
for example, a paper on multilinear models by Becker and Still, the M5 model tree is not a very popular method: to our
knowledge after the paper of Kompare et al. (1997) in Slovene
1
Associate Professor, UNESCO-IHE Institute for Water Education language the applications of M5 model trees in water-related
(IHE Delft), P.O. Box 3015, 2601 DA Delft, The Netherlands. (corre-
problems are reported only by Solomatine (2002), by Solomatine
sponding author). E-mail: [email protected]
2
Yellow River Conservancy Commission, 11 Jinshui Rd., 450003 and Dulal (2003) (for rainfall-runoff modeling), and by Bhatta-
Zhengzhou, China. E-mail: [email protected] charya and Solomatine (2002b) (for modeling the stage–discharge
Note. Discussion open until April 1, 2005. Separate discussions must relationship).
be submitted for individual papers. To extend the closing date by one In this study that actually took place in 2000–2001, a rather
month, a written request must be filed with the ASCE Managing Editor. complex catchment area, the upper reach of the Huai River, was
The manuscript for this paper was submitted for review and possible
publication on October 29, 2002; approved on February 20, 2004. This considered as the study area, and the performance of various M5
paper is part of the Journal of Hydrologic Engineering, Vol. 9, No. 6, model trees was investigated. In two of the five cases M5 model
November 1, 2004. ©ASCE, ISSN 1084-0699/2004/6-491–501/$18.00. tree is also compared to ANN.

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 491


Fig. 1. Example of M5 model tree. Models 1–6 are linear regression models

Introduction to M5 Model Trees and Artificial Neural tropy in the resulting subsets; in other words, trying to filter as
Network many samples from the same class into one subset as possible.
The M5 model tree is a numerical prediction algorithm and its
M5 Model Trees splitting criterion is based on the standard deviation of the values
in the subset T of the training data that reaches a particular node
The M5 model tree algorithm was originally developed by Quin- (which is an analogue of entropy). It is used as a measure of the
lan (1992); we used the software implementing its variation M5⬘ error at that node, and the attribute that maximizes the expected
provided by Witten and Frank (2000). Model trees combine a
error reduction is chosen for splitting at the node. Accordingly, on
conventional decision tree with the possibility of generating linear
Fig. 1 the attribute X2 is selected for the root node with the split
regression functions at the leaves. This representation is relatively
value 2.0.
perspicuous because the decision structure is clear and the regres-
sion functions do not normally involve many variables. The M5 The splitting process terminates when the output values of the
tree is a piecewise linear model, so it takes an intermediate posi- samples that reach a node vary slightly, that is, when their stan-
tion between the linear models as ARIMA and truly nonlinear dard deviation is just a small fraction (say, less than 5%) of the
models as ANNs. standard deviation of the original sample set. Splitting also termi-
The construction of a model tree is similar to that of decision nates when just a few samples remain in a subset. The linear
trees. Fig. 1(a) illustrates how the splitting of space is done. First, regression models are then built for each subset of samples asso-
the initial tree is built and then the initial tree is pruned (reduced) ciated with the terminating (leaf) nodes.
to overcome the overfitting problem (that is a problem when a
model is very accurate on the training data set and fails on the test Pruning and Smoothing Model Trees
set). Finally, the smoothing process is employed to compensate
for the sharp discontinuities between adjacent linear models at the Pruning If a generated tree has too many leaves, it may be “too
leaves of the pruned tree (this operation is not needed in building accurate” and hence overfit and be a poor generalizer. It is pos-
the decision tree). sible to make a tree more robust by simplifying it, i.e., by prun-
ing, that is by merging some of the lower subtrees into one node.
Building Model Trees
Different decision tree inductive algorithms used to solve classi-
Smoothing This process is used to compensate for the sharp dis-
fication problems employ the divide-and-conquer approach. First,
continuities that will inevitably occur between adjacent linear
an attribute is selected to be placed at the root node and one
branch is made for each possible value; then the example set is models at the leaves of the pruned trees. This is a particular prob-
split up into subsets; one for every value of the attribute. Now the lem for models constructed from a small number of training
process can be repeated recursively for each branch using only samples. Smoothing can be accomplished by producing linear
those samples that actually reach the branch. If at any time all models for each internal node, as well as for the leaves at the time
samples at a node have the same classification, the development the tree is built. Experiments show that smoothing substantially
of that part of the tree is stopped. The attribute, which is chosen to increases the accuracy of prediction.
be used for a split for a given set of samples, can be determined Fig. 4(c) presents a tree combining seven linear regression
by certain statistical property called a splitting criterion. For de- models at the leaves. In parenthesis, the first number is the num-
cision trees the splitting is based on trying to minimize the en- ber of samples in the subset sorted to this leave and the second

492 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004


Fig. 2. Sketch map of study area of Huai River (I,II,III,IV,V repre-
sent five subcatchment areas)
Fig. 3. Performance of M5 model using full-year data in (a) training
(fragment) and (b) testing (fragment)
one—root mean squared error (RMSE) of the corresponding lin-
ear model divided by the standard deviation of the samples subset in the flood detention and diversion areas, thus accurate flood
for which it is built (expressed in percent). forecasting is critical for flood management and optimal control
of the flood control projects.
The Xixian subcatchment with the drainage area of
Artificial Neural Networks
10,190 km2 is located in the upper reach of the Huai River and is
The ANN is a powerful machine learning method widely used in characterized by frequent storms with the highest annual rainfall
the problems of numerical prediction and classification. A net- reaching 1,500 mm. It is a major flood source in Huai River
work is made up of a number of interconnected nodes (processing basin. Most of the area is mountainous with the highest peak
elements), arranged into three basic layers: input, hidden, and reaching 1,140 m. The river system and the distribution of the
output. The links represent weighted connections between the monitoring stations is shown in Fig. 2. The discharge of the main
nodes. A processing element simply multiplies input by a set of trunk is monitored along the river at three hydrological stations,
weights, and linearly or nonlinearly transforms the result into an which are denoted as QC,QD,QX; the QX station (Xixian) is the
output value. By adapting its weights, the neural network works downstream station. There are nine tributaries that flow into the
towards generating an output that would be close to the measured main trunk of the Huai River but only the two main tributaries,
(target) output. There is a similarity between ANN and multiple namely, Shihe River 共QN兲 and Zhugan River 共QZ兲 are gaged.
nonlinear regression where coefficients are found as a result of Thus the data of the 17 rainfall stations and three evaporation
solving an optimization problem. The detailed coverage of ANN stations in this area can be used for flood forecasting. Since the
can be easily found in many books (e.g., Haykin 1999). The ref- land use has not significantly changed after the construction of the
erences given in “Introduction” refer to the application of ANN in reservoir on the Shihe River at the end of the 1950s, there is a
hydrology especially in rainfall-runoff modeling. A three-layer possibility to apply data-driven modeling to flood forecasting.
feed forward multilayer perceptron (MLP) ANN based on the Rainfall in the Huai River region is uneven in time and distri-
back propagation algorithm is a popular choice in the fields of bution, and also varies from season to season and from year to
hydrology in general and runoff analysis in particular. year. The flood season of the Huai River is from May to October,
and the precipitation events during the flood season according to
their cause can be classified as low pressure troughs, or as cy-
Study Area clones. Rainfall due to the low pressure troughs covers large areas
and has a long duration and amount, which leads to a long dura-
The Huai River is one of the seven largest rivers in China, and tion of flood. On the other hand, the rainfall due to cyclones
also the one that is frequently threatened by floods— usually has higher intensities, shorter durations, and is lower com-
approximately once every 5 years, sometimes leading to catastro- pared with the former, and this leads to floods with high peak and
phes like in 1931 when 75,000 people lost their lives. In order to short duration. The discharges of Huai River and the tributaries
protect the densely populated areas and the property in the flat are normally relatively low and even nearly dry out in the dry
flood plain many flood detention and diversion areas have to be season; the flood peak is, however, relatively large, reaching 50–
built along the river dike to store the excess water. The situation is 100 times its mean discharge. Due to the regulation of the Nan-
complicated by the fact that there are about 180,000 people living wan Reservoir, the maximum discharge 共QN兲 of the Shihe River

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 493


is relatively low (only 415 m3 / s), which does not have much way to bring the catchment characteristics into a data-driven
impact on the peaks downstream at Xixian 共QX兲. model. This was done by analyzing the physical properties of the
catchment, data analysis, and transformation, and the correlation
analysis. The daily areal average rainfall (Pa), moving average of
Objective and Methodology areal average daily rainfall (PaMov), and discharges of predicted
station 共QX兲 and upstream station 共QC兲 with different corre-
In flood control practice of Huai River the traditional hydrological sponding time lags were used as input variables. Different models
methods (unit hydrograph and gage to gage correlation method) required different lags. It was found that the 4 day moving aver-
are the main flood forecasting methods. They are complemented age of the area rainfall with 1 day time lag 共PaMov4t兲 has the
by the real time adjustment by experts, in which the prediction maximum correlation coefficient with the predicted discharge
accuracy relies mainly on and is limited to the expert’s experi- QXt+1. For the flood season data the mostly correlated variable is
ence. The forecasting performance of the Xinanjiang model (Zhao the 2 day moving average of area rainfall with 1 day time lag
and Liu 1995), a semidistributed conceptual rainfall-runoff model 共PaMov2t兲. In various modeling experiments different combina-
widely used in China, is not always adequate due to the limitation tions of variables were used.
or the unavailability of the input data and the difficulties of cali-
bration.
The main objective of this study was to build a model for Experiments and Results
predicting the flood discharge of the Xixian station 1 day ahead
共QXt+1兲 using machine learning methods based on the known hy- In total ten models were built that can be classified into five
drological system data (e.g., discharge, rainfall, and evaporation ) different types: full-year global (overall) model, high flows global
on the current day and the days before. Another objective was to model, flood season global model, flood season modular model,
investigate the applicability of the M5 model tree method and to flood season hybrid model, and flood season subarea rainfall
compare it, at least on some data sets, to ANN. The possibilities model. In two cases both the M5 model tree and ANN model
of combining M5 and ANN in a hybrid model were also seen as were built and compared using the same input and output vari-
one of the items of interest. ables, and each experiment was designed after the analysis of the
results of previous ones. In some cases the results were compared
to the naïve (“no-change”) model (that is the model QXt+1 = QXt,
Identification of Data Sets and Variables and the three-point linear regression model QXt+1 = a0 + a1QXt
+ a2QXt−1 + a3QXt−2.
There was 21 years of time series data available: (1) The daily Training (calibration) of every model was done using the train-
discharge time series of 21 years (1976–1996), discharges of the ing data set—for example, for the global model it consisted of
Zhugan River were, however, only for the period of 10 years 5,109 samples, each characterized by 11 measurements of the
(1987–1996) (denoted as QZ); (2) the daily rainfall data time present and past rainfalls and discharges or their moving averages
series of 21 years (1976–1996) from 17 stations; and (3) the daily (considered as input model variables) and one output variable
evaporation data time series of 14 years (1976–1989) at three QXt+1. The trained model was tested on another data set (for the
stations. The time series of 1976–1989 was used as a training data global model—2,565 samples). In both training and testing the
set, and the remaining data (1990–1996) was used for testing and trained model was run to make the prediction of discharge QX for
cross validation. The training set was constructed in such a way one time step (1 day) for each of the samples separately, and then
that it would cover both low and high flow conditions and include the corresponding hydrographs were plotted and the overall
both the maximum and minimum values of discharge. model errors calculated. The plots and errors for most of the
Any modeling exercise requires accurate identification of input models built together with the analysis of the results are presented
and systems variables, in this case the physical process of rainfall- below.
runoff yield and runoff routing. All the relevant system state pa-
rameters should be considered, such as rainfall, evaporation, soil
Full-Year Global (Overall) Model
characteristics, upstream discharges at the main trunk, and tribu-
taries. However, due to the data limitations, only the rainfall and First, the whole set of continuous 21 year data set was used, so
discharge data at the main trunk are used to predict the down- the model was called a full-year global model. After a number of
stream discharge. The reason is that the evaporation data series experiments aimed at finding the relevant inputs the following 11
(14 years) is shorter than the others (21 years), and the discharge input variables were selected:
of the tributary Zhugan River 共QZ兲 was recorded only for 10 1. three precipitation values for the current day and the previous
years. In order to reduce the number of input variables the areal 2 days 共Pat , Pat−1 , Pat−3兲;
average rainfall calculated by the Thiessen method was used in- 2. 4 day precipitation moving averages calculated for the cur-
stead of the 17 distributed rainfall data sources. rent and the previous day (PaMov4t,PaMov4t−1);
Traditional (physically based) approaches to hydrological 3. three values of the upstream discharge for the current and of
modeling require identification of the various types of the rainfall- the previous 2 days 共QCt , QCt−1 , QCt−2兲; and
runoff generation, baseflow separation techniques, etc. However, 4. discharge downstream for the current and of the previous 2
the data-driven models which work with the total rainfalls and days 共QXt , QXt−1 , QXt−2兲.
total flows do not require this information. The rainfall losses The M5 model tree with 35 leaf nodes was generated, each
could be taken into account by assuming that the daily rainfall leaf node corresponding to a linear equation predicting the dis-
data can express the rainfall intensity and the moving average of charge of the target station (it is not shown due to its large size).
rainfall (the antecedent rainfall) can implicitly express the soil The nodal splitting rules indicate the rainfall and flow condition
moisture content. associated with the predicted discharge, and this gives an indica-
Taking into account the time lags for the input variables is the tion of the catchment hydrological characteristics. The topmost

494 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004


Table 1. Comparison of Artificial Neural Network and M5 Model Trees Prediction Results (Full Year and Flood Season)
Full-year Full-year Full-year Full-year
M5 M5 naïve, linear, FS-M5 FS-M5 FS-ANN FS-ANN FS-ANN
Performance training testing testing testing training testing training cross-valid testing
Years 76–79 90–96 90–96 90–96 76–89 90–96 76–89 90–93 94–96
Number of equations 35 35 n/a n/a 7 7 8 8 8
in M5 or hidden
nodes in ANN
Number of samples 5,109 2,565 2,565 2,565 2,625 1,525 2,625 653 872
Root mean square error 69.6 84.5 183.0 160.0 98 87 100 79 96
Mean absolute error 18.7 18.9 37.1 39.9 31.4 24.2 33.0 25.1 24.9
Maximum absolute error 1,695.5 2,208.3 3,009.0 3,008.2 1,766 1,651 1,498 1,130 1,446
Correlation coefficient 0.97 0.95 0.76 0.79 0.97 0.97 0.97 0.98 0.95

splitting attribute is QCt, the upstream discharge on the current Flood Season Global Model
day; it has the maximum correlation with the predicted discharge This model dealt only with the flood season (FS) data from May
QXt+1. The attributes at lower levels are QXt, PaMov4t and to October across the 21 years time series, and the 2 day moving
PaMov4t−1, they also appear in the subbranches frequently. The average of area rainfall were used instead of the 4 day average
attributes Pat, Pat−1, Pat−2, and QCt−1 are less important and ap- (since it has higher correlation with QXt+1). Correlation analysis
pear only at or near leaf nodes in the trees and are thus indicative led to the selection of 16 input attributes (Pat, Pat−1, Pat−2, Pat−3,
of some special situations. PaMov2t, PaMov2t−1, PaMov2t−2, PaMov2t−3, PaMov2t−4, QCt,
As shown in Figs. 3(a and b) and Table 1 the M5 model tree QCt−1, QCt−2, QXt, QXt−1, QXt−2, and QXt−3). Two versions of
can predict the low flow correctly, but has higher errors in pre- model trees were built: with all 16 attributes (the model had 11
dicting some of the flood peaks: RMSE was 69 m3 / s in training regression equations) and the simpler version with seven at-
and 84 m3 / s in testing. Nevertheless, the M5 model tree error was
tributes (Pat, Pat−1, PaMov2t, PaMov2t−1, QCt, QCt−1, and QXt)
54% smaller than that of the naïve “no-change” model and 47%
and with seven equations. Accuracy of prediction was very simi-
smaller than that of the three-point linear regression model.
lar.
High error in flood forecasting was attributed to the fact that
An ANN model with the same input and output variables was
the number of samples corresponding to high flow was much
built as well. The popular three-layer feed forward ANN topology
smaller compared to those of the low flow in the full-year data
was employed, and the linear activation functions were used in
set. As a result, out of the 35 rules that M5 model generated, there
was only one linear model for the samples with QXt ⬎ 721 m3 / s the output layer since they delivered better performance if com-
corresponding to the flood situation. pared to sigmoid or tangent ones. The classical backpropagation
training method of ANN was adopted. The Neural Machine (Neu-
ral Machine 2003) and NeuroSolutions (NeuroSolutions 2003)
Zooming-In: Better Models for Extreme Flows software packages were used.
The performance of the models is shown in Figs. 4(a and b)
In order to reproduce the extreme-flow situations better, two other and Table 1, and the induced tree on Fig. 4(c). The ANN predic-
models were built: one for the selected high flows only (that was tion overall result is similar to that of the M5 model trees—its
filtered by the value of QX), and the other one for the data col-
RMSE is 10% higher than of M5, mean absolute error (MAE) is
lected during the flood season (filtered by the time constraints).
the same, and the maximum absolute error (MaxAE) is 12%
lower. Fig. 4(b) shows that the prediction of high flows by the M5
High-Flows Global Model
model has improved. However, both M5 and ANN still have a
A separate model for the flows QXt+1 ⬎ 500 m3 / s was set up, with high error in predicting some flood events, and the maximum
the 234 samples used for training and 80 for testing. The same 11
error occurs during the same flood events. This means that the
inputs were used and the model tree with 11 equations generated.
input data have to be processed more efficiently and some new
Most of the equations, however, have rather high error with only
attributes should be added to improve the prediction accuracy.
one rule with the error smaller than 10%. RMSE was 281 m3 / s in
training and 411 m3 / s in testing.
Modular Models (Mixtures of Models): Combining Expert
Interestingly, in the nodes corresponding to higher discharge
Rules with M5 Trees
values, instead of QXt the rainfall on the previous day Pat−1 be-
gins to appear at top layers of the generated M5 model tree—this More accurate analysis of the hydrological processes in the catch-
means that this attribute became the most important one for pre- ment and the performed error analysis of the models reported
dicting the discharge QXt+1. The physical explanation of this fact above lead us to a conclusion that the conditions used so far were
is that in flood season the rapid increase in discharge occurs after too superficial and this actually did not allow the data-driven
the intensive rainfall 共Pat−1兲—this is different from low flow con- models to classify various flood conditions into physically inter-
ditions when there is not much influence of rainfall. So, in spite of pretable classes.
the errors, the M5 model has correctly suggested that the flood In order to improve the effectiveness of the predictive model,
discharge has characteristics different from those of the low flow: the expert-generated rules were used to build modular models.
this is consistent with the physics of the hydrological processes. The whole flood season data was split into subsets using domain

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 495


Fig. 5. Modular approach in prediction of flood forecasting

by the global model anyway. This prompted for the further filter-
ing of data and considering the next local model—Module 2.

Module 2 (FS-m2-M5 Model) Data not included in the Module


1 (i.e., data with QXt−1 ⬍ = 1,000 m3 / s) was additionally filtered
by the rule QXt ⬎ 200 m3 / s, and a M5 model was built. As shown
in Figs. 7(a and b) and Table 3, there are still some erroneous

Fig. 4. (a) M5 models, flood season data (FS-M5) versus full-year


data (global-M5) (testing, fragment); (b) M5 and artifical neural net-
work models using flood season data (testing, fragment); and (c) M5
model tree (FS-M5) trained on flood season data with nine input
variables

knowledge, and then a set of models using the M5 model tree or


ANN was built. In total three modules were constructed (Fig. 5).

Module 1 (FS-m1-M5 Model) This model was built for the dis-
charges QXt−1 the day before higher than 1,000 m3 / s. Figs. 6(a
and b) show that the model is very accurate; Fig. 6(c) presents the
M5 model which is very concise and easy to understand. Table 2
shows that the prediction error of this model is much lower than
that of the flood season global model FS-M5 calculated only for
the samples with QXt−1 ⬎ 1,000 m3 / s. Fig. 6. FS-m1-M5 model performance (module 1, samples with
It was found that many samples processed by Module 1 are QXt−1 ⬎ 1,000 m3 / s) in (a) training; (b) testing; and (c) M5 model
still associated with low-flow predictions which are not so inter- tree for FS-m1-M5 model (module 1, samples with
esting for flood forecasting and which are already predicted well QXt−1 ⬎ 1,000 m3 / s)

496 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004


Fig. 7. Performance of FS-m2-M5 (module 2, QXt−1 艋 1,000 m3 / s and QXt ⬎ 200 m3 / s), (a) training; (b) testing; (c) FS-m2-M5 model (module
2, QXt−1 艋 1,000 and QXt ⬎ 200)

predictions, and its prediction performance is close to that of the This can be explained by the flood effect of the antecedent rain-
global model. Fig. 7(c) presents the resulting model tree. fall, and the heterogeneous distribution of rainfall that is not ac-
counted for due to averaging.
Analysis of Errors for Modules 1 and 2 Figs. 8(a and b) pre- From the generated M5 model trees of Module 2 [Fig. 7(c)], it
sents the individual flood events hydrographs with the measured can be seen that the intensive floods have been classified reason-
and calculated discharges. It can be seen that the data points of
ably well 共PaMov2t ⬎ 40.5兲, and even the distribution of the rain-
Module 1 either lie in the peak and recession part of the flood
fall duration is modeled correctly. The data filtered into Module 2
events, or lie in the rising limb of a flood with long duration. Thus
the soil moisture is saturated and the prediction is not affected by does not exhaust the possibilities of building more accurate local
flash flood at a tributary, so Module 1 gives a good prediction. models. Consider, for example, equation LM8 which is respon-
However, the situation corresponding to the samples of Module 2 sible for modeling the short duration of heavy rainfall in the
is more complex. If the point lies at the relatively low flow part, middle reach or downstream part 共QXt ⬍ 870, QCt−2 艋 8兲. The
the prediction is still good. However the points that are close to boundary of 870 for QXt that corresponds to the antecedent rain-
peaks of flood events of short duration are not predicted well. fall (soil moisture) is somehow misleading: the value 870 is too

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 497


Fig. 8. Performance of M5 modular model (a) in training (fragment
shown is 1982 flood) and (b) for testing (fragment shown is 1996
flood)

high for that. If, for example, there is small rainfall of long dura-
tion the discharge does not rise too much, but the soil becomes Fig. 9. Performance of FS-m3-M5 model in (a) training and (b)
saturated. Based on these considerations a new class called Mod- testing
ule 3 was constructed.

Module 3 Possibilities for improving accuracy lie in the further


analysis of the physical characteristics of the catchment. Module
3 was used to represent the flood data due to the short but inten- for different modules makes it possible to characterize the overall
sive rainfall after a period of dry weather, which is mainly in- model as a hybrid model. The three models for Module 2 com-
cluded in Module 2. This type of flood is filtered out by the pared in Table 5 are: (1) FS-ANN model trained using the whole
following rule: Pat−1 ⬎ 50 AND PaMov2t−2 ⬍ 5 AND flood season data only, but for which the error is calculated only
PaMov3t−4 ⬍ 5. There are only 23 samples in the 21 years time for the samples complying to the constraints of Module 2; (2)
series daily data; 18 in training and 5 samples in testing, respec- ANN trained on the samples of Module 2 (FS-m2-ANN); (3) M5
tively. The M5 method generates the following single-equation model tree trained on the samples of module 2 (FS-m2-M5).
model: From Table 5, it is clear that in comparison with the global
ANN model, the hybrid model approach improves the prediction
LM1:QXt+1 = − 43.2 + 75.6PaMov2t−4 + 16.4Pat + 1.06QCt accuracy. The FS-m2-ANN model is slightly better than the
modular model FS-m2-M5, and is far better than the ANN flood
+ 1.52QXt season model FS-ANN. Another merit of the hybrid approach is
As shown in Table 4 and Figs. 9(a and b), this formula works that smaller ANN models are easier to train. It is also possible to
quite well both in training and testing. Even in extrapolation, for combine (e.g., by averaging) the predictions of models of various
the sample in the testing data set whose discharge of 3,970 m3 / s types making predictions for the same subdomain of input space,
is higher than the maximum value of training data, is still pre- thus creating a committee machine.
dicted correctly. An exception is the June 30 1996 flood sample In practice, the presented models constituting the mixture are
with QXt+1 = 1,800 m3 / s, which is the only case with the short trained and when the new input data arrives, it is first filtered to
duration of extreme rainfall 共Pat−1 = 108, Pat = 0兲 in the whole an appropriate model and then prediction is made.
catchment; by adding an additional condition, such as Pat ⬎ 5, this
sample can be easily filtered out. Other Experiments: Changes in Discharge and Rainfall,
Distributed Rainfall
Hybrid Model: Combining Artificial Neural Network and Some of the previous studies indicated that using the changes in
M5 Model Tree discharge and rainfall (or their derivatives) along with their values
An insignificant number of samples for Module 2 and the need to as inputs may increase the performance of ANN. Several ANN
validate the results of linear regression models prompted the use and M5 models were built for Module 2, and the main conclusion
of an alternative model—ANN. So, in addition to the M5 model is that the performance of M5 models was getting worse, with the
built for Module 2 (FS-m2-M5), an ANN model (FS-m2-ANN) mixed results for ANN (RMSE decreased, but the MAE and
was also built. Such an approach of using different model types MaxAE increased).

498 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004


Table 2. M5 Model Performance for Module 1 (QXt−1 ⬎ 1,000 Using Flood Season Data)
FS-m1-M5 model FS-M5 model, extracted samples

Training Testing Training Testing


Years 76–89 90–96 76–89 90–96
Number of samples 96 32 96 32
Mean absolute error 83.0 126.8 97.3 114.1
Maximum absolute error 548.0 509.1 1,173.5 1,183.6
Root mean squared error 127.0 176.1 176.6 222.2
Correlation coefficient 0.99 0.98 0.96 0.95

Table 3. M5 Model Performance for Module 2 (Flood Season Data with QXt−1 ⬍ 1,000 and QXt ⬎ 200)
FS-m2-M5 model FS-M5 model, extracted samples

Training Testing Training Testing


Years 76–89 90–96 76–89 90–96
Number of samples 433 167 433 167
Mean absolute error 83.9 103.0 81.3 109.3
Maximum absolute error 1,345 2,016 1,125 1,616
Root mean squared error 175.8 251.7 159.7 264.6
Correlation coefficient 0.961 0.938 0.968 0.933

Table 4. M5 Model Performance for Module 3 (Pat−1 ⬎ 50 and PaMov2t−2 ⬍ 5 and PaMov2t−4 ⬍ 5 Using Flood Season data)
FS-m3-M5 model FS-M5 model (extracted samples)

Training Testing Training Testing


Years 76–89 90–96 76–89 90–96
Number of samples 17 5 17 5
Mean absolute error 123.6 188.5 193.6 260.5
Maximum absolute error 311.0 437.5 417.6 511.0
Root mean squared error 145.8 241.5 222.7 316.4
Correlation coefficient 0.950 0.995 0.888 0.985

Table 5. M5 Model Trees and Artificial Neural Network (ANN) for Module 2
Training 1976–1989 Testing 1995–1996
Performance FS-ANN, extracted samples FS-m2-ANN FS-m2-M5 FS-ANN, extracted samples FS-m2-ANN FS-m2-M5
Mean absolute error 125.2 91.7 97.3 121.8 24.5 20.8
Maximum absolute error 1,519 1,135 1,173 1,519 1,258 1,460
Root mean squared error 229.7 154.7 176.6 266.6 253.6 254.8
Correlation coefficient 0.929 0.968 0.96 0.893 0.925 0.91

Another way to improve the modeling performance would be • the generated tree-like structure of linear models is repro-
to use the distributed rainfall as input. Several experiments were ducible and easy to understand for decision makers. It
conducted, but the lack of detailed data did not allow for drawing makes it possible for a hydrologist to have a good over-
reliable conclusions. view of the relationships between the hydrological char-
acteristics;
• the M5 algorithm allows one to easily generate a family
Conclusions and Recommendations of interpretable models with different number of compo-
nent models/leaves and hence different robustness and ac-
1. Data-driven (machine learning) models are capable of per-
forming rainfall-runoff forecasting, even for a rather com- curacy;
plex catchment system. The performance of M5 model trees • training of M5 model trees is much faster than ANN and
is comparable to that of the widely used MLP ANNs. always converges; and
2. The advantageous features of M5 model trees if compared to • the knowledge encapsulated in a model tree may also help
ANN are: in parameters selection and assessing their relationships

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 499


for other models, such as a conceptual hydrological model The writers are also grateful to the anonymous reviewers for the
or an ANN. useful comments.
3. The general prediction performance of the M5 model trees
and ANN are good; the inaccuracies for the peak of some
special flood events are mainly due to data-related problems, References
which include:
• the unavailability of discharge data in a tributary Zhuguan Abrahart, R. J., See, L., and Kneale, P. E. (2001). “Applying saliency
River that can generate flash flood with a higher peak analysis to neural network rainfall-runoff modelling.” Comput.
discharge; Geosci., 27, 921–928.
• 24 h daily averaged data are too coarse to capture the Becker, A., and Kundzewicz, Z. W. (1987). “Nonlinear flood routing with
rapid changes of rainfalls and discharges; and multilinear models.” Water Resour. Res., 23, 1043–1048.
• the heterogeneity of rainfall distribution in rather large Bhattacharya, B., and Solomatine, D. P. (2002a). “Application of artificial
catchments and the improper accounting for the subcatch- neural network in reconstructing stage-discharge relationship.” Proc.,
ment rainfall. 4th Int. Conf. on Hydroinformatics, Iowa.
4. The inputs for M5 model trees are mainly selected according Bhattacharya, B., and Solomatine, D. P. (2002b). “Application of artificial
to the correlation analysis, which works very well. The pre- neural networks and M5 model trees to modelling stage-discharge
diction can be improved by using hydrological knowledge to relationship.” Proc., 2nd Int. Symp. on Flood Defence, Beijing. B. S.
refine the selection of inputs further and by a modular model Wu, Z. Y. Wang, G. Q. Wang, G. H. Huang, H. W. Fang, and J. C.
Huang, eds., Science Press, New York.
approach that uses the rules offered by a hydrology expert.
Birikundavyi, S., Labib, R., Trung, H. T., and Rousselle, J. (2002). “Per-
Such an approach would make it possible to filter out the formance of neural networks in daily streamflow forecasting.” J. Hy-
flood samples with special hydrological characteristics, and drologic Eng., 7(5), 392–398.
then using the M5 model trees to classify the samples into Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984).
more refined classes. The experiments with the so-called Classification and regression trees, Wadsworth, Belmont, Calif.
M5flex algorithm are reported by Solomatine and Siek “Data-driven modelling.” (2003). 具http://datamining.ihe.nl典 (Aug. 22,
(2003). 2003).
5. Using a hybrid model approach combining the M5 model Dawson, C. W., and Wilby, R. (1998). “An artificial neural network ap-
trees and ANN allowed for further accuracy improvements. proach to rainfall-runoff modelling.” Hydrol. Sci. J., 43(1), 47–66.
First the M5 model trees were used to classify the data into Dibike, Y. B., and Solomatine, D. P. (2001). “River flow forecasting using
different classes and make predictions for most of them, and artificial neural network.” Phys. Chem. Earth, 26(1), 1–7.
Dibike, Y. B., Velickov, S., Solomatine, D. P., and Abbott, M. B. (2001).
then ANN was used to forecast using the classified data set as
“Model induction with support vector machines: Introduction and ap-
input, to find the nonlinear relation in the data. plications.” J. Comput. Civ. Eng., 15(3), 208–216.
The following recommendations could be given: Friedman, J. H. (1991). “Multivariate adaptive regression splines.” Ann.
1. In the situations when data-driven models are used to predict Stat., 19, 1–141.
flash floods the data of higher frequency is needed. Govindaraju, R. S., and Rao, A. R., ed. (2000). Artificial neural networks
2. Since the relation between the flood peak discharge and the in hydrology, Kluwer Academic, Dordrecht, The Netherlands.
preceding rainfall is highly nonlinear it would be useful to Haykin, S. (1999). Neural networks: a comprehensive foundation, 2nd
use nonlinear models like ANNs or support vector machines Ed., Prentice-Hall, Upper Saddle River, N.J.
(Dibike et al. 2001) in some of the branches of the M5 Hsu, K., Gupta, H. V., and Sorooshian, S. (1995). “Artificial neural net-
trees—this requires the modification of the M5 algorithm. work modeling of the rainfall-runoff process.” Water Resour. Res.,
The possibility of a better smoothing algorithm between the 31(10), 2517–2530.
linear models of M5, for example using fuzzy methods, Kompare, B., Steinman, F., Cerar, U., and Dzeroski, S. (1997). “Predic-
tion of rainfall runoff from catchment by intelligent data analysis with
should be investigated as well.
machine learning tools within the artificial intelligence tools.” Acta
3. Machine learning techniques like M5 model trees and ANNs Hydrotechnica, 16, 16 (in Slovene).
can complement more traditional physically based models Minns, A. W., and Hall, M. J. (1996). “Artificial neural networks as
and expert judgments, but they cannot be used when a catch- rainfall-runoff models.” Hydrol. Sci. J., 41(3), 399–417.
ment undergoes considerable changes; for example due to “NeuralMachine.” (2003). Tools for neural networks modeling and global
urbanization. The combination of both types of models is a and evolutionary optimization. 具http://www.data-machine.com典 (Mar.
recommended approach to flood modeling. 8, 2004).
More information on the use of machine learning tools in “NeuroSolutions.” (2003). Neural Dimension Inc., 具http://www.nd.com典
water-related issues can be found on the Web site on data-driven (Mar. 8, 2004).
modeling (“Data-driven modeling” 2003). Quinlan, J. R. (1992). “Learning with continuous classes.” Proc., 5th
Australian Joint Conf. on Artificial Intelligence, Adams & Sterling,
eds., World Scientific, Singapore, 343–348.
See, L., and Openshaw, S. (1998). “Using soft computing techniques to
Acknowledgments
enhance flood forecasting on the river Ouse.” Proc., 3rd Int. Conf. on
Hydroinformatics, Copenhagen, Balkema, Rotterdam, The Nether-
The writers acknowledge the role of the Huai River Water Re-
lands.
source Commission of the Minister of Water Resource, China for Setiono, R., Leow, W. K., and Zurada, J. M. (2002). “Extraction of rules
the provision the data in this research, and the Dutch Embassy in from artificial neural networks for nonlinear regression.” IEEE Trans.
China for its financial contribution to the group training of Chi- Neural Netw., 13(3), 564–577.
nese participants at IHE Delft in 1999–2001. Part of this work Solomatine, D. P. (2002). “Applications of data-driven modelling and
was performed in the framework of the project “Data mining, machine learning in control of water resources.” Computational intel-
knowledge discovery and data-driven modelling” of the Delft ligence in Control, M. Mohammadian, R. A. Sarker, and X. Yao eds.,
Cluster research program supported by the Dutch government. Idea Group Publishing, 197–217.

500 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004


Solomatine, D. P., and Dulal, K. (2003). “Model tree as an alternative to Proc., 2nd Int. Conf. on Hydroinformatics, Zurich, Switzerland, 201–
neural network in rainfall-runoff modeling.” Hydrol. Sci. J., 48(3), 206.
399–411. Witten, I. H., and Frank, E. (2000). Data mining, Morgan Kaufmann, San
Solomatine, D. P., and Siek, M. B. (2003). “Flexibility and optimality in Francisco.
M5 model trees.” Proc., 3rd Int. Conf. on Hybrid Intelligent Systems Zhao, R. J., and Liu, X. R. (1995). “The xinanjiang model.” Computer
(HIS’03), Melbourne, Australia. models of watershed hydrology, V. P. Singh, ed., Water Resources
Solomatine, D. P., and Torres, L. A. (1996). “Neural network approxima-
Publications, Littleton, Colo., 215–232.
tion of a hydrodynamic model in optimizing reservoir operation.”

JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / NOVEMBER/DECEMBER 2004 / 501

You might also like