Hydro Soil
Hydro Soil
1029/2011WR010489, 2012
W01535 1 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
from instrumental data to form new stochastic rainfall continuous rainfall sequences at any desired location.
sequences. Detailed testing of these algorithms is conducted with an
[4] A limitation of many of these approaches is the need emphasis on evaluating the extent to which the methods cap-
for long, high-quality subdaily rainfall records as the basis ture both the distribution of extreme rainfall and the anteced-
for parameter estimation in the case of the multifractal ent rainfall leading up to the extreme event, reflecting the
and Poisson class of models, or for drawing the subdaily likely applicability of these techniques for flood estimation
‘‘fragments’’ in the case of the nonparametric algorithms practice.
described above. This is particularly unfortunate given that [8] The remainder of this paper is structured as follows.
the absence of long continuous rainfall records provides In section 2 we provide an overview of Australia’s continu-
one of the principal justifications for continuous simulation, ous rainfall record. This is followed in section 3 by a descrip-
with the solution to this problem usually involving the de- tion of the proposed methodology, including the statistics
velopment of ‘‘regionalized’’ approaches that make use of used to determine the similarity between daily/subdaily
subdaily data within a broader spatial domain in the vicin- rainfall relationships at any two locations. Results are pre-
ity of the location of interest. sented in section 4, including a preliminary analysis of the
[5] The majority of work on such regionalized approaches viability of the method at Sydney Airport, Australia, as
has focused on the Poisson cluster family of models. For well as more detailed results for five case study locations
example, Cowpertwait et al. [1996] and Cowpertwait and distributed throughout Australia. Finally, a discussion and
O’Connell [1997] developed a regionalized Neyman-Scott conclusions are provided in section 5.
Rectangular Pulse (NSRP) model for generating sequences
of hourly rainfall data across the UK, by regressing the 2. Data
NSRP parameters on site variables obtained from a relief [9] Continuous subdaily rainfall data were obtained from
map of the UK (including : elevation, north-south distance, the Australian Bureau of Meteorology (www.bom.gov.au) at
east-west effect, and distance to coast). Cowpertwait et al. 1397 stations, in increments of 6 minutes. The location of
[1996] also developed a disaggregation model that allows each gaging station is shown in Figure 1, together with an indi-
historical or generated hourly data to be disaggregated into cation of the length of record. The median record length of all
totals for shorter time intervals. An alternative approach stations was 9 yr, with only 101 stations having records longer
was proposed by Gyasi-Agyei [1999], who developed a than 40 yr, and an additional 331 stations have records of
regionalized version of the Gyasi-Agyei and Willgoose between 20 and 40 yr. Furthermore, the spatial distribution of
hybrid model based on the nonrandomized Bertlett-Lewis the gaging stations is not homogeneous, with a high density of
rectangular pulse and an autoregressive jitter [Gyasi-Agyei gages in the populated regions particularly along the eastern
and Willgoose, 1997, 1999]. This approach uses observed coastal fringe of Australia, and lower density elsewhere. In
daily statistics (namely dry probability, mean, and variance) contrast, there are 17,451 daily-read gaging stations in Aus-
and two regionalized subdaily parameter estimates, with tralia, of which 2708 locations stations have records longer
promising results found in simulating subdaily rainfall in than 20 yr, and 1768 stations which have more than 40 yr of
central Queensland, Australia. This model was extended to record. This asymmetry in data availability between daily
Australia-wide data by Gyasi-Agyei and Parvez Bin Hahbub and subdaily records highlights the potential benefits of
[2007], and was found to be successful in simulating a range developing a regionalized disaggregation approach using the
of statistics including extreme rainfall. conditional relationship between daily and subdaily rainfall.
[6] In our two articles, we present an alternative region- [10] The number of gaging stations with continuous
alized framework for generating continuous subdaily rain- rainfall records are plotted against the year of record in
fall sequences, drawing on the nonparametric resampling Figure 2. As can be seen, only a small number of gaging
approaches developed by Lall and Sharma [1996] and a stations were available in the early twentieth century (the
novel approach at defining regional similarity. Specifically, longest available record in Australia being from Melbourne
in this paper a nonparametric disaggregation approach will Regional Office, gage number 086071, with data from
be described, in which subdaily rainfall ‘‘fragments’’ are 1873), with significant increases in recording density appa-
randomly sampled from nearby pluviograph stations condi- rent in the 1960s. To limit the effects of possible temporal
tional on daily rainfall amounts at the location of interest. variability in the daily/subdaily characteristics, the remain-
This is one of the first regionalized extensions to the der of the paper only considers records between 1970 and
method of fragments logic, and substantially expands the 2005 with less than 20% of the record classified as ‘‘miss-
applicability of the method of fragments due to the relative ing,’’ with a total of 232 stations meeting this criterion.
abundance of high-quality daily rainfall data compared to ‘‘Missing’’ data was defined as data which was flagged as
subdaily rainfall records. We also modified the method of either missing or presented as an accumulation over previ-
fragments logic to also consider previous- and next-day ous time steps, and in these cases the full day of record was
wetness stages, with this modification improving the conti- removed from the analysis. As will be discussed further
nuity of the resampled-subdaily fragments, and it is made below, the proposed method is relatively insensitive to
possible because of the greater sample size brought about missing data.
by using multiple nearby records.
[7] In the second paper, an algorithm is developed for 3. Methodology
generating daily rainfall sequences at ungaged locations,
once again being informed by data from nearby gaged loca- 3.1. Regionalized Method of Fragments Algorithm
tions. The combination of these two algorithms allows for a [11] The method of fragments is a well-known resam-
complete regionalized framework for generating point-based pling algorithm for generating continuous rainfall sequences
2 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 1. Spatial coverage and record length of the Australian subdaily pluviograph record.
[Lall and Sharma, 1996; Nowak et al., 2010; Sharma and wet or dry. As will be discussed later, this second modification
Srikanthan, 2006; Sharma et al., 1997; Snavidze, 1977; partially overcomes an issue with the conventional method of
Tarboton et al., 1998]. In this paper, we make two modifica- fragments related to continuity of the resampled-subdaily
tions to enable the method to be applied in a regionalized rainfall fragments when there are successive wet days, and
setting. The first and most important modification involves is made possible here because of the greater sample size
the development of a regionalized version in which, condi- due to use of a larger number of nearby stations.
tional on daily rainfall at the location of interest, fragments are [12] The algorithm for the adjusted method of fragments
sampled from a range of ‘‘nearby’’ locations. The second mod- is presented here. The approach is also illustrated in Fig-
ification involves including a ‘‘state-based’’ logic in which ure 3, with the steps in the algorithm matching the steps
fragments are drawn not only conditional on daily rainfall highlighted in the figure. The algorithm :
amounts, but also on whether the previous and next day are Step 1: Obtain a sequence of daily rainfall Roi at the loca-
tion of interest, where subscript i indexes time and the
superscript ‘‘o’’ refers to the target location (the location at
which the continuous rainfall sequences are sought). The
daily rainfall sequence can be obtained either from a histor-
ical record of daily rainfall at the target location, or alterna-
tively from a daily stochastic generation algorithm such as
the one described in our second paper.
Step 2: Obtain daily rainfall sequences at a range of
‘‘nearby’’ subdaily rainfall gages, via:
X
Rsi ¼ s
Xi;m ; (1)
m
s
where Xi;m represents the rainfall depth on day i and at sub-
daily time step m, at nearby station s. For the present study
we have subdaily rainfall available in increments of 6 min,
such that m 2 {1, . . . , 240}. We also obtain the subdaily
s s
fragments given by fri;m ¼ Xi;m /Rsi , which is a dimension-
less version of the subdaily rainfall record.
Step 3: For each wet day Rot > 0, search for days with
similar daily rainfall depth across all nearby stations s ¼ 1,
Figure 2. Number of Australia-wide pluviograph records . . . , S, where S represents the total number of nearby sta-
against year of record, plotted from 1900. tions, across every year for which subdaily data is
3 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 3. Illustration of the state-based method of fragments algorithm. The indicated steps correspond
to the steps of the algorithm in section 3.1.
available. For example, if we consider 20 nearby stations [13] This completes the algorithm for the regionalized
which each have an average length of record of 9 yr, this method of fragments. In total, there are three ‘‘tuning’’ pa-
would amount to a total of 180 yr of record. To preserve rameters : the number of nearby stations S to include in the
seasonality, we only look within a moving window of 615 model, the number of nearest neighbors k, and the width of
days centered on day t. In other words, if t ¼ 45 (14 Febru- the moving window. Given the large amount of variability
ary), we search for days from t ¼ 30 to t ¼ 60. Further- in record length from one subdaily rainfall station to the
more, to account for continuity across the boundaries we next, we let S vary such that sufficient stations were
only look at wet days with the same previous- and next-day selected to have at least 250 yr of record from which to
wetness state (i.e., I[Rsi1 ] ¼ I[Rot1 ] and I[Rsiþ1 ] ¼ I[Rotþ1 ]), sample. The value of k ¼ 10 was chosen as this ensured the
where I( ) represents a binary indicator function defined as daily total rainfall for each fragment was within a relatively
I(R) ¼ 1 for a wet day and I(R) ¼ 0 for a dry day. small tolerance of Roi , while still ensuring a significant
Step 4: We use an index j ¼ 1, . . . , n to refer to days amount of induced sampling variability. Sensitivity to each
which are within the moving window and have the same of these turning parameters was evaluated and found to be
previous- and next-day wetness state, with the total number fairly limited. Finally, the width of the moving window
of days n being calculated across all the nearby stations was selected so as to ensure that samples were all drawn
S and across all years of record at each station. These days from the same time of year.
are ranked by absolute deviation in rainfall depth jRsj – Rot j, [14] Although the overall approach is conceptually sim-
to construct a sorted series RsðjÞ from the smallest absolute ple, the challenge is to define the neighborhood from which
deviation to the largest, where the use of parentheses indi- to sample the S pluviograph records. The basis for identify-
cates that the data has been sorted. We find the k nearest ing whether the daily-to-subdaily scaling at two locations is
neighbors (j) ¼ 1, . . . , (k), with the value of k selected to similar and thus substitutable is described below.
ensure all the neighbors have an absolute deviation in rain- 3.2. Daily-to-Subdaily Scaling
fall depth of less than 10% of the at-site rainfall, up to a
maximum of 10 nearest neighbors. [15] To enable substitution of subdaily fragments from
Step 5: Randomly draw from RsðjÞ with probability: one station to another, one needs to ensure that for any day
t, the conditional relationship between the daily rainfall
amount Rt and the full sequence of subdaily rainfall Xi,m
1=ðjÞ
PðjÞ ¼ Xk ; (2) are statistically similar at both the target station and the
i¼1
1=i nearby stations. This can be expressed as,
s
where P(j) represents the probability of selecting neighbor f ðXi;m jRst Þ ¼ f ðXi;m
o
jRot Þ (3)
(j) [Lall and Sharma, 1996; Mehrotra and Sharma, 2006].
The selected fragment is then inserted into day Rot via for all m and t, where f(.j.) is used to express a conditional prob-
s
fri;m ¼ X(j),m Rot . ability density function. Given the difficulty of constructing
4 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
separate conditional density functions for 240 separate [19] In combination, these scalar attributes are expected
increments of subdaily rainfall, as well as the fact that to cover most of the information on the scaling and timing
for any wet day Rt there is a high probability that any behavior between daily rainfall and the fragments.
subdaily rainfall increment Xt,m has no rainfall, we mod- [20] To illustrate these concepts, we present in Figure 4
ify equation (3) as follows: the joint probability plot of daily rainfall and the maximum
12-min storm burst at three locations in Australia : Hobart,
f ðYts jRst Þ ¼ f ðYto jRot Þ; (4)
Sydney, and Darwin. These locations were selected as they
have distinctly different climatology, with Hobart located
in the south of Tasmania being one of the most southerly
where Yts and Yto represent scalar attributes of Xt;m
s o
and Xt;m pluviograph records, Darwin in the Northern Territory
for each day of record, respectively. The attributes to be being one of the most northerly pluviograph records, and
considered include: Sydney being situated along the Australian east coast.
[16] Maximum intensity: for each wet day, what is the [21] As can be seen in the daily rainfall histogram
maximum 6-, 12-, 30-, 60-, 120-, 180-, and 360-min dura- (Figure 4, lower panel), the marginal probabilities of daily
tion storm burst expressed as a fraction of the total rainfall rainfall at each station are distinctly different. For example,
amount for that day? Darwin has a high probability of high daily rainfall
[17] Fraction of zeros: for each day, what is the fraction amounts (the majority of rain days having >10 mm rain-
of 6-min time steps with no rainfall? fall), whereas Hobart has a large number of rain days with
[18] Maximum intensity timing: for each wet day, what is relatively little rainfall, with most days having significantly
the time of day when the maximum 6-, 12-, 30-, 60-, 120-, less than 10 mm over the entire day. It should be empha-
180-, and 360-min duration storm burst occurs? sized, however, that our interest here is not on this marginal
Figure 4. Scatterplot with daily rainfall and an attribute of subdaily rainfall (the maximum 12-min
storm burst expressed as a fraction of the total daily rainfall) at three locations in Australia : Hobart
(blue), Sydney (green), and Darwin (red). Histograms of daily rainfall and the maximum 12-min storm
burst are provided in the bottom and left figure panels, respectively, for each of the three locations. The
solid lines are loess smoothers of the observations, and are provided for visualization purposes only.
5 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
distribution; rather, we wish to know, conditional on some The two-dimensional K-S statistic D is the maximum
daily rainfall amount, whether the subdaily rainfall proper- difference (ranging over both data points and quadrants)
ties are the same at any two locations. To determine of the integrated probabilities, and is given by [Press et al.,
whether this is the case, we started by plotting a loess 1992]:
smoother [Hastie et al., 2009] with support of 25% of the 0 1
sample to represent the conditional expected value of the pffiffiffiffi
B C
maximum 12-min storm burst as a function of daily B ND C
rainfall. Pr ðD > observedÞ ¼ QKS B
B sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C;
C (6)
@ 0:75 A
[22] It is evident that the fraction of daily rainfall con- 1 þ 1 r 0:25 pffiffiffiffi
2
6 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Table 1. Predictors Used for the Logistic Regression Model Described in Equations (9) and (10)a
Predictor Units Description/Comments
Diff_lat Degrees (expressed as a decimal) Difference in latitude between each station pair, calculated as abs(Lat1–Lat2)
Diff_lon Degrees (expressed as a decimal) Difference in longitude between each station pair, calculated as abs(Lon1–Lon2)
Diff_lat Diff_lon Degrees (expressed as a decimal) Interaction term, which would be greater than zero if it is the distance between
stations, rather than the sum of the latitude and longitude, which is the dominant
predictor.
Diff_dist_coast Dimensionless Difference in distance to coast between each station pair, normalized by the
(normalized) average distance to coast for the station pair, calculated as abs(dist1–dist2)/
mean(dist1, dist2).
Diff_elev Meters Difference in elevation between each station pair, calculated as abs(Elev1–Elev2)
a
The prefix ‘‘Diff_’’ emphasizes that it is the difference in each of the predictors between stations that is considered, rather than the absolute value.
length 26,796 (where u 2 {0, 1} represents the cases where model to a multivariate logistic regression setting to con-
the scaling between daily and subdaily rainfall at two sta- sider the influence of each of the plausible predictors men-
tions are statistically different and similar, respectively, as tioned above. The conceptual basis for this approach is
calculated by the Kolmogorov-Smirnov test described in illustrated in Figure 6. Given a target location of interest,
section 3.2). This relationship can be modeled using a we wish to define a zone for which the probability that
logistic regression, in which daily-to-subdaily scaling at two stations are statistically
similar is greater than a predefined threshold. This zone is
ez described by contours of equal probability, with the proba-
Pr ðu ¼ 1Þ ¼ logitðzÞ ¼ (9)
ez þ1 bility decreasing linearly (in the logistic transformed space)
in each of the dimensions of the regression model. The
transforms the continuous predictor variables to the range shapes of the contours are defined by the logistic regression
[0,1] as required when modeling a binomial response. In coefficients. In the idealized example in Figure 6, we repre-
this equation, z is defined as sent the case where the probability of two stations being
statistically similar decreases at a faster rate in the latitude
z ¼ 0 þ 1 v1 þ . . . þ 5 v5 ; (10) dimension compared to the longitude dimension. Further-
more, the location of the target station is slightly offset
with representing the regression coefficients. The results from the center of the contours, this being governed by the
of the logistic regression model are shown in Figure 5, influence of the relative difference in distance to the coast.
plotted against the difference in latitude. The results are [32] The results of this multivariate regression are pre-
presented for four attributes of subdaily rainfall: 6 min sented in Table 2, and once again plotted for the summer
maximum storm burst, 1 h maximum storm burst, fraction
of day with no rainfall, and time of day with the maximum
6-min storm burst. Note that for the time attribute, we are
only considering the marginal distribution of the time of
day when the maximum 6-min storm burst occurs, rather
than a joint density.
[30] As can be seen in Figure 5, with the exception of the
fraction of zeros measured by the K-S statistic, there is a
chance between 40% and 60% that the joint distribution of
daily rainfall and each of the attributes are statistically sim-
ilar provided that the difference in latitude is small, with
the probability decreasing rapidly with increasing differ-
ence in latitude. This is interesting, as no account is made
of any other physiographic information, so that stations
may be located in opposite sides of the continent, or at very
different elevations, and yet still have close to a 50%
chance of having the same scaling between daily and sub-
daily rainfall, provided the latitude is the same. The joint
distribution of daily rainfall and fraction of zeros has the
lowest probability of being statistically similar between sta-
tion pairs, with a chance of 22% that two stations will
have the same joint dependence, assuming they are at the
same latitude. Figure 5. Logistic regression results against a single pre-
[31] Consideration of just a single covariate – difference dictor (difference in latitude) and four responses represent-
in latitude – as in latitude as the only factor influencing the ing different subdaily attributes. The responses have been
similarity between stations ignores other physiographic in- calculated using the two-sample two-dimension Kolmo-
formation which may be important. As such, we extend this gorov-Smirnov test statistic.
7 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 6. Diagrammatic representation of logistic regression results. The response is the probability
that the joint distribution of daily rainfall amount and some attribute of subdaily rainfall at a ‘‘nearby’’
station is statistically similar to the target station. The predictors are the difference in latitude, longitude,
latitude longitude, elevation, and a normalized distance to coast, with the logistic regression coeffi-
cients determining the relative decrease in the probability that two stations are similar in each of these
dimensions.
months against latitude in Figure 7, with the remaining pre- only a chance of 40% that two stations have the same scal-
dictors held at zero. As can be seen, the results in Figure 7 ing, assuming all the predictors are zero.
show notable improvements in the probability that two sta- [33] It should be emphasized that this is in many ways a
tions are equal compared to Figure 5, because we are now conservative estimate as we consider the subdaily attributes
plotting the influence of latitude assuming that differences (e.g., fraction of zeros, 6-min rainfall intensity), which are
in longitude, elevation, and relative distance to coast are all the most challenging to capture from daily data alone. Even
zero. In fact, with the exception of the fraction of zeros, the more importantly, as can be seen in the example of Figure 4,
results show that for small values of each of the predictors the number of samples in each bivariate distribution is large
there is between a 60% and 70% probability that the daily- (30 years of data, 90 days per season, and 30% of days
to-subdaily joint probability distributions are statistically being wet days yields 800 wet days), such that the 95%
similar. Once again, the fraction of zeros is the most chal- confidence intervals are very narrow (as the width of the
lenging statistic in terms of maintaining similarity, with confidence intervals is governed by sample size).
8 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 8. Sydney Airport (large red dot) and nearby pluviograph stations (blue and brown dots). The
highest-ranked 13 pluviograph stations (totaling 250 yr of pluviograph data) based the full logistic
regression model are shown as brown dots, with the associated ranking.
9 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
to Sydney Airport) are those which are most proximate to sequences tend to overestimate rainfall for all exceedance
this station, generally within a small distance to coast, and probabilities, and for Hobart in which the simulated sequen-
all are at low coastal elevations. In this case, therefore, the ces underestimate the low exceedance probability rainfall
stations appear to be selected over a wide range of lati- events. Interestingly, this is observed for both results using
tudes, which is probably due to the strong increases in ele- at-site data and nearby station data, highlighting that the
vation and relative distance to the coast with changing issue is unlikely to be related to the regionalization proce-
longitude. dure. In fact, a more thorough examination indicates that
the annual maxima of the daily rainfall obtained using the
4.2 Model Evaluation daily rainfall record is on average slightly higher than the
[43] We now repeat the process of identifying nearby annual maxima of daily rainfall obtained from the subdaily
stations at five locations across Australia each having more rainfall record, due to the daily record being more complete
than 50 yr of pluviograph data, representing a diversity of (i.e., with less missing days) than the subdaily record. The
climate zones. These stations are shown in Table 3. Having issue is particularly notable for Alice Springs and Hobart,
identified the pool of nearby stations from which to draw which both have a significant percentage of the pluviograph
the fragments, we apply the approach described in algo- record classified as missing, such that resampling the sub-
rithm 1 to draw subdaily rainfall fragments from nearby daily fragments conditional to the daily rainfall record
stations conditional on at-site daily rainfall, and compare would be expected to yield simulated series which on aver-
these sequences to the at-site pluviograph records. For age have higher annual maximum rainfall at both daily and
comparison purposes, we also generated results using the subdaily durations.
algorithm but with at-site results only, and presented these [47] In addition to this issue, it was noted that the maxi-
alongside the regionalized results. mum largest 6-min value for the Hobart Airport record was
[44] It is emphasized that the use of a disaggregation well in excess of the simulated results, with this being no-
model derived using observed daily rainfall sequences ticeable for both the at-site and regionalized results. In par-
implies that the daily- and longer-timescale statistics will ticular, the maximum-recorded 6-min storm burst was
be identical to the observational data set. As such, the selec- 23.14 mm occurring on 24 April 1972, representing a very
tion of evaluation statistics should focus on the capacity intense storm burst for such a high latitude. Aggregating
of the model to simulate rainfall at subdaily timescales. the pluviograph record for that full day showed 192.2 mm
Reflecting the likely application of this model for flood esti- falling, which contrasted with the daily station at the same
mation, the statistics considered here are based on: whether location recording only 42.2 mm for that day. We also
the model is capable of reproducing the extreme rainfall in- examined the nearest pluviograph and daily-read station
tensity; and whether the model captures the antecedent pairing, namely gage number 94029 located 15.6 km from
rainfall prior to the flood-producing rainfall event. In addi- the Hobart Airport gage, and found the aggregated daily
tion, several statistics have been calculated to determine the rainfall from the pluviograph to be 27.94 mm, compared
connectivity of rainfall events between successive wet with 27.9 mm from the daily rain gage at that same loca-
days. tion. Furthermore, the maximum 6-min increment rainfall
[45] Considering first the annual maxima statistics, we intensity was found to be 1.74 mm, substantially smaller
present in Figure 9 a plot of the annual maximum 6-min than that recorded at Hobart Airport. This therefore indi-
rainfall against the exceedance probability for both the cates that a recording error probably occurred at the pluvio-
observed data at the target location, as well as the results of graph gage at Hobart Airport.
100 simulation runs with the same length of series as the [48] For both reasons, we suggest that the simulated
original target pluviograph time series. The left column results in this case may be more likely to reflect the precipi-
represents the results using at-site data as the basis for tation patterns at each location compared with the observed
resampling, while the right column represents results using subdaily record at those same locations, although this con-
data from nearby records. The median and the 5th and 95th clusion is unlikely to apply everywhere. Comparing the
percentiles are calculated empirically from these 100 simu- sampling intervals for the at-site and regionalized results, it
lation runs, with the 5th and 95th percentile values meas- can be seen that the regionalized intervals are generally
uring the degree of sampling variability induced by the smoother, and tend to widen for higher events. In contrast,
stochastic generation algorithm. the at-site results tend to have narrower sampling intervals
[46] As can be seen, the observed data is generally for the events with the lowest exceedance probabilities,
within the sampling interval for most of the stations, with reflecting the small sample size from which to draw sub-
the exception of Alice Springs, for which the generated daily fragments for these large events. Therefore, rather
10 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 9. Six-minute annual maximum rainfall against exceedance probability for (a) Sydney,
(b) Perth, (c) Alice Springs, (d) Cairns, and (e) Hobart. Black dots represents observed data, black solid
line represents the median of 100 simulations, and black dotted lines represent the 5th- and 95th-percentile
simulated values.
11 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
than resulting in a deterioration in performance, it is likely [51] As can be seen, the simulated data appear to follow
that the regionalized version actually provides a better rep- the observed data reasonably well, although there are sev-
resentation of the sampling intervals for these very large eral points outside the 90% sampling interval. Importantly,
events. no systematic biases could be identified, with performance
[49] In addition to the results presented in Figure 9 for varying depending on the location. This is also shown in
the 6-min duration annual maxima, we also tabulated the the lower half of Table 4 with the antecedent rainfall of dif-
results for other durations up to 12 h, presented in Table 4. ferent durations prior to the 1-h storm burst. The observed
Once again, the observed and simulated sequences are gen- antecedent rainfall is generally within the 90% sampling
erally similar, with the median-sampled value within 10% interval, with the exception of Cairns in which antecedent
of the observed value, and no obvious systematic under- or rainfall is underestimated for 6-h depth prior to the 1-h
overestimation biases. The exception here is for Hobart, in storm burst, and overestimation for longer durations. Once
which the annual maxima are typically undersimulated by again, the main outlier is for Hobart Airport, however, as
10%–20%. It should be noted, however, that the observed discussed in the context of the annual maxima this is likely
rainfall often falls outside of the 5th- and 95th-percentile due to a recording error for the pluviograph record.
simulation bounds, highlighting that the simulation bounds [52] Finally, we present results addressing the connectiv-
may underestimate the true level of variance. Finally, ity in rainfall events between successive wet days. This is a
although the results are only presented in Table 4 for the potential issue with the conventional method of fragments
regionalized method of fragments, the results from the at- logic described in the literature [Lall and Sharma, 1996;
site implementation are comparable, again highlighting that Nowak et al., 2010; Sharma et al., 1997; Snavidze, 1977;
the regionalized method of fragments does not result in any Tarboton et al., 1998], as the subdaily fragments are, in
notable deterioration in model performance. effect, randomly reordered such that by definition all of the
[50] We next consider the antecedent rainfall prior to the within-day rainfall characteristics will be preserved, but the
design storm burst event, plotted in Figure 10. The justifi- between-day characteristics will be lost other than ensuring
cation for focusing on the antecedent rainfall exceedance that the daily total rainfalls are maintained. This was one of
probability plot was because of the often important rela- the primary justifications for using the state-based method
tionship between the ‘‘flood-producing’’ rainfall event and of fragments, in which the fragments are selected condi-
the catchment wetness prior to the event [Kuczera et al., tional on both current day wetness and the previous and
2006]. We only focus on the 6-h antecedent rainfall depth, next-day wetness state.
as antecedent conditions for longer durations (particularly, [53] The results are presented in Table 5 for all five test
multiday antecedent rainfall depth) will be correctly cap- locations. For each location, the first four rows represent
tured as we are using observed daily rainfall data at the the probability that the last hour of day t (represented as
location of interest. Xt,24) is wet or dry given that the next day is wet or dry.
Table 4. Comparison of Observed and Simulated Results for Median Annual Maxima for Different Storm Burst Durations and Antecedent
Rainfall Prior to 1 h Storm Bursta
Sydney Perth Alice Springs Cairns Hobart
Annual Maxima
6 min 8.9 8.8 6.2 6.2 5.5 6.8 11.6 11.8 4.5 3.8
(8.14–9.32) (5.77–6.81) (6.32–7.2) (11.15–12.65) (3.4–4.12)
30 min 25.7 23.7 14.7 14.0 16.7 18.2 34.9 35.3 11.3 8.9
(21.95–25.92) (13.08–15.26) (17.09–19.45) (33.96–37.05) (8.22–9.55)
1h 35.4 32.6 18.8 18.4 22.1 24.2 51.7 51.9 14.6 12.0
(30.04–35.45) (16.95–19.72) (22.5–25.75) (49.76–54.79) (11.26–12.85)
3h 55.4 49.4 29.0 27.9 32.6 33.6 83.5 85.1 22.9 19.5
(46.46–52.47) (26.18–29.89) (31.42–35.16) (81.33–89) (18.54–20.56)
6h 72.3 64.0 36.3 35.4 39.6 39.8 113.0 110.8 30.3 26.5
(61.08–67.09) (34.09–37.53) (37.65–41.42) (106.12–114.22) (25.69–27.75)
12 h 91.8 84.8 45.4 44.5 48.2 46.5 147.4 140.7 39.6 35.3
(81.92–87.2) (43.46–45.63) (45.42–47.65) (137.24–144.39) (34.58–36.26)
12 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Figure 10. Six hour antecedent rainfall prior to the 6-min annual maximum storm burst plotted against
exceedance probability for (a) Sydney, (b) Perth, (c) Alice Springs, (d) Cairns, and (e) Hobart. Black
dots represents observed data, black solid line represents the median of 100 simulations, and black dotted
lines represent the 5th- and 95th-percentile simulated values.
13 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Sydney Airport
Pr(Xt,24 > 0, Rtþ1 > 0 j Rt > 0) 19.9% 13.1% 18.3%
Pr(Xt,24 > 0, Rtþ1 ¼ 0 j Rt > 0) 5.2% 11.2% 6.4%
Pr(Xt,24 ¼ 0, Rtþ1 > 0 j Rt > 0) 28.0% 34.9% 29.7%
Pr(Xt,24 ¼ 0, Rtþ1 ¼ 0 j Rt > 0) 46.9% 40.8% 45.6%
Pr(Xt,24>0, Xtþ1,1 > 0 j Rt > 0, Rtþ1 > 0) 31.7% 6.7% 14.7%
Perth Airport
Pr(Xt,24 > 0, Rtþ1 > 0 j Rt > 0) 21.6% 15.9% 21.3%
Pr(Xt,24 > 0, Rtþ1 ¼ 0 j Rt > 0) 5.2% 11.7% 7.4%
Pr(Xt,24 ¼ 0, Rtþ1 > 0 j Rt > 0) 30.9% 36.5% 31.1%
Pr(Xt,24 ¼ 0, Rtþ1 ¼ 0 j Rt > 0) 42.3% 35.9% 40.2%
Pr(Xt,24>0, Xtþ1,1 > 0 j Rt > 0, Rtþ1 > 0) 26.7% 6.8% 14.3%
15590
Pr(Xt,24 > 0, Rtþ1 > 0 j Rt > 0) 16.3% 6.5% 11.3%
Pr(Xt,24 > 0, Rtþ1 ¼ 0 j Rt > 0) 6.7% 7.9% 4.0%
Pr(Xt,24 ¼ 0, Rtþ1 > 0 j Rt > 0) 24.8% 34.6% 29.7%
Pr(Xt,24 ¼ 0, Rtþ1 ¼ 0 j Rt > 0) 52.2% 51.0% 55.0%
Pr(Xt,24>0, Xtþ1,1 > 0 j Rt > 0, Rtþ1 > 0) 28.6% 2.4% 8.3%
31011
Pr(Xt,24 > 0, Rtþ1 > 0 j Rt > 0) 21.2% 15.4% 19.8%
Pr(Xt,24 > 0, Rtþ1 ¼ 0 j Rt > 0) 3.6% 6.4% 4.3%
Pr(Xt,24 ¼ 0, Rtþ1 > 0 j Rt > 0) 44.0% 49.9% 45.4%
Pr(Xt,24 ¼ 0, Rtþ1 ¼ 0 j Rt > 0) 31.2% 28.4% 30.5%
Pr(Xt,24>0, Xtþ1,1 > 0 j Rt > 0, Rtþ1 > 0) 20.9% 5.3% 8.9%
94008
Pr(Xt,24 > 0, Rtþ1 > 0 j Rt > 0) 14.0% 9.6% 14.5%
Pr(Xt,24 > 0, Rtþ1 ¼ 0 j Rt > 0) 4.4% 10.4% 5.6%
Pr(Xt,24 ¼ 0, Rtþ1 > 0 j Rt > 0) 30.1% 34.5% 29.6%
Pr(Xt,24 ¼ 0, Rtþ1 ¼ 0 j Rt > 0) 51.4% 45.5% 50.2%
Pr(Xt,24>0, Xtþ1,1 > 0 j Rt > 0, Rtþ1 > 0) 23.6% 4.5% 10.9%
a
The first four rows provide the probability that the last hour of day t (Xt,24) is wet/dry given that the next day t þ 1 is wet/dry, with the probabilities
summing to 100%. The fifth row is the probability that the last hour of day t, and the first hour of day t þ 1 wet.
b
MoF, Method of Fragments.
This has been calculated for the observed record as well as this approach is randomly to draw subdaily fragments from
for the conventional and state-based implementations of nearby pluviograph stations conditional on the daily rain-
the method of fragments logic. As can be seen, the state- fall amount and the previous- and next-day wetness state at
based logic yields a significant improvement compared the target station. The identification of nearby stations is
with the conventional method of fragments. In particular, based on a distance metric which considers latitude and
the probability that the last hour of the day is wet is under- longitude as well as elevation and distance to coast, with
estimated by the conventional method of fragments when the relative importance of each variable determined by
the next day is wet, and overestimated when the next day is looking at the similarity in the daily-to-subdaily scaling at
dry, for all five locations. 232 long pluviograph stations across Australia.
[54] The fifth row then summarizes the probability that [56] The approach sought to address several important
the last hour of day t, and the first hour of day t þ 1, are limitations associated with the Australian pluviograph re-
both wet for successive wet days. As can be seen, this is cord. First, compared to daily rainfall data, there is approxi-
dramatically underestimated for both the conventional and mately one order of magnitude less pluviograph stations, and
state-based method of fragments, highlighting that the tem- the records at each station are usually much shorter than
poral patterns on the boundary between wet days are likely their daily read counterparts. Thus, by combining longer,
to be less continuous for the simulated data compared with more abundant, and more reliable daily data at the target
the observations. Nevertheless, the state-based method of location with the information contained in a number of plu-
fragments provides a significant improvement compared viograph records in the neighborhood of the target location,
with the conventional algorithm, highlighting the advan- it is possible to make the best use of the both types of data.
tages of moving to the state-based logic. Second, by drawing records from multiple nearby pluvio-
graph records rather than relying on a single record, it is also
possible to consider information from records only several
5. Discussion and Conclusions years long, which would usually be discarded as being too
[55] In this paper, a framework was described where short for meaningful analysis. Finally, pluviograph data
continuous (6-min increment) rainfall can be generated at flagged as missing or unreliable can simply be discarded
any location of interest provided that daily data is either from the analysis, even for cases where there is a systematic
available or can be synthetically generated. The basis of bias in the missing data (e.g., pluviograph recording tends to
14 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
fail during major storm events). This is because, provided yield more intense rainfall bursts for a given daily rainfall
the daily rainfall data are reliable, and there are sufficient amount [e.g., see Hardwick-Jones et al., 2010; Lenderink
data at other pluviograph stations to capture a diversity of and van Meijgaard, 2008; Lenderink et al., 2011; Westra
rainfall events across a range of magnitudes, such possible and Sisson, 2011], however, explicitly addressing this issue
systematic pluviograph recording biases are unlikely to be is reserved for future research.
translated into the final synthetically generated sequences. [60] Although daily data is much more abundant than
[57] The evaluation of the method on a range of statistics pluviograph data across Australia, in many regions the
which are relevant for flood estimation, notably the annual length or reliability of daily rainfall may not be sufficient
maximum statistics and the antecedent rainfall prior to the for the stochastic generation of rainfall sequences. This is
flood-producing storm burst, suggests that the method com- the subject of the next paper, in which the approach pre-
pares reasonably well with at-site data for the five test loca- sented here is generalized to any location in Australia,
tions considered. In particular, no significant deterioration in regardless of the availability of daily or pluviograph data.
the results could be observed when moving from the at-site
method of fragments to the regionalized version, suggesting
that the regionalized version properly represents the at-site [61] Acknowledgments. This study was supported by an Australian
Research Council Discovery grant as well as a research grant from the Insti-
variability. Furthermore, it is likely that the sampling inter- tution of Engineers, Australia to help develop continuous rainfall sequences
vals for the regionalized version are likely to more reason- for design flood estimation. The daily and continuous rainfall records used
ably reflect the true variability of the data, with widening were obtained from the Australian Bureau of Meteorology. Finally, we wish
to thank Geoff Pegram and two anonymous reviewers, whose comments
sampling intervals for lower exceedance probability (and and suggestions have greatly improved the quality of the manuscript.
thus higher magnitude) events; although, as discussed in the
context of the results of Table 4, this variability may still be
underestimated. This also highlights that the regionalized References
method is able to provide a much greater diversity of extreme Blazkova, S., and K. Beven (2002), Flood frequency estimation by continu-
ous simulation for a catchment treated as ungauged (with uncertainty),
rainfall sequences (and associated temporal patterns) than Water Resour. Res., 38(8), 1139, doi:10.1029/2001WR000500.
what has been observed at any one point location, with this Boughton, W., and O. Droop (2003), Continuous simulation for design
in turn likely to yield more robust flood-frequency results flood estimation—a review, Environ. Model. Software, 18(4), 309–318.
when the continuous rainfall sequences are run through a Cameron, D., K. Beven, J. Tawn, and P. Naden (2000), Flood frequency
estimation by continuous simulation (with likelihood based uncertainty
continuous rainfall-runoff model. estimation), Hydrol. Earth Syst. Sci., 4(1), 23–34.
[58] We also looked at the connectivity in the temporal Cowpertwait, P. S. P., and P. E. O’Connell (1997), A regionalised Neyman-
patterns between successive wet days, which represents one Scott model of rainfall with convective and stratiform cells, Hydrol.
of the most obvious limitations of the method of fragments Earth Syst. Sci., 1, 71–80.
logic. In general, the state-based logic proposed here results Cowpertwait, P. S. P., P. E. O’Connell, A. V. Metcalfe, and J. A. Mawdsley
(1996), Stochastic point process modelling of rainfall. II. Regionalisation
in a notable improvement in connectivity, although it is and disaggregation, J. Hydrol., 175, 47–65.
clear that the method is unable to reproduce observed con- Cowpertwait, P. S. P., V. Isham, and C. Onof (2007), Point process models
nectivity exactly. Nevertheless, the implications for applica- of rainfall: Developments for fine-scale structure, Proc. R. Soc. A and B,
tions such as flood estimation are unclear. For example, if 463(2086), 2569–2588.
Fasano, G., and A. Franceschini (1987), A multidimensional version of the
the method is able to reproduce within-day temporal pat- Kolmogorov-Smirnov test, Monthly Notices of the Royal Astronomical
terns, preserves annual maximum rainfall and associated an- Society, 225, 155–170.
tecedent conditions, and maintains the daily total rainfall Frost, A. J., R. Srikanthan, and P. S. P. Cowpertwait (2004), Stochastic gen-
depths, then the effect of some discontinuities on flood esti- eration of rainfall data at subdaily timescales: A comparison of DRIP
mates are unlikely to be large. The use of these generated and NSRP, Rep. 04/9, pp. 1813–1819, CRC, Salisbury South, Australia.
Gupta, V. K., and E. C. Waymire (1993), A statistical analysis of mesoscale
sequences as an input for continuous rainfall-runoff model- rainfall as a random cascade, J. Appl. Meteorol., 32, 251–267.
ing would be one way to test this issue, and is an area which Gyasi-Agyei, Y. (1999), Identification of regional parameters of a stochasic
we plan to investigate further. model for rainfall disaggregation, J. Hydrol., 223, 148–163.
[59] We note that like most continuous simulation algo- Gyasi-Agyei, Y., and S. M. Parvez Bin Mahbub (2007), A stochastic model
for daily rainfall disaggregation into fine time scale for a large region,
rithms, the objective of our method is to preserve various J. Hydrol., 347, 358–370.
statistics of historical rainfall variability. We have addressed Gyasi-Agyei, Y., and G. R. Willgoose (1997), A hybrid model for point
nonstationarity in the daily-to-subdaily scaling as a result of rainfall modelling, Water Resour. Res., 33(7), 1699–1706.
seasonal fluctuations by selecting fragments from within Gyasi-Agyei, Y., and G. R. Willgoose (1999), Generalisation of a hybrid
the same season. We do not expect that non-stationarity model for point rainfall, J. Hydrol., 219(3–4), 218–224.
Hardwick-Jones, R., S. Westra, and A. Sharma (2010), Observed relation-
issues due to inter-annual variability of rainfall are likely to ships between extreme sub-daily precipitation, surface temperature and
result in major distortions to the fidelity of the generated relative humidity, Geophys. Res. Lett., 37, L22805, doi:10.1029/
continuous sequences, since much of this variability results 2010GL045081.
in changes to wet day occurrences and daily rainfall Hastie, T., R. Tibshirani, and J. Friedman (2009), The Elements of Statisti-
cal Learning: Data Mining, Inference and Prediction, 763 pp., Springer,
amounts rather than sub-daily temporal patterns [Pui et al., Berlin, Germany.
2011b]. Thus, this should be accounted for by the daily Koutsoyiannis, D., and C. Onof (2001), Rainfall disaggregation using
rainfall simulation algorithm rather than the daily to sub- adjusting procedures on a Poisson cluster model, J. Hydrol., 246(1–4),
daily disaggregation approach. Finally, there is an 109–122.
increased interest in nonstationarity of rainfall (and other Kuczera, G., M. Lambert, T. M. Heneker, S. Jennings, A. J. Frost, and P. J.
Coombes (2006), Joint probability and design storms at the crossroads,
hydroclimatic sequences) as a result of anthropogenic cli- Aust. J. Water Resour., 10(1), 63–80.
mate change [e.g., Milly et al., 2008]. In particular, the Lall, U., and A. Sharma (1996), A nearest neighbour bootstrap for resam-
associated increases in temperature may be expected to pling hydrological timeseries, Water Resour. Res., 32, 679–693.
15 of 16
W01535 WESTRA ET AL.: CONTINUOUS RAINFALL: REGIONALIZED DISAGGREGATION W01535
Lamb, R., and A. L. Kay (2004), Confidence intervals for a spatially gener- Pui, A., S. Westra, A. Santoso, and A. Sharma (2011b), Impact of the El
alized, continuous simulation flood frequency model for Great Britain, Niño Southern Oscillation, Indian Ocean Dipole, and Southern Annular
Water Resour. Res., 40(7), W07501, doi:10.1029/2003WR002428. Mode on daily to sub-daily rainfall characteristics in East Australia,
Lenderink, G., and E. van Meijgaard (2008), Increase in hourly precipita- Monthly Weather Review, in press.
tion extremes beyond expectations from temperature changes, Nat. Rodriguez-Iturbe, I., D. Cox, and V. Isham (1987), Some models for rain-
Geosci., 1, 511–514. fall based on stochastic point processes, Proc. R. Soc. A and B, 410,
Lenderink, G., H. Y. Mok, T. C. Lee, and G. J. Van Oldenborgh (2011), 269–288.
Scaling and trends of hourly precipitation extremes in two different cli- Rodriguez-Iturbe, I., D. Cox, and V. Isham (1988), A point process model
mate zones—Hong Kong and the Netherlands, Hydrol. Earth Syst. Sci., for rainfall: Further developments, Proc. R. Soc. A and B, 417, 283–298.
8, 4701–4719. Schertzer, D., and S. Lovejoy (1987), Physical modelling and analysis of
Lovejoy, S., and D. Schertzer (1990), Multifractals, universality classes, rain and clouds by anisotropic scaling multiplicative processes, J. Geo-
and satellite and radar measurements of cloud and rain fields, J. Geophys. phys. Res., 92, 9693–9714.
Res., 95, 2021–2031. Sharma, A., and R. Srikanthan (2006), Continuous rainfall simulation: A
Marshak, A., A. Davis, R. Cahalan, and W. Wiscombe (1994), Bounded nonparametric alternative, in 30th Hydrology and Water Resour. Symp.,
cascade models as nonstationary multifractals, Phys. Rev. E, 49(1), Launceston, Tasmania.
55–69. Sharma, A., D. G. Tarboton, and U. Lall (1997), Streamflow simulation: a
Mehrotra, R., and A. Sharma (2006), Conditional resampling of hydrologic nonparametric approach, Water Resour. Res., 33(2), 291–308.
time series using multiple predictor variables: A K-nearest neighbour Snavidze, G. G. (1977), Mathematical Modeling of Hydrologic Series, 314
approach, Adv. Water Resour., 29, 987–999. pp., Water Resour. Publ., Littleton, Colo.
Menabde, M., D. Harris, A. W. Seed, G. Austin, and D. Stow (1997), Multi- Tarboton, D. G., A. Sharma, and A. Lall (1998), Disaggregation procedures
scaling properties of rainfall and bounded random cascades, Water for stochastic hydrology based on nonparametric density estimation,
Resour. Res., 33(12), 2823–2830. Water Resour. Res., 34(1), 107–119.
Milly, P. C. D., J. Betancourt, M. Falkenmark, R. M. Hirsch, W. Zbigniew, Verhoest, N., P. Troch, and F. D. Troch (1997), On the applicability of Bart-
Z. W. Kundzewicz, D. P. Lettenmaier, and R. J. Stouffer (2008), lett-Lewis rectangular pulse models in the modelling of design storms at
Stationarity is dead: Whither water management?, Science, 319, a point, J. Hydrol., 202, 108–120.
573–574. Westra, S., and S. A. Sisson (2011), Detection of non-stationarity in precip-
Nowak, K., J. Prarie, B. Rajagopalan, and U. Lall (2010), A nonparametric itation extremes using a max-stable process model, J. Hydrol., 406,
stochastic approach for multisite disaggregation of annual to daily 119–128.
streamflow, Water Resour. Res., 46, W08529, doi:10.1029/2009WR
008530. R. Mehrotra and A. Sharma, School of Civil and Environmental Engi-
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992), neering, University of New South Wales, Sydney, NSW 2052, Australia.
Numerical Recipes in Fortran—The Art of Scientific Computing, 2nd ed., ([email protected])
963 pp., Cambridge Univ. Press, Cambridge, Mass. R. Srikanthan, Water Division, Australian Bureau of Meterology,
Pui, A., A. Lall, and A. Sharma (2011a), How does the Interdecadal Pacific G.P.O. Box 1289, Melbourne, Victoria 3001, Australia.
Oscillation affect design floods in Australia?, Water Resour. Res., 47, S. Westra, School of Civil, Environmental, and Mining Engineering,
W05554, doi:10.1029/2010WR009420. University of Adelaide, SA 5005, Australia.
16 of 16