Kavish Daya and Darmikah Pather Final IP Report
Kavish Daya and Darmikah Pather Final IP Report
1
CIVN4005A-Investigational Project Report
Table of Contents
Abstract.............................................................................................................................3
List of figures.....................................................................................................................4
1. Introduction...............................................................................................................5
1.2 Aim and objectives...............................................................................................................8
1.3 Research Questions..............................................................................................................8
1.4 Organization of the report....................................................................................................8
2. Literature Review.....................................................................................................10
2.1 Introduction.......................................................................................................................10
2.2 Hydrological Stochastic generator applications...................................................................10
2.3 Non-Parametric and Parametric models..............................................................................14
2.4 Climate change and variability modelling............................................................................16
2.5 Variable Length Bootstrap method.....................................................................................17
2.6 Daily stochastic rainfall data...............................................................................................18
2.7 Daily VLB stochastic generator............................................................................................19
2.8 Limitations of subjective parameters in the daily stochastic rainfall VLB generator.............20
3. Research Method.....................................................................................................21
3.1 Introduction.................................................................................................................21
3.2 Tasks that will be carried out..............................................................................................21
3.3 Daily VLB stochastic generation procedure..........................................................................24
3.4 Analysis of historical observed data....................................................................................29
3.4.1 Gauteng rainfall stations..................................................................................................30
3.4.2 Kwa-Zulu Natal rainfall stations.......................................................................................34
4. Performance Evaluation...........................................................................................38
Gauteng Evaluation:.................................................................................................................38
Kwa-Zulu Natal Evaluation........................................................................................................44
5. Conclusions and Recommendations..........................................................................51
6. References................................................................................................................53
2
CIVN4005A-Investigational Project Report
Abstract
This research project aims to investigate how sensitive the performance of the daily VLB
stochastic generator is to variations in two parameters that are subjectively selected to
disaggregate stochastic rainfalls from an annual to daily timestep. The two subjective
parameters are the number of pairs of nearest neighbours to select and the number of
years out of these to use for the disaggregation. 57 years of daily rainfall data obtained from
two stations in Gauteng (Johannesburg Leeukop and Johannesburg Turffontein) and two
stations in Kwa-Zulu Natal (Durban Heights, Ngome-Bos) from the South African Weather
Service were applied. By using the daily VLB stochastic generator, 100 stochastic series were
generated for a number of pairs, which are defined as the number of pairs of nearest
neighbours selected and the number of years out of these to use for the disaggregation.
These pairs are the subjective parameter values that this research is based on. By using
three main performance evaluations, one being the percentage deviation of median values
from historic values for different statistical measures such as mean, standard deviation and
skewness. Another performance evaluation is based on the deviation of the historic statistic
from the median of the 100 stochastic values, and lastly an average composite deviation
performance evaluator was used. In addition to these performance evaluators, boxplots
were used to evaluate performance of certain statistical measures. When the historic
statistic lies within the inter- quartile range of the stochastic data, the replication was
deemed acceptable. It was found that no pair of the disaggregation parameters is optimal
for a specific climate and the daily VLB stochastic generator is not sensitive to changes in the
subjective parameters. The largest caveat of the daily VLB stochastic generator is that it
cannot be used to generate stochastic data for an arid region where the occurrence of
rainfall is too low for some of the computational steps of the method.
3
CIVN4005A-Investigational Project Report
List of figures
Table 1: Rainfall station characteristics..................................................................................26
Figure 1:Historic daily rainfall.txt........................................................................................... 25
Figure 2: VLBSettingsandfiles.txt file......................................................................................26
Figure 3: RainAndRainDays.txt...............................................................................................26
Figure 4: Stochastic monthly generated rainfall.....................................................................27
Figure 5: Stochastic daily generated rainfall...........................................................................28
Figure 6: Stochastic daily generated rainfall with updated neighbouring pairs.....................28
Figure 7: Summary of VLB generator process........................................................................29
Figure 8 :Leeukop Station obtained from Google Earth Pro...................................................30
Figure 9: Average and daily rainfall for Leeukop....................................................................31
Figure 10:Daily rainfalls for the 57-year period in Leeukop....................................................32
Figure 11:Average daily rainfall per month in Leeukop..........................................................32
Figure 12:Turrfontein Station obtained from Google Earth Pro.............................................32
Figure 13: Average daily rainfall depths observed from the historic data for Turffontein.....33
Figure 14: Daily rainfalls for the 57-year period in Turffontein..............................................33
Figure 15: Average daily rainfall per month in Turffontein....................................................34
Figure 16: Ngome-Bos station in Kwa-Zulu Natal province....................................................36
Figure 17: Mean historic daily rainfall per year......................................................................36
Figure 18: Historic daily rainfall..............................................................................................37
Figure 19: Monthly Mean daily rainfall box plots over a 57 year period for Ngome-Bos.......37
Figure 20: Google Earth Pro image of Durban Heights station...............................................38
Figure 21: Mean historic daily rainfall per year......................................................................38
Figure 22: Daily historic rainfalls over a 57 year period.........................................................38
Figure 23: Mean daily rainfall per month over the 57-year period in Durban Heights..........39
Figure 24: Leeukop performance based on historic value……………………………………………………39
Figure 25: Leeukop performance based on deviation from median…………………………………….39
Figure 26: Skewness boxplot for Leeukop…………………………………………………………………………..40
Figure 27: Composite average deviation for Leeukop…………………………………………………………40
Figure 28: Turffontein performance based on historic value………………………………………………41
Figure 29: Turffontein performance based on deviation from median……………………………....41
Figure 30: Mean boxplot for Turffontein ……………………………………………………………………………42
Figure 31: Composite average deviation for Turffontein…………………………………………………….42
Figure 32: Durban Heights performance based on historic deviation………………………………….44
Figure 33: Durban Heights performance based on deviation from median…………………………44
Figure 34: Proportion of rainfall days boxplot for Durban Heights………………………………………45
Figure 35: Composite average deviation for Durban Heights………………………………………………45
Figure 36: Ngome Bos performance based on historic deviation………………………………………..46
Figure 37: Ngome Bos performance based on deviation from median……………………………….46
Figure 38: Standard deviation boxplot for Ngome Bos……………………………………………………….47
Figure 39: Composite average deviation for Ngome Bos …………………………………………………..47
4
CIVN4005A-Investigational Project Report
1. Introduction
Rainfall is the main input in the hydrological cycle and provides the primary source of water.
The hydrological cycle allows for the continuous movement of water across the Earth. Since
rainfall is an important aspect in the hydrological system there have been several studies
done in order to predict rainfall patterns (Chapman,1994). The variability in rainfall patterns
can affect human activities such as farming, and mining. The unpredictability of rainfall
patterns on a global scale due to climate change has made it crucial to incorporate
uncertainty into the prediction of future rainfall patterns for use in hydrological evaluations
and planning.
Stochastic data generation allows for the extension of the historical data records in areas
beyond the current data records and is particularly useful in areas where there are
limitations in sparse monitoring networks(Benoit,2022). Furthermore, stochastic generated
data can be used to study the impact of climate change on rainfall patterns, which are
expected to vary in terms of, duration and intensity in different regions
worldwide(Seifossadat,2022). By simulating these changes hydrologists can assess the
potential effects on local and regional water resources, which impact infrastructure planning
and water management decisions. It is important to incorporate stochastic data generation
into a hydrological rainfall model, as it helps with the simulation of rainfall events. By using
probability distribution functions uncertainty and accuracy of synthetic data is achieved.
Despite the vast amount of research conducted globally on stochastic rainfall generation,
there has been limited use of observed rainfall data in African regions for such development
5
CIVN4005A-Investigational Project Report
(Pegram and Clothier, 2002). Within South Africa, climate change is a significant issue that is
currently overlooked by the Department of Water Affairs.
A model used in South Africa that accounts for climate change is the SERGE tool. It is
designed for a semi-arid environment (Wiegand,2008). The SERGE tool is a parametric daily
rainfall generator that employs a Zucchini rainfall model (Zucchini et al., 1992), which
focuses on examining the impact of spatial rainfall variability using several randomly located
rain clouds. It consists of 16 parameters and has been calibrated based on daily rainfall data
across South Africa. This model has been shown to accurately depict temporal rainfall
patterns and thus has a dependable accuracy (Wiegand, 2008).
Southern Africa is faced with large variabilities in rainfall based on a temporal and spatial
scale. These variabilities influence water resources, agriculture, and the economy. Due to
these large climate variabilities extreme floods and droughts are experienced (Tjebane,
2022). A few daily stochastic rainfall generation models have been used to study Southern
African rainfall data. Three comparable models that utilize Southern Africa rainfall data
include the WGEN (Weather Generator) Method, the Transition Probability Matrix (TPM)
method and the Variable Length Bootstrap (VLB) method.
The WGEN method was developed by Richard and Wright (1984).The Weather Generator
method is a parametric model that provides daily values for maximum and minimum
temperatures, solar radiation, and precipitation values (Tjebane,2022). It considers the
variable’s persistence, dependency, and seasonality. WGEN's precipitation model is a
Markov chain-gamma model, with a first-order Markov chain used to produce the
occurrence of rainy or dry days. This method tends to overestimate daily values when
compared to historical data, while the monthly average precipitation values are relatively
accurate when compared to the historic data. WGEN produced lower maximum values for
daily, monthly, and annual averages when compared to historic data. Due to WGEN
requiring a large time series for daily rainfall data, it tends to either over or underestimate
precipitation values, however it is much more accurate for monthly rainfall generation.
6
CIVN4005A-Investigational Project Report
STOMSA parametric model (Phakula et al., 2018, Pegram and Clothier, 2002) which is
commonly used in South African water resource planning and yield analysis. In 2011,
Professor John Ndiritu created the Variable Length Bootstrap (VLB) method for generating
monthly streamflow data (Ndiritu,2011). Ndiritu and Nyaga adopted this method in 2014 to
generate monthly stochastic rainfall data. The effectiveness of this approach is assessed by
comparing the upper and lower bounds of the VLB-generated annual flows. When
comparing VLB with STOMSA it was discovered that both approaches reproduce the mean
monthly flows and standard deviations accurately (Tjebane, 2022). STOMSA tends to
underestimate the standard deviation, whereas VLB does not (Tjebane, 2022). The VLB
stochastic generation model has enabled a more comprehensive and dependable
incorporation of risk and uncertainty in hydrological modelling and analysis. The data
produced from VLB is generally overestimated when compared to the historic data and has
a higher variability than WGEN. VLB did not generate maximum values that were greater
than the historic data, implying that it is rather conservative. During dry and wet seasons,
synthetic data produced was representative of historic data.
Furthermore, the Transition Probability Matrix (TPM) is a multi-state Markov chain model
that generates synthetic sequences of daily rainfalls. The model collects probabilities for rain
in one state and determines the probabilities for the next day. Data is collected for several
days and then collated into a transition probability matrix. For annual data, 12 TPMS would
be required for each month (Tjebane,2022). TPM tends to underestimate large variances in
annual data and is therefore not suitable for areas prone to drought or flooding (Boughton,
1999). The TPM method generates daily rainfall depths that are closer to historic values and
has a higher variability than the Weather Generator (WGEN) Method. It is recommended
that for areas prone to floods and droughts VLB or TPM is better suited as data produced by
these models are more conservative. (Tjebane,2022).
The VLB daily stochastic generator developed by Professor Ndiritu in 2022 generates daily
stochastic rainfall data by disaggregating the annual stochastic rainfalls generated by the
VLB stochastic rainfall generator developed by Ndiritu and Nyaga (2014) (Ndiritu, 2022). The
daily VLB stochastic generator involves identifying the nearest neighbours based on an
annual rainfall magnitude from the historic rainfall sequence. The nearest neighbours are
annual rainfalls whose values are close to the stochastic annual rainfall. The neighbours
7
CIVN4005A-Investigational Project Report
This research project aims to investigate how sensitive the performance of the daily VLB
stochastic generator is to variations in the two subjective parameters namely, the number
of pairs of nearest neighbours and the years selected out of these pairs to be used in the
disaggregation from an annual to daily time step.
1.How does subjective parameter selection influence the performance of the VLB daily
stochastic generator?
2.Do optimal parameter values of the VLB daily stochastic generator exist?
3. Do optimal parameters values exist for the range of climatic conditions assessed by the
daily VLB generator?
Chapter 1 provides an introduction into the definition of stochastic data generation and
different types of generators available. This chapter will include a discussion about the
proposed generator that will be used to conduct this research.
Chapter 2 includes a literature review on stochastic daily rainfall generation. It will provide
information on the uses and importance of daily stochastic rainfall generation. In addition to
providing information about parametric and non-parametric stochastic generators.
8
CIVN4005A-Investigational Project Report
Chapter 3 provides the VLB daily stochastic generator procedure followed by the
methodology used to carry out the research. The methodology is split into steps that were
followed during this research and the reasoning behind each step.
Chapter 4 presents the performance evaluation for all stations in Gauteng and Kwa-Zulu
Natal, this includes graphs obtained for each subjective parameter pair using Microsoft
Excel. This chapter further includes an analysis, discussing the findings from the
performance evaluation.
Chapter 5 provides the conclusions and recommendations based on the findings presented
in chapter 4.
2. Literature Review
9
CIVN4005A-Investigational Project Report
2.1 Introduction
Rainfall is a driving force for many hydrologic processes (Zhao & Nearing, 2019). However,
the lack of reliable rainfall records hinders the development of hydrologic research and
applications. With the developments in stochastic hydrological generation, improvement to
our understanding of water resources can be achieved in the decision-making process in
areas such as water management, agriculture, and mining (Hughes, 2004). Stochastic
generation involves the simulation of the random variability of hydrological variables, such
as rainfall and streamflow, using statistical models and probability theory (Ndiritu & Nyaga,
2014). The stochastic generation of hydrological data is essential for a wide range of
applications, including water resource planning, flood management, and environmental
protection (Ndiritu & Nyaga, 2014). In recent years, advancements in computing power and
data analysis techniques have led to significant improvements in hydrological stochastic
generation methods. These developments have enabled a more comprehensive and
dependable incorporation of risk and uncertainty in hydrological modelling and analysis
(Beven, 2012). However, improvements can be made to make stochastic generators more
robust and reliable.
10
CIVN4005A-Investigational Project Report
However, with the advancement of technology, the accuracy of these stochastic models has
been greatly improved. To achieve better accuracy, new generators utilize much wider
search field criteria and focus on key elemental areas such as spatial topography and
temporal variability of climate conditions (Huang, Wang,Xiao,2018).
An accurate and reliable stochastic rainfall generator was invented by Keith Beven and Mike
Kirkby (Beven,2012). Beven and Kirkby produced a surface rainfall runoff generator called
the “Topmodel generator” also known as the topology based hydrological model. This
generator is capable of simulating rainfall inputs from small catchment areas using a
stochastic rainfall representation. The model consists of relationships between the
catchment topography and the rainfall input. It is assumed that a catchment will behave as a
large sum of interconnected soil found in series. As such the model will be able to calculate
the water balances from each soil column by accounting for infiltration or runoff of rainfall
on that particular column of soil (Beven,2012).
Another accurate rainfall generator that was invented is called The Long Ashton Research
Weather Generator (LARS-WG) (Chisanga,2017). This generator is used for a singular
location to obtain weather data; the generator is to be calibrated every time before
obtaining temperature and rainfall data. This generator is capable of testing 3 parameters,
including maximum and minimum temperatures and precipitation. The main reason why
LARS-WG can simulate future and present climatic weather conditions is that the generator
is based on the General Circulation Models (GCM)(Chisanga,2017). GCM uses historic
geographical data to help manipulate and produce different climatic responses for
scenarios, such as the ocean weather, global atmosphere, and greenhouse gas emissions
(Sellwood, Valdes, 2005). The LARS-WG model and several other stochastic rainfall models
can be used for hydrological design purposes, such as evaluating the likelihood of peak
discharges from occurring in a dam with a basin area of 100km2. It can be used to evaluate
the probability of flood occurrence and risk. The rainfall model is calibrated to obtain hourly
precipitation over the basin area. The hourly precipitation data will be analysed by the
model to identify storm water events that occurred in a period of 100 years. This data will
then be further used to predict future dam water levels in order to determine forthcoming
flooding and overflowing of a dam (Campo-Bescos, Sordo-Ward,2009).The stochastic rainfall
11
CIVN4005A-Investigational Project Report
model can be used to determine if a biomass environment is a niche environment using the
variances in rainfall patterns, water availability, carbon storage and vegetation competition
within an environment (Coletti,2013).
Furthermore, the use of the stochastic model is dependent on the type of model used, as
such there are two main types of models, namely a parametric stochastic model and a non-
parametric stochastic model. A parametric stochastic model follows a parametric probability
distribution, these distributions include, a 1- parameter distribution consisting of an
exponential distribution, a 2- parameter distribution consisting of a Log-normal, Weibull,
Gamma, and Gaussian distributions. Lastly there is a 3-parameter distribution consisting of a
mixture of distributions such as exponential, hybrid exponential and normal skewed
distributions (Pratikasiwi,2022). The parametric model is utilized for several hydrological
analyses, particularly it can be used in simulating the extreme rainfall characteristics within
the 95th percentile, this is achieved using the Weibull distribution. Another application of the
parametric model is determining variability of flow distribution across a catchment area. It is
also capable of estimating minimum, maximum and average temperatures for a tropical
area, in an hourly time scale.
The Markov Chain stochastic model is a non-parametric model and is further classified as a
comparison type of model; for example, the model can determine whether a day is
considered too dry or wet based on the relationship between the current and previous day’s
data. This data includes parameters such as temperature, humidity, air pressure and air
density. A first order Markov chain is used in the WGEN model that is used to describe the
probability of rainfall or the occurrence of a wet and dry day, for a specific day based on if
the previous day was wet or dry. Zhang, Singh,Gagnon,2013). This model was designed in
Fortran and is used to generate rainfall data and is then used to simulate the minimum and
maximum precipitation and air temperature.
Moreover, another model is the variable length block generator (VLB) (Ndiritu, Nyaga,2014).
The VLB was initially a streamflow generator that was modified to be used as a stochastic
rainfall generation model (Ndiritu,2011). This is a non-parametric stochastic model. This
model is an adaptation of the variable length block streamflow generator(Ndiritu, 2011) for
12
CIVN4005A-Investigational Project Report
rainfall generation. The model is designed to incorporate historic spatial and temporal
characteristics of rainfall. The VLB method allows for the disaggregation of fragments using
the weighted method and perturbing bootstrapped annual flows. According to findings by
(Ndiritu,2011), it was found that the VLB generator produces an overestimate of minimum
flows, and when an effort was made to enhance the reproduction of these flows, it resulted
in a significant underestimation.
13
CIVN4005A-Investigational Project Report
All stochastic rainfall models are primarily based on either parametric or non-parametric
methods. A non-parametric stochastic rainfall model is a statistical type of model. It is used
to reproduce and predict future rainfall patterns based on observed and historic rainfall
data. A core aspect of a non-parametric model is that it does not require assumptions to be
made about the statistical distribution of the data and limited parameters needed to be
made. The bootstrap method may be used to generate synthetic rainfall data based on the
original historic rainfall data that was initially observed. An important note is that the
selected parameters used for a non-parametric model, need to be stochastically correct and
unbiased. As this will severely impact the estimations that are generated by the probability
distribution, resulting in a lack of accuracy and poor performance of the model
(Chapman,1997). The application of the non-parametric model is using the synthetic rainfall
data for assessing flood risk and overflowing of dam walls, predicting future weather
patterns, and mining hydrological events. A non-parametric model can also be used for
hydrological water resource management, this includes determining the behaviour of
stream flow, river channels and generating a rainfall time series annually, monthly, daily, or
hourly.
Whereas a parametric model is still able to reproduce and simulate rainfall data, but it does
require assumptions about the probability distribution of the data to be made, hence an
exact number of parameters needs to be defined for the functioning of the model. These
specific assumptions, such as defining what the parameters are, need to be decided based
on the rainfall data and the chosen parametric distribution. Data input is necessary for the
model to function and simulate any rainfall data (Tjebane,2022).
With the two different types of stochastic models to choose from, each model will have its
respective strengths and weaknesses. Concerning the non-parametric model, if it is based
off on The Markov Chain, this model is highly efficient in simulating monthly and annual
rainfall events, it is generally used in sub-tropical and tropical climate regions such as South
Africa, Mozambique, Angola, and Madagascar (Nix, 2019). A non-parametric model is much
more flexible and efficient compared to a parametric model, the parameters do not need to
be defined which allows for a larger range of variability and rainfall patterns, since this
model is also capable of using the kernel density estimation techniques, several assumptions
14
CIVN4005A-Investigational Project Report
are avoided which helps to increase accuracy (Harrold,2003). A non-parametric model can
clearly identify any outliers in the rainfall data due to not requiring defined parameters,
while a parametric model is unable to do so. Non-parametric model requires fewer
assumptions to be made as such it is more efficient than a parametric model and its
application is much wider utilised than a parametric model. However, since less
assumptions are made, non-parametric models have less statistical power to compute the
data and can be highly resource intensive, thus requiring large amounts of input data, which
ultimately results in longer processing times to simulate rainfall data. Due to the non-
parametric model having fewer assumptions, it could limit the model, which can result in
uncertain data being processed, this can be evident in the tendency of the model to
disregard any variation in low frequency rainfall (Pratikasiwi,2022).
Parametric stochastic models are best used to generate rainfall amounts, as it utilizes the
parametric probability distributions. By using these distributions, results are more accurate
compared to non-parametric models that do not use this type of distributions (Huth,2004).
For example, the mixed exponential parametric model is extremely accurate in reproducing
daily rainfall occurrences in subtropical areas, a key area that a non-parametric model is
incapable of producing. In a tropical climate region, a parametric model is better suited for
generating maximum and average rainfall values in a 1-hour time frame (Pratikasiwi,2022).
Less effort is required for the model to function, it is less resource intensive, processing
speeds and the generation of rainfall data is much faster than non-parametric models. Both
parametric and non-parametric models may underestimate extreme data values, as such
outliers are apparent and is not reduced if no care is taken specifying what is required
(Chapman,1998). Parametric models are unable to automatically generate a weather time
series without having bias data. The nonparametric model, the VLB method, can capture
long term rainfall data accurately with VLB generators are able to reproduce the historic
statistics within 82% and 90% of the historic statistics ranging within the interquartile range
of the box plots (Ndiritu and Nyaga, 2014). The VLB generator is able to better reproduce
annual statistics more accurately than a parametric model is a more suitable model to be
used for annual and monthly generation (Ndiritu and Nyaga, 2014).
15
CIVN4005A-Investigational Project Report
As such, the decision about appropriate type of model to use requires careful consideration
and many factors are involved in the decision process. These factors include such the
climate of the region, whether a model is required for annual, monthly, or daily rainfall and
whether a model is required for a time series generation (Pratikasiwi,2022). Based on these
parameters, the most logical model to use includes the variable length block generator
(VLB), which is a non-parametric model, used for rainfall generation. The method has no
known limitations and can be used for groundwater level, streamflow, precipitation and for
radiation (Tjebane,2022).
Climate change holds significant importance for South Africa, particularly given by the
variability in rainfall patterns and climate change. These factors must be considered when
utilizing a rainfall model. There are not many models available that can be associated with
climate change and the long-term effect that it might have due to future uncertainties that
may arise (Ghil et al., 2002; Hughes, 2012), however the General circulation model (GCM’s)
is a model that can attempt to model a long-term climatic change into the analysis. The
GCM is currently the only model that is a physically based approach when accounting for
climate change. This model can assess the impact of greenhouse gases on a global scale to
possibly slow down the impact of greenhouse gases. Several other models that take climate
change into account when producing data with a large degree of variability and uncertain
values that are produced (Mujumdar and Ghosh, 2008). Since the GCM is a one of a kind,
the models data reproduction cannot be validated against another data set that has being
analysed, therefore the accuracy cannot be validated (Koutsoyiannis et al., 2009). However,
from what testing that has been carried out, it is apparent that the GCM cannot replicate
inter-annual rainfall data. This is further proved by (Koutsoyiannis,2011) who stated that
long term persistence needs to be incorporated into the hydro-climate time series as the
GCM underestimates the uncertainties.In 2010 Kundzewicz and Stakhiv concluded that
GCMs are currently not ready to be used for practical water resource planning as the
downscaled data produced by the GCM is uncertain and not validated.
16
CIVN4005A-Investigational Project Report
Variable length bootstrapping is a statistical resampling technique that resamples data using
a variable number of samples (Ndiritu, 2011). The number of samples are obtained
randomly from the original data set. The traditional bootstrap technique only resamples
data in the historic record consequently producing no new data, thus resulting in a
significant limitation to the stochastic data generation (Ndiritu, 2011). The Variable Length
Block (VLB) bootstrap was developed to overcome this limitation and to ensure a replication
of multi-annual flow variability, using blocks of variable length (Ndiritu, 2011). A variable
length bootstrapping stream flow generator developed by Ndiritu (2011) was later adapted
for rainfall generation (Ndiritu & Nyaga, 2014) by considering the characteristics of rainfall
rather than streamflow. These characteristics include spatial and temporal dependence
characteristics of rainfall rather than streamflow (Ndiritu & Nyaga, 2014). The rainfall
generator (Ndiritu & Nyaga, 2014) generates monthly and annual stochastic rainfalls using
the disaggregation of stochastic annual values and perturbing the data (Ndiritu & Nyaga,
2014). Thereafter it updates the stochastic annual values after the disaggregation.
VLB is a nonparametric method that (Ndiritu, 2011):
i) Produces stochastic values beyond the range of historical values, which is
beneficial in simulating extreme rainfall events different to historical data.
ii) Includes a method for preserving the correlations between monthly values at the
end and beginning of each year, thus ensuring a realistic generated time series.
The main steps of the VLB generator include (Ndiritu & Nyaga, 2014):
blocks.
● Matching each stochastic time series year with a pair of different years from the
historical data.
perturbation is incorporated.
17
CIVN4005A-Investigational Project Report
Refer to (Ndiritu & Nyaga, 2014) for the detailed description of the method.
Long sequences of daily rainfall is required for hydrological purposes and to provide inputs
for models of crop growth, landfills, tailing dams and other environmentally sensitive
projects (Srikanthan & McMahon, 2001). It is thus imperative that annual data can be used
to obtain daily rainfall data. The newly developed daily VLB stochastic rainfall generator
(Ndiritu, 2022) is derived from the disaggregation of the annual stochastic rainfalls obtained
using the VLB stochastic rainfall generator (Ndiritu and Nyaga, 2014). Annual to daily
18
CIVN4005A-Investigational Project Report
(i) From the historic rainfall sequence an annual rainfall magnitude (ASI) is selected and
thereafter nearest neighbours of the annual stochastic rainfall are identified. Nearest
neighbours are defined as the annual rainfall values that are close to the selected
annual stochastic rainfall value. An equal number of neighbours with lower and
higher rainfalls are identified. The number of neighbours is a multiple of two.
(ii) After selecting the number of nearest neighbours, a random number will be selected
from the number of neighbours that will supply the initial daily rainfall amounts for
set parts of the year. For example, if 8 nearest neighbours are selected, as it must be
a multiple of two, a random number will be selected between 1 and 8, such as 5. The
random selected number of neighbours (e.g., 5) will fill in equal proportion of the
year to create the initial daily rainfall sequence for the year. If 5 neighbours are
selected, then each neighbour supplies daily rainfalls for a continuous period of
365/5 = 73 days to fill up the year.
(iii) The summation of the rainfalls obtained in step (ii) is unlikely to equal the stochastic
annual rainfall ASi selected in step (i). Therefore, the daily stochastic rainfalls from
step (ii) are proportioned such that it retains the total magnitude of the annual
stochastic streamflow ASi. This adjustment ensures that new daily rainfall data is
generated.
(iv) In order to ensure the replication of cross correlations where multiple rainfall
stations are involved, the contemporaneous approach described by (Ndiritu &
Nyaga, 2014) and (Ndiritu, 2011) is applied.
19
CIVN4005A-Investigational Project Report
2.8 Limitations of subjective parameters in the daily stochastic rainfall VLB generator
Two parameters of the daily generator have been obtained subjectively, namely the number
of pairs of nearest neighbours and number of years selected out of the pairs to use in the
disaggregation from an annual to daily time step. Subjective parameters in a stochastic
generation could lead to sub-optimal generation performance and this investigation
therefore seeks to find out how sensitive the performance of the daily stochastic generator
is to variations in the two subjective parameters and if optimal parameter values exist for
the different climatic conditions the daily VLB generator could be used for.
20
CIVN4005A-Investigational Project Report
3. Research Method
3.1 Introduction
The aims of this investigation include determining how sensitive the performance of the
daily VLB stochastic generator is to the variations in the two subjective parameters. These
subjective parameters are the number of pairs of nearest neighbours and number of years
out of the pairs that are selected to disaggregate the annual rainfalls to daily. In addition,
the aim of this study is to determine if optimal parameter values exist for different climatic
conditions the daily VLB generator can be used for. The following tasks will be carried out to
meet these aims and objectives.
- Task 1: Obtain historic rainfall data for different climatic conditions namely an arid,
semi-arid and humid conditions within South Africa. Gauteng is the semi- arid region
and represents a normal climate. Kwa-Zulu Natal will represent the humid region.
The arid region will be Northern Cape. This is done to determine if optimal values
for the subjective parameters exist for different climatic conditions. The historic
rainfall data will be obtained for a minimum of 50 years from the South African
Weather Service. More than 50 years of historic data is selected to account for
climatic variability and to ensure the analysis is thorough. Once the historic data is
obtained, quality checks will be carried out to ensure the data is continuous and
basic statistics of the data will be obtained. These will include mean daily rainfall,
skewness, standard deviation, and proportion of rainfall days.
- Task 2: A range of values will be defined for the two subjective parameters. For one
subjective parameter, which is the number of pairs of nearest neighbours, a
minimum, middle, and maximum value will be set. The number of pairs of nearest
neighbours can only range from 1-50% of the historic years of data obtained. For
example, for 57 years of historic data that is obtained, a minimum number of pairs
of nearest neighbours selected will be 1, the middle number selected will be 14 and
21
CIVN4005A-Investigational Project Report
the maximum will be 28. The second parameter is the years selected to
disaggregate the annual rainfalls to daily, which takes a maximum value of twice the
number of pairs of nearest neighbours. For each number of pairs of nearest
neighbours selected, a lower and upper bound is defined for the years selected to
be used in the disaggregation. For example, if the number of pairs of nearest
neighbours selected is 9, the years selected out of these neighbours will be 1 and
18, which will be used to generate the stochastic data. As such if the number of
pairs of nearest neighbours is 9, the pairs referred to in this report will be (9,1) and
(9,18). Therefore, a pair will be defined as the number of pairs of nearest
neighbours selected and the number of years out of these to use in the
disaggregation from an annual to a daily time step. By choosing a minimum, middle,
and maximum number of nearest neighbour pairs, this will allow for the sensitivity
analysis to be carried out on the daily VLB stochastic generator’s performance due
to variations in the subjectively selected values of the two parameters.
- Task 3: The daily VLB stochastic generator will be employed by changing the number
of pairs of nearest neighbours and years selected. For the minimum, middle and
maximum nearest pairs of neighbours selected. This will be done for each station in
the respective regions. Since there are six pairs per station, and four stations will be
analysed a total of twenty four sets of 100 stochastic series will be obtained .
- Task 4: Once the daily stochastic data has been generated by varying subjective
parameters for each climatic condition, an analysis of the performance of the daily
VLB stochastic generator will be carried out. This analysis carried out will be across
the different climatic conditions in order to determine whether an optimal
subjective parameter value exists in each condition. The analysis will be carried out
to determine how sensitive the generator is to the variation in the two subjective
parameters. It will be carried out by using statistical measures such as mean,
standard deviation and skewness. The mean and standard deviation will provide
information about the central tendency and variability of the data, respectively. The
proportion of rainfall days for each set of stochastic generated data will be
22
CIVN4005A-Investigational Project Report
- Task 5: To determine if optimal values exist for different climatic conditions and the
sensitivity of altering the parameters values on the VLB daily generator three
performance evaluations will be carried out.
1) The percentage deviation from the historic value will be one of the performance
indicators that will be determined. From the historic data one value will be obtained
for the mean, skewness, standard deviation as well as proportion of rainfall days.
From the stochastic data the median value for all 100 stochastic series will be
obtained for each of the statistical measures mentioned above. To get the deviation
from the historic value the following formula will be applied for each statistical
measure:
% deviation from historic value ¿ stochastic median−historic value …Equation 1
historic value
For a pair (number of pairs of nearest neighbours, years selected from these pairs)
to perform well, the deviation away from the stochastic median for a statistical
measure from the historic value should be low, as it is ideal for the median of the
stochastic data to be close to the historic value to ensure the stochastically
generated data is representative of the historic data.
- Another performance evaluator that will be utilised is based on the deviation of the
historic statistic from the median of the 100 stochastic values. The historic value for
a particular statistical measure will be included in the 100 stochastic values for that
measure and the deviation of the historic value from the stochastic median value will
be determined. It is deemed optimal if the historic value deviates minimally from the
stochastic median value (50th value). If the historic value is near the median
stochastic value, the data is representative of the historic data, whereas large
deviations away from the median stochastic value will indicate a lower performance.
A low deviation from the stochastic median value will therefore lead to a pair
performing well as the data will be representative of the historic data.
- An overall performance evaluator utilised is the composite average deviation. The
composite average deviation is obtained by multiplying the deviation from the
23
CIVN4005A-Investigational Project Report
historic statistic to the stochastic median statistic and the percentage deviation from
the historic value for all statistical measures for each pair and thereafter obtaining an
average value for each pair. A low average composite deviation will indicate that the
pair performs well. These three performance evaluators will determine if optimal
parameter values exist for the applied rainfall data sets and will be used to assess
the sensitivity of the VLB daily generator to changes in the parameter values.
The daily VLB generator is useful for generating daily rainfall data; however, the
performance of the VLB daily generator is dependent on two subjective parameters which
can affect its reliability. Therefore, these tasks will help determine whether optimal values
exist for certain climatic conditions and how the two subjective parameters will influence
the performance of the VLB generator.
The Fortran programming language was employed in utilizing the VLB Method to develop
the application with stochastic elements. Fortran is a highly capable and versatile
programming language that is designed for generating rainfall data, specifically tailored for
numeric calculations and scientific computing. After the VLB is designed, an executable
(.exe) file is created to work alongside a text file(. txt) where all the relevant rainfall data
would be required. A Microsoft excel file is used, where all the rainfall is captured, and this
excel file is then saved as a delimited text file which would be read by the executable
application. Microsoft excel is used ; therefore the Fortran compiler is not required.
Four executable files are created, named DailyToMonthlyRainfallCreator.exe,
VLBMonthlyGenerator.exe , AnnualToDailyDisaggregator.exe and
AnnualtoDailyDisaggregationParameters.txt, these executables are used to convert the daily
historic rainfall into monthly rainfall, thereafter stochastic monthly data is produced, and
daily stochastic rainfall is produced, this last set is produced using the
AnnualToDailyDisaggregation.exe where VLB disaggregates the annual stochastic rainfalls
(aggregated from the stochastic monthly rainfalls) to daily stochastic rainfalls. A detailed
description of the VLB monthly stochastic generator is provided by Ndiritu and Nyaga
(2014).
24
CIVN4005A-Investigational Project Report
A summary of the steps is provided below. Despite the ability of the VLB model created by
Ndiritu and Nyaga (2014) to produce stochastic rainfalls on both monthly and annual scales,
it is deemed suitable to directly disaggregate rainfalls from an annual to a daily time step
without involving the intermediary monthly time step. Annual to daily disaggregation is far
less complex as opposed to monthly to daily disaggregation and is the reason why this
computation is instead carried out by the VLB generator.
The following files were produced when running the VLB daily stochastic generator to
generate stochastic daily rainfalls.
Step 1:
Two text files were required to run the application, this includes:
a) Historic daily rainfall.txt consisting of the daily historic rainfall data from the rainfall
stations, the format of the data is shown below in Figure 1. The VLB application is
capable of producing stochastic for multiple regions simultaneously but it was opted
to run the software for each station in each region individually for simplicity, thus the
software was executed 4 times to obtain stochastic monthly data for
Gauteng(Leeukop and Turffontein) and Kwa-Zulu Natal(Durban Heights and Ngome-
Bos) respectively.
25
CIVN4005A-Investigational Project Report
b) VLBSettingsandfiles.txt which consist of the length of historic data in years, length of the
stochastic sequences to be generated in years and minimum block length which are altered
according to each station, as shown in Figure 2.
Step 2:
Execute the DailyToMonthlyRainfallCreator.exe program to generate monthly rainfall data
files intended for utilization by the VLB Monthly Generator.exe. Additionally, this program
conducts an analysis that establishes a correlation between annual rainfalls and the count of
annual rainy days, and the results are saved as RainAndRainDays.txt. VLB also produces the
RainfilesNames.txt consisting of names of the rainfall stations and the number of stations
analysed, as seen in figure 3.
Figure 3: RainAndRainDays.txt
26
CIVN4005A-Investigational Project Report
Step 3:
Execute the VLB Monthly Generator.exe program, which generates text files containing
various analyses of the stations, including historical annual rainfalls, historical monthly
statistics, block length and averages, and cross-relations, among others. The only required
files are the ones containing the monthly stochastic generation files, these files include the
specified station containing monthly stochastics rainfall data, the files are named with the
station name followed with generated. Each monthly stochastic generated file consists of
100 stochastic rainfall series with each stochastic series consisting of 57 years’ worth of
rainfall monthly data as seen in the figure 4 below. The columns are arranged with the
months of least rainfall to months of most rainfall.
Step 4:
Execute the AnnualToDailyDisaggregator.exe program, which generates text files containing
the annual stochastic data disaggregated into stochastic daily rainfall data for 100 stochastic
rainfall series, as seen in figure 5 below. This data will now be used to perform a
performance evaluation in order to determine how well the daily stochastic rainfall data
comparred to the historic rainfall. A series of performance evaluations will be carried out
which will is discussed in the methodology above.
27
CIVN4005A-Investigational Project Report
Step 5:
The last step in the VLB procedure, is changing the parameters of the generator to improve
the accuracy of the stochastic rainfall that is generated. This is done by executing the
AnnualToDailyDisaggregation.exe The parameters selected in the text file is in the form of a
pair e.g. (X,Y), Where X represents the number of pairs of nearest neighbours, the number
selected should not exceed 50% of the number of years of stochastic data. While Y
represents the number of nearest neighbour’s to use to disaggregate the annual rainfall to
daily rainfall. These pairs can be altered interchangeably that produces different sets of
stochastic data that consists of 100 stochastic series.
28
Figure 6: Stochastic daily generated rainfall with updated neighbouring pairs
CIVN4005A-Investigational Project Report
Figure 7 shows the summary flow chart of the VLB process followed.
The historical data was obtained from the South African Weather service, 2 stations for each
province was selected and 57 years of daily historical data was obtained. The stations were
chosen subjectively by the South African Weather service based on the given provinces and
the number of years of data given such that it meets the requirements of being long and
reliable. Two stations in the Northern Cape were obtained, however due to the very low
rainfalls the daily VLB stochastic generator could not produce stochastic data, and is not
considered in the analysis.
The Gauteng stations include:
- Johannesburg Turffontein (Station number- 0476044 0)
- Johannesburg Leeukop (Station number- 0476031 8).
The Kwa-Zulu Natal stations include:
- Durban Heights station (Station number- 0240738 1)
- Ngome Bos (Station number- 0373680A1).
Once the data was obtained, quality checks and checks for the continuity of the data were
carried out. These quality checks include basic statistical analysis such as the average daily
29
CIVN4005A-Investigational Project Report
and annual rainfall, number of rain days per year and checking if there are any gaps in the
data. The data had missing values and a 96% reliability percentage, this was deemed
acceptable. The 4% missing values were manually filled in with 0 values to provide a
continuous data set. This is a common practice in hydrology, as most stochastic generators
require a continuous data set.
Table 3.1 summarises the characteristics of each station for the two provinces.
Province Representative Station Co- ordinates Record of Number Mean
climatic region Latitude Longitud Data of years annual
e of data precipitation
MAP(mm)
Gauteng Semi-arid Johannesburg -26.23 28.04 01/01/1965- 57 759.5
Turffontein 31/12/2022
Johannesburg -26.00 28.05 01/01/1965- 57 693.2
Leeukop 31/12/2022
Kwa- Humid Durban -29.80 30.93 01/01/1965- 57 790.1
Zulu Heights 31/12/2022
Natal Ngome Bos -27.82 31.42 01/01/1965- 57 1450.4
31/12/2022
Table 1: Rainfall station characteristics
Johannesburg Leeukop
Johannesburg Leeukop is situated near Leeukop dam and around 1.2km away from the
Jukskei River. This area is largely farmland with agricultural activity. It is situated near the
Kyalami grand prix circuit and is a far less built-up area in Johannesburg. Figure 8 is an image
of Leeukop rainfall station obtained from Google Earth Pro.
30
CIVN4005A-Investigational Project Report
The mean daily historic rainfall per year is calculated to be 1.9mm/day/year over the 57-
year historic period, as illustrated in figure 9. Additionally, from figure 9 it observed that the
wettest year is in 1997 with a mean daily rainfall of 2.9mm/day for 1997 and the driest year
is 1965 with a mean daily rainfall of 1mm/day for the year 1965 as shown in the figure
below. The maximum daily observed rainfall occurred on the 5th of February 2022 with a
value of 122mm, as shown in figure 10. The wet months are from October- March as seen in
figure 11 that shows the mean historic rainfall per day in each month for the 57-year period.
Figure 9: Average annual daily rainfall for Leeukop for the historic period from 1965-2022
31
CIVN4005A-Investigational Project Report
The Johannesburg Turffontein rainfall station is located next to Robinson deep gold mine
and near the Robinson deep landfill, figure 12 shows the Turffontein area where the station
is located. Turffontein is developing into a residential and commercial area and mining
activity is decreasing in this area. The Turffontein station is around 30km away from the
Leeukop station
32
CIVN4005A-Investigational Project Report
The mean daily historic rainfall per year is calculated to be 2.1mm/day over the 57-year
historic period, as illustrated in figure 13. It is observed that the wettest year is in 2010 with
a mean daily rainfall of 3.2mm/day for 2010 and the driest year is 2003 with a mean daily
rainfall of 1.2mm/day for 2003 as shown in the figure below. The maximum daily observed
rainfall occurred on the 20th of January 1972 with a value of 162mm, as shown in figure 14.
The wet months are from October- March, as seen in figure 15 that shows the mean historic
rainfall per day in each month for the 57-year period.
Figure 13: Average daily rainfall depths observed from the historic data for Turffontein These are annual rainfalls
33
CIVN4005A-Investigational Project Report
Figure 15: Box plots of the average daily rainfall per month in Turffontein (mm/day/month) from 1965-2022
Ngome-Bos
F
This station is located in the Ngome Forest that is situated 70km east of Vryheid, KwaZulu-
Natal, South Africa. This is a forest that is transitional between the Mistbelt
Forest and Coastal Scarp Forest. It is a protected area since 1905, and forms part of the
Ntendeka Wilderness Area. The Ngome Forest is situated on the southern slopes of high-
altitude mist-belt grasslands and contains a unique combination of coastal and upland plant
and bird species. Below is an image of the Ngome-Bos area.
34
CIVN4005A-Investigational Project Report
The mean daily historic rainfall per year is calculated to be 4mm/day over the 57-year
historic period, as illustrated in figure 17. It is observed that the wettest year is in 1984 with
a mean daily rainfall of 6.1mm/day in 1984 and the driest year is 2015 with a mean daily
rainfall of 2.2mm/day in 2015 as shown in the figure below. The maximum daily observed
rainfall occurred on the 30th of December 1993 with a value of 320 mm, as shown in figure
11. The wet months are from October- March, as seen in figure 19 that shows the mean
historic rainfall per day in each month for the 57-year period.
Figure 17: Mean historic daily rainfall per year from 1965-2022
Figure 18: Historic daily rainfall for Ngome Bos from 1965-2022
35
CIVN4005A-Investigational Project Report
Figure 19: Box plots of monthly mean daily rainfall over a 57-year period for Ngome-Bos
Durban Heights
Durban heights rainfall station is situated around 400km away from the Ngome- Bos station.
Durban heights station is near Umgeni Water in a suburb called Reservoir Hills. This area is a
few kilometres away from Durban CBD in the Western part of Durban and is characterised
by its hilly terrain.
The mean daily historic rainfall per year is calculated to be 2.2 mm/day over the 57-year
historic period, as illustrated in figure 21. It is observed that the wettest year is in 1994 with
a mean daily rainfall of 4mm/day in 1994 and the driest year is 2016 with a mean daily
36
CIVN4005A-Investigational Project Report
rainfall of 0.6mm/day in as shown in the figure below. The maximum daily observed rainfall
occurred on the 9th of April 2022 with a value of 281 mm, as shown in figure 22. The wet
months are from October- March, as seen in figure 23 that shows the mean historic rainfall
per day in each month for the 57-year period. The mean daily historic rainfall per year varies
from Ngome-Bos by 82% and this is due to the change in environmental conditions as
Ngome-Bos is situated in a forest and Durban Heights is a residential area.
Figure 21: Mean historic daily rainfall per year Annual rainfalls.
Figure
22: Daily historic
rainfalls over a 57-
year period
37
CIVN4005A-Investigational Project Report
Figure 23:Box plots of mean daily rainfall per month over the 57-year period in Durban Heights
4. Performance Evaluation
This chapter will assess the performance of different subjective parameter pairings. Pairs
(1,1) and (1,2) are the minimum pairs, (14,1) and (14,28) are the middle pairs and lastly
(28,1) and (28,56) are the maximum pairs. The first number of the subjective pairs is the
number of pairs of nearest neighbours selected and the second number is the number of
years used to disaggregate from an annual to a daily time step.
This chapter will evaluate the performance of the different subjective pairs mentioned
above based on the three performance evaluations stated chapter 3 (research method). This
chapter covers the performance evaluation for Gauteng and Kwa-Zulu Natal, it will provide
an analysis based on the graphs obtained from Excel for each performance evaluator.
Gauteng Evaluation:
38
CIVN4005A-Investigational Project Report
from 2.59%-8.6% for Leeukop and for Turffontein the range was 0.11%-3.74%. It can be
concluded that altering the pairs influences skewness as a statistical measure as seen by the
greater deviations but does not influence mean.
Deviation of the historic statistic from the median of the 100 stochastic values
For Gauteng, the deviation of the statistical historic value from the stochastic median value
varies from 9-11 values away from the stochastic median, however the deviation for the
mean historic statistic is less than 1% as seen in figure 24 and 28, therefore changing the
pairs has no effect on the deviation away the stochastic median i.e., the 50 th value. Changing
the pairs affected the performance of the VLB generator as it can be seen in figure 25 and
29 the deviation is further away from the stochastic median value. However, it can be noted
that the deviations from the historic median( figures 24 and 18) are relatively small
compared to the great deviation away from the median stochastic value ( figure 25 and 29).
Thus, altering the pairs in the VLB generator results in a sensitivity to the statistical
measures, standard deviation, and skewness.
As per figure 25 for Leeukop, all subjective parameter pairs have large deviations in the
skewness value from the stochastic median. With the largest deviation of 44 values away
from the stochastic median skewness for pair 28,1. On average the historic value is 3.66%
away from being within the interquartile range, where this region is deemed acceptable
(figure 26). Pair 14,28 was the only subjective parameter pair where the historic value was
in the interquartile range. This implies that the VLB daily stochastic generator may be bias
with regards to producing these stochastic rainfalls, as it tends to overestimate the rainfall
resulting in higher rainfalls.
Across all pairs for Turffontein, the central mean remained consistent with a value of 2.06
seen in figure 30. These stochastic means fared well against the historic mean of 2.08. This is
further depicted in figure 29 as where all subjective parameter pairs were 11 places away
from the stochastic mean median. For all subjective pairs, the first quartile and median
have values below the historic mean thus indicating some form of systematic bias, whereby
daily VLB stochastic generator is underestimating rainfall values. This ultimately means that
the generator is generating lower rainfall days for the stochastic series.
39
CIVN4005A-Investigational Project Report
For the station Leeukop pair 14,28 performed the best as seen in figure 24 and 25. The
average composite deviation is the lowest for this pair (figure 27) and it performs the best in
3/4 statistical measure in terms of deviation as well as deviation away from the stochastic
median.
For the Turffontein station it can be concluded that pair 28,56 performs the best, as it has a
significantly lower average composite deviation when compared to performance of the
other pairs. In addition, pair 28,56 performs the best in ¾ statistical measures for the
percentage deviation away from the historic median rainfall.
40
CIVN4005A-Investigational Project Report
41
CIVN4005A-Investigational Project Report
3.50 3.41
3.25
3.00
2.50
2.50
2.03
2.00
1.72
1.50 1.35
1.16
1.00 0.86
0.69 0.76
0.64 0.64 0.570.48
0.41 0.46
0.50
0.11
0.01 0.00
0.00
Mean Stanard Deviation Skewness Proportion of rainfall days
Subjective Pairs:
1,2 1,1 14,1 14,28 28,1 28,56
42
CIVN4005A-Investigational Project Report
33
30
Deviation from median
20 18 18
17
14 14 14
11 11 11 11 11 11 12 12
10 10
10
6 7
1 2 2 1
0
Mean Standard deviation Skewness Proportion of rainfall days
Statistical measures
Subjective Pairs:
1,2 1,1 14,1 14,28 28,1 28,56
Subjective Pairs:
43
CIVN4005A-Investigational Project Report
Average deviation from historic value x deviation Turffontein Composite average deviation
35.00
32.25
30.00
25.00 24.29
from median
20.00 17.78
16.17
15.00 14.17
10.00
5.17
5.00
0.00
Subjective Pairs
Typically for Ngome Bos, all deviations in skewness were substantial, with all pairs exhibiting
a deviation greater than 5%. The same can be said for Durban Heights, with all pairs except
1,1 having deviations larger than 5%. It should be noted that pair 1,1 fared the worst for
Ngome Bos while it performed the best for Durban Heights.
As for proportion of rainfall, the pairs utilized by VLB produced stochastic rainfall data that
had much larger proportion of rainfall to non-rainfall days, which is evident in figure 32 for
44
CIVN4005A-Investigational Project Report
Durban Heights, therefore implying the alteration of pairs did have an impact on the
proportion of historic rainfall days as all percentages were much larger than 5%. As for
Ngome Bos, altering of the pairs did not have a significant impact on the data, with pair
14,28 having faired the best.
Deviation of the historic statistic from the median of the 100 stochastic values
In Kwa-Zulu Natal the deviation of the historic statistic from the stochastic median value
varies slightly. For the two stations namely, Ngome-Bos and Durban heights for the
deviation away from the stochastic median for mean does not change when the pairs are
altered thus indicating that changing the parameter pair does not influence the mean of the
stochastic data. Skewness is the most sensitive measure when the parameter pair values are
altered, this can be seen by the large deviations away from the stochastic median. For
Durban heights pair 1,1 is 1 value away from the stochastic median which is the ideal value
and thus the most representative of the historic data, whereas for pair 14,1 the historic
statistic deviates by 37 values away from the stochastic median thus indicating that is not
representative of the historic data. For Durban heights the proportion of rainfall days
deviates by 51 values away from the stochastic median, which is constant when the pairs
are changed, however the deviation of the historic statistic from the median of the 100
stochastic values for Ngome-Bos varies when the pairs are altered. Standard deviation varies
in both stations by 2-24 values away from the stochastic median. For Durban heights most
pairs deviate by 2 values away from the stochastic median with exceptions of pair 14,1 and
14,28. However for Ngome-Bos the deviations from the stochastic median range from 2-19
values away from the stochastic median with no noticeable trends.
The consistent deviation of the historic proportion of rainfall by 51 (figure 33) values from
the stochastic median across varied subjective parameter pairs is indicative of a systematic
discrepancy between observed historical data and the statistical central tendency
represented by the stochastic median. This divergence could be indicative of a systematic
bias or unaccounted influencing factor that consistently skews the historic values away from
the expected median. From figure 34 the historic proportion of rainfall days is greater than
the stochastic median proportion of rainfall days for all pairs, thus indicating that the daily
VLB stochastic generator may be bias to generating lower rainfall days for the stochastic
series.
45
CIVN4005A-Investigational Project Report
For Ngome-Bos the differences between the historical standard deviation and the median
stochastic standard deviation range from 2 to 19 values across the different subjective
parameter pairs (figure 37). As illustrated in Figure 38, the historical value closely aligns with
the stochastic median values, with the most significant disparity observed at 1.7% for pair
14,28. This implies that despite a substantial deviation of 19 values from the stochastic
median value, the disparity between the stochastic and historical medians remains minimal
at 1.7%. Consequently, the stochastic data maintains its representativeness of the historical
data.
Overall, when carrying out the performance evaluation for all stations in Gauteng and Kwa-
Zulu Natal, it is found that there was no specific pair of the disaggregation parameters that
performed the best by having the lowest percentage deviation from the historic value for
the statistical measures of central tendency and deviations from the stochastic median.
46
CIVN4005A-Investigational Project Report
47
CIVN4005A-Investigational Project Report
48
CIVN4005A-Investigational Project Report
49
CIVN4005A-Investigational Project Report
50
CIVN4005A-Investigational Project Report
For Gauteng, there is no optimal parameter values that exist because no trends were
identified in the analysis. When conducting an analysis on an arid region, namely Northen
Cape, it was found that VLB stochastic generator could not produce stochastic data as the
rainfall occurrences were too few for the methodology of the VLB to function. As such the
arid region (Northen Cape) was excluded from the research. For the humid region (Kwa-Zulu
Natal), the stochastic generator was able to produce stochastic data, yet no optimal
parameter values were identified for this. A key limitation to the VLB daily stochastic
generator was the inability to generate stochastic data for arid regions. Moreover, in the
case where historical data includes missing values, the data would need to be in-filled for
the generator to operate. This is a common practice in hydrological analysis as a continuous
data set is required for most stochastic generators to operate.
From this research it can be concluded that no optimal pair exists for the regions analysed,
however if additional subjective parameter pairings are examined beyond just the minimal,
median, and maximum pairs there is a possibility that an optimal pair of subjective
parameters may exists.
51
CIVN4005A-Investigational Project Report
For future research, it is advised that a minimum of three weather stations with comparable
climatic conditions per region is examined, to identify if an optimal pair of subjective
parameters exists. Adaptations to the VLB daily stochastic generator are necessary to
accommodate the analysis of arid regions. In addition, the performance of the daily VLB
stochastic generator can be compared to other daily stochastic generators such as the
Weather Generator Method (WGEN) and the Transition Probability Matrix method. (TPM).
This can assess how well each model captures the statistical characteristics of observed
rainfall patterns.
52
CIVN4005A-Investigational Project Report
6. References
–Acharya, S. (2021, May 14). What are RMSE and MAE? Retrieved from Towards Data
science: https://towardsdatascience.com/what-are-rmse-and-mae-e405ce230383
–Benoit, L., Sichoix, L., Nugent, A. D., Lucas, M. P., & Giambelluca, (2022, April 27).
Stochastic daily rainfall generation on tropical islands with complex topography. Retrieved
from European Geosciences Union: https://doi.org/10.5194/hess-26-2113-2022
-Beven, K. (2012). Rainfall-Runoff Modelling. Sussex: John Wiley & Sons, Ltd.
–Boughton, W. C. (1999). A daily rainfall generating model for water yield and flood studies.
Australia: Cooperative Research Centre for Catchment Hydrology (99/9).
-Campo, M. (2009, March 8). Application of a stochastic rainfall model in flood risk
assessment. Retrieved from Research Gate:
https://www.researchgate.net/publication/234517790_Application_of_a_stochastic_rainfall
_model_in_flood_risk_assessment.
-Chapman, T. (1999, January 5). Stochastic modelling of daily rainfall: the impact of adjoining
wet days on the distribution of rainfall amounts Retrieved from Science Direct:
https://www.sciencedirect.com/science/article/pii/S136481529800036X?via%3Dihub.
-Edwin, A. I., & Martins, O. Y. (2014). Stochastic Characteristics and Modelling of Monthly
Rainfall Time Series of Ilorin, Nigeria. Modern Hydrology, 67-79.
-Ghil M (2002). Natural Climate Variability. Vol. 1, The Earth system: physical and chemical
dimensions of global environmental change, Eds., MacCracken M C and Perry J S., in
Encyclopedia of Global Environmental Change, Ed. -in-Chief, Munn T. John Wiley. P 544-549.
53
CIVN4005A-Investigational Project Report
-Harold, T. (2003, December 12). A nonparametric model for stochastic generation of daily
rainfall amounts. Retrieved from Research Gate:
https://www.researchgate.net/publication/248808540_A_nonparametric_model_for_stoch
astic_generation_of_daily_rainfall_amounts.
-Helton, J. (1995). Uncertainty and sensitivity analysis in the presence of stochastic and
subjective uncertainty. Journal of Statistical Computation and Simulation , 3-76.
-Huang, Y. (2018, March 11). Spatial and Temporal Variability in the Precipitation
Concentration in the Upper Reaches of the Hongshui River Basin, Southwestern China.
Retrieved from Hinawi: https://www.hindawi.com/journals/amete/2018/4329757/.
-Hughes, D. (2004). Three decades of hydrological modelling research in South Africa. South
African Journal of Science, 638-642.
-Kundzewicz ZW, Mata lJ, Arnell NW, Döll P, Jimenez B, Miller K, Oki T, Şen Z and
Shiklomanov I (2008). The implications of projected climate change for freshwater resources
and their management, Hydrol.Sci. J., 53:1, 3-10.
-Kundzewicz, Z. (2010, October 13). Are climate models “ready for prime time” in water
resources management applications, or is more research needed? Retrieved from Taylor
Francis Online: https://www.tandfonline.com/doi/full/10.1080/02626667.2010.513211.
-LePan, N. (2023, March 22). The world’s wettest mines: Measuring precipitation at mine
sites. Retrieved from Datamine: https://www.mining.com/the-worlds-wettest-mines-
measuring-precipitation-at-mine-sites/
-Mirzaei, M. (2022, August 9). A Novel Framework for Nonparametric Rainfall Generator
Based on Deep Convolutional Wasserstein Generative Networks (DC-WGANs). Research
Square, 6-9.
-Mujumdar PP and Ghosh S (2008). Modeling GCM and scenario uncertainty using a
possibilistic approach: Application to the Mahanadi River, India, Water Resour. Res., 44,
W06407, doi:10.1029/2007WR006137.
54
CIVN4005A-Investigational Project Report
-Nix, S. (2019, July 8). The Territory and Current Status of the African Rainforest. Retrieved
from ThoughtCo.: https://www.thoughtco.com/african-rainforest-1341794.
-Pegram, G. , & Clothier, A. N. (2002). Space time modelling of rainfall using the string of
beads model: Integration of radar and rain gauge data. Durban, South Africa: Water
Research Commission Report No 1010/1/02.
-Pratikasiwi, H. (2022, July 14). Stochastics Modelling of Rainfall Process in Asia Region: A
Systematics Review. Retrieved from environmental sciences proceedings:
file:///D:/Downloads/environsciproc-19-00022.pdf.
–Richardson, C. W., & Wright, D. A. (1984). WGEN: A model for generating daily weather
variables. U. S. Department of Agriculture, 8 (83p).
-Srikanthan, R., & McMahon, T. (2001). Stochastic generation of annual, monthly and daily
climate data: A review. Hydrology and Earth System Sciences, 653–670.
-Wiegand, K.(2008, February 2). SERGE: a spatially explicit generator of local rainfall in
southern Africa. Retrieved from SciELO: http://www.scielo.org.za/scielo.php?
script=sci_arttext&pid=S0038-23532008000100010.
–Ye, Lei. (2018, December 17). The probability distribution of daily precipitation at the point
and catchment scales in the United States. Retrieved from European Geosciences Union:
https://hess.copernicus.org/articles/22/6519/2018/
-Zhao, Y., & Nearing, M. A. (2019). A daily spatially explicit stochastic rainfall generator for a
semi-arid climate. Elsevier, 181-192.
-Zhao, Y., Nearing, M. A., & Guertin, D. P. (2019). A daily spatially explicit stochastic rainfall
generator for a semi-arid climate. Journal of Hydrology, 181-192.
-Zhou, Y., Yu, Z. J., Li, J., Huang, Y., & Zhang, G. (2017). The Effect of Temporal Resolution on
the Accuracy of Predicting Building Occupant Behaviour based on Markov Chain Models.
ScienceDirect: 10th International Symposium on Heating, Ventilation and Air Conditioning,
ISHVAC2017 , Procedia Engineering 205 (2017) 1698 1704.
55
CIVN4005A-Investigational Project Report
-Zucchini, W., Adamson, P. T., & McNeill, L. (1992). A model of southern African Rainfall. S.
Afr. J. Sci, 103-109.
56