Applied Modeling Techniques and Data Analysis 2 Financial, Demographic, Stochastic and Statistical Models and Methods
Applied Modeling Techniques and Data Analysis 2 Financial, Demographic, Stochastic and Statistical Models and Methods
Volume 8
Edited by
Yannis Dimotikalis
Alex Karagrigoriou
Christina Parpoula
Christos H. Skiadas
First published 2021 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms and licenses issued by the
CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
www.iste.co.uk www.wiley.com
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Yannis D IMOTIKALIS, Alex K ARAGRIGORIOU, Christina PARPOULA
and Christos H. S KIADAS
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Preface
Part 2 covers the area of applied stochastic and statistical models and methods
and comprises eight chapters: Chapter 10, “The Double Flexible Dirichlet: A
Structured Mixture Model for Compositional Data”, by Roberto Ascari, Sonia
Migliorati and Andrea Ongaro; Chapter 11, “Quantization of Transformed Lévy
Measures”, by Mark Anthony Caruana; Chapter 12, “A Flexible Mixture Regression
Model for Bounded Multivariate Responses”, by Agnese M. Di Brisco and Sonia
Migliorati; Chapter 13, “On Asymptotic Structure of the Critical Galton–Watson
Branching Processes with Infinite Variance and Allowing Immigration”, by Azam A.
Imomov and Erkin E. Tukhtaev; Chapter 14, “Properties of the Extreme Points of the
Joint Eigenvalue Probability Density Function of the Wishart Matrix”, by Asaph
Keikara Muhumuza, Karl Lundengård, Sergei Silvestrov, John Magero Mango and
Godwin Kakuba; Chapter 15, “Forecast Uncertainty of the Weighted TAR
Predictor”, by Francesco Giordano and Marcella Niglio; Chapter 16, “Revisiting
Transitions Between Superstatistics”, by Petr Jizba and Martin Prokš; Chapter 17,
“Research on Retrial Queue with Two-Way Communication in a Diffusion
Environment”, by Viacheslav Vavilov.
We wish to thank all the authors for their insights and excellent contributions to
this book. We would like to acknowledge the assistance of all those involved in the
reviewing process of this book, without whose support this could not have been
successfully completed. Finally, we wish to express our thanks to the secretariat and,
Preface xiii
of course, the publishers. It was a great pleasure to work with them in bringing to
life this collective volume.
Yannis DIMOTIKALIS
Crete, Greece
Alex KARAGRIGORIOU
Samos, Greece
Christina PARPOULA
Athens, Greece
Christos H. SKIADAS
Athens, Greece
December 2020
PART 1
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
1
1.1. Introduction
Fraud detection systems are designed to automate and help reduce the manual parts
of a screening/checking process (Phua et al. 2005). Data mining plays an important
role in fraud detection as it is often applied to extract fraudulent behavior profiles
hidden behind large quantities of data and, thus, may be useful in decision support
systems for planning effective audit strategies. Indeed, huge amounts of resources
(to put it bluntly, money) may be recovered from well-targeted audits. This explains
the increasing interest and investments of both governments and fiscal agencies
in intelligent systems for audit planning. The Italian Revenue Agency (hereafter,
IRA) itself has been studying data mining application techniques in order to detect
tax evasion, focusing, for instance, on the tax credit system, supposed to support
investments in disadvantaged areas (de Sisti and Pisani 2007), on fraud related to
credit mechanisms, with regard to value-added tax – a tax that is levied on the price
of a product or service at each stage of production, distribution or sale to the end
consumer, except where a business is the end consumer, which will reclaim this input
value (Basta et al. 2009) and on income indicators audits (Barone et al. 2017).
In this context, all the taxpayers are in some way “unfaithful”, since all of them
have received a tax notice that somehow rectified the tax return they had filed. Thus,
the predictive analysis tool we develop is designed to find patterns in data that may
help tax offices recognize only the riskiest taxpayers’ profiles.
Evidence on data at hand shows that our first model, which is described in detail
later, is able to distinguish the taxpayers who are worthy of closer investigation from
those who are not. 2
However, by defining the class value as a function of the higher due taxes, we
satisfy the need of focusing on the taxpayers who are more likely to be “significant”
tax evaders, but we do not ensure an efficient collection of their tax debt. Indeed, data
shows that as the tax bill increases, the number of coercive collection procedures put
in place also increases. Unfortunately, these procedures are highly inefficient, as they
are able to only collect about 5% of the overall credits claimed against the audited
taxpayers (Italian Court of Auditors 2016). As a result, the tax authorities’ ability to
collect the due taxes may be jeopardized.
1 A tax notice is a formal written act through which tax authorities assess a higher due taxable
income with respect to the declared one.
2 Data analyses are performed using WEKA – the data mining workbench developed at Waikato
University in Hamilton, New Zealand, released under the GNU GPL license.
Data Mining Application Issues in the Taxpayers Selection Process 5
against tax evasion are crucial from the State budget point of view, because public
expenditures (i.e. public services) strictly depend on the amount of public revenue.
Of course, fraud and other incorrect fiscal behaviors may be tackled, even though no
tax collection is guaranteed, in order to reach the maximum tax compliance. Such
extra activities may also be jointly conducted with the Finance Guard or the Public
Prosecutor if tax offenses arise.
Therefore, to tackle our second problem, i.e. to guarantee a certain degree of due
tax collection, a trivial fact that we start from is that a taxpayer with no properties will
not be willing to pay his dues, whereas if he had something to lose (a home or a car
that could be seized), then, if the IRA’s claim is right, it is more probable that he might
reach an agreement with the tax authorities.
The key feature of our procedure is the twofold selection process target, needed to
maximize the IRA’s audit processes’ effectiveness. The methodology we suggest will
soon be validated in real cases i.e. a sample of taxpayers will be selected according to
the classification criteria developed in this chapter and will be subsequently involved
in some audit processes.
1.2.1. Data
Just for descriptive purposes, we can depict the statistical distribution of the
revenues achieved by the businesses in our sample, grouped in classes (in thousands
of euros), in Figure 1.1.
3 The IRA sent a total of 59,269 tax notices concerning fiscal year 2012 to self-employed
individuals allowed to keep simplified registers, so we can manage a quite significant sample.
6 Applied Modeling Techniques and Data Analysis 2
For each taxpayer in the dataset, both his tax notice status and the additional due
taxes (i.e. the additional requested tax amount) are known.
Here comes the first problem that needs to be tackled: the additional due tax is
a numeric attribute which measures the seriousness of the taxpayer’s tax evasion,
whereas our algorithms, as we will show later on, need categorical values in order to
predict. Thus, we cannot directly use the additional due taxes, but we need to define a
class variable and decide both which values it will take and how to map each numeric
value referred to the additional due taxes into such categorical values.
We must define a function f(x) which associates, to each element x in the dataset,
a categorical value that shows its fraud risk degree and represents the class our
first model will try to predict. Of course, a function that labels all the taxpayers in
the dataset as tax evaders would be useless. Thus, a distinction needs to be drawn
between serious tax evasion cases and those that are less relevant. To this purpose,
we somehow follow (Basta et al. 2009) and choose to divide the taxpayers into two
groups, the interesting ones and the not interesting ones, from the tax administration
point of view (to a certain extent, interesting stands for “it might be interesting
for the tax administration to go and check what’s going on ...”), based on two
criteria: profitability (i.e. the ability to identify the most serious cases of tax evasion,
independently from all other factors) and fairness (i.e. the ability to identify the most
serious cases of tax evasion, with respect to the taxpayer’s turnover).
Honest taxpayers are treated as not interesting taxpayers, even though this label
is used to indicate moderate tax evasion cases. We are somehow forced to use this
approximation since we only have data on taxpayers who received a tax notice, and not
Data Mining Application Issues in the Taxpayers Selection Process 7
on taxpayers for which an audit process may have been closed without qualifications,
or may have not even been started.
Therefore, in order to take the profitability issue into account, we define a new
variable, called the tax claim, which represents the higher assessed taxes if the tax
notice stage is still open, or the higher settled taxes if the stage status is definitive. Note
that the higher assessed tax could be different from the higher settled tax, because
the IRA and the taxpayer, while reaching an agreement, can both reconsider their
positions. The tax claim distribution grouped in classes (again, in thousands of euros)
is shown in Figure 1.2.
Figure 1.2. Tax claim distribution. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
The left vertical axis is related to the tax claim distribution, grouped in the classes
shown on the horizontal axis; the right vertical axis, on the contrary, sums up the
monetary tax claim amount that arises from each group (in thousands of euro).
Therefore, as it can easily be seen, the 331 most profitable tax notices (12% of the
total) account for almost half of the tax revenue arising from our dataset.
The fairness criterion is then introduced to address the audit process, even towards
smaller firms (which usually are charged smaller amounts of due income taxes), and
it is useful as it allows the tax authorities to not discriminate against taxpayers on the
basis of their turnover and introduces a deterrent effect which improves the overall tax
compliance.
Therefore, we define another variable, called Z, which takes into account, for each
taxpayer, both his turnover and revenues, and compares them to the tax claim. More
formally, both of the ratios tax claim tax claim
turnover and revenues are computed. Then, the minimum
between these two ratios and 1 is taken. That is, the variable Z value, which thus
ranges from 0 to 1.
8 Applied Modeling Techniques and Data Analysis 2
Now, for both tax claim (TC) and Z, we calculate the 25th percentile (Q1 ), the
median value (Q2 ) and the 75th percentile (Q3 ). We then state that a taxpayer may be
considered interesting if he satisfies one of the following conditions:
Q1 ≤ T C < Q2 and Z ≥ Q3
T C ≥ Q2 and Z ≥ Q2
T C ≥ Q3 and Z < Q2
The three above-mentioned rules can be represented as in Figure 1.3.
Once the population of our dataset is entirely divided into interesting and not
interesting taxpayers, we can see from Table 1.1 that the interesting ones are far more
profitable than the others (tax claim values are in thousands of euros). A machine
learning tool able to distinguish these two kinds of taxpayers fairly well would then
be very useful.
Our first model task will then be that of identifying, with a certain confidence
degree, the taxpayers who are more likely to have evaded (both in absolute terms and
as a percentage of revenues or turnover).
The literature on tax fraud detection, although using different methods and
algorithms, is usually only concerned about this issue, i.e. in finding the best way
to identify the most relevant cases of tax evasion (Bonchi et al. 1999; Wu et al. 2012;
Gonzalez and J.D. Velasquez 2013; de Roux et al. 2018).
There is another crucial issue that has to be taken into account, i.e. the effective
tax authorities’ ability to collect the tax debt arising from the tax notices sent to all of
the unfaithful taxpayers.
Data Mining Application Issues in the Taxpayers Selection Process 9
What happens if a taxpayer does not spontaneously pay the additional tax amount
he is charged? Well, after a while, coercive collection procedures will be deployed
by the tax authorities. However, as we have seen above, these procedures are highly
ineffective, as they only collect about the 5% of the overall credits claimed against the
audited taxpayers.
Indeed, data shows that coercive procedures take place in almost 40% of cases,
although its distribution is not uniform: they are more frequent if the tax bill is high,
as reported in Table 1.2 (again, tax claim values are in thousands of euros).
Table 1.2 is actually a double frequency table, which can be used to investigate the
existing relationship between the two categorical variables, Coercive procedures and
Tax claim (they both take on values that are labels). Recall that given characters X and
Y, X is independent from Y if for all Y values, the relative distribution of X does not
change. Therefore, a quick glance at Table 1.2 shows that Coercive procedures depend
on the values taken by Tax claim.
In a more formal way, following the Openstax (2013) notation, we could also
perform a test of independence for these variables, by using the well-known test
statistic for a test of independence:
(O − E)2
χ2 =
i.j
E
Given the values in Table 1.2, the test would let us reject the hypothesis of the two
variables being independent at a 1% level of significance: therefore, from the data,
there is sufficient evidence to conclude that Coercive procedures are dependent on the
Tax claim level.
It is easy to calculate, from Table 1.2, for each tax claim interval, the total coercive
procedures rate, the tax notices rate and the coercive procedures within that tax claim
interval rate (all of these ratios are depicted in Figure 1.4).
A close look at Figure 1.4 shows that until the tax claim is “low” (less than
C 10,000; please note that the intervals are in thousands of euros), the blue line, i.e.
the percentage of tax notices, is above the purple one, i.e. the percentage of coercive
procedures, while for higher values of tax claim, the blue line is under the purple one.
This is quite strong evidence that coercive procedures are not independent from tax
claim.
As a result, the red line shows that the higher the tax claim, the higher the
percentage of procedures within the tax claim range itself, up to over 70% in the last
and, apparently, most desirable range.
Therefore, with just one model in place, whose task is to recognize interesting
taxpayers, the tax authorities would risk facing many cases of coercive procedures.
Thus their ability to ensure tax collection may be seriously jeopardized.
We therefore need to find a way to discover, among the most interesting taxpayers,
the most solvent ones, the most willing to pay.
Data Mining Application Issues in the Taxpayers Selection Process 11
Figure 1.4. Coercive procedures and tax claim. For a color version of
this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
We can start by observing that a taxpayer with no properties will probably not be
willing to pay his dues. Therefore, a second model only focusing on a few features
indicating whether the taxpayer owned some kind of assets or not is built, in order to
predict if a tax notice will end in an enforced recovery proceeding or not.
Once both models are available, the taxpayer selection process is held in such a
way that undertakings will only be audited if judged worthy by both models.
Our selection strategy needs to take into account two competing demands: on one
hand, tax notices must be profitable, i.e. they have to address serious tax fraud or the
tax evasion phenomena; on the other, tax collectability must be guaranteed in order to
justify all of the tax authorities’ efforts.
To this purpose, we develop two models, both in the form of classification trees:
the first one predicts whether a taxpayer is interesting or not, while the second predicts
the final stage of a tax notice, distinguishing between those ending with an enforced
recovery proceeding and the others, where such enforced recovery proceedings do not
take place.
The first one’s attributes are taken from several datasets run by the IRA and are
related to the taxpayers’ tax returns and their annexes (such as the sector studies), their
properties details, their customers and suppliers lists and their tax notices, whereas the
second one only focuses on a set of features concerning taxpayers’ assets.
12 Applied Modeling Techniques and Data Analysis 2
In the taxpayer selection process, models that are easier to interpret are preferred to
more complex models. Typically, decision trees meet the above requested conditions,
so both of our models take that form.
In both cases, instead of considering just one decision tree, both practical and
theoretical reasons (Breiman 1996) lead us towards a more sophisticated technique,
known as bagging, which stands for bootstrap aggregating, with which many base
classifiers are computed (in our case, many trees).
Moreover, a cost matrix is used while building the models. Indeed, in our context,
to classify an actual not interesting taxpayer as interesting is a much more serious error
than that of classifying as an actual interesting taxpayer as not interesting, based on the
fact that, generally, tax offices’ human resources are barely sufficient to perform all of
the audits they are assigned. Therefore, as long as offices audit interesting taxpayers,
everything is fine, even though many interesting taxpayers may not be considered. In
the same way, to predict that a tax notice will not end in a coercive procedure when
it actually does, is a much more serious error than that of classifying a tax notice
final stage the other way round. Therefore, different weights are given to different
misclassification errors.
Finally, Ross Quinlan’s C4.5 decision tree algorithm is used to build the base
classifiers within the bagging process.
1.3. Results
Our first model predicts, on the basis of the available features, 415 taxpayers to
be interesting (i.e. 15.5% of the entire test set), with a precision rate of about 80%, as
shown in Figure 1.6.
In terms of tax claim amounts, the model appears to perform quite well, since the
selected taxpayers’ average due additional taxes amounts to C 49,094, whereas the
average on the entire test set is equal to C 22,339.
So far, we have shown that our model, on average, is able to distinguish serious tax
evasion phenomena from the less significant ones. But what about the tax collection
issue? To deal with this matter, we should investigate what kind of taxpayers we have
just selected. For this purpose, Table 1.3 shows that the majority of the taxpayers, the
model would select, would also be subject to coercive procedures (as we can see, the
sum of the values of each column is 100%).
Thus, many of the selected taxpayers have a debt payment issue. This jeopardizes
the overall selection process efficiency and effectiveness. As pointed out by the Italian
Court of Auditors, coercive procedures, on average, are able to collect only about 5%
of the overall claimed credits.
14 Applied Modeling Techniques and Data Analysis 2
To evaluate the problem extent, we can replace the actual tax claim value
corresponding to the problematic taxpayers with the estimated collectable tax, which
is equal to the tax claim multiplied by a discount factor of 95%, and compare the two
scenarios, as in Figures 1.7 and 1.8, where we depict both the total tax claim and the
average tax claim arising from the taxpayers’ notices in the entire test set.
Figure 1.7. Total tax claim and discounted tax claim. For a color version
of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Taxpayers are ordered, from left to right, according to their probability of being
interesting, as calculated by our model. Figure 1.7, for instance, depicts the cumulative
tax claim charged up to a certain taxpayer: the red line values refer to the additional
taxes requested with the tax notices, while the black line is drawn by considering
the discounted values. The dashed vertical line indicates the levels corresponding to
the last selected taxpayer according to the model (in our case, the 415th). Recall that
when associating a class label with a record, the model also provides a probability,
which highlights how confident the model is about its own prediction. Therefore, to
a certain extent, it sets a ranking among taxpayers, which we can exploit to draw
Figures 1.7 and 1.8. As we can easily observe, the overall tax claim charged to the
selected taxpayers plummets from C 20 million to C 5 million, and the average tax
claim, depicted in Figure 1.8, from C 49,000 to C 12,000. Thus, the selection process,
which relied on our data mining model and at first sight seemed to be very efficient,
shows some important flaws that we need to face. In fact, tax collectability is not
adequately guaranteed.
Data Mining Application Issues in the Taxpayers Selection Process 15
Figure 1.8. Average total tax claim and discounted tax claim. For a
color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
A second model may then help us by predicting which taxpayers would not be
subject to coercive procedures, by focusing on a set of features concerning their assets.
Again, with a precision rate of about 80%, as shown in Figure 1.9, the model
appears to be successful.
This second model could be useful on our end, even though it may have some
caveats. First, most of the taxpayers that the model classifies as people that will not
face a coercive procedure are also not interesting, as shown in Table 1.4. Again, the
sum of the values of each column is 100%.
In fact, this second model’s performance in terms of tax claim appears to have
worsened with respect to the first, since the no procedure taxpayers’ average due
additional tax, calculated on the first 415 taxpayers (according to the ranking set by
this model, which is, obviously, dramatically different from the one set by the first
model we have seen), is equal to C 20,388. However, the average collectable tax claim
is equal to C 13,493, which is a little bit better than the one we have seen before.
We point out that throughout this chapter, we have compared sets of selected
taxpayers with the same cardinality, for two kinds of considerations: first, tax
authorities, reasonably, have a fixed budget of audits to perform, so comparisons
between models should be done subject to a given number of audits; second, for
comparability reasons, since smaller sets tend to perform more (see Figure 1.8, where
the average tax claim decreases while the number of selected taxpayers increases).
Therefore, in this second model we have developed, the high rate of not interesting
taxpayers, on one hand, causes a drop in the average tax claim (from 49,000 to
20,000), but, on the other, it contributes to the slight enhancement of the discounted
average tax claim (from C 12,000 to C 13,000), since only a few of the not interesting
taxpayers pass through a coercive procedure. Figure 1.10 compares, for each number
of selected taxpayers, the different coercive procedures rates arising from the two
models.
What we can do, then, is use the two models “together”. For instance, we could
exploit the first model in order to sort the taxpayers eligible to be selected and the
second one to discard the ones likely to be subject to coercive procedures.
In such a way, if we imagine that we select our 415 taxpayers again, on one
hand, we would select both interesting and not interesting taxpayers (only if the
second model had predicted that no interesting taxpayers would go through a coercive
Data Mining Application Issues in the Taxpayers Selection Process 17
procedure, we would have selected only interesting taxpayers), but, on the other, we
would also select the taxpayers who are more likely to pay their tax debts.
This is just an example and it is not the only way we can combine the two models.
Indeed, there is space for policymakers to exploit the two models in different ways,
depending on the kind of tradeoff choices they may want to reach, concerning the two
goals of the audit process: its profitability and its tax collectability. For instance, a
selection process could only be targeted towards interesting taxpayers and taxpayers
without payment issues.
Figures 1.11–1.13 can shed some light on our ensemble model’s performance.
As usual, the dashed vertical line shows the values corresponding to the number of
taxpayers we wish to select.
In our case, thus, with the ensemble model, we would claim, on average, C 26,219
from the selected taxpayers and we would hopefully collect, on average, C 17,542
from each of them, of whom only 25% are predicted to incur in coercive procedures.
18 Applied Modeling Techniques and Data Analysis 2
Figure 1.11. Total tax claim. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
Figure 1.12. Average tax claim. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
Data Mining Application Issues in the Taxpayers Selection Process 19
In a hypothetical selection process, the winning strategy would then be to use the
ensemble model, since it maximizes the collectable tax claim.
What we might be interested in, is to know whether the ensemble model is always
the best option. This may depend on the coercive procedures’ rate that characterizes
the two sets of auditable taxpayers selected by the two models. Unfortunately, once
we build the models, before applying them to the test set, we do not exactly know
what kind of taxpayers will be selected. Therefore, we do not even know these rates;
however, we can consider them as unknown parameters, say θ and θ . From this point
of view, the rates we have observed within the two selected sets can be considered as
two values of such parameters, say θ (70%) and θ
(25%) (see Table 1.5).
To satisfy our interest, we should depict the two models’ behavior as a function
of the unknown parameters, θ and θ , respectively; that is, we should calculate the
expected tax revenues amounts for any value of θ and θ . Unfortunately, this cannot
be done. To understand why, suppose that for both models, only one of the selected
taxpayers turns out to be subject to coercive procedures. If this taxpayer’s debt is high,
the amount of money that is difficult to collect would be high, but if his debt is low,
then the uncollected tax would also be low.
What can be done, instead, is to calculate, for any given value of θ and θ ,
the maximum and minimum collectable taxes arising from each model. Indeed, the
maximum collectable taxes scenario is the one where coercive procedures are first
applied to the less unfaithful taxpayers, while the minimum collectable taxes scenario
20 Applied Modeling Techniques and Data Analysis 2
refers to a situation in which coercive procedures are first applied to the most
unfaithful taxpayers, as shown in Figure 1.14.
Figure 1.14. Models’ maximum and minimum collected tax. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
The first model’s maximum and minimum values are represented by the red and
orange lines, while the ensemble model’s are the blue and purple ones. Any point
within the red and orange lines represents a possible outcome for the first model and
any point within the blue and purple lines represents one possible outcome for the
ensemble model. For instance, points A and B represent the outcomes of our models
(the first and the ensemble, respectively), given our training and test sets.
Having to deal with two areas means that the models’ behavior is determined
not only by θ and θ , but also by the kind of taxpayers that go through a coercive
procedure. If we could shrink the areas between the red and orange lines and the blue
and purple ones, we could be put in a better shape.
How could we do this? Well, if we turn back to points A and B in Figure 1.14, and
we draw two dashed vertical lines from them, we can see that the first is nearer to the
minimum line of its model (since line AD is shorter than line CA), while the other is
nearer to the maximum one (since line EB is shorter than line BF ).
If we assume that, for each value of θ and θ and for each corresponding points
CA
A (and, also, lines AD and CA) and B (and, also, lines EB and BF ), ratios AD
Data Mining Application Issues in the Taxpayers Selection Process 21
BF
are always the same and also ratios EB , we could draw a single line for each model,
which would only be a function of θ and θ ’, respectively, as shown in Figure 1.15.
Based on our data, these functions intersect at two points, where θ and θ are,
respectively, equal to α and β. Moreover:
– γf irst (0) > γens (0), i.e. if all taxpayers were to pay their debts, the first model
would be better than the ensemble one.
– γf irst (1) > γens (1) since if all taxpayers were to undergo a coercive procedure,
these functions’ values would be 0.05 times γf irst (0) and γens (0), respectively (recall
that in the case of coercive procedures, the collectable tax is assumed to be equal to
the tax claim multiplied by a discount factor of 95%).
– γf irst (θ ) < γens (θ ), for α < θ < φ and θ < α
– γf irst (θ ) < γens (θ ), for β < θ < φ and α < θ < β
– γf irst (θ ) > γens (θ ), for β < θ < φ and θ > φ
22 Applied Modeling Techniques and Data Analysis 2
– There is a ψ such that γf irst (θ ) ≥ γens (θ ), for θ ≤ ψ and for any θ .
– There is a φ such that γf irst (θ ) ≥ γens (θ ), for θ ≥ φ and for any θ .
Figure 1.16 depicts, in a θ x θ space, the regions where the two models represent
the best choice (the dark gray region is where the first model is the best option, while
in the light gray one, the ensemble model is better).
Figure 1.16. Values of θ and θ determining the best model. For a
color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
In the three white regions, the exact combinations of θ and θ that guarantee
whether a model is better than the other, depend on the relative slopes of γf irst (θ )
and γens (θ ).
1.4. Discussion
The ensemble model seems to tackle both of the above-mentioned issues quite
well.
Given that the whole test set’s average claim is C 22,339, while the average
collectable taxes are equal to C 10,194, our procedure increases the first figure by
1.17% ( C 26,219) and the second by 72% (C 17,542).
With respect to the scenario in which only the first model is put in place, by
developing the twofold selection process as described above, the presence of coercive
procedures dramatically plummets from 70% to 25%. Moreover, the selection of not
interesting taxpayers, while causing a drop in the average tax claim (from C 49,094
to C 26,219), is more than compensated by the procedure’s capability of efficiently
collecting the additional taxes charged to the selected taxpayers (from C 12,187 to
C 17,542).
Table 1.5 summarizes the most significant results reached by the three models that
have been built: the first model looks for interesting taxpayers; the second model is
in search of solvent taxpayers; and the third model, called the ensemble model, is a
combination of the first two. To better understand the figures referred to the models,
the same information set is shown, related to the entire test set.
This result can be generalized, and the best selection strategy depends on our
estimates of θ and θ in the sets of the selected taxpayers.
1.5. Conclusion
The data analysis framework designed in this chapter gives an effective learning
scheme aimed at improving the IRA’s ability to identify non-compliant taxpayers.
24 Applied Modeling Techniques and Data Analysis 2
It involves two C4.5 decision trees, predicting two different class values, based on
two different predictive attribute sets. That is, the first model is built to identify the
most likely non-compliant taxpayers, while the second one identifies the ones who are
more likely going to pay the additional tax bill. This twofold selection process target
is requested in order to maximize the overall audit effectiveness, so businesses will be
audited, only if suggested by both models.
Tax evasion is a topic that has been studied extensively in the past (starting from
Allingham and Sandmo 1972) and it is still a hot topic. Most models are usually
mainly concerned with finding the best way to identify the most relevant cases of tax
evasion. In this chapter, we go further, analyzing the overall effectiveness of the tax
authorities activity, which has to take into account both the tax notices’ profitability
and the collectability of the additional requested taxes.
The latter issue cannot be tackled without knowing the final stage of the tax notices.
In fact, it is very difficult to have this kind of information at hand, even because a tax
notice can come to an end years after it was sent to the taxpayer (especially when a
tax court is addressed).
By ignoring the collectability aspect of the audit process, the selection processes
may not be correctly targeted, or at least, may not satisfy the tax authorities’ needs
i.e. relevant evasion phenomena may be discovered, but only little money may be
collected.
Of course, the fight against tax evasion is not only a matter of collecting money,
but should also have some other purposes, such as promoting taxpayers’ compliant
behavior. Nonetheless, efficient tax bill collection is crucial from the state budget point
of view, because public expenditures are strictly connected to public revenues.
The methodology we suggest here will soon be validated in real cases i.e. a sample
of taxpayers will be selected according to the classification criteria developed in this
chapter and will subsequently be involved in some audit processes.
1.6. References
Agenzia delle Entrate e Ministero dell’Economia e delle Finanze (2018). Convenzione triennale
per gli esercizi 2018–2020 [Online]. Available at: https://www.finanze.it/export/sites/finanze/
.galleries/Documenti/Varie/DF_CONVENZIONE-MEF_ADE_2018.2020_FIRMATA-28_
11_2018.pdf.
Allingham, M.G. and Sandmo, A. (1972). Income tax evasion: A theoretical analysis. Journal
of Public Economics, I, 323–338.
Barone, M., Pisani, S., Spingola, A. (2017). Data mining application issues in income indicators
audits. Argomenti di discussione – Agenzia delle Entrate, 2.
Data Mining Application Issues in the Taxpayers Selection Process 25
Basta, S., Fassetti, F., Guarascio, M., Manco, G., Giannotti, F., Pedreschi, D., Spinsanti, L.,
Papi, G., Pisani, S. (2009). High quality true positive prediction for fiscal fraud detection to
regressive conditional. 2009 IEEE International Conference on Data Mining Workshops.
Bonchi, F., Giannotti, F., Mainetto, G., Pedreschi, D. (1999). A classification-based
methodology for planning auditing strategies in fraud detection. Proc. of SIGKDD99,
175–184.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Corte dei Conti (2016). Il sistema della riscossione dei tributi erariali al 2015. Deliberazione 20
ottobre 2016, 11/2016/G.
Gonzalez, P.C. and Velasquez, J.D. (2013). Characterization and detection of taxpayers with
false invoices using data mining techniques. Expert Systems with Applications, 40(5),
1427–1436.
OpenStax (2013). Introductory Statistics. OpenStax, 19 September [Online]. Available at:
http://cnx.org/content/col11562/latest/.
Phua, C., Lee, V., Smith, K., Gayler, R. (2005). A comprehensive survey of data mining-based
fraud detection research. Artificial Intelligence Review, submitted.
de Roux, D., Perez, B., Moreno, A., del Pilar Villamil, M., Figueroa, C. (2018). Tax fraud
detection for under-reporting declarations using an unsupervised machine learning approach.
KDD 2018, 215–222.
de Sisti, P. and Pisani, S. (2007). Data mining e analisi del rischio di frode fiscale: il caso dei
crediti d’imposta. Documenti di lavoro dell’Ufficio Studi – Agenzia delle Entrate, 4.
Wu, R., Ou, C.S., Lin, H., Chang, S., Yen, D. (2012). Using data mining technique to enhance
tax evasion detection performance. Expert Systems with Applications, 39, 8769–8777.
2
2.1. Introduction
The history of implied volatility can be traced back at least to Latané and
Rendleman (1976), where it appeared under the name “implied standard deviation”,
i.e. the standard deviation of asset returns, which are implied in actual European call
option prices when investors price options according to the Black–Scholes model. For
a recent review of different approaches to determine implied volatility, see Orlando
and Taglialatela (2017). To give exact definitions, we use Pagliarani and Pascucci
(2017).
In order to briefly explain our contribution to the subject, we will introduce some
notations. Let d ≥ 2 be a positive integer, let T0 > 0 be a time horizon, let T ∈ (0, T0 ],
and let { Zt : 0 ≤ t ≤ T } be a continuous Rd -valued adapted Markov stochastic process
on a probability space (Ω, F, P) with a filtration { Ft : 0 ≤ t ≤ T }. Assume that the first
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
28 Applied Modeling Techniques and Data Analysis 2
coordinate St of the process Zt represents the risk-neutral price of a financial asset, and
the d − 1 remaining coordinates Yt represent stochastic factors in a market with zero
interest rate and no dividends.
On one hand, we have the time t no-arbitrage price of a European call option with
strike price K > 0 and maturity T is Ct,T,K = v(t, St , Yt , T, K), where
and where (t, s, y) ∈ [0, T ] × (0, ∞) × Rd−1 . We change to logarithmic variables and
define the option price by
u(t, x, y, T, k) = v(t, ex , y, T, ek ),
where x is the time t log price of the underlying asset, k is the log strike of the option,
and (t, x, y) ∈ [0, T ] × R × Rd−1.
x k 1 σ 2τ
u (σ , τ , x, k) = e N (d+ ) − e N (d− ),
BS
where d± = √ x−k± ,
σ τ 2
[2.1]
R EMARK 2.1.– In the literature on option pricing, there are concepts of model implied
volatility and market implied volatility. If the right-hand side of the above equation,
i.e. u(t, x, y, T, k), refers to the European option price under a given model, then σ =
σ (t, x, y, T, k) is called the model implied volatility. If u(t, x, y, T, k) is replaced by the
observed market option price, then we have the so-called market implied volatility.
Here, we work with the (model) implied volatility.
Pagliarani and Pascucci (2012) derived a fully explicit approximation for the
implied volatility at any given order N ≥ 0 for the scalar case. Lorig et al. (2017)
extended this result to the multidimensional case. Denote the above approximation by
σ N (t, x, y, T, k).
Asymptotics of Implied Volatility in the Gatheral Double Stochastic Volatility Model 29
Pagliarani and Pascucci (2017) proved that under some mild conditions, the
following limits exist:
∂q ∂m ∂q ∂m
σ N (t, x) = lim σ N (t, x, T, k),
∂ T q ∂ km (T,k)→(t,x) ∂ T q ∂ km
where the limit is taken as (T, k) approaches (t, x) within the parabolic region
√
Pλ = { (T, k) ∈ (0, T0 ] × R : |x − k| ≤ λ T − t }
1 ∂q ∂m
σ (t, x, y, T, k) = ∑ q!m! ∂ T
σ (t, x)(T − t)q(k − x)m
q ∂ km N
2q+m≤N [2.2]
+ o((T − t)N/2 + |k − x|N ),
where the Wiener processes Wti are correlated: E[WsiWt j ] = ρi j min{s,t}, and where
parameters κ1 , κ2 , θ , ξ1 , ξ2 , α1 , α2 are the positive real numbers. Note that while S0 is
observable in the market, ν0 and ν0 are usually not observable and may be calibrated
from the market data on options.
In this model, with rate κ1 the variance νt mean reverts to a level νt which itself
moves over time to the level θ at a (usually slower rate) κ2 , hence the name double-
mean-reverting. Here, parameters α1 , α2 ∈ [1/2, 1]. In the case of α1 = α2 = 1/2,
we have the so-called double Heston model; in the case of α1 = α2 = 1, the double
lognormal model; and finally, in the general case, the double CEV model (Gatheral
2008).
The DMR model can be consistently calibrated to both the SPX options and the
VIX options. However, due to the lack of an explicit formula for both the European
30 Applied Modeling Techniques and Data Analysis 2
option price and the implied volatility, the calibration is usually done using time-
consuming methods like the Monte Carlo simulation or the finite difference method.
In this chapter, we provide an explicit solution to the implied volatility under this
model.
In section 2.2, we formulate three theorems that give the asymptotic expansions of
implied volatility of orders 0, 1 and 2. Detailed proof of Theorems 2.1 and 2.2 as well
as a short proof of Theorem 2.3 without technicalities are given in section 2.3.
Put xt = ln St .
T HEOREM 2.1.– The asymptotic expansion of order 0 of the implied volatility has the
form
√
σ (t, T ) = ν0 + o(1).
T HEOREM 2.2.– The asymptotic expansion of order 1 of the implied volatility has the
form
√ 1 √
σ (t, x0 , ν0 , ν0 ; T, k) = ν0 + ρ12 ξ1 ν0α1 −1 (k − x0 ) + o( T − t + |k − x0|).
4
T HEOREM 2.3.– The asymptotic expansion of order 2 of the implied volatility has the
form
√ 1
σ (t, x0 , ν0 , ν0 ; T, k) =ν0 + ρ12 ξ1 ν0α1 −1 (k − x0 )
4
3 2 2 2α1 −5/2
− ρ12 ξ1 ν0 (k − x0 )2
16
1 √ 2 2 2α1 −3/2
+ [8κ1 (ν0 − ν0 )/ ν0 + 2ρ12ξ1 ν0α1 + 3ρ12 ξ1 ν0 ](T − t)
32
+ o(T − t + (k − x0)2 ).
[2.4]
2.3. Proofs
where the terms on the right-hand side of equation [2.5] are the values of the functions
(x,y,z)
σn (t, x, y, z; T, k) given by (Pagliarani and Pascucci 2017, Equation 3.15), when
x = x, y = y, and z = z. (Pagliarani and Pascucci 2017, Equation 3.15) is recursive, and
we define the above functions for n = 0 first.
Let uBS (σ ; τ , x, k) be the Black–Scholes price [2.1]. Pagliarani and Pascucci (2017,
Equation 3.15) has the form
(z) (z) ∂ BS (z) −1
σn (t, x, y, z; T, k) = un u σ0
∂σ
1 N (z) (z) (z)
− ∑ Bn,h(1!σ1 , 2!σ2 , . . . , (n − h + 1)!σn−h+1) [2.6]
n! h=2
∂ h BS (z) ∂ BS (z) −1
× u σ u σ , n ≥ 1.
∂σh 0
∂σ 0
For the sake of simplicity, we have omitted the last three arguments of the function
(z) (z)
uBS and all arguments of the functions σi , 0 ≤ i ≤ n and un .
(z)
To define un , consider the differential operator
(z)
3
∂2 3
∂
An (z) = ∑ ai j,n (z) + ∑ ai,n (z)
∂ zi ∂ z j i=1 ∂ zi
,
i, j=1
where
and
are the terms of the Taylor expansions of the functions ai j (z) and ai (z) around the
point z.
Following Pagliarani and Pascucci (2017), define the vector m(z) (t, s) by
(z)
mi (t, s) = (s − t)ai(z), 1 ≤ i ≤ 3,
(z)
and the operator Gn (t, s, z) by
(z) (z)
Gn (t, s, z) = An (z − z + m(z) (t, s) + C(z) (t, s)∇z ). [2.8]
Asymptotics of Implied Volatility in the Gatheral Double Stochastic Volatility Model 33
(z)
The function un in equation [2.6] is defined by (Pagliarani and Pascucci 2017,
Equation D.1)
(z) (z) (z)
un (t, z; T, k) = Ln (t, T, z)uBS σ0 ; τ , x, k . [2.9]
(z)
Here, we wrote all the arguments of the function uBS σ0 ; τ , x, k to show that it
does not depend on y and z.
(z)
where the operator L˜n (t, T, z) is given by (Lorig et al. 2017, Equation 3.14) as
n T T T
(z) (z) (z)
L˜n (t, T, z) = ∑ ··· ∑ Gi1 (t, s1 , z) · · · Gih−1 (t, sh−1 , z)
h=1 t s1 sh−1 i∈I
n,h [2.10]
(z) (z)
× a11,ih (z − z + m (t, sh ) + C (t, sh )∇z ) dsh · · · ds1 .
It is well-known that
2
∂ BS (z)
(z) ∂ ∂
(z)
u σ0 ; τ , x, k = σ0 τ − u BS
σ ; τ , x, k .
∂σ ∂ x2 ∂ x 0
The first term on the right-hand side of equation [2.6] takes the form of (Lorig
et al. 2017, Equation 3.13)
2
(z) ∂ BS (z) −1 (z) ∂ ∂
(z)
un u σ0 = L˜n (t, T, z) − u BS
σ ; τ , x, k
∂σ ∂ x2 ∂ x 0
2 −1
(z) ∂ ∂ (z)
× σ0 τ − u BS
σ ; τ , x, k .
∂ x2 ∂ x 0
34 Applied Modeling Techniques and Data Analysis 2
(n)
It follows that there exist functions χm (z;t, T ) such that
−1 2
(z) ∂ BS (z) (n) ∂m ∂ ∂
(z)
un u σ0 =∑ χm (z;t, T ) − u BS
σ ; τ , x, k
∂σ m ∂ xm ∂ x2 ∂ x 0
2 −1
∂ ∂ (z)
× − u BS
σ ; τ , x, k .
∂ x2 ∂ x 0
(z)
(see Lorig et al. (2017), Equation 3.15). This is because the function uBS σ0 ; τ , x, k
does not depend on y and z.
2 −1 m
∂m ∂2 ∂ ∂ ∂ 1
− u (σ0 )
BS
− uBS (σ0 ) = − √ Hm (ζ ),
∂ xm ∂ x2 ∂ x ∂ x2 ∂ x σ0 2τ
[2.11]
where
x − k − σ02τ /2
ζ= √ ,
σ0 2τ
and where
∂m
Hm (ζ ) = (−1)m exp(ζ 2 ) exp(−ζ 2 )
∂ζm
is the mth Hermite polynomial.
We must still calculate the expression in the third line of equation [2.6] for h ≥ 2
(it is equal to 1 when h = 1). From Lorig et al. (2017), Proposition 3.5, we have
(1) y − ν0 (T − t)κ1(ν0 − ν0 )
χ0 (z;t, T ) = √ + √ ,
2 ν0 4 ν0
α +1/2
(1) (T − t)ρ12ξ1 ν0 1
χ1 (z;t, T ) = √ .
4 ν0
(z) y − ν0 (T − t)κ1(ν0 − ν0 )
σ1 (t, z; T, k) = √ + √
2 ν0 4 ν0
α +1/2
ρ12 ξ1 ν0 1 (x − k − σ02(T − t)/2)
− 3/2
.
4ν0
Then,
and
√ (T − t)κ1(z − y) ρ12 ξ1 yα1 +1/2 (x − k − σ02(T − t)/2)
σ 1 (t, z; T, k) = ν0 + √ − .
4 ν0 4ν
3/2
0
The sets I2,h are I2,1 = {(2)}, I2,2 = {(1, 1)}. We have a11,2 (x, y, z) = 0. It follows
that equation [2.10] with n = 2 includes only summation over the set I2,2 and takes the
form
T T
(z) (z)
L˜2 (t, T, z) = G1 (t,t1 , z)
t t1
(z)
While calculating the operator G1 (t,t1 , z) using equation [2.8], we need to
calculate only the coefficients of the three partial derivatives with respect to the
variable x. We obtain
α +1/2 ∂
1 3
(z)
G1 (t,t1 , z) = (t1 − t)ρ12ξ1 ν0 1
2 ∂ x3
α +1/2 ∂
1 1 1 2
+ (y − ν0) + (t1 − t)κ1 (ν0 − ν0 ) − (t1 − t)ρ12 ξ1 ν0 1
2 2 4 ∂ x2
1 ∂
− [(y − ν0 ) + (t1 − t)κ1(ν0 − ν0 )] + · · · .
2 ∂x
The following integrals are important for calculations:
T T
1
(t1 − t)(t2 − t) dt2 dt1 = (T − t)4,
t t1 8
T T
1
(t1 − t) dt2 dt1 = (T − t)3,
t t1 6
T T
1
(t2 − t) dt2 dt1 = (T − t)3,
t t1 3
T T
1
dt2 dt1 = (T − t)2.
t t1 2
Asymptotics of Implied Volatility in the Gatheral Double Stochastic Volatility Model 37
(z)
The operator L˜2 (t, T, z) takes the form
2 2 2α1 +1 ∂
1 4
(z)
L˜2 (t, T, z) = (T − t)4 ρ12 ξ1 ν0
32 ∂ x4
1 α +1/2
+ [2(T − t)4 ρ12 ξ1 ν0 1 κ1 (ν0 − ν0 )
32
α +1/2 2 2 2α1 +1 ∂3
+ 4(T − t)3 ρ12 ξ1 ν0 1 (y − ν0) − (T − t)4ρ12 ξ1 ν0 ]
∂ x3
1 α +1/2
+ [(T − t)4 κ12 (ν0 − ν0 )2 − 2(T − t)4 ρ12 ξ1 ν0 1 κ1 (ν0 − ν0 )
32
α +1/2
+ 4(T − t)3 κ1 (ν0 − ν0 )(y − ν0 ) − 4(T − t)3 ρ12 ξ1 ν0 1 (y − ν0)
∂2
+ 4(T − t)2 (y − ν0)2 ]
∂ x2
1
− [(T − t)4 κ12 (ν0 − ν0 )2 + 4(T − t)3 κ1 (ν0 − ν0 )(y − ν0 )
32
∂
+ 4(T − t)2 (y − ν0)2 ] + . . . .
∂x
Calculation of the first term on the right-hand side of equation [2.13] using
equation [2.11] may be left to the reader.
Next, we calculate the left-hand side of equation [2.12] for h = 2. Using the
Hermite polynomials H0 (ζ ) = 1, H1 (ζ ) = 2ζ and H2 (ζ ) = 4ζ 2 − 2, we obtain
−1
∂ 2 BS ∂ BS
u (σ 0 ) u (σ 0 ) = 2(T − t)ζ + 2σ0−1ζ 2 .
∂σ 2 ∂σ
Combining everything together, we obtain the formula for σ 2 (t, x, y, z; T, k)
√ κ1 (ν0 − ν0 ) 1
σ 2 (t, x, y, z; T, k) = ν0 + √ (T − t) − ρ12 ξ1 ν0α1 −1 (x0 − k)
4 ν0 4
1 3 2 2 2α1 −2
+ ρ12 ξ1 ν0α1 (T − t) − ρ12 ξ1 ν0 (x0 − k)2 [2.14]
16 32
3 2 2 2α1 −3/2
+ ρ12 ξ1 ν0 (T − t) + · · · ,
32
where the ellipsis denotes the terms satisfying the following condition: the limits of
the term, its first partial derivative with respect to T and its first two partial derivatives
with respect to k as (T, k) approaches (t, x) within Pλ , are all equal to 0.
On the right-hand side of equation [2.14], the first term, the partial derivatives
with respect to T of the second, fourth and sixth terms, the first partial derivative
38 Applied Modeling Techniques and Data Analysis 2
with respect to k of the third term, and the second partial derivative with respect to k
of the fifth term give nonzero contributions to the right-hand side of the asymptotic
expansion [2.4].
2.4. References
Gatheral, J. (2008). Consistent modelling of SPX and VIX options. The Fifth World Congress
of the Bachelier Finance Society, London.
Latané, H.A. and Rendleman Jr., R.J. (1976). Standard deviations of stock price ratios implied
in option prices. J. Finance, 31(2), 369–381.
Lorig, M., Pagliarani, S., Pascucci, A. (2017). Explicit implied volatilities for multifactor
local-stochastic volatility models. Math. Finance, 27(3), 926–960.
Orlando, G. and Taglialatela, G. (2017). A review on implied volatility calculation. J. Comput.
Appl. Math., 320, 202–220.
Pagliarani, S. and Pascucci, A. (2012). Analytical approximation of the transition density in a
local volatility model. Cent. Eur. J. Math., 10(1), 250–270.
Pagliarani, S. and Pascucci, A. (2017). The exact Taylor formula of the implied volatility.
Finance Stoch., 21(3), 661–718.
3
We will consider two insurance models with dividend payments. The first one is
the Cramér–Lundberg model with exponentially distributed claims. We will study a
barrier dividend strategy with the Parisian implementation delay. This means that if
the company surplus stays a prescribed time interval h above the barrier, the overshoot
is immediately paid as dividend. The expected discounted dividends paid before the
Parisian ruin are chosen as the objective function. Numerical results are provided.
The second model is the dual Cramér–Lundberg model with exponentially distributed
gains. Instead of barrier strategy, we deal with a threshold one, meaning that dividends
are paid with a constant rate as long as the surplus stays above the threshold.
3.1. Introduction
It is well known that insurance models are of the input–output type. We have to
specify the premiums inflow P (t) and claim payments to customers (outflow) S(t), as
well as the planning horizon T ≤ ∞. Thus, the company surplus (capital or reserve)
X(t) at time t ≤ T has the form X(t) = x + P (t) − S(t). Here, x is the initial
surplus. To accomplish the optimization of the insurance company performance, we
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
40 Applied Modeling Techniques and Data Analysis 2
need to introduce the set of feasible controls and an objective function. It is possible
to use different objective functions (criteria, targets or risk measures) in order to
evaluate an insurance company’s performance. The most popular one in non-life
insurance (since 1903) was the company ruin probability for the classical (collective
risk) Cramér–Lundberg model. In other words, the main goal was to achieve the high
reliability of the company. In practice, it turned out that the negative surplus level
may not always lead to bankruptcy, since the company may use, for example, a bank
loan to avoid insolvency. Therefore, there were defined and studied “Absolute ruins”,
“Parisian ruins”, as well as, “Omega models”, in the framework of reliability approach
(for the definitions, see Bulinskaya (2017) and the references therein). New problems
have arisen in actuarial sciences during the last 20 years. This period is characterized
by the interplay of insurance and finance, the unification of reliability and the cost
approaches (see, for example, Bulinskaya (2017)), as well as the consideration of
complex systems. Sophisticated mathematical tools are used for the analysis and
optimization of insurance systems including dividends, reinsurance and investment.
A large number of papers have been devoted to dividends payment (see, for
example, Avanzi (2009), Albrecher and Thonhauser (2009) for the survey of the results
published before 2009). Expected discounted dividends paid before ruin are usually
taken as the objective function (see the classical textbooks by Bühlman (1970) and
Gerber (1979), as well as the paper by Sethi et al. (1991)). The barrier strategy is the
most popular, although in Azcue and Muler (2005), it was established that the optimal
dividend strategy is not always the barrier strategy. Many ramifications of this strategy
were proposed (see, for example, Drekic et al. (2018) and the references therein).
Below, we investigate two models with dividends. In section 3.2, we treat the
classical Cramér–Lundberg insurance model with exponential claims and the barrier
dividend strategy generalizing those introduced in Dassios and Wu (2009) and
Bulinskaya and Shigida (2018). The attention is focused on the calculation of the
objective function and simulation. In section 3.3, we study the dual Cramér–Lundberg
model with exponential gains and the threshold dividend strategy. Integro-differential
equations for the expected discounted dividends are established. Using the explicit
form of the objective function, we obtained the optimal threshold. Section 3.4 contains
the conclusion and further research directions.
New Dividend Strategies 41
3.2. Model 1
Nt
Xt = X0 + ct − Ci , t ≥ 0, [3.1]
i=1
We will use the fact that Xt has independent increments and is translation
invariant; therefore, the strong Markov property is applicable. Also, Xt − EXt is a
martingale (because it is a process with independent increments whose mean value is
constant), and the optimal stopping theorem is applicable as well.
λ
c≤
β
holds, then for any initial capital x > 0, ruin happens with probability 1.
Therefore, further on, we suppose the net profit condition c > βλ to be fulfilled.
We also need some results that were established in Bulinskaya and Shigida (2019).
Let r > 0 be the force of interest, and vr+ and vr− be the positive and negative roots
of the equation, respectively
β
−r + cvr + λ − 1 = 0.
β + vr
42 Applied Modeling Techniques and Data Analysis 2
Let Ui be the ith excursion of the process Xt above l1 and Vi be the ith excursion
below l1 . All of these random variables are independent (by the strong Markov
property). If x ≥ l1 , U1 has a distribution different from that of other {Ui }∞ i=2 , which
are identically distributed, so are {Vi }∞ i=1 . If x < l1 , V1 has a distribution different
from that of other {Vi }∞ ∞
i=2 , which are identically distributed, so are {Ui }i=1 . In any
case, let p1 be the density of U2 and p2 be the density of V2 (p1 and p2 do not depend
on l1 ). If x ≥ l1 , let g1x−l1 be the density of U1 (this way, g1x does not depend on l1 ),
otherwise let g2x−l1 be the density of V1 (g2x also does not depend on l1 ).
For x < l1
vr+ − vr−
Ee−rτx,l2 1{τx,l <Tx,l1 ,d } = + −
2
(β + vr+ )evr (l2 −l1 ) − (β + vr− )evr (l2 −l1 )
+ −
βevr (l1 −l2 ) βevr (l1 −l2 )
× −
β + vr+ β + vr−
d −rs x−l1
e g (s) ds
× + −
0 2 .
βevr (l1 −l2 ) vr (l1 −l2 ) + − d
β+vr+
− βe β+v− − evr (l1 −l2 ) − evr (l1 −l2 ) 0 e−rs p2 (s) ds
r
where
+ −
βevr (l1 −l0 ) − βevr (l1 −l0 )
A=
(β + vr+ )evr (l1 −l0 ) − (β + vr− )evr (l1 −l0 )
+ −
vr+ − vr− β β
+ × −
β + vr+ β + vr−
+ −
(β + vr+ )evr (l1 −l0 ) − (β + vr− )evr (l1 −l0 )
d −rs
e p2 (s) ds
× + −
0 .
βevr (l0 −l1 ) βevr (l0 −l1 ) +
(l −l ) −
(l −l ) d −rs
β+v +
− β+v− − e vr 0 1 −e vr 0 1
0
e p2 (s) ds
r r
R EMARK.– This theorem generalizes Theorem 3.1 in Bulinskaya and Shigida (2018).
T HEOREM 3.2.– The function Ee−rFl1 ,l1 ,h XFl1 ,l1 ,h 1{Fl <Tl1 ,l0 ,d } is given by
1 ,l1 ,h
⎛ ⎞
1 h
β + c − βλ 0 sp1 (s) ds λh + 1 ⎠
⎝ + l1 + ch − .
P̄1 (h) β
The expectation of total dividend payments until the Parisian ruin, under barrier
strategy, with the Parisian implementation delay is our next goal.
be the first time the process {Xt } hits the barrier b and let
⎧ ⎫
⎨ ⎬
X
τiX = inf t ≥ τi−1 1 t − gX ≥h
⎩ Xt >Xτ X
Xτ X ,t
i−1 ⎭
i−1
X
be the first time after τi−1 when the length of the excursion above Xτi−1
X reaches h.
We assume that dividends are paid only if the surplus stayed above the barrier b during
44 Applied Modeling Techniques and Data Analysis 2
the time interval of length h. Then, the excess is immediately paid out and the surplus
starts from level b.
Obviously, EV (x, b) depends not only on the initial surplus x and dividends barrier
b. However, other parameters (d, h, r, λ, β) are omitted in order to simplify the
notation.
Also, denote by NR [t1 , t2 ) the event that there is no moment t1 ≤ t < t2 , when
the surplus has stayed below zero for at least d. Then, we can rewrite [3.3] as
∞
e−rτi
X
V (x, b) = XτiX − Xτi−1
X 1NR[0,τ X ) .
i
i=1
is true.
∞
+ Ee−r(τ1 −τ0X )
e−r(τi −τ0X )
X X
1NR[τ X ,τ X ) E XτiX − Xτi−1
X 1NR[τ X ,τ X )
0 1 0 i
i=2
= Ee−r( τ1X −τ0X ) X X −X X 1
τ1 τ0 NR[τ0X ,τ1X )
+ Ee−r(τ1 −τ0X )
X
1NR[τ X ,τ X ) EV (b).
0 1
EV (b) = 0 1
.
−r (τ1X −τ0X )
1 − Ee 1NR[τ X ,τ X )
0 1
Recalling [3.5], we establish the desired result, thus ending the proof.
The explicit expression of the function EV (x, b) can be obtained as follows. Three
terms in [3.4] are given by the first expression in Lemma 3.2 with l1 = 0, l2 = b and
in Theorems 3.2 and 3.3 with l0 = 0, l1 = b.
Our task is to find the optimal barrier b∗ maximizing the expectation EV (x, b).
Analysis of EV (x, b) also provides the following result.
Thus, it is possible to establish that the expected present value of dividends until
the Parisian ruin (d > 0) is greater than that until the classical ruin (d = 0).
The explicit expression [3.4] of the function EV (x, b) seems very complicated
for analytical investigation. Therefore, the numerical results were obtained first.
An analysis of the model under consideration was conducted using the Python
programming language. In Figure 3.1, we provide six graphs of the expected
discounted total dividend payment as a function of b (for c = 10.0, λ = 5.4, β = 1.0,
d = 0.1, h = 0.3, r = 0.2 and x = 0, 1, 2, 3, 4, 5).
It can be seen from those graphs that the optimal barrier b∗ maximizing the
expectation (see the vertical yellow line) does not depend on x. The function was
analyzed in Python using the scipy library, and the graphs were obtained with the
matplotlib library.
46 Applied Modeling Techniques and Data Analysis 2
Using the first expression in Lemma 3.2, it can be shown that Ee−rτ0 1NR[0,τ X )
X
0
has the form C(x)f (b), where the function f (b) does not depend on x and the factor
C(x) does not depend on b.
Thus, by taking the derivative of the function EV (x, b) with respect to b and by
finding the point b∗ where it equals zero, we can find the global maximum of this
function.
Due to [3.4], this means that the partial derivative with respect to b of the expected
total dividend payment has such a form as well
∂
EV (x, b) = C(x)τ (b),
∂b
where τ (b) does not depend on x. This means that the optimal barrier b∗ , satisfying
τ (b∗ ) = 0,
does not depend on x either.
If b∗ > 0 (which does not follow from the net profit condition), for all
0 ≤ x < b∗ , it is the optimal barrier (the barrier that maximizes the expected total
dividend payment).
New Dividend Strategies 47
Also, a simulation of the process Yt itself was carried out. We generated a large
sample of independent exponential random variables with parameter λ, which are
treated as intervals between claims, and of independent exponential random variables
with parameter β, which represent claim amounts. Random samples were generated
using the standard module random in Python.
The formulas [3.2] and [3.1] translated into code were applied directly, in order to
get the simulation of our model. Also, the formula [3.3] (with Td ∧ t instead of Td )
was used to calculate the total (discounted to the moment t = 0) dividend payment up
to t.
The simulation of the two processes (Yt itself and the process Vt (x, b) of dividend
payments up to t) is shown in Figure 3.2 (note that the upper picture is a magnification
of the lower left corner of the lower picture).
The horizontal line (which is close to zero) denotes the dividend barrier b = 10,
the blue curve fluctuating around it is Yt .
The upper horizontal line shows the expectation of the total dividend payment,
the vertical yellow line marking the time of Parisian ruin. It is clear that Vt (x, b) (the
yellow step function) is close to EV (x, b) at time t = Td of Parisian ruin.
The parameters here are as follows λ = 5.4, c = 10.0, β = 1.0, x = 0.0, d = 0.5,
h = 0.3, r = 0.01.
Figure 3.2. Simulation of the process Yt and dividends Vt (x, b). For a
color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
48 Applied Modeling Techniques and Data Analysis 2
3.3. Model 2
There exist other possible interpretations for a dual Cramér–Lundberg model. One
can treat the surplus as the amount of capital of a business engaged in research and
development (see, for example, Avanzi et al. (2007)). The company pays continuous
expenses for research, and the occasional profit of random amounts (such as the award
of a patent or a sudden increase in sales) arises according to the Poisson process.
A similar model was used to model the functioning of a venture capital investment
company in Bayraktar and Egami (2008).
Expected discounted dividends paid until ruin time Tu (the first time that the
T
surplus becomes negative) are given by V (x; b) = E( 0 u e−δt dD(t)). Here, D(t)
is the aggregate dividends paid up to time t and δ > 0 is a constant discount rate. It
was established in Avanzi et al. (2007) that a simple barrier strategy is optimal for the
model under consideration, and it is possible to calculate the optimal barrier for some
particular cases.
Here, we consider a threshold strategy. The modified process (with dividends paid
at the rate α after crossing the threshold b) has the form
U (t) = u − ct + S(t), t ≥ 0, for u ≥ 0,
New Dividend Strategies 49
Figure 3.3. Surplus and dividends under barrier strategy. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
c1 , u ≤ b;
where c =
c2 = c1 + α, u > b.
Figure 3.4. Surplus and dividends under threshold strategy. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
T HEOREM 3.5.– For the exponential jumps, namely, p(y) = βe−βy , y > 0, the
expected discounted dividends
A(ey1 u − ey2 u ), 0 ≤ u ≤ b,
V (u; b) = [3.6]
Cex1 u + αδ , u > b,
where
−αx1 (β − y2 )(β − y1 )
A= · > 0,
βδ (y2 − x1 )(β − y1 )ey2 b − (y1 − x1 )(β − y2 )ey1 b
In the same way, we proceed in the case 0 < u ≤ b and obtain the differential
equation
Its characteristic equation also has two roots y1 < 0, y2 > 0. Solving two
differential equations of the second order, we obtain the stated result.
b∗ = 1
y2 −y1 ln (y 1 −x1 )(β−y2 )y1
(y2 −x1 )(β−y1 )y2 .
For the dual process with dividend threshold strategy, we only formulated the
obtained results without the proofs. The next steps are the study of Parisian ruins
(instead of the usual ones), the investigation of the optimal policy dependence on the
gains distribution in terms of probability metrics, as well as the parameter estimation.
3.5. Acknowledgments
Many thanks to the anonymous reviewer for reading the chapter and making
suggestions to improve the presentation.
52 Applied Modeling Techniques and Data Analysis 2
3.6. References
Albrecher, H. and Thonhauser, S. (2009). Optimality results for dividend problems in insurance.
RESCAM Rev. R. Acad. Cien. Serie A. Mat., 103(2), 295–320.
Avanzi, B. (2009). Strategies for dividend distribution: A review. N. Am. Actuar. J., 13(2),
217–251.
Avanzi, B., Gerber, H.U., Shiu, E.S.W. (2007). Optimal dividends in the dual model. Insur.
Math. Econ., 41(1), 111–123.
Azcue, P. and Muler, N. (2005). Optimal reinsurance and dividend distribution policies in the
Cramér-Lundberg model. Math. Finance, 15(2), 261–308.
Bayraktar, E. and Egami, M. (2008). Optimizing venture capital investment in a jump diffusion
model. Math. Methods Oper. Res., 67(1), 21–42.
Bühlman, H. (1970). Mathematical Methods in Risk Theory. Springer-Verlag, Heidelberg.
Bulinskaya, E. (2017a). New research directions in modern actuarial sciences. In Modern
Problems of Stochastic Analysis and Statistics – Selected Contributions in Honor of Valentin
Konakov, Panov, V. (ed.). Springer, Cham.
Bulinskaya, E.V. (2017b). Cost approach versus reliability. Proceedings of International
Conference DCCN-2017, Technosphera, Moscow.
Bulinskaya, E.V. and Shigida, B.I. (2018). Sensitivity analysis of some applied probability
models (in Russian). Fundam. Appl. Math., 22(3), 19–34.
Bulinskaya, E.V. and Shigida, I.B. (2019). Modeling and asymptotic analysis of insurance
company performance. Communications in Statistics – Simulation and Computation
[Online]. DOI: 10.1080/03610918.2019.1612911.
Dassios, A. and Wu, S. (2009). On barrier strategy dividends with Parisian implementation
delay for classical surplus processes. Insur. Math. Econ., 45, 195–202.
Drekic, S., Woo, J.-K., Xu, R. (2018). A threshold-based risk process with a waiting period to
pay dividends. J. Ind. Manag. Optim., 14(3), 1179–2001.
de Finetti, B. (1957). Su un’impostazione alternativa della teoria collettiva del rischio.
Transactions of the XV-th International Congress of Actuaries, 2, 433–443.
Gerber, H.U. (1979). An Introduction to Mathematical Risk Theory. Huebner Foundation,
Philadelphia.
Mikosch, T. (2006). Non-life Insurance Mathematics. Springer-Verlag, Berlin.
Sethi, S.P., Derzko, N.A., Lehoczky, J. (1991). A stochastic extension of Miller-Modigliany
framework. Math. Finance, 1, 57–76.
4
4.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
54 Applied Modeling Techniques and Data Analysis 2
gone from 7.7 in 1962 to 2.49 in 2016 and is forecast to decrease further in the coming
years; as long as it remains below the renewal threshold of the population, there will
be fewer active workers to finance more and more pensions; thus, in 2016, the number
of active workers for a pensioner barely reaches 2.23, while this ratio was 6 in 2000.
Added to this demographic problem is an unfavorable economic situation. The
sustainability of the distribution lies in the fact that the contributions are greater than
the wage bill is, but the unemployment situation is significant (10.6 %) in addition to
the fact that the nature of employment is not the same as before. The fixed wage
is tending to disappear in favor of entrepreneurship, so the number of employee
contributors is continuing to decline, which weakens the regime. In addition, the
informal sector accounts for 40% of employment in 2016 (see the CESE annual report
2017).
Faced with this challenge of longevity, many countries around the world have
started (or will begin) reforms of their mandatory pension scheme. These reforms
maintained defined benefits and adjusted various parameters (postponement of
retirement age, reduction of benefit rates, hardening of anticipation conditions in the
event of early retirement...). We are talking about parametric reforms. These punctual
measures can, up to a certain point, restore a certain viability in the short term, but in
the face of the scale of the actual challenges, they are obsolete.
This chapter focuses on the transformation of the current pay-as-you-go
(PAYG) system with defined benefits into a point-managed pension system and the
introduction of a rule of automatic piloting of the different parameters of the regime
over time: the Musgrave rule. To do this, we will briefly present the architecture of the
Moroccan pension system as a whole before focusing on the largest public pension
fund by presenting its characteristics and parameters.
In the second part of the chapter, we present the theoretical framework of the
Musgrave rule in the management and control of the regime in point as well as the
effect of the introduction on the extinction date of the reserves. We will then simulate
the transformation of the fund into a point-managed plan by applying the Musgrave
rule and the introduction of reserves. Finally, we compare the current system with the
new simulated system by measuring the impact of this transformation on the level of
benefits and contributions through contribution rates and replacement rates.
Our study was focused on the Moroccan pension fund (CMR); we present in what
will follow the characteristics of the fund and the different parameters.
The CMR public scheme is compulsory for three categories of employees: the civil
and military personnel of the State, permanent and trainee agents of local authorities
and the staff of public institutions.
The plan is funded on a PAYG basis. The contribution rate is set at 28% of
base salary, bonuses and other allowances. The contribution rate is equally shared
between employees and the state employer. The plan is based on the principle of the
laddered premium which sets an equilibrium contribution rate for a minimum period
of 10 years.
P = N × A × SR [4.1]
where:
– N is the number of years contributed;
– A is the annuity rate;
– SR is the reference salary.
Before 2016, the annuity rate was 2.5%; the reference salary was the last salary,
and the legal retirement age was 60 years; this caused the scheme to be much too
generous. It offered for 40 years of contribution a replacement rate of 100%. The
parametric reform tried to correct this generosity. Thus, the maximum contribution
period within the fund is 40 years and the reference salary for the calculation of
the pension is now the average of the last eight (8) earnings preceding the date
of retirement. The annuity rate is now 2%. The fund offers a maximum of 80%
replacement rate for a complete career. The legal retirement age is 63 years.
The fund’s surplus position in the past has allowed it to accumulate significant
reserves. Today, these reserves make it possible to fill the technical deficit. However,
the evolution of the declining population ratio has accelerated the depletion of these
reserves. The evolution of the demographic ratio is presented here.
The main assumption used in this projection is the replacement of the workers,
and their number remains the same on the projection horizon. We used a deterministic
projection for the retiree population.
We will use five professional categories representing career trajectories for our
simulations; we present the average wages by age for the following categories, the
wages are in Moroccan Dirham:
56 Applied Modeling Techniques and Data Analysis 2
– administrators;
– engineers;
– grade A professors;
– secretaries;
– grade B teachers.
The selected categories are representative of wage developments within the fund.
Administrators and grade B teachers represent more than 80% of the members of
the scheme.
Using the current parameters, we are going to simulate on these trajectories the
contributions throughout their career, and determine the replacement rates associated
with the average contribution period in a year for each trajectory as well as the ratio
between expected benefits and contribution through the career.
After describing the functioning of the Moroccan Pension Fund, we will present
in what follows the theoretical framework of the rule of piloting the new regime that
we will put in place.
for the decrease. We will present this mechanism when the fund is managed in defined
benefits and then we will introduce a new management mode driven by the Musgrave
rule.
Musgrave (1981) proposed another invariant leading to a form of sharing of the risk
between the two generations. Let us define the Musgrave ratio as the ratio between the
pension and the salary net of pension contributions:
P δ1
M1 = = [4.5]
S(1 − π1) 1 − π1
Using the previous situation, when D1 becomes D2 , we stabilize this coefficient:
δ1 δ2
M1 = = = M2 [4.6]
1 − π1 1 − π2
Introduction of Reserves in Self-adjusting Steering 59
It can be seen that the wage contribution that makes it possible to balance the plan
from an actuarial point of view goes from 28% to 35% when the dependency ratio is
the most deteriorated. The contribution rate drops to 69% at this period. The evolution
of the rates follows the tendency of the dependence ratio; when it deteriorates, the
contribution rates rise and the replacement rate becomes lower (see Figure 4.4). The
deterioration of the demographic ratio is absorbed by both the active and the retired in
the simulation. In the next section, we will apply this piloting rule on the parameters
and transform the scheme.
60 Applied Modeling Techniques and Data Analysis 2
Each year, the payment by the affiliate of his/her contribution entitles him/her to a
certain number of retirement points. The number of points given is the ratio between
the salary of the affiliate and an identical reference salary for all, called the acquisition
value of the point.
The monetary counterpart of these points is only known on the liquidation date,
depending on the value of service of the point on that date.
The number of points earned at the time of retirement is the sum of the points
earned during the career. The pension at retirement age is given by the formula:
P = NT ·VT · σT [4.9]
where NT represents the number of points earned at retirement age T, VT is the value
of the point and σT is an actuarial coefficient that depends on the length of career and
the generation.
salary. The actuarial coefficient equals 1 for this individual. The amount of the pension
PT can be written according to the replacement rate δ and the reference wage STr
PT = δ · STr [4.10]
and according to the number of points and the value of one point
PT = N ·VT [4.11]
We simulate through the five wage trajectories presented in section 4.2 the
transformation of the current scheme into a new one managed with the point system
we have just introduced. For individuals joining the plan today, we calculate their
pension entitlements and the ratio of the present value between the benefits and
the contributions. We will compare those indicators in the current system and after
transformation.
Administrators Engineers A professors B teachers Secretaries
Replacement rate 78% 55% 69% 60% 99%
Benefit/contribution ratio 0.9199 1.0110 0.8843 0.9832 0.8858
Table 4.3. Replacement rates and contribution ratio in the new system
Indexing the pension in relation to the evolution of the average salary of the
scheme has the immediate effect of improving the value of the pension for “flat”
trajectories; trajectories with evolution rates higher than the evolution of the average
wage have replacement rates lower than the target replacement rate of the scheme;
the redistribution of wealth in the scheme is done more uniformly across the types of
trajectories.
Replacement rates are not capped; thus, trajectories with pay decreases at the
end of the career have better replacement rates; they are also better for long careers
(contribution period greater than the reference period) and do not penalize the fact of
having a flat salary evolution. The scheme is more generous for trajectories having
little revaluation throughout the career; this is the case for secretaries.
The second indicator measures the performance of the scheme; for each monetary
unit paid in the form of a contribution, the Moroccan Pension Fund pays an average of
62 Applied Modeling Techniques and Data Analysis 2
0.96. It is 0.19 less than the current system, and the system is on average less generous.
The new system benefits contributors with wage developments that are lower than the
average wage in the scheme, so there is a different distribution of wealth in the scheme.
After transforming the pension plan, we are interested in the impact on the level of
the reserves as well as on the horizon of viability. Here is how we model the reserves:
The rate of return is supposed to be constant. This rate corresponds to the average
value observed over the last 10 years. The increase in contribution rates and the decline
in replacement rates should slow down the rate of exhaustion of the fund, as the level
of implicit debt is very high because the fund has operated in the past with generous
parameters.
Before transforming the scheme, we present a projection of the reserve fund. There
is no adjustment.
We observe that with the current operating parameters of the fund, it is possible
to maintain a positive level of reserves only until 2027. The management is not
financially sustainable in the long term. The deficit continues to grow until the horizon
of projection.
4.5. Conclusion
Facing the failing situation of PAYG financed pension plans managed in defined
benefits in Morocco, we examined in this chapter a management model, the point
system and a steering mechanism of the plan that would make it possible to overcome
the shortcomings of the current system. Our purpose was to determine whether the
new scheme is financially sustainable and its impact on the standard of living of
the contributors and retirees. We first drew a portrait of the current situation of
the Moroccan retirement system. This analysis allowed us to identify the diverging
parameters from one fund to another as well as the problems related to the defined
benefit pension management method. We also highlighted the actions taken to solve
these problems. Then, we presented the theoretical model of the points and Musgrave’s
rule to control the value of the point as well as contribution rates and equity in the
distribution of capital through active workers and retirees, also across generations.
Under pressure from the deterioration of the demographic ratio, the solutions to
maintain the PAYG pension system are becoming fewer and fewer. The point system
allowed this load to be distributed with equity.
64 Applied Modeling Techniques and Data Analysis 2
4.6. References
Blanchard, M. (2017). Pilotage et gestion d’un régime de retraite et impact sur sa situation
financière suite au decret. Report, Institut des actuaires, 2017-887.
Caisse marocaine des retraite (2016). Activity report.
Comission 2020–2040 (2014). Un contrat social performant et fiable [Online]. Available:
http://pension2040.belgium.be/fr/.
Conseil Economique Social et Environmental (2017). Annual report, France.
Cour des comptes (2018). Rapport sur caisse marocaine des retraites. Report, Cour des comptes,
Morocco.
Devolder, P. (2010). Perspectives pour nos régimes de pension légale. Revue belge de sécurité
sociale, 4, 597–614.
Devolder, P. (2015). Pension reform in Belgium, a new points system between DB and DC.
[Online]. Available: http://www.actuaries.org/oslo2015/papers/PBSS-Devolder.pdf.
Musgrave, R. (1981). A reappraisal of social security finance. Social Security Financing.
Cambridge, MIT, 89–127.
Palier, B. (2012). La réforme des retraites. Presses Universitaires de France, Paris.
5
5.1. Introduction
When you move into a country that uses a different currency than yours, you need
to change your currency by buying that of the country you are moving to. The rate at
which you buy is called the exchange rate (Hull 2006), which is greater or less than
one, depending on whether your currency is lower or greater in value than the currency
you are buying.
Considering, for example, the currency of Sweden (the Swedish Krona: SEK) and
the currency of Rwanda (the Rwandan Franc: RWF), the exchange rate for the pair
SEK/RWF is greater than 1 because 1 SEK is nowadays nearly equivalent to 100 RWF.
When the exchange rate is equal to one, both the considered currencies are equal in
power, but this rarely happens.
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
66 Applied Modeling Techniques and Data Analysis 2
In the following, we will use St to denote the exchange rate between two currencies
at a certain date t. The change in the returns of an exchange rate for a given period
is known as volatility (Hull 2006). When hedging risk, risk managers have interest in
knowing how volatile a specified currency pair is. Thus, financial engineering brings
different techniques on how to model volatility in financial markets. By modeling
volatility here, we mean that we are trying to derive a model that can be used to best
forecast volatility for future returns.
The exponential weighted moving average (EWMA) is one of the models used to
estimate the volatility in financial returns. It can be written as follows:
σt2 = λσt−1
2 2
+ (1 − λ)rt−1 [5.1]
where λ ∈ [0, 1] is the decay factor, σt2 is the variance at time t and rt is the log-return
at time t.
In this chapter, two important questions are investigated, the first being the best
value of the decay factor in the EWMA model when forecasting volatility of exchange
rates and the second being the optimal out-of-sample forecasting period. Before
reviewing the EWMA model used (Winters 1960; J.P. Morgan 1996; Bollen 2015),
let us have a look at the data analyzed in this chapter.
5.2. Data
We deal with five currencies: EUR (Euro), USD (US Dollar), SEK (Swedish
Krona), KES (Kenyan Shilling) and RWF (Rwandan Franc). These data have been
Forecasting SV for Exchange Rates using EWMA 67
collected from the website of the National Bank of Rwanda (BNR) (2019). The
collected data are four time series of daily exchange rates that cover the period from
January 1, 1995 up to December 31, 2018. Because some information related to EUR
and SEK is missing for early years, we have optimized by equalizing the ranges
and then have chosen to work with the period starting from January 1, 2003 up to
December 31, 2018.
The four currency pair series used (EUR/RWF, KES/RWF, SEK/RWF and
USD/RWF) have 5,844 observations each, which makes 23,376 observations in total.
One of the novelties in this chapter is a form of extrapolation in the collected data.
The missing values for weekends and holidays have been filled in by considering their
corresponding previous values. We have introduced this method based on the fact
that the returns for values around the missing values are stationary. This allows us to
consider a year with 365.25 trading days instead of the usual 252 generally used. The
descriptive statistics are given in Table 5.1.
EUR/RWF USD/RWF SEK/RWF KES/RWF
Size 5,844 5,844 5,844 5,844
Minimum 533.2172 509.1101 58.2864 5.7595
Maximum 1,0652 879.1009 108.0807 8.8294
Range 531.9892 369.9908 49.7943 3.0699
Mean 803.4626 638.4323 85.8304 7.5951
St. dev. 104.4771 104.3494 10.9031 0.4832
Kurtosis 2.9081 2.6037 2.2614 3.4667
Skewness 0.1560 0.9853 -0.1927 0.0671
In Figure 5.1, we have plotted the normalized raw data. In this figure, the pair
SEK/RWF has been multiplied by 10, while the pair KES/RWF has been multiplied
by 100. This is to allow a better visualization of the data in the figure.
In 1994, J.P. Morgan, a financial services company (J.P. Morgan 1996), introduced
some procedures to quantify financial risk in what has been called RiskMetrics. The
EWMA volatility model has been added to the RiskMetrics in 1996 (J.P. Morgan 1996).
From the generalized autoregressive conditional heteroscedasticity GARCH(1,1) model
(Hull 2006), we have
σt2 = γ + βσt−1
2 2
+ αrt−1 . [5.2]
While the EWMA model (Winters 1960) is given as:
σt2 = λσt−1
2 2
+ (1 − λ)rt−1 [5.3]
68 Applied Modeling Techniques and Data Analysis 2
900
800
700
600
500
0 1000 2000 3000 4000 5000 6000
Figure 5.1. Exchange rate data. Note that SEK/RWF (×10) and
KES/RWF (×100). For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
When using the EWMA model, the main goal is to estimate the next period or next
day volatility in a time series and also to closely observe the way volatility changes
(Andersen et al. 2005). The EWMA model uses two parameters: time and λ, which is
related to the sensitivity of the forecasted volatility to the historical data.
The parameter λ satisfies: 0 < λ < 1, and the RiskMetrics by J.P. Morgan (1996)
suggests the use of λ = 0.94 for daily data and λ = 0.97 for monthly data. For better
analysis, we choose to use λ1 = 0.97, λ2 = 0.75, λ3 = 0.50 and λ4 = 0.25, as
suggested in Bollen (2015).
Forecasting SV for Exchange Rates using EWMA 69
Let V1 be the rolling historical volatility and V2 be the EWMA volatility described
by the following formulae:
n n
365.25 365.25
V1 = σ1,t = (rt−i − r¯t ) =
2 r2 [5.6]
n − 1 i=1 n − 1 i=1 t−i
and
V2 = σ2,t = 2
λσ2,t−1 2
+ (1 − λ)rt−1 [5.7]
where rt is the logarithmic return for each pair and r̄t is the related mean return with
t ∈ [1, T ], i ∈ [1, n] and T = 5, 843, where n represents the window’s size. Three
values of n are used: n = 7, n = 30 and n = 90 for one-week, one-month and
one-quarter window sizes, respectively.
We have chosen to work with 365.25 trading days instead of the common 252
because exchange bureaux operate each day of the year; thus, in this market, we need
to consider all days of the year. Table 5.4 presents the descriptive statistics for different
returns, and Figure 5.2 shows how the logarithmic returns behave.
We apply four different values for the decay factor in the EWMA:
0.1 0.1
0 0
−0.1 −0.1
−0.2 −0.2
0 2000 4000 6000 0 2000 4000 6000
0.2 0.1
0 0
−0.2 −0.1
−0.4 −0.2
0 2000 4000 6000 0 2000 4000 6000
0.04
0.15
0.03
0.1
0.02
0.05
0.01
0 0
0 2000 4000 6000 0 2000 4000 6000
0.1
0.3
0.08
0.06 0.2
0.04
0.1
0.02
0 0
0 2000 4000 6000 0 2000 4000 6000
Table 5.3. Errors (RMSE and MAPE) for different decay factors λi
and out-of-sample periods
From Figures 5.4 to 5.6, we plot the real data versus forecasted ones for the three
out-of-sample periods. The results in Figures 5.4–5.6, related to λ1 = 0.97 and others
not presented, show that the EWMA with a larger decay factor λ (i.e. closer to 1) is
better for forecasting exchange rates. We can mention that, in the EWMA model with
a large decay factor, the recent values have more weight than the non-recent ones on
the forecasts (Andersen et al. 2005).
72 Applied Modeling Techniques and Data Analysis 2
2
0.06
1.5
0.04
1
0.02
0.5
0 0
0 2 4 6 8 0 2 4 6 8
0.08 0.045
0.04
0.06
0.035
0.04 0.03
0 2 4 6 8 0 2 4 6 8
0.07 2.8
0.065 2.6
0.06 2.4
0.055 2.2
0.05 2
0.045 1.8
0 10 20 30 40 0 10 20 30 40
0.07 0.03
0.06 0.025
0.05 0.02
0 10 20 30 40 0 10 20 30 40
0.08 12
10
0.075
8
0.07
6
0.065 4
0.06 2
0 20 40 60 80 100 0 20 40 60 80 100
0.03
0.1
0.025
0.09
0.02
0.08 0.015
0.01
0 20 40 60 80 100 0 20 40 60 80 100
5.5. Conclusion
Observing the four series, it is clear that the decay factor has no effect on both
errors (RMSE and MAPE). As λ varies, there are no changes on the two errors. We
remember that, for the EWMA model, when λ increases, there is more weight on the
recent values. Among the four values of λ used, we realize that λ = 0.97 is the best
and for the three out-of-samples considered, we obtain good results on a wider out-
of-sample period. This shows us that it is better to forecast exchange rate volatility
considering a wider in-sample period also. We advise using 365.25 trading days based
on our results and the fact that Forex markets operate even on weekends and holidays.
5.6. Acknowledgments
Jean-Paul Murara would like to thank the International Science Programme (ISP,
Uppsala University) and the Wimas Group for the financial support, allowing this
research paper to be written. Thanks also go to the Division of Applied Mathematics,
School of Education, Culture and Communication, Mälardalen University, for creating
an excellent environment for research in mathematics and applied mathematics.
74 Applied Modeling Techniques and Data Analysis 2
5.7. References
Andersen, T.G., Bollerslev, T., Christoffersen, P.F., and Diebold, F.X. (2005). Volatility
forecasting. Working Paper. National Bureau of Economic Research, Cambridge, MA.
Bollen, B. (2015). What should the value of lambda be in the exponentially weighted moving
average volatility model? Applied Economics, 47(8), 853–860.
Hull, J. (2006). Options, Futures, and Other Derivatives. Pearson Prentice Hall, Englewood
Cliffs.
J.P. Morgan (1996). RiskMetrics. Technical Document. J.P. Morgan/Reuters, New York.
National Bank of Rwanda (N/A). Exchange rate. [Online]. Available at: https://www.bnr.
rw/footer/quick-links/exchange-rate/?txbnrcurrencyma-nagermaster%5Baction%5D=archive
&txbnrcurrencymanagermaster%5Bco-ntroller%5D=
Currency&cHash=9b3b8a3170a02e5876e4a1be17720fec [Accessed 3 January 2019].
Winters, P.R. (1960). Forecasting sales by exponentially weighted moving averages.
Management Science, 6(3), 324–342.
6
Before the financial crisis started in 2007, the forward rate agreement contracts
could be perfectly replicated by overnight indexed swap zero coupon bonds. After the
crisis, the simply compounded, risk-free, overnight indexed swap forward rate became
less than the forward rate agreement. Using an approach proposed by Cuchiero, Klein
and Teichmann, we construct an arbitrage-free market model, where the forward
spread curves for a given finite tenor structure are described as a mild solution to a
boundary value problem for a system of infinite-dimensional stochastic differential
equations. The constructed financial market is large: it contains infinitely many
overnight indexed swap zero coupon bonds and forward rate agreement contracts, with
all possible maturities. We also investigate the necessary assumptions and conditions
which guarantee existence, uniqueness and non-negativity of solutions to the obtained
boundary value problem.
In the last decades of the previous century and for the duration of the third
millennium, financial derivatives have significantly affected finance and the global
market. In terms of underlying assets, the derivative market is massive, meaning that it
is much larger than the stock market and, in terms of value, it is several times the world
gross domestic product. In addition, derivative markets have been considered as the
core of the financial crisis that began in 2007. That is, many derivative products which
were constructed from a portfolio of risky mortgages became worthless after house
prices decreased in the United States. Since then, many banks and financial institutions
have changed their proxies in considering the term “risk-free” interest rate (Hull
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
76 Applied Modeling Techniques and Data Analysis 2
2015). Indubitably, the importance and impact of proper and accurate studies in this
vast and significant field of mathematics is vital. Therefore, we attempt to develop
an algebraic method for pricing financial contracts in the post-crisis financial market.
This will also include calculating the forward rates. In this chapter, our objectives will
be to:
– review the theories of constructing a large financial market model;
– construct an equivalent separating measure and prove its uniqueness;
– prove the existence, uniqueness and non-negativity of solutions to the system of
SDEs describing the dynamics of a constructed large market.
We should emphasize that a large financial market model can include infinitely
many assets (or more specifically, bonds). Furthermore, we will focus on the Heath–
Jarrow–Morton (HJM) framework that describes no-arbitrage conditions that must be
satisfied by a model of yield curves. Let us start with some preliminaries.
To begin with, in derivative security models, the underlying assets are securities,
whereas in term-structure models, the underlying assets are interest rates. In
term structure models, the current value/price of a default-free (risk-free) discount
bond for different maturities is called the term-structure of interest rate (Kijima
2013). Furthermore, interest rate derivatives are designed to protect an investor
from huge losses against dramatic changes in interest rates. Bond options,
swap options (swaptions), cap options (portfolios of caplets) and floor options
(portfolios of floorlets) are important interest rate derivatives and are used to secure
borrowing/lending from huge losses against dramatic changes in interest rates. Several
different interest rate models have been developed to price interest rate derivatives.
First, we denote the money market account by B(t), the market price of a default-free
discount bond by P(t, T ), the instantaneous interest rate (spot rate) by r(t) and the
instantaneous forward rate by f (t, T ), where
∂ ∂
for t ≤ T, r(t) = − ln P(t, T )T =t , f (t, T ) = − ln P(t, T ).
∂T ∂T
On the contrary, we have the following relations between bonds and forward rates
for both stochastic spot rate and deterministic spot rate
T T
B(t)
P(t, T ) = exp − f (t, s)ds , P(t, T ) = exp − r(s)ds = .
t t B(T )
Now, following Hull (2015) and Kijima (2013), we mention and categorize some
of the most noteworthy and frequently used interest rate models in the following
groups:
An Arbitrage-free Large Market Model for Forward Spread Curves 77
The models mentioned above have different characteristics and might be useful
for specific applications. For example, in the Vasicek model, the spot rate can become
negative with positive probability. The Black model is easy to work for plain vanilla
(relatively simple derivative) options, whereas HJM and LMM are more suitable for
working with exotic (relatively complicated derivative) options. However, a problem
with equilibrium models is that they do not fit today’s term structure of interest
rates (i.e. the term structure of interest rates is an output), whereas no-arbitrage
models are designed to be consistent with today’s term structure of interest rates (the
term structure of interests rates is an input). The spot rate models in general have
two important limitations. First, they usually involve only one factor or source of
uncertainty. Second, they are not capable of choosing the volatility structure. The HJM
and LMM, however, can be used to involve several factors and sources of uncertainty
(Hull 2015). The HJM and LMM models also allow us to specify more realistic
volatility structures to construct an interest rate model. That is, these models can be
used in the evaluation of two (or more) yield curves. These curves can, for example, be
LIBOR zero curves and overnight indexed swap (OIS) curves. For our purposes, we
will focus on the HJM framework, which we briefly describe in the following section.
In 1990 and 1992, David Heath, Bob Jarrow and Andy Morton (HJM) introduced
a new framework in interest rate models (Heath et al. 1990, 1992). The defined
framework describes no-arbitrage conditions which must be fulfilled by a yield curve
model for some ultimate maturity τ (usually 20 or 30 years hence). The HJM model
explains the dynamics of the forward rate curve { f (t, T, τ ), 0 ≤ t ≤ T ≤ τ }.
78 Applied Modeling Techniques and Data Analysis 2
In the HJM framework, the evolution of the forward curve satisfies the following
stochastic differential equation (SDE) (Glasserman 2004):
where W is the standard d-dimensional Brownian motion and d represents the number
of factors (sources of uncertainty). μ (t, T ) is the drift structure and σ (t, T ) is the
volatility structure. Moreover, the drift and volatility structures are Rd -valued and can
be either stochastic, or can depend on the current and past level of the forward rate.
Before the financial crisis started in 2007, LIBOR rates were commonly used as
risk-free rates, whereas in the post-crisis market, OIS rates are considered the new
proxies for risk-free rates. Moreover, an OIS is a swap contract to exchange cash
flows at a fixed rate (called the OIS rate) for the geometric average of the overnight
rates during the same period. For a fixed period (e.g. three months), the OIS rates are
generally lower than LIBOR rates which yields to the so-called LIBOR-OIS spread
(Hull 2015).
In order to price financial instruments under collateral and forward rate agreement
(FRA) rates, we follow the approach considered in Filipović and Trolle (2013) and
Cuchiero et al. (2016a). That is, OIS zero coupon bonds are considered to be the basic
traded instruments, and they play the
default-free zero coupon bonds’ role in the old
setting. Thus, B(t) = exp 0t r(s)ds is now the OIS (risk-free) bank account with r
representing the OIS short rate.
Now, we will follow the HJM framework and modify equation [6.1] such that we
will have a set of initial forward spread curves as well as a set of forward spread
curves with different maturities. The set of forward spread curves will, of course,
have different sources of uncertainty. In other words, we construct an arbitrage-free
market model, where the forward spread curves for a given finite tenor structure
are described as a mild solution to a boundary value problem (BVP) for a system
of infinite-dimensional stochastic differential equations. The constructed financial
market is large: it contains infinitely many OIS zero coupon bonds and FRA contracts
with all possible maturities.
An Arbitrage-free Large Market Model for Forward Spread Curves 79
In summary, we consider and pursue the following concepts and objectives in this
chapter. In section 6.2, we will go through the definitions of small and large financial
markets, no asymptotic free lunch conditions in a large market and present the
fundamental theorem of asset pricing (FTAP) for a large market. Also, we construct a
unique risk-neutral probability measure (equivalent martingale probability measure).
Then, in section 6.3, we construct a system of stochastic partial differential equations
and discuss the necessary conditions and assumptions which guarantee existence,
uniqueness and non-negativity of solutions to the obtained BVP. Finally, we will close
this chapter with a conclusion–future works section.
Some of the important studies in the theory of market models with infinitely many
assets have been conducted by Björk et al. (1997), De Donno and Pratelli (2005), and
Ekeland and Taflin (2005). However, using such a theory, according to Taflin (2011),
might lead to some difficulties. That is, the construction of such a market does not
imply that the market is complete. In other words, a market without any arbitrage
opportunities is complete if and only if there exists a unique risk-neutral probability
measure (RNPM) (Kijima 2013).
5) C = (K0 −L0≥0 )∩L∞ denotes a convex cone of bounded claims at price 0, where:
– K0 is a set of admissible generalized portfolios’ terminal values at t = 1;
– L0 is the set of all measurable functions;
– L0≥0 is the set of all non-negative measurable functions;
– L∞ is the space of bounded functions;
– L∞
≥0 is the space of non-negative bounded functions.
Now, we follow the CKT approach. Émery (1979) defined the metric on S by
where the outer supremum is taken over the set of all bounded predictable processes
K bounded by 1 in absolute value (not only over all simple predictable processes).
Mémin (1980) proved that taking supremum over the set of simple predictable
strategies bounded by 1 in absolute value yields
K(t) = ∑ ki 1(τi ,τi+1 ] (t), for i = 0, . . . , , ki ∈ K ,
i=0
where is the positive integer, and ki is the Fτi -measurable random variable. The
obtained metric is equivalent to the metric [6.2].
Now, for (parameter space) I ⊆ [0, ∞) and each positive integer n, we define a
family A n of subsets of I satisfying the following conditions:
i) each set in A n contains exactly n elements;
ii) if A1 , A2 ∈ ∞ n ∞
n=1 A , then A ∪ A ∈ n=1 A .
1 2 n
An Arbitrage-free Large Market Model for Forward Spread Curves 81
Here, we can present and formulate the mathematical definitions of small and large
financial markets.
D EFINITION 6.1 (Small financial market).– A small financial market indexed by a set
A∈ ∞ n A
n=1 A is a set X1 ⊂ S which satisfies the following conditions:
i) Bounded: each element of X1A is bounded from below by −1.
∞ n with A1 ⊂ A2 , then X1A ⊂ X1A .
1 2
ii) Monotonicity: if A1 , A2 ∈ n=1 A
iii) Concatenation property: if G and H are the bounded predictable processes with
G ≥ 0, H ≥ 0, GH = 0, then for all X, Y ∈ X1A such that
Z = G · X + H ·Y ≥ −1,
we have Z ∈ X1A .
The set X1A is called the set of one-admissible portfolio wealth processes in the
small financial market A. Moreover, the set of all one-admissible portfolio wealth
processes with respect to strategies that include at most n assets is given by
X1n = X1A .
A∈A n
X1 = X1n .
n=1
D EFINITION 6.3 (NAFLVR (Cuchiero et al. 2016b)).– Let C be the set of all bounded
random variables in a convex cone C0 defined by
C0 := K0 − L∞
≥0 , C := C0 − L∞ ,
where the minus operation here means C0 = {Y : Y ≤ X for some X ∈ K0 }. Also, let C
be the closure of C in L∞ . Then, by definition, the set X satisfies no asymptotic free
lunch with vanishing risk (NFLVR) if
C ∩ L∞
≥0 = {0}.
82 Applied Modeling Techniques and Data Analysis 2
EQ [X (1)] ≤ 0, X ∈X.
T HEOREM 6.1 (FTAP for a large market (Cuchiero et al. 2016b)).– The set X satisfies
no asymptotic free lunch with vanishing risk if and only if it satisfies the equivalent
separating measure property.
An obstacle for us is that in the CKT’s result I ⊆ [0, ∞). In our case, we have m
numbers of tenors. Thus, our desirable parameter space has the following form:
Therefore, we would like to prove that Theorem 6.1 remains true when the set I
(parameter space) has the form [6.3].
T HEOREM 6.2.– For m-dimensional parameter space, i.e. I = [0, ∞)m , Theorem 6.1
(CKT’s FTAP for a large market) remains true.
Careful analysis of Theorem 3.1, and its proof in Cuchiero et al. (2016b, Section 7)
shows that no special properties of the one-dimensional parameter space [0, ∞) have
been used. This fact completes the proof.
{η t (T ) : 0 ≤ t ≤ T < ∞}.
λ
10) κ i : Hm+1 → H1λ and κ i (θ t ) are the drift coefficients/functions.
λ
11) ζ i : Hm+1 → L(Rd , H1λ ), where L(·, ·) is the space of linear operators and ζ i (θ t )
are the diffusion coefficients.
12) (Ss )≥0 is the shift semi-group on Hm+1λ , i.e. S h = h (s + ·).
s
d
13) becomes the infinitesimal operator (generator) of the strongly continuous
ds
semi-group of shifts (Ss )≥0 .
R EMARK 6.3.– Equations [6.4] and [6.5] in Cuchiero et al. (2016a) and Filipović
et al. (2010) have one additional term for considering dramatic changes (jumps) in
the dynamic of θti , which we omit.
Following Cuchiero et al. (2016a) (see also Filipović et al. (2010, Assumption
3.1)), we introduce the following assumptions to establish the existence and
uniqueness of a mild solution to equation [6.5].
A SSUMPTION 6.1 (Existence and uniqueness (Cuchiero et al. 2016a)).– The growth
and Lipschitz continuity conditions on the volatility function ζ i for all i = 1, . . . , m can
be formulated by the following axioms:
λ
(E 1) For all i = 1, . . . , m, we have ζ i : Hm+1 → L(Rd , H1λ ,0 ), where
Hkλ ,0 := {hh ∈ Hkλ : lims→∞ h(s) k = 0}, for k ∈ {1, d}.
(E 2) For all i = 1, . . . , m, there exist positive constants Ki , Li , Mi such that
(E 2.1) 0s ζ i (hh )(u)du d ≤ Ki , for all h ∈ Hm+1 λ , s ∈ [0, ∞);
Now, following Cuchiero et al. (2016a, Proposition 4.4), we present the following
proposition (see also Filipović et al. (2010, Assumption 4.11)).
P ROPOSITION 6.1.– If the axioms in Assumption 6.1 are met, then for every i ∈
λ ) ⊆ H λ ,0 holds. Furthermore, for all h , h ∈ H λ , there exist
{0, . . . , m}, κ i (Hm+1 1 1 2 m+1
constants Qi > 0 such that
κ i (hh1 ) − κ i (hh2 ) λ ,1 ≤ Qi h 1 − h 2 λ ,m+1 .
Finally, the existence and uniqueness of the solution to a system of SPDE [6.5] is
established by the following theorem.
An Arbitrage-free Large Market Model for Forward Spread Curves 85
T HEOREM 6.3 (Existence and uniquness (Cuchiero et al. 2016a)).– If the axioms in
λ , there exists a
Assumption 6.1 are met, then for all T ∈ R+ , and each initial θ0i ∈ Hm+1
λ
unique adapted cà dlà g, mean-square continuous mild Hm+1 -valued solution (θti )t≥0
such that
E sup θti 2
λ ,m+1 < ∞.
t∈[0,T ]
In this section, we investigate the necessary and sufficient conditions that give
non-negative forward spread curves, i.e. θti ≥ 0 for all t ∈ [0, T ], for given non-negative
initial curves, i.e. θ0i ≥ 0.
d
Step 2. Put A = ds and rewrite equation [6.5] in vector form. That is,
Wt,
d θ t = (Aθ t + κ (θ t )) dt + ζ (θ t ) dW
[6.6]
θ 0 = η 0,
λ , κ : Hλ
where θ t ∈ Hm+1 λ λ d λ
m+1 → Hm+1 and ζ : Hm+1 → L(R , Hm+1 ).
notation, Nakayama (2004) requires ζ to map Hm+1 λ to the linear space of Hilbert–
λ
Schmidt operators from U0 to Hm+1 . In our case, U = Rd , Q is the identity operator
on U. Then, U0 = U. For any linear operator from the finite-dimensional space U0 to
λ , the sum of squares of its eigenvalues is obviously finite, and this operator is
Hm+1
Hilbert–Schmidt.
Now, we can claim that the conditions for Lipschitz-continuous bounded mapping
in Nakayama (2004) are equivalent to the conditions in Assumption 6.1 and
Proposition 6.1.
λ
(N 2) Let n ∈ N+ , ρn : Hm+1 λ
→ Hm+1 and ρn (θ ) = 1/2 ∑nj=1 Dζ j (θ )ζ j (θ ) for θ ∈
λ . We assume that there exists a map ρ : H λ λ
Hm+1 n m+1 → Hm+1 , such that for all θ ∈
λ
Hm+1 , the following holds:
lim ρn(θ ) − ρ (θ ) = 0.
n→∞
λ
(N 3) Assume for any h 1 , h 2 ∈ Hm+1 and all n ∈ N+ , there exists a constant M such
that
ρn (hh1 ) − ρn(hh2 ) ≤ M h 1 − h 2 .
Equivalently, denote ξ i (t; ·) = ξ i (t; η0i , g), then for t ∈ [0, T ], we have
t t
ξ i (t; η0i , g) = St η0i + St−s (κ i − ρ )(ξ i(s; ·))ds + St−s ζ i (ξ i (s; ·))dg(s)ds.
0 0
An Arbitrage-free Large Market Model for Forward Spread Curves 87
Step 5. Let us rewrite V = {∑∞j=1 b j β j : ∑∞j=1 b2j < ∞}. Furthermore, let us define
Z = {θ t ∈ Hm+1 λ : θti ≥ 0, t ≥ 0, 0 ≤ i ≤ m}. Now, we are able to see the
non-negativity of solutions to our SPDEs in the following proposition (see
Proposition 1.1 in Nakayama (2004) for details and proof, as well as Proposition 4.19
in Filipović et al. (2010)).
operator and the solution to the SPDE [6.5] exists and is unique and non-negative.
6.5. References
Bayer, C. and Teichmann, J. (2008). Cubature on Wiener space in infinite dimension. Proc. R.
Soc. Lond. Ser. A Math. Phys. Eng. Sci., 464(2097), 2493–2516.
Björk, T., Di Masi, G., Kabanov, Y., Runggaldier, W. (1997). Towards a general theory of bond
markets. Finance Stoch., 1(1), 141–174.
Black, F. (1976). The pricing of commodity contracts. J. Financ. Econom., 3(1), 167–179.
Black, F. and Karasinski, P. (1991). Bond and option pricing when short rates are lognormal.
Financial Analysts J., 47(4), 52–59.
Black, F., Derman, E., Toy, W. (1990). A one-factor model of interest rates and its application
to treasury bond options. Financial Analysts J., 46(1), 33–39.
Brace, A., Gatarek, D., Musiela, M. (1997). The market model of interest rate dynamics. Math.
Finance, 7(2), 127–155.
Cox, J.C., Ingersoll, J.E., Ross, S.A. (1985). A theory of the term structure of interest rates.
Econometrica, 53(2), 385–407.
Cuchiero, C., Fontana, C., Gnoatto, A. (2016a). A general HJM framework for multiple yield
curve modeling. Finance Stoch., 20(2), 267–320.
Cuchiero, C., Klein, I., Teichmann, J. (2016b). A new perspective on the fundamental theorem
of asset pricing for large financial markets. Theory Probab. Appl., 60(4), 561–579.
Da Prato, G. and Zabczyk, J. (2014). Stochastic Equations in Infinite Dimensions. Cambridge
University Press, Cambridge.
De Donno, M. and Pratelli, M. (2005). A theory of stochastic integration for bond markets. Ann.
Appl. Probab., 15(4), 773–791.
Ekeland, I. and Taflin, E. (2005). A theory of bond portfolios. Ann. Appl. Probab. 15(2),
1260–1305.
Émery, M. (1979). Une topologie sur l’espace des semimartingales. In Séminaire de
Probabilités, XIII (Univ. Strasbourg, Strasbourg, 1977/78), vol. 721 of Lect. Notes Math.,
260–280. Springer, Berlin.
Filipović, D. (2001). Consistency Problems for Heath–Jarrow–Morton Interest, vol. 60 of Lect.
Notes Math., Springer-Verlag, Berlin.
Filipović, D. and Trolle, A.B. (2013). The term structure of interbank risk. J. Financ. Econ.,
109(3), 707–773.
Filipović, D., Tappe, S., Teichmann, J. (2010). Term structure models driven by Wiener
processes and Poisson measures: Existence and positivity. SIAM J. Financ. Math., 1(1),
523–554.
Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. Springer, New York.
Heath, D., Jarrow, R., Morton, A. (1990). Bond pricing and the term structure of interest rates:
A discrete time approximation. J. Financ. Quantitative Analysis, 25(4), 419–440.
Heath, D., Jarrow, R., Morton, A. (1992). Bond pricing and the term structure of interest rates:
A new methodology for contingent claims valuation. Econometrica, 60(1), 77–105.
Ho, T.S. and Lee, S.B. (1986). Term structure movements and pricing interest rate contingent
claims. J. Finance, 41(5), 1011–1029.
An Arbitrage-free Large Market Model for Forward Spread Curves 89
Hull, J.C. (2015). Options, Futures, and Other Derivatives. Pearson Prentice Hall, New Jersey.
Hull, J.C. and White, A. (1990). Pricing interest-rate-derivative securities. Rev. Financ. Studies,
3(4), 573–592.
Hull, J.C. and White, A. (1994). Numerical procedures for implementing term structure models
II: Two-factor models. J. Derivatives, 2(2), 37–48.
Kijima, M. (2013). Stochastic Processes with Application to Finance. Chapman & Hall/CRC,
Florida.
Klein, I., Schmidt, T., Teichmann, J. (2016). No arbitrage theory for bond markets. In Advanced
Modelling in Mathematical Finance, Kallsen, J. and Papapantoleon, A. (eds), vol. 189 of
Springer Proc. Math. & Statist., pp. 381–421, Springer, Berlin.
Kotelenez, P. (1992). Comparison methods for a class of function valued stochastic partial
differential equations. Probab. Theory Related Fields, 93(1), 1–19.
Longstaff, F.A. and Schwartz, E.S. (1992). Interest rate volatility and the term structure: A
two-factor general equilibrium model. J. Finance, 47(4), 1259–1282.
Malyarenko, A., Nohrouzian, H., Silvestrov, S. (2020). An algebraic method for pricing
financial instruments on post-crisis market. In Algebraic Structures and Applications,
Silvestrov, S., Malyarenko, A., Rančić, M. (eds). Springer Nature, Berlin.
Mémin, J. (1980). Espaces de semi martingales et changement de probabilité. Z. Wahrsch. Verw.
Gebiete, 52(1), 9–39.
Milian, A. (2002). Comparison theorems for stochastic evolution equations. Stochastics Stoch.
Reports, 72(1–2), 79–108.
Musiela, M. (1993). Stochastic PDEs and term structure models. Journées internationales de
finance, IGR–AFFI, La Baule, France.
Nakayama, T. (2004). Viability theorem for SPDE’s including HJM framework. J. Math. Sci.
Univ. Tokyo, 11(1), 313–324.
Oertel, F. and Owen, M. (2007). On utility-based super-replication prices of contingent claims
with unbounded payoffs. J. Appl. Probab., 44(4), 880–888.
Rendleman, R.J. and Bartter, B. (1980). The pricing of options on debt securities.
J. Financ. Quantitative Analysis, 15(1), 11–24.
Taflin, E. (2011). Generalized integrands and bond portfolios: Pitfalls and counter examples.
Ann. Appl. Probab., 21(1), 266–282.
Vasicek, O. (1977). An equilibrium characterization of the term structure. J. Financ. Econ., 5(2),
177–188.
7
Healthy life expectancy (HLE) estimates are achieved after systematic work
done by a large group of researchers all over the world over the last few decades.
The most successful estimate was termed as HALE and is provided by the World
Health Organization (WHO) on their related website. Having established a
methodology of data collection and handling, the HLE can be estimated and
provided for researchers and policy makers.
However, there remains an unexplored period over the last few centuries where
LE (life expectancy) data exists along with the appropriate life tables, but not
enough information for HLE estimates is collected and stored. The problem has now
been solved following a methodology of estimating the HLE from the life tables
after the healthy life years lost (HLYL) estimation.
Our methodology on a Direct HLYL estimation from life tables is tested and
verified via a series of additional methods including a Weibull parameter test, a
Gompertz parameter alternative and of course a comparison with HALE estimates
from the WHO.
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
92 Applied Modeling Techniques and Data Analysis 2
The full life tables are used to estimate not only the LE but also the HLE, based
on an existing methodology (Skiadas and Skiadas 2020a).
Based on the data series from 1900 to 2016 for males and females in Sweden,
estimates until 2016 and forecasts to 2060 are done. The logistic model is fitted to
data series to calculate the three parameters of the model. Then, forecasts to 2060
are done. For fitting and long range forecasts, the logistic model is selected.
According to Sunding and Willner, in the years just before 1900, several
important key-points have been done in Sweden including:
– 1862: local government reform: establishment of the Landsting (county
councils, who take over the responsibility for hospitals);
– 1878: the National Medical Board (Medicinalstyrelsen) is founded;
– 1890: a chief provincial doctor is appointed in every county;
– 1891: the first sanatorium for lung disease sufferers in Sweden is opened.
The development of the first healthcare system of modern history started with
policies introduced by Otto von Bismarck’s social legislation (1883–1911). The
introduction of such systems in many countries came after important discoveries
from scientists such as Pasteur, Chamberland and Descomby in France, von Behring
in Germany, Kitasato from Japan and many others. The 1901 Nobel Prize in
Physiology or Medicine, the first one in that field, was awarded to von Behring for
his discovery of a diphtheria antitoxin.
It looks like the healthcare systems and methodologies already set in place in
1900 follow a rather systematic trend until today. See Figure 7.1 where LE data
series is provided by the human mortality database (HMD), and HLE is estimated
with our direct methodology (Skiadas and Skiadas 2018a, 2018b, 2020a, 2020b,
2020c). The LE series from 1751 to 1875 is strongly fluctuating mainly due to
health causes. The fluctuations become smaller after this period, with a clear
stabilization from 1900 until now except the strong decline during the 1918
Estimating the Healthy Life Expectancy (HLE) in the Far Past 93
influenza pandemic followed by a fast recovery later on. The period starting from
1950 is followed with a rather smooth trend as a result of the improvement of the
health system structure, financing, technology and pharmaceutical discoveries and
production.
80
60
40
20
0
1750 1800 1850 1900 1950 2000 2050
LE HLE
Figure 7.1. Life expectancy (LE) and healthy life expectancy (HLE) in Sweden,
females (1751–2016). For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
10
8
Age (Years)
0
1750 1800 1850 1900 1950 2000 2050
Year
Figure 7.2. Healthy life years lost (HLYL) in Sweden, females (1751–2016)
94 Applied Modeling Techniques and Data Analysis 2
The healthy life years lost (HLYL) calculated data series is illustrated in
Figure 7.2. The HLYL trend grows slightly until 1850 followed by a faster growth
until 11.35 years of age in 2016, after 5.69 years of age in 1751.
( )=
1+ −1 (− ( − (0)))
(0)
where b is the trend or diffusion parameter, F is the upper level of the sigmoid
logistic process and g(0) is the value at time T(0) = 1900.
80
60
Years of age
LE
HLE, our estimates
40
LE fit and forecasts to 2060
HLE fit & forecasts to 2060
20 HALE, WHO estimates
0
1900 1920 1940 1960 1980 2000 2020 2040 2060
Year
Figure 7.3. Logistic model fit and forecasts to 2060 for females in Sweden.
For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Estimating the Healthy Life Expectancy (HLE) in the Far Past 95
The latest WHO estimates for healthy life expectancy called HALE are provided
for the years 2000, 2005, 2010, 2015 and 2016.
These estimates perfectly fit into our calculations for the HLE and the fit results
by using the logistic model.
Our HLE calculations are based on the direct estimates from the life tables of the
HLYL with a formula provided in recent publications (Skiadas and Skiadas 2020a,
2020b, 2020c) that is:
=
∑
The logistic model is applied to data sets for LE and HLE from our estimates
from 1900 to 2016. The parameters selected appear in Table 7.1.
The HLYL are 7.40 years of age in 1900, 10.19 in 2016 and 12.15 in 2060 with a
maximum of F = 13.74 years of age difference.
Table 7.2 summarizes the three HLE estimates from the WHO (HALE), our
direct estimates and from logistic fit. All three methodologies provide close results.
Table 7.2. HALE and healthy life expectancy direct estimates and logistic fit
96 Applied Modeling Techniques and Data Analysis 2
7.4. Conclusion
We have solved the problem of finding the HLE in the far past. The case of
Sweden (1751–2016, females) with forecasts to 2060 and comparisons with HALE
have been explored. The selected logistic model has a good fit, while the HALE
estimates from the WHO compared very well to our estimates both with direct
method and the logistic fit.
7.5. References
Skiadas, C.H. and Skiadas, C. (2018a). Exploring the Health State of a Population by
Dynamic Modeling Methods. Springer, Cham, Switzerland.
Skiadas, C.H. and Skiadas, C. (2018b). Demography and Health Issues: Population Aging,
Mortality and Data Analysis. Springer, Cham, Switzerland.
Skiadas, C.H. and Skiadas, C. (2020a). Demography of Population Health, Aging and Health
Expenditures. Springer, Cham, Switzerland [Online]. Available at: https://www.springer.com/
gp/book/9783030446949.
Skiadas, C.H. and Skiadas, C. (2020b). Relation of the Weibull Shape Parameter with the
Healthy Life Years Lost Estimates: Analytical Derivation and Estimation from an
Extended Life Table. Springer, Cham, Switzerland.
Skiadas, C.H. and Skiadas, C. (2020c). Direct Healthy Life Expectancy Estimates from Life
Tables with a Sullivan Extension. Bridging the Gap Between HALE and Eurostat
Estimates. Springer, Cham, Switzerland.
Sundin, J. and Willner, S. (2007). Social Change and Health in Sweden: 250 Years of Politics
and Practice. Swedish National Institute of Public Health, Solna, Sweden.
8
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
98 Applied Modeling Techniques and Data Analysis 2
8.1. Introduction
The rationale for the influenza vaccination is based on the need to protect the
health of workers and vulnerable patients, as well as ensuring the proper functioning
of health services during influenza seasons. Acceptance by health care staff helps
build confidence in vaccination and better prepare the health system for the next
influenza pandemic (Ministry of Health 2019a, 2019b, 2019c).
The sample of respondents comes from medical and other staff in Health Centers
(HC) and Local Health Units (LHU) in the prefecture of Chania, from February to
March 2020. 80% of the questionnaires were answered. Of these, 63 people (40.4%)
work in the first and second Chania LHU and the first and second Chania HC
(structures within the city of Chania), while the remaining 93 (59.6%) work in the
Kissamos HC, the Kandanos HC and the Vamos HC (structures outside the city of
Chania) (Figure 8.1).
Vaccination Coverage Against Seasonal Influenza of Workers 99
In all participants, women made up 69.2% of the sample, with the largest age
group being 45–54 years (52, 33.3%), and the smallest being 55+ (27, 17.3%).
Regarding the distribution, 59 women constituted 63.4% of the sample in the out of
town HC/LHU, while 49 women constituted 77.8% in the inner city, with the gender
distribution not showing a statistically significant difference (p = 0.057) (Table 8.1).
HC/LHU location
Inside city (93) Outside city (63) Total
n % n % n % p
Male 34 36.60 14 22.20 48 30.80 0.057
Gender
Female 59 63.40 49 77.80 108 69.20
<=34 24 25.80 14 22.20 38 24.40 0.351
35–44 20 21.50 19 30.20 39 25.00
Age group
45–54 35 37.60 17 27.00 52 33.30
55+ 14 15.10 13 20.60 27 17.30
Table 8.1. Age and gender distribution in terms of HC/LHU inside/outside the city
100 Applied Modeling Techniques and Data Analysis 2
In terms of education, the largest percentage had a university degree (84, 53.8%)
in total and also at HC/LHU outside (51, 54.8%) and inside the city (33, 52.4%),
while from out of the total of 25 people who had a postgraduate degree, 20 (80.0%)
had a master’s degree (Table 8.2).
HC/LHU location
Inside city (93) Outside city (63) Total
n % n % n % p
Medical 49 52.70 18 29.00 67 43.20 0.011
Staff Nurses 18 19.40 15 24.20 33 21.30
Others 26 28.00 29 46.80 55 35.50
<1 11 11.80 5 7.90 16 10.30 0.08
1–5 15 16.10 14 22.20 29 18.60
6–10 16 17.20 5 7.90 21 13.50
Work 11–15 12 12.90 7 11.10 19 12.20
experience
(years) 16–20 11 11.80 6 9.50 17 10.90
21–25 11 11.80 3 4.80 14 9.00
26–30 4 4.30 10 15.90 14 9.00
>30 13 14.00 13 20.60 26 16.70
High school 14 15.10 4 6.30 18 11.50 0.225
Occupational
10 10.80 7 11.10 17 10.90
training institute
Education Technological
Educational 18 19.40 19 30.20 37 23.70
institute
University 51 54.80 33 52.40 84 53.80
8.3. Results
The vaccination coverage in total for the periods 2017–2018, 2018–2019 and
2019–2020 is presented in Figure 8.2. There is an increasing trend in the vaccination
of employees with 65 respondents (41.7%) for 2017–2018, 74 respondents (48, 4%)
for 2018–2019 and 89 respondents (57.1%) for 2019–2020. Also, three respondents
(1.9%) chose I do not know/I do not want to answer for the vaccination 2017–2018,
four (2.6%) for the vaccination 2018–2019 and none for 2019–2020.
80%
41,7
48,4
57,1
60%
%Frequensy
40%
56,4
51,0
20% 42,9
0%
Vaccination 17-18 Vaccination 18-19 Vaccination 19-20
In all three study periods, the highest vaccination rates were observed at the age
<= 34 years (2017–2018: 50.0%, 2018–2019: 63.9%, 2019–2020: 76.3%); however,
there was no statistically significant difference in the periods 2017–2018 (p = 0.399)
and 2018–2019 (p = 0.132), but only in the period 2019–2020 (p = 0.012).
Figure 8.3 shows the % vaccination frequency for the period 2017–2020 in all
three occupational categories of participants. Doctors show a similar frequency of
52.2% (40.4%–63.9%) with nurses 51.5% (34.9%–67.8%) for the period
2017–2018, while the vaccination frequency for the rest of the staff was low, at
23.6% (13.9%–36.0%). There was a statistically significant difference between the
above percentages (p = 0.003).
For the period 2018–2019 despite the fact that the frequency of vaccination of
nurses increased to 60.6% (43.6%–75.8%) and the rest of the staff to 35.8%
(24.0%–49.2%), the frequency of vaccination of doctors remained relatively
constant with the previous period 53.0% (41.4%–64.7%). No statistically significant
difference was observed (p = 0.053).
104 Applied Modeling Techniques and Data Analysis 2
Table 8.5 presents the distribution by the type of staff and in total, of the
“impulses” (motivations, measures, views) and also the reasons for prevention of
vaccination. Family protection (96, 65.3%) and the need for self-protection (93,
63.7%) are the two most common “impulses” recorded for vaccination, with no
significant variations between the types of staff. A statistically significant difference
(p = 0.037) occurred in the “need to protect patients”, where the rest of the staff
responded positively at a rate of 30.0% (n = 15).
Finally, in terms of vaccine prevention, the most common are fears about the
safety of the vaccine and its side effects (53, 38.7%), inactivity for vaccination
(35, 25.5%) and the belief that the respondent will not become ill (29, 21.2%). Three
of the remaining staff (6.5%) were “anti-vaccinators” which varied the results
(p = 0.048). Another statistical differentiation with p = 0.005 was observed
regarding the availability of the vaccine in the workplace in HC/LHU (eight doctors
13.1%).
Vaccination Coverage Against Seasonal Influenza of Workers 105
Staff
n % n % n % n % p
Staff
n % n % n % n % p
Belief that I will not get sick 13 21.30 8 26.70 8 17.40 29 21.20 0.626
8.4. Discussion
Similar surveys have been conducted in primary and secondary health facilities
in Greece and abroad.
106 Applied Modeling Techniques and Data Analysis 2
In a study by Maltezou et al. (2007) for the period 2005–2006, conducted in 132
hospitals in Greece, the influenza vaccination rates of health workers were 16.36%,
when in the previous year, they were only 1.72%. In 2006–2007 (Maltezou et al.
2008), the average influenza vaccination was 5.8%, in which 89.1% did so in order
to protect themselves, 59.1% to protect their family and 55.2% their patients. The
main reasons for refusing vaccination were the perceptions about non-disease
(43.2%), about the ineffectiveness of the vaccine (19.2%) and the fear of its side
effects (33.4%). In 2009, researchers (Maltezou et al. 2010) recorded the intention
of health professionals in 92 hospitals and 60 Health Centers in Greece for influenza
vaccination. 21.8% stated that they intended to be vaccinated, while the main
reasons for refusing vaccination were the fear of the safety of the vaccine (43.1%),
insufficient information (27.8%) and the perception that there is no risk of influenza
(10.7%).
In the study conducted by Durando et al. (2016) in Italy, during the period
2013–2014, almost half of the study population had never been vaccinated against
influenza between 2008 and 2014. In the study conducted by Petek and Kamnik-Jug
(2018) in PHC units in Slovenia, only 12% of health professionals were vaccinated
during the period 2014/2015. Motivation for vaccination was the fear of risk of
infection in the workplace, self-protection and the protection of family and
colleagues. The main obstacles were the doubt about the effectiveness of the
vaccine, the fear of side effects and the belief that they are not at high risk of
infection from influenza.
The research showed that the percentage of vaccination coverage against the
seasonal influenza of PHC employees in the prefecture of Chania shows an
increasing trend in the last three years. However, there is room for improvement, so
health structures should implement policies to encourage and promote influenza
vaccination. The role of the administration and scientific community is important in
order, through proper guidance and education, to change the mentality in favor of
prevention and overthrow the dangerous anti-vaccination culture that was based on
the phobia about the possible side effects of the vaccine.
8.5. References
Dominguez, A., Godoy, P., Castilla, G. (2013). Knowledge of and attitudes to influenza
vaccination in healthy primary healthcare workers in Spain, 2011–2012. PLoS ONE,
8(11), e81200.
Durando, P., Alicino, C., Dini, G. (2016). Determinants of adherence to seasonal influenza
vaccination among healthcare workers from an Italian region: Results from a
cross-sectional study. BMJ Open, 6(5), e010779.
Jianxing, Y., Xiang, R., Chuchu, Y. (2019). Influenza vaccination coverage among registered
nurses in China during 2017–2018. An Internet Panel Survey. Vaccines, 7(4), 134.
Maltezou, H.C., Maragos, A., Halharapi, T. (2007). Factors influencing influenza vaccination
rates among healthcare workers in Greek hospitals. Journal of Hospital Infection, 66(2),
156–159.
Maltezou, H.C., Maragos, A., Katerelos, P. (2008). Influenza vaccination acceptance among
health-care workers: A nationwide survey. Vaccine, 26(11), 1408–1410.
Maltezou, H.C., Dedoukou, X., Patrinos, S. (2010). Determinants of intention to get
vaccinated against novel (pandemic) influenza A H1N1 among health-care workers in a
nationwide survey. Journal of Infection, 61(3), 252–258.
Ministry of Health (2019a). Influenza vaccination of health care personnel [Online]. Available at:
https://eody.gov.gr/wp-content/uploads/2019/01/antigripikos-emvoliasmos-prosopikou-yy.pdf
[Accessed April 2020].
Ministry of Health (2019b). Instructions for seasonal influenza 2019–2020. Influenza
vaccination [Online]. Available at: https://www.moh.gov.gr/articles/health/
dieythynsh-dhmosias-ygieinhs/metadotika-kai-mh-metadotika-noshmata/c388-egkyklioi/
6474-odhgies-gia-thn-epoxikh-griph-2019-2020-ndash-antigripikos-emboliasmos?fdl=15434
[Accessed April 2020].
108 Applied Modeling Techniques and Data Analysis 2
Ministry of Health (2019c). Influenza vaccination action plan [Online]. Available at:
https://eody.gov.gr/wp-content/uploads/2019/01/antigripikos-emvoliasmos-prosopikou-yy.pdf
[Accessed April 2020].
National Public Health Organization (2020). Influenza vaccination of health services
personnel influenza season 2019–2020 [Online]. Available at: https://eody.gov.gr/
wp-content/uploads/2020/05/ekthesi_emvoliasmos_ergazomenon_gripi_2019-2020.pdf
[Accessed June 2020].
Petek, D. and Kamnik-Jug, K. (2018). Motivators and barriers to vaccination of health
professionals against seasonal influenza in primary healthcare. BMC Health Services
Research, 18(1), 853.
9
9.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
110 Applied Modeling Techniques and Data Analysis 2
been characterized by the emergence of fatal viruses that led to both epidemics and
pandemics such as MERS-CoV (Middle East respiratory syndrome coronavirus) and
SARS-CoV (severe acute respiratory syndrome coronavirus) and the Ebola virus.
Their most prominent common characteristics were the large-scale outbreaks that
followed their initial emergence, as well as both the high fatality and basic
reproduction rates (R0) (Callaway et al. 2020; Kaswa and Govender 2020; Petersen
et al. 2020).
In the early months of 2020, humanity was rocked by the emergence of the novel
severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that initially induced
various types of pneumonia but that ultimately led to a severe public health crisis
worldwide. This is despite the fact that, according to the epidemiological data
gathered thus far, the latter virus, responsible for COVID-19, presents lower fatality
and R0 rates (i.e. the estimated number of new infections from a single case)
compared to the former pathogens. Its severity therefore lies in the fact that it seems
to be able to spread very easily (Abebe et al. 2020; Callaway et al. 2020; Khalili et al.
2020). Soon after its detection, COVID-19 was characterized as a highly contagious
disease, leading to a global effort to investigate it in order to inhibit its spread.
9.2. Background
Lungs contain a large number of the ACE-2 enzyme and, therefore, can be very
vulnerable to a SARS-CoV-2 infection (Prajapati et al. 2020). This might be the
reason why, in the case of respiratory system, infections are the most common. For
example, according to worldwide studies, the largest percentage of patients present
the following: dry cough, shortness of breath, intense dyspnea and tachypnea,
sputum, as well as fever (Guan et al. 2020; Huang et al. 2020; Tabata et al. 2020;
Wang et al. 2020; Zhang et al. 2020). In addition, headache, loss of smell, nasal
obstruction, rhinorrhea and sore throat have also been reported (European Center for
Disease Prevention and Control 2020).
Neurological problems have also been reported in various cases. Issues related to
the sensory system, such as anosmia, hypogeusia and dysgeusia are among the most
common (Aghagoli et al. 2020; Fiani et al. 2020). Fewer, but rather more severe, are
the cases that involve paresthesia, altered mental status and encephalopathy which
so far appear to be associated with previous health problems (Demirci Otluoglu et
al. 2020; Poyiadji et al. 2020).
9.2.3. Diagnosis
COVID-19 has spread throughout the world, yet for a number of countries, no
cases have been reported, such as North Korea, Turkmenistan and the Solomon
Islands. Since the public release of the COVID-19 genetic sequence on 12 January
2020, studies are focused on an understanding of its genome sequence, its ability to
114 Applied Modeling Techniques and Data Analysis 2
replicate and its probable vulnerabilities. Mutation might be one of the most
prominent characteristics of a virus that secures its propagation. Even though the
mutational extent of COVID-19 is not well established at the time of writing, it
seems probable given it is the basis of its survival mechanism (Chan et al. 2020;
Pachetti et al. 2020). Recent genome-wide studies show frequent mutations in
residues of protein structures, suggesting a probable correlation to the virus’s
adaptability and transmission (Kaushal et al. 2020; Laha et al. 2020).
On January 14, 2020, the World Health Organization (WHO) confirmed human-
to-human transmission of COVID-19 (Figure 9.1). It rapidly became clear that the
significant characteristic of this coronavirus is its strong ability to transmit, mainly
through respiratory droplets produced by the coughing or sneezing of an infected
individual. These droplets can easily find their way into the lungs of the new host
via inhalation. Contact transmission via a contagious object has also been reported
as a potential route. Finally, there is aerosol transmission, the process by which
respiratory droplets mix into the air, become aerosols and lead to an infection when
inhaled in large quantities (Adhikari et al. 2020; Xu et al. 2020).
The basic reproduction number (R0), in other words the average number of
infections generated by one individual, plays a substantial role in quantifying
transmission of the virus. According to research works, during the outbreak, China
presented an average 3.28 R0 value that ranged between 2.0 and 6.49 depending on
the analyses and sample sizes (Chen and Wang 2020; Liu et al. 2020; Zhao et al.
2020). During the same period, European countries presented a 4.22 mean R0
whereas Romania, Germany, the Netherlands and Spain showed the highest values
(5.19–6.06) (Linka et al. 2020).
The sudden outbreak of COVID-19 and the extent of the deaths worldwide – that
soon surpassed those of SARS-CoV – raised essential questions concerning the
Some Remarks on the Coronavirus Pandemic in Europe 115
nature of the virus and the probable populations at high risk. Until now, older age
and underlying diseases (such as cardiovascular diseases, chronic respiratory
problems and diabetes), as well as obesity, are considered to be positively correlated
to a rapid and severe evolution of the infection (Davies et al. 2020; Fang et al. 2020;
Huang et al. 2020; Khalili et al. 2020). In addition, the biological sex of the
individual has been identified as a key epidemiological factor in the case of COVID-19.
In almost all of the published case studies worldwide, males do seem to be generally
more susceptible to the virus with fatality numbers presenting a male-to-female ratio
of 2:1 (Falahi and Kenarkoohi 2020; Garcia 2020; Jin et al. 2020; Ryan et al. 2020;
Sun et al. 2020).
The outbreak of the COVID-19 pandemic provoked a dynamic response from the
medical community and the consequent global adoption of public measures,
restrictions and surveillance strategies. The central aim of public measures was
focused on containing the virus. Therefore, the initial focus globally was to shield
the vulnerable populations and health workers, gaining, at once, the essential time
needed to understand the disease and to build up therapeutic strategies.
116 Applied Modeling Techniques and Data Analysis 2
Some of the most widely adopted measures included daily official (governmental
and medical communities) communication to the public of current findings related
to the pandemic, as well as guidance urging all citizens to self-monitor and to
prevent interpersonal spread. In addition, there was the communication of protocols
concerning hand hygiene, the use of masks, home disinfection, and the most
up-to-date list of “suspicious” symptoms related to COVID-19, which preoccupied
all forms of social media. The establishment of 24-hour health hotlines also became
a global phenomenon.
Table 9.1. COVID-19 suspected case criteria (adapted from the WHO:
https://www.who.int/emergencies/diseases/novelcoronavirus-2019)
Finally, there was variability between countries in terms of testing policies. This
was related both to the extent of the applied procedures (i.e. the number of tested
individuals per community) and of the applied protocols regarding the criteria
according to which someone should be considered as a COVID-19 patient.
3) It has been observed that the economic status of a country played an essential
role in the type of policy adopted in the face of the pandemic. This is not only related,
for example, to the economic costs incurred by PCR tests or by medical/hospital
treatments, but also to the fact that poorer countries are expected to have limited
resources in consumables, antivirus drugs, up-to-date medical information and so on.
Above all, the economic cost generated by the pandemic is mainly related to the
degree of economic recession undergone in each of these countries.
If a person falls ill, do they have the same access to testing as in all the other
European countries? Thus far, and according to the aforementioned issues, the
answer appears to be negative.
9.2.6. The role of statistical research in the case of COVID-19 and its
challenges
So far, analysis tools, indexes for the quantification of the disease’s characteristics,
forecasting models to observe the evolution of the virus, as well as epidemiological
schemes that demonstrate transmission, contagiousness and vulnerability of
populations and groups, have been used as a support to medical research and
governmental strategies. However, despite this enormous contribution, the certainty
and objectivity of generated results raise an important issue. As always, issues related
to sample sizes and to the objectivity of the implemented parameters are severe
limitations that might drastically alter all prognoses and outcomes.
For our research, we collected data from all the daily updates from official online
resources. World Health Organization (WHO) and Centers for Disease Control
120 Applied Modeling Techniques and Data Analysis 2
According to the official data provided up until May 2020, we investigated the
number of tests applied, as well as the reported COVID-19 cases and deaths for each
of the European countries. A simple way to report such data is by presenting a time
series of the absolute number of cases observed in each country by gender and
applying a mathematical model to it. However, such an illustration – while very
useful for recording registered cases (hospitalized or not) on a daily basis and for the
development of appropriate patient institutionalization policies – is of little
importance for the comprehension of the prevalence of the SARS-CoV-2 virus and
the COVID-19 disease in each population and for intra/inter-population
comparisons. The reason is simple: different countries include different populations.
One solution could be the calculation of rates, i.e. cases or deaths per 100,000 or
1,000,000 population per gender.
The last question is the most important concerning the actual prevalence of
COVID-19 within a specific population. It is well known that most younger people
are asymptomatic during the carrier stage. Then reasonably enough, these people
will never be examined for this disease even though they can transmit it to other
people. The same of course holds for people of other ages being either asymptomatic
carriers or having mild symptoms. In the latter case, it depends on the country if one
is to be tested for the coronavirus. Therefore, it is expected that most of the
asymptomatic carriers in a country would be unknown and the same occurs with
many patients who have mild symptoms.
From this point of view, it is actually difficult and complex to estimate the
prevalence of the present virus, whereas all available numbers correspond only to
the individuals that have been tested. Is it possible to consider these people as a
random sample in order to refer to the real prevalence of the virus within the
population and thus of the disease itself? The answer is not simple since all of the
COVID-19 carriers are not equally included within these samples.
Some Remarks on the Coronavirus Pandemic in Europe 121
To conclude, it is questionable whether the available data can describe the real
prevalence of COVID-19. Even though such data could be available, for inter- and
intra-population comparisons, the results of the analysis require a disregard of any
differentiation in social and population structures. It is self-evident that a population
mainly consisting of aged individuals would respectively present higher numbers of
COVID-19 deaths or elevated COVID-19 cases than others, a well-known
phenomenon in demography.
In this way, the following section presents some of the preliminary statistical
analyses.
Statistical analyses indicate that the average number of total COVID-19 cases
per 100,000 population and country is 220.2 ± 303.8. Until May 2020, the largest
percentage of the employed countries reported more than 100 cases per 100,000
individuals. The classification of the European countries on the basis of the standard
deviation of the above-mentioned mean “prevalence rate” revealed a four-scale
scheme (Figures 9.4a, b and c): 25 countries belong to a differentiated group in
which the number of cases per 100,00 population is low (-0.50 Std. Dev. group), 19
lie in the range -0.50 to 0.50 standard deviations, whereas only nine lie in the range
between 0.50 and 1.50. In a small number of countries, the prevalence of the disease
was more intense.
As shown in Figure 9.4a, the low prevalence countries are of Eastern Europe and
the Balkans. Because many of these countries are among the poorer in Europe, an
open question remains to be addressed. Does this finding represent a smaller or
delayed development of the pandemic, or might it also be attributed to a partial or
total inability to identify the intensity of the pandemic in some of these countries?
Obviously enough, a lower prevalence rate characterizes this part of the continent,
although health infrastructure, economic and other types of problems may have
played an important role in the quantification of the pandemic. This question
remains open for further investigation.
122 Applied Modeling Techniques and Data Analysis 2
Figure 9.4a. Total cases per 100,000 population with low prevalence. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Figure 9.4b. Total cases per 100,000 population with medium prevalence. For a
color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
Many countries lie very close to the European average (Figure 9.4b). These are
spatially, politically, economically and culturally diversified, spread all over the
Some Remarks on the Coronavirus Pandemic in Europe 123
Figure 9.4c. Total cases per 100,000 population with a high prevalence. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
LU
600
IS
ES
IE
BE
400
IT y=79,75+0,03*x
CH
GB
SE BY PT
NL
FR
DE
200
IL DK
TR AT
RS RU
EE
FI NO
RO CZ
HR
LV LT
UA PL
SI
KZ
0 BG HU SK
GR
Figure 9.5. Tests and cases per 100,000 persons. For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
200
FR
SE
NL
150
Group A
GB
ES
BE
100
TR IE
CH
UA
IT
Group B
BY
LU
RS
RO DE
50 AT PT
HR FI
RU
BG IL y=74,05-3,51E-3*x
IS
DK
PL CZ NO
SI Group C
HU EE
GR LV
SK LT
0 KZ
Figure 9.6. Cases per 1,000 tests. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
In any case, Figure 9.7 illustrates the number of deaths per 100,000 positive
COVID-19 recorded cases. However, it must be stressed that this is not a measure of
mortality. Such an estimation requires age standardization of mortality rates in order
to become comparable between the countries. Also, it requires detailed death by age
data, which are either partially known or absent, as well as detailed information of
the age structure of each population for the year 2020, which is unavailable for the
moment. It is also not a measure of fatality. Such a measure requires detailed,
accurate data of COVID-19 prevalence, which is largely problematic too, as
discussed previously. Thus, the result presented in Figure 9.9 must be understood as
a crude and limited estimation of the pandemic.
Keeping in mind the above, the European countries were classified in Figure 9.7
on the basis of standard deviation of the mean rate. The most diverse country is
France, where the estimations show the vast effect of the COVID-19 pandemic.
126 Applied Modeling Techniques and Data Analysis 2
However, it is important to bear in mind that France was also one of the most thrifty
countries if one takes into consideration the relationship between the tests conducted
and the COVID-19 recorded cases. The second group of countries is formed by
Belgium, the United Kingdom, Italy, Hungary, the Netherlands and Sweden. Spain
ranks in eighth position. All the other countries cluster into two groups. They are
either very close to the European average (22 countries) or lower (22 countries).
Thus, the COVID-19 pandemic is evident but diverse among the countries of the
European continent, given the problems of the data, of course.
results we will obtain by the promising vaccinations, will surely broaden our
knowledge on this pandemic, offering at the same time new insights concerning the
quantification of large-scale epidemiological data.
9.6. References
Abdin, S.M., Elgendy, S.M., Alyammahi, S.K., Alhamad, D.W., Omar, H.A. (2020). Tackling
the cytokine storm in COVID-19, challenges, and hopes. Life Sciences, 257, 118054.
Abebe, E.C., Dejenie, T.A., Shiferaw, M.Y., Malik, T. (2020). The newly emerged COVID-19
disease: A systemic review. Virology Journal, 17(1), 96.
Adhikari, S.P., Meng, S., Wu, Y.J., Mao, Y.P., Ye, R.X., Wang, Q.Z., Sun, C., Sylvia, S.,
Rozelle, S., Raat, H., Zhou, H. (2020). Epidemiology, causes, clinical manifestation and
diagnosis, prevention and control of coronavirus disease (COVID-19) during the early
outbreak period: A scoping review. Infectious Diseases of Poverty, 9(1), 29.
Agarwal, S. and Agarwal, S.K. (2020). Endocrine changes in SARS-CoV-2 patients and
lessons from SARS-CoV. Postgraduate Medical Journal, 96(1137), 412–416.
Aghagoli, G., Gallo Marin, B., Katchur, N.J., Chaves-Sell, F., Asaad, W.F., Murphy, S.A.
(2020). Neurological involvement in COVID-19 and potential mechanisms: A review.
Neurocritical Care, July 13, 1–10.
Banerjee, A., Kulcsar, K., Misra, V., Frieman, M., Mossman, K. (2019). Bats and
coronaviruses. Viruses, 11(1), E41.
Bi, Q., Wu, Y., Mei, S., Ye, C., Zhou, X., Zhang, Z. (2020). Epidemiology and transmission
of Covid-19 in 391 cases and 1286 of their close contacts in Shenzhen, China:
A retrospective cohort study. The Lancet Infectious Diseases, 20(8).
Callaway, E., Cyranoski, D., Mallapaty, S., Stoye, E., Tollefsom, J. (2020). Coronavirus by
the numbers. Nature, 579, 482–483.
Chan, J.F., Kok, K., Zhu, Z., Chu, H., To K.K., Yuan, S., Yuen, K.Y. (2020a). Genomic
characterisation of the 2019 novel human-pathogenic coronavirus isolated from a patient
with atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections, 9,
221–236.
Chan, A.P., Choi, Y., Schork, N.J. (2020b). Conserved genomic terminals of SARS-CoV-2 as
co-evolving functional elements and potential therapeutic targets. BioRxiv: The Preprint
Server for Biology, 7(6), 190–207.
Dasari, C.M. and Bhukya, R. (2020). Comparative analysis of protein synthesis rate in
COVID-19 with other human coronaviruses. Infection, Genetics and Evolution: Journal
of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases, 85,
104–432.
128 Applied Modeling Techniques and Data Analysis 2
Davies, N.G., Klepac, P., Liu, Y., Prem, K., Jit, M. (2020). CMMID COVID-19 working
group, Eggo, R.M. Age-dependent effects in the transmission and control of COVID-19
epidemics. Nature Medicine, 26, 1205–1211.
Demirci Otluoglu, G., Yener, U., Demir, M.K., Yilmaz, B. (2020). Encephalomyelitis
associated with Covid-19 infection: Case report. British Journal of Neurosurgery, 1–3.
European Centre for Disease Prevention and Control (2020). Clinical characteristics of
COVID-19 [Online]. Available at: https://www.ecdc.europa.eu/en/covid-19/latest-
evidence/clinical.
Falahi, S. and Kenarkoohi, A. (2020). Sex and gender differences in the outcome of patients
with COVID-19. Journal of Medical Virology, 93(1), 151–152.
Fang, B. and Meng, Q.H. (2020). The laboratory’s role in combating COVID-19. Critical
Reviews in Clinical Laboratory Sciences, 1–15.
Fang, X., Li, S., Yu, H., Wang, P., Zhang, Y., Chen, Z., Li, Y., Cheng, L., Li, W., Jia, H., Ma, X.
(2020). Epidemiological, comorbidity factors with severity and prognosis of COVID-19:
A systematic review and meta-analysis. Aging, 12(13), 12493–12503.
Fiani, B., Covarrubias, C., Desai, A., Sekhon, M., Jarrah, R. (2020). A contemporary review
of neurological sequelae of COVID-19. Frontiers in Neurology, 11, 640.
Garcia, L.P. (2020). Sex, gender and race dimensions in COVID-19 research. Dimensões de
sexo, gênero e raça na pesquisa sobre COVID-19. Epidemiologia e servicos de saude:
revista do sistema unico de saude do brasil, 29(3), e20202207.
Guan, W.J., Ni, Z.Y., Hu, Y. (2020). Clinical characteristics of coronavirus disease 2019 in
China. New England Journal of Medicine, 382, 1708–1720.
Gupta, A., Madhavan, M.V., Sehgal, K., Nair, N., Mahajan, S., Sehrawat, T.S., Bikdeli, B.,
Ahluwalia, N., Ausiello, J.C., Wan, E.Y., Freedberg, D.E., Kirtane, A.J., Parikh, S.A.,
Maurer, M.S., Nordvig, A.S., Accili, D., Bathon, J.M., Mohan, S., Bauer, K.A., Leon,
M.B., Landry, D.W. (2020). Extrapulmonary manifestations of COVID-19. Nature
Medicine, 26(7), 1017–1032.
Gulati, A., Pomeranz, C., Qamar, Z., Thomas, S., Frisch, D., George, G., Summer, R.,
De Simone, J., Sundaram, B. (2020). A comprehensive review of manifestations of novel
coronaviruses in the context of deadly COVID-19 global pandemic. The American
Journal of the Medical Sciences, 360(1), 5–34.
Hampton, T. (2005). Bats may be SARS reservoir. JAMA. 294(18), 2291.
Hardenberg, J.H. and Luft, F.C. (2020). Covid-19, ACE2, and the kidney. Acta Physiologica
(Oxford, England), 230(1), e13539.
Hoffmann, M., Kleine-Weber, H., Krüger, N., Müeller, M.A., Drosten, C., Pöhlmann, S.
(2020). The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor
ACE2 and the cellular protease TMPRSS2 for entry into target cells [Online]. Available
at: https://doi.org/10.1101/2020.01.31.929042.
Some Remarks on the Coronavirus Pandemic in Europe 129
Honigsbaum, M. (2020). Revisiting the 1957 and 1968 influenza pandemics. Lancet (London,
England), 395(10240), 1824–1826.
Huang, C., Wang, Y., Li, X. (2020). Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China. Lancet, 395(10223), 497– 506.
Jang, Y. and Seo, S.H. (2020). Gene expression pattern differences in primary human
pulmonary epithelial cells infected with MERS-CoV or SARS-CoV-2. Archives of
Virology, 165, 2205–2211.
Jin, J.M., Bai, P., He, W., Wu, F., Liu, X.F., Han, D.M., Liu, S., Yang, J.K. (2020). Gender
differences in patients with COVID-19: Focus on severity and mortality. Frontiers in
Public Health, 8, 152.
Kaswa, R. and Govender, I. (2020). Novel coronavirus pandemic: A clinical overview. South
African Family Practice (2004), 62(1), e1–e5.
Kaushal, N., Gupta, Y., Goyal, M., Khaiboullina, S.F., Baranwal, M., Verma, S.C. (2020).
Mutational frequencies of SARS-CoV-2 genome during the beginning months of the
outbreak in U.S.A. Pathogens (Basel, Switzerland), 9(7), E565.
Khalili, M., Karamouzian, M., Nasiri, N., Javadi, S., Mirzazadeh, A., Sharifi, H. (2020).
Epidemiological characteristics of COVID-19: A systematic review and meta-analysis.
Epidemiology and Infection, 148, e130.
Kilbourne, E.D. (2006). Influenza pandemics of the 20th century. Emerging Infectious
Diseases, 12(1), 9–14.
Kreitmann, L., Monard, C., Dauwalder, O., Simon, M., Argaud, L. (2020). Early bacterial
co-infection in ARDS related to COVID-19. Intensive Care Medicine, July 13, 1–3.
Laha, S., Chakraborty, J., Das, S., Manna, S.K., Biswas, S., Chatterjee, R. (2020).
Characterisations of SARS-CoV-2 mutational profile, spike protein stability and viral
transmission. Infection, Genetics and Evolution: Journal of Molecular Epidemiology and
Evolutionary Genetics in Infectious Diseases, 85, 104445.
Lai, M.M.C. and Holmes, K.V. (2001). The viruses and their replication. In Coronaviridae,
4th edition, Knipe, D.M. and Howley, P.M. (eds). Lippincott, Williams & Wilkins,
Philadelphia.
Lam, T.T., Jia, N., Zhang, Y. (2020). Identifying SARS-CoV-2-related coronaviruses in
Malayan pangolins. Nature, 583, 282–285.
Li, F. (2016). Structure, function, and evolution of coronavirus spike proteins. Annual Review
of Virology, 3, 237–261.
Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J.H. (2005). Bats are natural reservoirs
of SARS-like coronaviruses. Science, 310(5748), 676–679.
Liao, D., Zhou, F., Luo, L., Xu, M., Wang, H., Xia, J., Gao, Y., Cai, L., Wang, Z., Yin, P.,
Wang, Y., Tang, L., Deng, J., Mei, H., Hu, Y. (2020). Haematological characteristics and
risk factors in the classification and prognosis evaluation of COVID-19: A retrospective
cohort study. The Lancet. Haematology, S2352-3026(20)30217-9.
130 Applied Modeling Techniques and Data Analysis 2
Linka, K., Peirlinck, M., Kuhl, E. (2020). The reproduction number of COVID-19 and its
correlation with public health interventions. Computational Mechanics, July 28, 1–16.
Liu, J.W., Bi, Y., Wang, D., Gao, G.F. (2018). On the centenary of the Spanish flu: Being
prepared for the next pandemic. Virologica Sinica, 33, 463–466.
Liu, Z., Xiao, X., Wei, X., Li, J., Yang, J., Tan, H. (2020a). Composition and divergence of
coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts
of SARS-CoV-2. J. Med. Virol., 92(6),595-601.
Liu, Y., Gayle, A.A., Wilder-Smith, A., Rocklöv, J. (2020b). The reproductive number of
COVID-19 is higher compared to SARS coronavirus. Journal of Travel Medicine, 27(2),
taaa021.
Lu, R., Zhao, X., Li, J., Niu, P., Yang, B., Wu, H., Wang, W., Song, H., Huang, B., Zhu, N.,
Bi, Y., Ma, X., Zhan, F., Wang, L., Hu, T., Zhou, H., Hu, Z., Zhou, W., Zhao, L.,
Chen, J., Meng, Y., Wang, J., Lin, Y., Yuan, J., Xie, Z., Ma, J., Liu, W.J., Wang, D.,
Xu, W., Holmes, E.C., Gao, G.F., Wu, G., Chen, W., Shi, W., Tan, W. (2020). Genomic
characterisation and epidemiology of 2019 novel coronavirus: Implications for virus
origins and receptor binding. Lancet, 395(10224), 565–574.
Malik, Y.S., Sircar, S., Bhat, S., Sharun, K., Dhama, K., Dadar, M., Tiwari, R., Chaicumpa, W.
(2020). Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary
perspective based on genome analysis and recent developments. The Veterinary
Quarterly, 40(1), 68–76.
Marra, M.A., Jones, S.J., Astell, C.R., Holt, R.A., Brooks-Wilson, A., Butterfield, Y.S.,
Khattra, J., Asano, J.K., Barber, S.A., Chan, S.Y., Cloutier, A., Coughlin, S.M. (2003).
The genome sequence of the SARS-associated coronavirus. Science, 1399–1404.
Masters, S.P. (2006). The molecular biology of coronaviruses. Advances in Virus Research,
66, 193–292.
Momtazmanesh, S., Shobeiri, P., Hanaei, S., Mahmoud-Elsayed, H., Dalvi, B., Malakan Rad, E.
(2020). Cardiovascular disease in COVID-19: A systematic review and meta-analysis of
10,898 patients and proposal of a triage risk stratification tool. The Egyptian Heart
Journal: (E.H.J.): Official Bulletin of the Egyptian Society of Cardiology, 72(1), 41.
Nepal, G., Rehrig, J.H., Shrestha, G.S., Shing, Y.K., Yadav, J.K., Ojha, R., Pokhrel, G.,
Tu, Z.L., Huang, D.Y. (2020). Neurological manifestations of COVID-19: A systematic
review. Critical Care (London, England), 24(1), 421.
Neuman, B.W., Kiss, G., Kunding, A.H., Bhella, D., Baksh, M.F., Connelly, S., Droese, B.,
Klaus, J.P., Makino, S., Sawicki, S.G. (2011). A structural analysis of m protein in
coronavirus assembly and morphology. J. Struct. Biol., 174(1), 11–22.
Oliviero, A., de Castro, F., Coperchini, F., Chiovato, L., Rotondi, M. (2020). COVID-19
pulmonary and olfactory dysfunctions: Is the chemokine CXCL10 the common
denominator? The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology
and Psychiatry, July 13, 1073858420939033.
Some Remarks on the Coronavirus Pandemic in Europe 131
Pachetti, M., Marini, B., Benedetti, F., Giudici, F., Mauro, E., Storici, P., Masciovecchio, C.,
Angeletti, S., Ciccozzi, M., Gallo, R.C., Zella, D., Ippodrino, R. (2020). Emerging
SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase
variant. Journal of Translational Medicine, 18(1), 179.
Pan, L., Mu, M., Yang, P., (2020). Clinical characteristics of COVID-19 patients with
digestive symptoms in Hubei, China: A descriptive, crosssectional, multicenter study.
American Journal of Gastroenterology, 115(5), 766–773.
Petersen, E., Koopmans, M., Go, U., Hamer, D.H., Petrosillo, N., Castelli, F., Storgaard, M.,
Al Khalili, S., Simonsen, L. (2020). Comparing SARS-CoV-2 with SARS-CoV and
influenza pandemics. The Lancet. Infectious Diseases, S1473-3099(20)30484-9.
Phipps, W.S., SoRelle, J.A., Li, Q.Z., Mahimainathan, L., Araj, E., Markantonis, J., Lacelle, C.,
Balani, J., Parikh, H., Solow, E.B., Karp, D.R., Sarode, R., Muthukumar, A. (2020).
SARS-CoV-2 antibody responses do not predict COVID-19 disease severity. American
Journal of Clinical Pathology, 154(4), 459–465.
Poyiadji, N., Shahin, G., Noujaim, D., Stone, M., Patel, S., Griffith, B. (2020). COVID-19-
associated acute hemorrhagic necrotising encephalopathy: C.T. and M.R.I. features
[Online]. Available at: https://doi.org/10.1148/radiol.2020201187.
Prajapati, S., Sharma, M., Kumar, A., Gupta, P., Narasimha Kumar, G.V. (2020). An update
on novel COVID-19 pandemic: A battle between humans and virus. European Review for
Medical and Pharmacological Sciences, 24(10), 5819–5829.
Rajendran, D.K., Rajagopal, V., Alagumanian, S., Santhosh Kumar, T., Sathiya Prabhakaran,
S.P., Kasilingam, D. (2020). Systematic literature review on novel corona virus
SARS-CoV-2: A threat to human era. Virusdisease, 31(2), 161–173.
Rota, P.A., Oberste, M.S., Monroe, S.S., Nix, W.A., Campagnoli, R., Icenogle, J.P.,
Penaranda, S., Bankamp, B., Maher, K., Chen, M.H., Tong, S., Tamin, A. (2003).
Characterization of a novel coronavirus associated with severe acute respiratory
syndrome. Science, 1394–1399.
Ruch, T.R. and Machamer, C.E. (2012). The coronavirus e protein: Assembly and beyond.
Viruses, 4(3), 363–382.
Ryan, N.E. and El Ayadi, A.M. (2020). A call for a gender-responsive, intersectional
approach to address COVID-19. Global Public Health, 1–9.
Salazar de Pablo, G., Vaquerizo-Serrano, J., Catalan, A., Arango, C., Moreno, C., Ferre, F.,
Shin, J.I., Sullivan, S., Brondino, N., Solmi, M., Fusar-Poli, P. (2020). Impact of
coronavirus syndromes on physical and mental health of health care workers: Systematic
review and meta-analysis. Journal of Affective Disorders, 275, 48–57.
Schoeman, D. and Fielding, B.C. (2019). Coronavirus envelope protein: Current knowledge.
Virology Journal, 16(1), 69.
Schwartz J.L. (2018). The spanish flu, epidemics, and the turn to biomedical responses.
American Journal of Public Health, 108(11), 1455–1458.
132 Applied Modeling Techniques and Data Analysis 2
Shereen, M.A., Khan, S., Kazmi, A., Bashir, N., Siddique, R. (2020). COVID-19 infection:
Origin, transmission, and characteristics of human coronaviruses. Journal of Advanced
Research, 24, 91–98.
Shyu, D., Dorroh, J., Holtmeyer, C., Ritter, D., Upendran, A., Kannan, R., Dandachi, D.,
Rojas-Moreno, C., Whitt, S.P., Regunath, H. (2020). Laboratory tests for COVID-19: A
review of peer-reviewed publications and implications for clinical uIse. Missouri
Medicine, 117(3), 184–195.
Spiteri, G., Fielding, J., Diercke, M., Campese, C., Enouf, V., Gaymard, A., Bella, A.,
Sognamiglio, P., Sierra Moros, M.J., Riutort, A.N., Demina, Y.V., Mahieu, R.,
Broas, M., Bengnér, M., Buda, S., Schilling, J., Filleul, L., Lepoutre, A., Saura, C.,
Mailles, A., Ciancio, B.C. (2020). First cases of coronavirus disease 2019 (COVID-19) in
the WHO European region, 24 January to 21 February 2020. Euro Surveillance: Bulletin
Europeen sur les maladies transmissibles = European communicable disease bulletin,
25(9), 2000178.
Sui, J., Deming, M., Rockx, B., Liddington, R.C., Zhu, Q.K., Baric, R.S., Marasco, W.A.
(2014). Effects of human anti-spike protein receptor binding domain antibodies on severe
acute respiratory syndrome coronavirus neutralisation escape and fitness. Journal of
Virology, 88(23), 13769–13780.
Tabata, S., Imai, K., Kawano, S., Ikeda, M., Kodama, T., Miyoshi, K., Obinata, H., Mimura,
S., Kodera, T., Kitagaki, M., Sato, M., Suzuki, S., Ito, T., Uwabe, Y., Tamura, K. (2020).
Clinical characteristics of COVID-19 in 104 people with SARS-CoV-2 infection on the
diamond princess cruise ship: A retrospective analysis. The Lancet. Infectious Diseases,
S1473-3099(20)30482-5.
Tort, F.L., Castells, M., Cristina, J. (2020). A comprehensive analysis of genome composition
and codon usage patterns of emerging coronaviruses. Virus Research, 283, 197976.
Wang, D., Hu, B., Hu, C. (2020a). Clinical characteristics of 138 hospitalised patients with
2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA, 323(11), 1061–1069.
Wang, W., Chen, Y., Wang, Q., Cai, P., He, Y., Hu, S. (2020b). The transmission dynamics
of SARSCOV-2 in China: Modeling study and the impact of public health interventions
[Online]. Available at: 10.1101/2020.03.24.20036285.
Wege, H., Siddell, S., ter Meulen, V. (1982). The biology and pathogenesis of coronaviruses.
In Current Topics in Microbiology and Immunology, Cooper, M., Henle, W.,
Hofschneider, P.H., Koprowski, H., Melchers, F., Rott, R., Schweiger, H.G., Vogt,
P.K., Zinkernagel, R. (eds). Springer, Berlin, Heidelberg.
Weiss, S.R. and Navas-Martin, S. (2005). Coronavirus pathogenesis and the emerging
pathogen severe acute respiratory syndrome coronavirus. American Society for
Microbiology Journals, 69(4), 635–664.
Wu, C. (2020). Risk factors associated with acute respiratory distress syndrome and death in
patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Internal
Medicine, 180(7), 934–943.
Some Remarks on the Coronavirus Pandemic in Europe 133
Wu, F., Zhao, S., Yu, B., Chen, Y.M., Wang, W., Song, Z.G., (2020). A new coronavirus
associated with human respiratory disease in China. Nature, 579, 265–269.
Xu, X., Chen, P., Wang, J., Feng, J., Zhou, H., Li, X, (2020). Evolution of the novel
coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk
of human transmission. Science China Life Sciences, 63, 457–460.
Ye, Q., Wang, B., Zhang, T., Xu, J., Shang, S. (2020). The mechanism and treatment of
gastrointestinal symptoms in patients with COVID-19. American Journal of Physiology.
Gastrointestinal and Liver Physiology, 319(2), G245-G252.
Zaki, A.M., van Boheemen, S., Bestebroer, T.M., Osterhaus, A.D., Fouchier, R.A. (2012).
Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. New
England Journal of Medicine, 8, 1814–1820.
Zhang, W., Zhao, Y., Zhang, F., (2020). The use of anti-inflammatory drugs in the treatment
of people with severe coronavirus disease 2019 (COVID19): The experience of clinical
immunologists from China. Clinical Immunology, 214, 108393–108393.
Zhao, S., Musa, S.S., Lin, Q., Ran, J., Yang, G., Wang, W., Lou, Y., Yang, L., Gao, D.,
He, D., Wang, M.H. (2020). Estimating the unreported number of novel coronavirus
(2019-nCoV) cases in China in the first half of January 2020: A data-driven modelling
analysis of the early outbreak. Journal of Clinical Medicine, 9(2), 388.
Zheng, Y.Y., Ma, Y.T., Zhang, J.Y. (2020). COVID-19 and the cardiovascular system.
Nature Reviews Cardiology, 17, 259–260.
Zhou, P., Yang, XL., Wang, X.G., Hu, B., Zhang, L., Zhang, W., Si, H.R., Zhu, Y., Li, B.,
Huang, C.L., Chen, H.D., Chen, J., Luo, Y., Guo, H., Jiang, R.D., Liu, M.Q., Chen, Y.,
Shen, X.R., Wang, X., Zheng, X.S., Zhao, K., Chen, Q.J., Deng, F., Liu, L.L., Yan, B.,
Zhan, F.X., Wang, Y.Y., Xiao, G.F., Shi, Z.L. (2020). A pneumonia outbreak associated
with a new coronavirus of probable bat origin. Nature, 579, 270–273.
PART 2
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
10
10.1. Introduction
Γ(α+ ) αj −1
D
fDir (x; α) = D xj , [10.3]
j=1 Γ(αj ) j=1
D
where α+ = r=1 αr . The first two order moments of this distribution are:
αj
E [Xj ] = [10.4]
α+
E [Xj ] (1 − E [Xj ])
Var (Xj ) = [10.5]
α+ + 1
E [Xj ] E [Xl ]
Cov (Xj , Xl ) = − , j = l. [10.6]
α+ + 1
It is easy to see that, once the mean vector of X is chosen, only one parameter,
namely α+ , is devoted to modeling the entire covariance matrix. In particular, if two
elements of the composition have the same expected value, then they also have the
same variance. Furthermore, covariances are strictly proportional to the product of the
expectations of the corresponding elements. This poor parameterization does not allow
for either positive covariances or multimodality, which can be an important limitation.
Even though the unit-sum constraint naturally induces a negative dependence among
The Double Flexible Dirichlet 139
Yj = Wj + U · Zj , j = 1, . . . , D, [10.7]
D
FD(x; α, τ, p) = pr Dir(x; α + τ er ), [10.8]
r=1
where er is the vector with elements equal to zero except for the r-th that is equal to 1.
It is well known that each mixture component can be thought of as a sub-population
(cluster) in the population (Frühwirth-Schnatter 2006); therefore, the FD can include a
number k ≤ D of different modes, one for each cluster, even if its components do not
allow for multimodality. The FD also includes the Dirichlet as a special case if τ = 1
and pr = αr /α+ , r = 1, . . . , D. The rich parameterization of this distribution allows
for a flexible modelization of the covariance matrix of a composition, overcoming the
drawbacks highlighted in section 10.1, even if covariances are still always negative.
140 Applied Modeling Techniques and Data Analysis 2
In this chapter, we extend the FD distribution to obtain an even more flexible cluster
structure and a modelization of the covariance matrix allowing for positive linear
dependence.
The vector p can be thought of as the row sum (or column sum) of a symmetric
matrix P whose generic element is prh . Then, the basis Y is said to have a double
flexible gamma (DFG) distribution with parameters α, τ and P. The first two order
moments of this distribution are:
E [Yj ] = αj + 2τ pj· [10.10]
Var (Yj ) = αj + 2τ pj· + 2τ pj· − 2p2j· + pjj
2
[10.11]
Cov (Yj , Yl ) = 2τ 2 (pjl − 2pj· pl· ), j = l [10.12]
where pj· = D l=1 pjl . Unlike the bases characterizing both the Dirichlet and the FD
distributions, the DFG allows for positively correlated elements, indeed:
pj· pl· 1
Cov (Yj , Yl ) ≥ 0 ⇐⇒ ≤ . [10.13]
pjl 2
D
D
DFD(x; α, τ, P) = prh Dir(x; α + τ (er + eh )). [10.15]
r=1 h=1
The Double Flexible Dirichlet 141
From equation [10.16] and Figure 10.1, it is easy to see that the parameter τ
regulates the distance between each cluster barycenter and ᾱ: increasing τ , we obtain
cluster barycenters closer to the simplex boundary. While the above structure is
somewhat rigid, it is similar to that of the FD distribution (Migliorati et al. 2017)
142 Applied Modeling Techniques and Data Analysis 2
allowing for more clusters. Furthermore, thanks to the fact that some prh can be equal
to 0, this model allows for a variety of cluster configurations that cannot be obtained
with the FD model. For example, in Figure 10.2, it is possible to see some cluster
configurations that cannot be reached by simpler models. Please note that joining the
cluster means in these two panels produces a diamond and an inverse triangle shape,
respectively.
x3 x3
x1 x2 x1 x2
Figure 10.1. DFD cluster means structure. α = (5, 13, 5) . Left: τ = 15,
right: τ = 5. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
x3 x3
x1 x2 x1 x2
Figure 10.2. DFD cluster means with α = (5, 5, 5) and τ = 10.
Left: p11 = p22 = 0. Right: p11 = p22 = p33 = 0. For a color version of
this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
The Double Flexible Dirichlet 143
Thanks to the mixture representation, computing the first two order moments of
the DFD distribution is easy:
αj + 2τ pj·
E [Xj ] = [10.17]
α+ + 2τ
E [Xj ] (1 − E [Xj ]) 2τ 2 [pj· (1 − 2pj· ) + pjj ]
Var (Xj ) = +
+ + [10.18]
α + 2τ + 1 (α + 2τ + 1)(α+ + 2τ )
E [Xj ] E [Xl ] 2τ 2 (pjl − 2pj· pl· )
Cov (Xj , Xl ) = − + j = l. [10.19]
α+ + 2τ + 1 (α+ + 2τ + 1)(α+ + 2τ )
The first element is always negative and it is due to the closure of a gamma-related
basis, whereas the second term is exactly the covariance of the corresponding basis
elements divided by a constant. Since this last term can assume both positive and
negative values according to the difference (pjl − 2pj· pl· ), it influences the negative
linear dependence which is typical of the Dirichlet. In particular, thanks to this new
term, the covariance among two components can assume values greater than zero,
allowing for positive dependence.
Even if the analytical expression for Pearson’s correlation coefficient ρjl of two
arbitrary components Xj and Xl (j, l = 1, . . . , D, j = l) of X is hardly tractable, it is
easy to show that it may take high positive values.
E XAMPLE 10.1.– Let us consider the matrix P satisfying the following constraints:
⎧
⎪ 1
⎨pjl = plj = 4
pj· = pl· = 14 . [10.21]
⎪
⎩
pjj = pll = 0
144 Applied Modeling Techniques and Data Analysis 2
In the previous section, the DFD distribution has been introduced and some
theoretical properties have been listed. In this section, the interest is in providing an
estimation procedure for the parameters α, τ and P. To this end, it is useful to define
a cluster-code matrix:
D
where e(k) = (er + eh ) · I(crh = k), π = (π1 , . . . , πD∗ ) and
r=1 h≥r
pkk if k = 1, . . . , D
πk = .
2p{rh : crh =k} if k = D + 1, . . . , D∗
This new notation makes the definition of the cluster barycenters easier:
α+ τ
μk = ᾱ + e(k), k = 1, . . . , D∗ . [10.22]
α+ + 2 τ α+ + 2 τ
where zik is a component indicator that is equal to 1 if observation xi has arisen from
cluster k.
It is well known that the EM algorithm is not robust with respect to the choice
of the initial values (Diebolt and Ip 1996; Biernacki et al. 2003; O’Hagan et al.
2012). For this reason, an ad hoc initialization procedure has been implemented.
It requires a partition of the sample x = (x1 , . . . , xn ) into D∗ groups and
thus a clustering method. A hierarchical clustering based on the Aitchison metric
(Pawlowsky-Glahn and Egozcue 2002) and four k-means algorithms based on
different transformations of the compositions have been compared. An exploratory
simulation study has highlighted that the k-means algorithm based on the entire
untransformed compositions works better in most parameter configurations. Although
in the DFD context there exists a clear cluster structure, the k-means algorithm (as any
clustering method) labels clusters in a random way. Thus, a labeling scheme has been
ad hoc constructed to assign the “correct” label to each cluster. Suppose, without loss
of generality, that D = 3 so that D∗ = 6. Remembering that the component-specific
distribution is Dir(α + τ e(k)), the mean vector for each cluster can be expressed as
in Table 10.1.
146 Applied Modeling Techniques and Data Analysis 2
Component
Cluster k μk1 μk2 μk3
α1 + 2τ α2 α3
1
α+ + 2τ α+ + 2τ α+ + 2τ
α1 α2 + 2τ α3
2
α+ + 2τ α+ + 2τ α+ + 2τ
α1 α2 α3 + 2τ
3
α+ + 2τ α+ + 2τ α+ + 2τ
α1 + τ α2 + τ α3
4
α+ + 2τ α+ + 2τ α+ + 2τ
α1 + τ α2 α3 + τ
5
α+ + 2τ α+ + 2τ α+ + 2τ
α1 α2 + τ α3 + τ
6
α+ + 2τ α+ + 2τ α+ + 2τ
Table 10.1. Mean vectors stratified by cluster. μkj refers
to the j -th element of μk . For a color version of this figure,
see www.iste.co.uk/dimotikalis/analysis2.zip
It is easy to note that the highest value of μkj (the j-th element of μk ) is reached
when k = j, as we can see from the red fractions in Table 10.1. If we estimate each μkj
with the cluster sample mean x̄kj , we can label the cluster associated with the greatest
x̄kj (j = 1, . . . , D) as cluster j. To label the remaining clusters, let us consider the set
of indices Ur :
Then, the cluster that maximizes μ·r and μ·h is labeled as k = crh . If multiple label
schemes occur, the estimation procedure is applied to every single label permutation
compatible with the observed structure. Given a data partition obtained with the above
method, an initialization for π is the percentage of data points allocated to each cluster:
(0) (0)
π(0) = π1 , . . . , πD∗ ,
(0) n
where πk = n1 i=1 ẑik and ẑik is the sample version of zik , i.e. it is an indicator
that is equal to 1 if observation xi has been allocated to cluster k.
r∈ Uj
2) Given that
2τ
=
α+ + 2τ
⎧
⎪
⎪ αr + 2τ αr
⎨ + − +
, if r = 1, . . . , D
α + 2τ α + 2τ
⎪
⎪ αl + τ αl αw + τ αw
⎩ + − +
+ +
− +
, if r = D + 1, . . . , D∗
α + 2τ α + 2τ α + 2τ α + 2τ
where l and w are two indices such that clw = r or cwl = r, then the initialization
2τ
is the weighted mean of the D∗ quantities:
α+ + 2τ
⎧
⎪
⎪ αr
⎪
⎨x̄rr − , if r = 1, . . . , D
+
α + 2τ
⎪
⎪ αl αw
⎪
⎩x̄rl − + x̄rw − , if r = D + 1, . . . , D∗
+
α + 2τ +
α + 2τ
D
1− x̄2rj
j=1
− 1, r = 1, . . . , D∗ .
D
s2rj
j=1
We can use π (0) as weights in steps 2 and 3. Table 10.2 reports the means of
500 initializations for α and τ . These initializations have been obtained with samples
of size 300 generated from a subset of the parameter configurations reported in
Table 10.3.
148 Applied Modeling Techniques and Data Analysis 2
α1 α2 α3 τ α1 α2 α3 τ
True 10 10 10 10 True 100 40 40 15
Init. 9.427 9.286 9.526 9.703 Init. 92.450 37.019 37.261 13.209
True 10 10 10 40 True 10 100 14 8
Init. 9.706 9.397 9.874 37.745 Init. 10.034 98.212 13.570 7.800
True 2 23 12 17 True 12 0.900 30 20
Init. 1.888 20.615 10.861 15.836 Init. 12.478 0.892 32.036 21.476
ID α1 α2 α3 τ π1 π2 π3 π4 π5 π6
1 10 10 10 15 0.11 0.11 0.11 0.22 0.22 0.22
2 10 10 10 40 0.11 0.11 0.11 0.22 0.22 0.22
3 2 23 12 17 0.08 0.16 0.18 0.10 0.40 0.08
4 40 20 30 25 0.00 0.16 0.26 0.40 0.00 0.18
5 40 20 30 50 0.00 0.16 0.26 0.40 0.00 0.18
6 100 40 40 15 0.22 0.17 0.15 0.15 0.10 0.20
7 40 20 30 18 0.00 0.00 0.00 0.30 0.19 0.51
8 10 100 14 8 0.10 0.15 0.15 0.10 0.40 0.10
9 12 0.90 30 20 0.08 0.16 0.18 0.10 0.40 0.08
CEM+EM and SEM+EM consist of initializing the CEM/SEM with the proposed
procedure and then use their results as initial values for the standard EM. In this way,
we give the EM algorithm a chance to move away from a path of convergence to a
local maximizer. Table 10.4 reports the proportion of simulations (column “%”) where
each method provided the highest log-likelihood and the mean of the log-likelihoods
evaluated at the obtained final estimates. From these results, one can conclude that
the SEM + EM combination is the one providing the best values in most cases. The
presence of an EM step is fundamental: the CEM and the SEM are not able to find
the global maximizer by themselves (look at columns “%” for the CEM and SEM
methods).
EM CEM SEM CEM+EM SEM+EM
ID % Mean l̂ % Mean l̂ % Mean ˆ
l % Mean l̂ % Mean l̂
1 0.287 128.6434 0 104.5987 0 104.5987 0.330 128.6434 0.383 129.5926
2 0.005 191.0407 0 189.1948 0 189.1948 0.000 191.0407 0.995 191.0408
3 0.285 169.1768 0 166.2929 0 166.2929 0.278 169.1768 0.437 169.1559
4 0.280 220.7419 0 211.3169 0 211.3169 0.330 220.7419 0.390 220.7137
5 0.018 261.2633 0 251.4802 0 251.4802 0.028 261.2633 0.953 261.2633
6 0.358 307.6959 0 279.7145 0 279.7145 0.325 307.6959 0.317 307.6959
7 0.337 216.5530 0 184.9596 0 184.9596 0.447 218.6098 0.217 216.2881
8 0.318 344.0424 0 305.5102 0 305.5102 0.330 344.0424 0.352 344.0383
9 0.295 260.0346 0 258.3536 0 258.3536 0.270 260.0346 0.435 260.0347
The second simulation study regards the evaluation of the performance of the
final estimation procedure, composed of the initialization followed by the SEM+EM
algorithm. For each configuration reported in Table 10.3, 1000 samples of size
n = 150 have been generated. For each of them, the parameters of the DFD
150 Applied Modeling Techniques and Data Analysis 2
model have been estimated according to the initialization and estimation procedures
described in section 10.3.1. Table 10.5 reports the results of the simulation for two
particular scenarios (third and fourth ID, Figure 10.3). This table contains the true
value of the parameters, the mean of the 1000 estimates for each parameter and the
absolute relative bias (ARB, defined as the mean of the absolute differences between
the true parameter and its estimate, divided by the true value of the parameter).
Finally, we reported two quantities: the first one is the standard deviation of the 1000
estimates for each parameter, which can be thought of as the bootstrap approximation
of the standard error (SE) of the estimator and, therefore, it has been called “Boot.
SE”. The last quantity is the coverage of the approximated 95% confidence intervals
(CI, computed as θ̂ ± z.975 · SEBoot ), which is the percentage of times that the
approximated 95% CI contains the true value of the parameter. In general, it is reported
that estimating parameters of a finite mixture model through the EM algorithm can
encounter several issues, particularly when the sample size is small (McLachlan and
Peel 2004; Frühwirth-Schnatter 2006). In the considered simulations, the relatively
small sample size (fixed and equal to n = 150) seems to be large enough to produce
very good results. In most of the scenarios, the coverage level of approximated
confidence intervals is very close to the 95% nominal one.
X3 X3
** **
**
* ****** **
* ************
1142
* ********
10
*
* *****8** ***************************** *
10
20
* **6*
* ****
* ** ****** *15 *
4 * **** * * *
****2 8 ** * * **5** ** *
4
** * **** *******
* ****** ********
1
********************* ***************
** * **** 10***
* ********
14
10
** ***
* * ***1*6**** **** * * *** **** **********
15
* * *6*****
18
* * * *10 * ** * 5** *
6* *
*
* *****2* * * **4 *** *10 ** * *
** *** * ** **
* * *** *********** * ** *** *10 * **
** *
* * ****************************** ****************
20
25
* * **********
* 4** * ** 4
10
* * * * * * * ** ** * * ** 6 ** *** * *
** ** * **1* ** * * * *5 * **
* 5* *
** *** * * * * ****************** * * * ** 5 * * * *
8
* *
** 2 ** ** **2* * * * * 4* *****2
6
X1 X2 X1 X2
Figure 10.3. Ternary diagrams with isodensity contour plot of the true
density function of scenarios ID = 3 (left) and ID = 4 (right). For a color
version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
As remarked in section 10.2.1, setting some null mixing weights could lead to
a very interesting and particular configuration of clusters’ barycenters. For example,
scenario 4 is characterized by two weights equal to zero (π1 = π5 = 0) and joining
the clusters’ means produces an oblique and rotated “L”. Scenario 3 has clusters very
close to one of the edges of the simplex; this means that many observations have at
least one component close to 0. This can be a problem in compositional data analysis
The Double Flexible Dirichlet 151
Since the simulation study presented here was aimed only at evaluating the
performances of the estimation procedure, future works will compare the DFD
distribution with other popular simplex distributions in terms of fit to real and
simulated data.
10.4. References
Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press,
London.
Barndorff-Nielsen, O.E. and Jørgensen, B. (1991). Some parametric models on the simplex.
Journal of Multivariate Analysis, 39(1), 106–116.
Biernacki, C., Celeux, G., Govaert, G. (2003). Choosing starting values for the EM algorithm
for getting the highest likelihood in multivariate Gaussian mixture models. Computational
Statistics & Data Analysis, 41, 561–575.
Celeux, G. and Govaert, G. (1992). A classification EM algorithm for clustering and
two stochastic versions. Computational Statistics & Data Analysis – Special Issue on
Optimization Techniques in Statistics, 14(3), 315–332.
Celeux, G., Chauveau, D., Diebolt, J. (1995). On stochastic versions of the EM algorithm on
stochastic versions of the EM algorithm. Technical report, INRIA.
152 Applied Modeling Techniques and Data Analysis 2
Connor, R. and Mosimann, J.E. (1969). Concepts of independence for proportions with a
generalization of the Dirichlet distribution. Journal of the American Statistical Association,
64(325), 194–206.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, Series B: Methodological,
39(1), 1–38.
Diebolt, J. and Ip, E.H.S. (1996). Stochastic EM: Method and application. Markov Chain Monte
Carlo in Practice, 259–273.
Favaro, S., Hadjicharalambous, G., Prünster, I. (2011). On a class of distributions on the
simplex. Journal of Statistical Planning and Inference, 141(9), 2987–3004.
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer,
New York.
Gupta, R.D. and Richards, D.St.P. (1987). Multivariate Liouville distributions. Journal of
Multivariate Analysis, 23, 233–256.
McLachlan, G. and Peel, D. (2004). Finite Mixture Models. John Wiley & Sons, New York.
Migliorati, S., Ongaro, A., Monti, G.S. (2017). A structured Dirichlet mixture model for
compositional data: Inferential and applicative issues. Statistics and Computing, 27(4),
963–983.
O’Hagan, A., Murphy, T.B., Gormley, I.C. (2012). Computational aspects of fitting mixture
models via the expectation–maximization algorithm. Computational Statistics and Data
Analysis, 56(12), 3843–3864.
Ongaro, A. and Migliorati, S. (2013). A generalization of the dirichlet distribution. Journal of
Multivariate Analysis, 114(1), 412–426.
Pawlowsky-Glahn, V. and Egozcue, J.J. (2002). BLU estimators and compositional data.
Mathematical Geology, 34(3), 259–274.
11
Quantization of Transformed
Lévy Measures
11.1. Introduction
Recently, there has been a sharp rise of interest in the study of Lévy processes. This
is because their applications are far-reaching. These processes have been applied in
various fields of research which include telecommunications, quantum theory, extreme
value theory, insurance and finance. Lévy processes can be defined as stochastic
processes that are stochastically continuous, with increments that are independent and
stationary. Moreover, it is possible to find a version of such a process that is almost
surely right continuous with left limits.
The relation between Lévy processes and the family of infinitely divisible
distributions has been studied extensively and has been well documented. In fact,
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
154 Applied Modeling Techniques and Data Analysis 2
where γ is the drift term and G is a non-decreasing function of bounded variation such
that G(−∞) = 0 and G(∞) < ∞. Furthermore, we have that
1 + |u|2−|u| −t2
β
itu
lim exp(itu) − 1 − = . [11.2]
u→0 1 + |u|2−|u| β
|u|2−|u| β
2
The parameter γ and the function G together completely determine the infinitely
divisible distribution, and thus, they allow us to identify a Lévy process. This
representation is similar to the so-called Lévy–Khintchine canonical representation
in which ϕ(t) can be expressed as follows:
itu 1 + u2
ϕ(t) = exp iδt + exp(itu) − 1 − dH(u) .
R 1 + u2 u2
[11.3]
Like G, the function H is also non-decreasing and with bounded variation such
that H(−∞) = 0 and H(∞) < ∞. Sant and Caruana (2017) also discuss the
relation between the functions H, G and the Lévy measure which features in the
Lévy–Khintchine representation and is usually denoted by v(.). In this chapter, we
will primarily focus our efforts on the estimation of the measure associated with
the function G and which is defined in [11.1]. In particular, we will assume that
G is continuous except at the origin. At this point, the function G and also the
function H both experience a jump. This jump is caused by the Brownian motion
component. The literature related to the parameter estimation of Lévy processes
is primarily divided into two approaches: the parametric and the non-parametric.
Quantization of Transformed Lévy Measures 155
where Xij represents the j th increment within the ith time interval. Other authors,
which include Basawa and Brockwell (1982) and Gegler and Stadmuller (2010), also
proposed estimators for the function H. The former only considered non-decreasing
Lévy processes, and proposed three estimators for the function H. The second of
which is identical to [11.4]. Moreover, these authors show that this estimator enjoys
asymptotic normality. Gegler and Stadmuller (2010) apply the estimator of Rubin and
Tucker only over the jump part of a Lévy process. As a result, their estimator cannot
be defined over intervals close to and including 0.
Sant and Caruana (2017) proposed an estimator for the function G which is defined
as follows:
N n
|Xij |2−|Xij |
β
1 1
Ĝ(u) = β 1{Xij ≤u} . [11.5]
N i=1 j=1 n
|Xij |β
2
1 + |Xij |2−|Xij |
The authors prove that [11.5] converges P-almost surely to G at all points of
continuity of the said function. Through a transformation of this estimator, which
is discussed in the said paper, an estimator for the function H was also obtained.
156 Applied Modeling Techniques and Data Analysis 2
Simulations revealed that this proposed estimator converged faster than the Rubin and
Tucker estimator.
The non-parametric estimators discussed above all make use of discrete measures.
However, they all have a significant disadvantage in that the points Xij and the
corresponding masses are not chosen in an optimal way. Indeed, the points Xij simply
correspond to the observed increments of an observed path of a Lévy Process. Hence,
the main aim of this chapter is to propose an estimator of the measure Γ that can be
defined in terms of G as follows:
where a, b ∈ R. This estimator will make use of the theory of discrete measures where
the masses as well as their position are such that they minimize the objective function
of a stochastic program which is described in the next section.
The rest of this chapter is organized as follows: in section 11.2, we will introduce
the estimation strategy; in section 11.3, we will discuss some of the statistical
properties of the estimator of Γ; section 11.4 presents some simulation results; and
finally, section 11.5 contains some concluding remarks.
Let Ω be the set which contains all the possible paths of a Lévy process (Lt )t≥0 .
Given a specific path ω ∈ Ω, let {Xij (ω)}1≤i≤N,1≤j≤n denote a set of nN increments
obtained from ω. As before, N denotes the number of time intervals, while n
denotes the number of increments within each time interval. This double indexing
of the increments is normal within the so-called high-frequency setting. Using these
increments, we estimate the previously defined measure Γ that is associated with the
distribution G. The estimator of Γ, which we denote by Γ̂, is a random measure
supported on a finite number of points, and is of the form
P
Γ̂ = m̂k 1yˆk , [11.7]
k=1
where P ≤ nN and m̂k are the estimates of the masses mk associated with the points
yk . The estimates of the latter are denoted by ŷk .
Since we have assumed that G/c has a finite second moment, then the optimal
points yk and their associated masses mk , can be found by solving the following:
P P
mk g(x)
inf W 1 yk , dx m1 , . . . , mP ≥ 0, mk = c , [11.8]
c c
k=1 k=1
From standard theory found in Graf and Luschgy (2000), Iacobelli (2015) and
Caglioti et al. (2016), it was shown that the objective function in [11.8] can be written
as follows:
P P
mk g(x)
inf W 1 yk , dx m1 , . . . , mP ≥ 0, mk = c
c c
k=1 k=1
In [11.9], we note that the right-hand side (RHS), i.e. E min |yk − x| , is
1≤k≤P
indeed a stochastic program as discussed in various sources which include among
others (Shinji 1962; Shapiro et al. 2009; Sueishi and Nishiama 2005). In stochastic
programming, there are two main reformulations: the wait-and-see and the here-
and-now. The stochastic program just defined in [11.9] belongs to the here-and-now
reformulation. In the context of our problem, this stochastic program may be
re-written as follows:
∞
E min |yi − x| = 1c min |yk − x|g(x)dx. [11.10]
1≤k≤P −∞ 1≤k≤P
Moreover, in Theorem 7.5 in Graf and Luschgy (2000), it was shown that the
optimal set of P points yk that minimize the RHS of [11.9] has the property that as
158 Applied Modeling Techniques and Data Analysis 2
∞
In this case, r has to satisfy the property that −∞ |z|r+1 g(z)
c dz < ∞. Since we
have assumed that G/c has finite second moment, we can take r = 1. By considering
the RHS of [11.9], it can be shown from Iacobelli [9, pages 43 and 77], that
∞ P yuk
1 1
c min |yk − x|g(x)dx = c |yk − x|g(x)dx, [11.12]
∞ 1≤k≤P k=1 y
lk
where yl1 = −∞, yu1 = yl,2 , yuP = ∞, yuk = yk +y2 k+1 and ylk = yk−12+yk .
From Iacobelli (2015), the optimal points yk can be found through [11.12]. Through
differentiation, this problem boils down to solve the following system of P equations
involving P unknowns:
P yuk
∂ 1c |yk − x|g(x)dx
k=1 ylk
= 0, [11.13]
∂yk
for k = 1, . . . , P . This can be re-written as follows:
yk + yk−1 yk + yk+1
2G(yk ) − G −G = 0, [11.14]
2 2
for k = 1, . . . , P . Given the positions yk , the best choice of the masses mk that
minimize [11.8] is explicit and is discussed in Lemmas 3.1 and 3.4 in Graf and
Luschgy (2000). Adapting the results of these lemmas to the context of this chapter,
we find that the value of mk is given as follows:
yuk
mk = g(z)dz = G(yuk ) − G(ylk ). [11.15]
ylk
Furthermore, since we have assumed that G/c has a finite second moment, as it
was shown in Lemma 6.1 in Graf and Luschgy (2000) that given the points yk and the
corresponding weights mk , the following holds:
P
mk g(x)
lim W 1y , dx → 0, [11.16]
P →∞ c k c
k=1
Quantization of Transformed Lévy Measures 159
We start this section by considering the issue of estimating the optimal points yk .
In the previous section, we proposed to replace G by Ĝ. As a result, [11.14] can be
re-written as follows:
yk + yk−1 yk + yk+1
2Ĝ(yk ) − Ĝ − Ĝ = 0, [11.17]
2 2
for k = 1, . . . , P . Once the ŷk ’s are computed, it is easy to estimate the masses. This
can be done by replacing G, ylk and yuk by Ĝ, ŷlk and ŷuk , respectively, in [11.15].
Hence
converges almost surely to 0. This step is not trivial and is presented in Theorem
11.4. However, in order to prove this theorem, we use a number of results presented
in Theorems 11.1, 11.2 and 11.3. We start by defining the following discrete random
measure:
P
m∗k
H ∗ (S) = 1ŷk (S), [11.20]
j=1
c
where m∗k = G(ŷuk ) − G(ŷlk ) and S ⊆ R. This random measure will be used in
certain proofs below.
160 Applied Modeling Techniques and Data Analysis 2
We observe that (3.3) in [11.21] has already been discussed in [11.16]. Hence,
we proceed to consider (1.1) and (3.3) in [11.21]. However, before we consider these
expressions, we present Theorem 11.1 below. This result will be frequently used in
the following pages.
T HEOREM 11.1.– If ŷuk , ŷlk are continuity points of G for 1 ≤ k ≤ P , then |m̂k −m∗k |
converges almost surely to 0 as n, N → ∞.
From Sant and Caruana (2017), we know that both Ĝ(ŷuk ) − G(ŷuk ) and
Ĝ(ŷlk ) − G(ŷlk ) converge almost surely to 0 since we know that ŷuk , ŷlk are
continuity points of G.
Before we state Theorem 11.2, we recall that the Wasserstein distance between
two discrete measures H ∗ and Ĥ, both of which are defined on the atoms ŷ1 , . . . , ŷP ,
can be expressed as follows:
where q is the space of all possible matrices which satisfy the following conditions:
m̂j mi
qij = for each j, and qij = for each i,
i
c j
c
Quantization of Transformed Lévy Measures 161
where qij ∈ [0, 1] × [0, 1]. For further details, we refer the interested reader to Nguyen
(2011).
T HEOREM 11.2.–
Given that P ≤ nN observations, if ŷk is a point of continuity of
P
m̂k
P
mk∗
G, then, W c 1ŷk , c 1ŷk converges almost surely to 0 as n, N → ∞.
j=1 i=1
P ROOF.– We show that this result is true for the case when we have two masses and
the case when we have three masses. We then move to the general case when we have
P masses. The general case follows in a similar way to the previous two cases.
2
m̂k
2
m∗
In this case, we have c 1ŷk , c 1ŷk and
k
k=1 k=1
2 2
m̂k m∗k
W 1ŷ , 1ŷk = inf [(q12 + q21 ) |ŷ1 − ŷ2 |] . [11.23]
c k c q
k=1 k=1
where qij ∈ [0, 1]. Hence, to solve the above, we can use linear programming; thus, it
can be re-written as follows:
The optimal solution to this problem is |ŷ1 − ŷ2 | |m∗1 − m̂1 | /c. Hence,
2 2
m̂k m∗k |ŷ1 − ŷ2 |
W 1ŷk , 1ŷk = |m̂1 − m∗1 | . [11.24]
c c c
k=1 k=1
Moreover, we know, from Theorem 11.1 above, that |m̂1 − m∗1 | converges almost
surely to 0. Hence, the result follows.
162 Applied Modeling Techniques and Data Analysis 2
k=1 k=1
3 3
m̂k m∗k
W 1ŷk , 1ŷk = inf [(q12 + q21 ) |ŷ1 − ŷ2 | + (q13 + q31 ) |ŷ1 − ŷ3 |
c c q
k=1 k=1
+ (q32 + q23 ) ρm ] ,
where ρm = max {|ŷ1 − ŷ2 |, ŷ1 − ŷ3 |, |ŷ2 − ŷ3 |} ≥ 0. The above inequality holds
because qij ≥ 0 ∀i, j. Moreover, the above elements of q must satisfy the following
constraints:
q11 + q21 + q31 = m∗1 /c q11 + q12 + q13 = m̂1 /c
q12 + q22 + q32 = m∗2 /c and q21 + q22 + q23 = m̂2 /c
q13 + q23 + q33 = m∗3 /c q31 + q32 + q33 = m̂3 /c
Through the use of some algebraic manipulation, using the above six constraints,
it can be shown that
Moreover, q11 may be written as follows: q11 = λ1 (m̂1 /c) + (1 − λ1 )(m∗1 /c),
where λ1 = (cq11 − m∗1 )/(m̂1 − m∗1 ).
Similarly, q22 = λ2 (m̂2 /c) + (1 − λ2 )(m∗2 /c) and q33 = λ3 (m̂3 /c) + (1 −
λ3 )(m∗3 /c), for λ2 , λ3 ∈ R which can be defined in a way similar to λ1 . Using these
results, it follows that:
3 3
m̂k m∗k
W 1ŷk , 1ŷk
c c
k=1 k=1
2ρm ∗ ∗ ∗
≤ (λ1 |m̂1 − m1 | + λ2 |m̂2 − m2 | + λ3 |m̂3 − m3 |) . [11.25]
c
As before, using Theorem 11.1, we have that |m̂k − m∗k | converges almost surely
to 0. Hence, the result follows.
Quantization of Transformed Lévy Measures 163
Note that the above inequality holds because qij ≥ 0 ∀i, j. Moreover, the above
elements of q must satisfy the following constraints:
P
P
qij =m∗j /c for each j, and qij =m̂i /c for each i,
i=1 j=1
From Theorem 11.1, we know that |m̂k − m∗k | converges almost surely to 0 for all
P
k as n, N → ∞. Moreover, we have (|m̂k − m∗k |) converges almost surely to 0
P k=1
P
∗
as n, N → ∞. Hence, W m̂k 1ŷk , mk 1ŷk also converges almost surely to
k=1 k=1
0 as n, N → ∞.
P
P
∗
We next consider the term (2.2) in [11.21], i.e. W m̂k 1ŷk , mk 1ŷk .
k=1 k=1
In this case, we note that the position of the masses and the masses themselves
are different, unlike in the previous case. Nevertheless, the following expression still
holds:
P P
P P
m∗k mk
W 1ŷk , 1yk = inf qij |ŷi − yj |, [11.28]
c c q
k=1 k=1 i=1 j=1
164 Applied Modeling Techniques and Data Analysis 2
where 1 ≤ i, j ≤ P .
T HEOREM 11.3.– If ŷuk and ŷlk are the points of continuity of G, then we have that
P P
m∗k mk
W 1ŷk , 1 yk [11.29]
c c
k=1 k=1
converges almost surely to 0 as n, N → ∞.
[11.30]
whereas before ρm = max {|yi − yj | , 1 ≤ i, j ≤ P, i = j} ≥ 0.
Moreover,
|m∗k − mk | = |(G(ŷuk ) − G(ŷlk )) − (G(yuk ) − G(ylk ))|
≤ |G(yuk ) − G(ŷuk )| + |G(ylk ) − G(ŷlk )| ,
Since the derivative of G/c is a density function, then by definition, it must be
bounded and thus G is Lipchitz. This implies that the following inequalities hold:
|G(yuk ) − G(ŷuk )| ≤ γ |yuk − ŷuk |, and |G(ylk ) − G(ŷlk )| ≤ σ |yuk − ŷuk |,
for some finite constants γ and σ. Moreover, |ŷuk − yuk | and |ŷlk − ylk | both go
to zero almost surely as Ĝ goes to G almost surely
at continuity points ŷuk and
P
mk∗
P
mk
ŷlk . This implies that W c 1ŷk , c 1 yk converges almost surely to 0 as
k=1 k=1
n, N → ∞.
T HEOREM 11.4.– If ŷuk and ŷlk are the points of continuity of G, then we have that
P
m̂k g(x)
W 1ŷk , dx [11.31]
c c
k=1
converges almost surely to 0 as P, n, N → ∞.
P ROOF.– To prove this result, we just have to combine Theorem 11.2 and
Theorem 11.3 with [11.21] and [11.16].
Quantization of Transformed Lévy Measures 165
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8
u
For the gamma process, we chose the shape and scale parameters to be both equal
to 1. In both cases, the number N of unit time intervals was chosen to be equal to 10.
Moreover, in each interval, we had a total of 1000 observations.
In Figure 11.1, we compare Γ[0, u] and Γ̂[0, u]. We can observe that with just
31 atoms, Γ̂[0, u] is a good estimator of Γ[0, u]. We next consider the Cauchy
166 Applied Modeling Techniques and Data Analysis 2
process with location and scale parameters being equal to 1 and 0.05, respectively.
In this simulation, we took the same number of unit time intervals and the same
number of observations within each interval. In Figure 11.2, we compare Γ[−∞, u]
and Γ̂[−∞, u]. As in the previous simulation, we observe that with just 31 atoms,
Γ̂[−∞, u] is a good estimator of Γ[−∞, u].
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-200 -150 -100 -50 0 50 100 150
u
11.5. Conclusion
Simulations have shown that with just 31 points, good estimates were obtained for
the gamma process and for the Cauchy process. Moreover, the diagrams reveal that
the points are not evenly spaced out. Indeed, there are a few masses where the curve
is relatively flat. This is in sharp contrast with other areas. The same cannot be said
Quantization of Transformed Lévy Measures 167
about the other estimators, such as the Rubin and Tucker estimator discussed in Rubin
and Tucker (1959) and its variants discussed in Sant and Caruana (2017), where the
position of the masses simply coincided with the size of the increments obtained from
an observed Lévy process.
11.6. References
Applebaum, D. (2004). Lévy Processes and Stochastic Calculus. Cambridge University Press.
Basawa, I. and Brockwell, P. (1982). Non-parametric estimation for non-decreasing Lévy
processes. Journal of the Royal Statistical Society, Series B (Methodological), 44, 262–269.
Bertoin, J. (1996). Lévy Processes. Cambridge University Press.
Caglioti, E., Golse, F., Iacobelli, M. (2016). Quantization of measures and gradient flows:
A perturbative approach in the 2-dimensional case. HAL.
Chan, N.H., Chen, S., Peng, L., Yu, C.L. (2009). Empirical likelihood methods based on
characteristic functions with applications to Lévy processes. Journal of the Royal Statistical
Society. Series B (Methodological), 104(448), 1612–1630.
Gegler, A. and Stadmuller, U. (2010). Estimation of the characteristics of a Lévy process.
Journal of Statistical Planning and Inference, 140, 1481–1496.
Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions.
Springer-Verlag, Berlin, Heidelberg.
Heathcote, C. (1977). The integrated squared error estimation of parameters. Biometrika, 64,
255–264.
Iacobelli, M. (2015). Dynamics of large particle systems. PhD Thesis, University of Rome.
Kyprianou, A.E. (2006). Introductory Lectures on Fluctuations of Lévy Processes with
Applications. Springer, Berlin.
Nguyen, X. (2011). Wasserstein distance for discrete measures and convergence in
nonparametric mixture models. Technical Report 527, University of Michigan.
Rubin, H. and Tucker, H.G. (1959). Estimating the parameters of a differential process. Ann.
Math. Statist., 30, 641–658.
Sant, L. and Caruana, M.A. (2012). Products of characteristic functions in Lévy processes
parameter estimation. SMTDA Conference Proceedings, Crete.
Sant, L. and Caruana, M.A. (2015a). Estimation of Lévy processes through stochastic
programming. In Stochastic Modelling, Data Analysis and Statistical Applications, 1st
edition, Filus, L., Oliviera, T., Skiadas, C.H. (eds). ISAST.
Sant, L. and Caruana, M.A. (2015b). Incorporating the stochastic process setup in parameter
estimation. Methodology and Computing in Applied Probability, 17(4), 1029–1037.
Sant, L. and Caruana, M.A. (2017). Choosing tuning instruments for generalized Rubin-Tucker
Lévy measure estimators. 17th ASMDA Conference Proceedings, London.
Sato K. (1999). Lévy Processes and Infinitely Divisible Distribtuions. Cambridge University
Press.
Shapiro, A., Dentcheva, D., Ruszcynski, A. (2009). Lectures on Stochastic Programming.
MPS-SIAM, University City, Philadelphia.
168 Applied Modeling Techniques and Data Analysis 2
Compositional data are defined as vectors with strictly positive elements subject to
a unit-sum constraint. The aim of this contribution is to propose a regression model for
multivariate continuous variables with bounded support by taking into consideration
the flexible Dirichlet (FD) distribution that can be interpreted as a special mixture of
Dirichlet distributions. The FD distribution is an extension of the Dirichlet one, which
is contained as an inner point, and it enables a greater variety of density shapes in
terms of tail behavior, asymmetry and multimodality. A convenient parameterization
of the FD is provided which is variation independent and facilitates the interpretation
of the mean vector of each mixture component as a piecewise increasing linear
function of the overall mean vector. A multivariate logit strategy is adopted to regress
the vector of means, which is itself constrained to add up to one, onto a vector
of covariates. Intensive simulation studies are performed to evaluate the fit of the
proposed regression model, particularly in comparison with the Dirichlet regression
model. Inferential issues are dealt with by a (Bayesian) Hamiltonian Monte Carlo
algorithm.
12.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
170 Applied Modeling Techniques and Data Analysis 2
The rest of this chapter is organized as follows. Section 12.2 introduces the
Dirichlet and the FD distributions. Moreover, it shows a convenient parameterization
of the latter for regression purposes, and it describes the FDReg model for
compositional data. Section 12.3 provides details on a Bayesian approach to inference
suitable for the FDReg model. Section 12.4 illustrates several simulation studies that
have been performed to evaluate the behavior of the proposed regression model.
Finally, section 12.5 is devoted to our final comments.
1 αj −1
D
fD (y; α1 , . . . , αD ) = y , [12.1]
B(α) j=1 j
A Flexible Mixture Regression Model for Bounded Multivariate Responses 171
D D
where y ∈ S D , B(α) = j=1 Γ(αj )/Γ( j=1 αj ) and α1 , . . . , αD > 0. In a
regression perspective, the following mean-precision parameterization proves to be
convenient:
D
α+ = j=1 αj ,
α [12.2]
ᾱj = α+j j = 1, . . . , D,
D
with j=1 ᾱj = 1.
D
fF D (y; α1 , . . . , αD , p1 , . . . , pD , τ ) = ph fD (y, α + τ eh ), [12.3]
h=1
Let us define w = τ
α+ +τ , then the mean of the j−th component of vector Y is:
D
μj = E(Yj ) = ph λhj = ᾱj (1 − w) + pj w, [12.5]
h=1
172 Applied Modeling Techniques and Data Analysis 2
αj
where ᾱj = α+for j = 1, . . . , D. It is worth noting that
0 < ᾱj <1, from
μj
which, after some algebra, it follows that 0 < w < min 1, minj pj . Thus,
the normalized version of w, denoted by 0 < w∗ < 1, takes the form:
w
w∗ = .
μj
min 1, minj pj
Let us now focus on regression issues. To such an end, let us consider a response
vector yi = (yi1 , . . . , yiD ) on the simplex and a corresponding vector of covariates
xi = (xi0 , xi1 , . . . , xik ) for subject i = 1, . . . , n. Furthermore, let us assume that
Yi is FD distributed and that we aim to regress its mean vector onto covariates. Since
μi belongs to the simplex too, a GLM-type regression model (McCullagh and Nelder
1989) for the mean has to take into account the constraints of positivity and unit-sum.
In this regard, we take advantage of a multinomial logit strategy defining:
μij
log = xi β j , j = 1, . . . , D, [12.6]
μiD
n
L(η|y) = fF∗ D (yi ; β1 , . . . , βD−1 , α+ , p, w∗ ),
i=1
where fF∗ D (·) is the df of the FD distribution under the new parameterization
depending on the vector of unknown parameters η = (β 1 , . . . , β D−1 , α+ , p, w∗ ) .
With respect to the prior choice, we favor non- or weakly informative priors with the
purpose of inducing a minimum impact on the posteriors. Since the parametric space
is variation independent, we might further assume prior independence.
As a first baseline scenario (i), we consider a data generating process that follows
a DirReg model. The sample size is n = 250, and the mean vector of the Dirichlet
distributed multivariate response with D=3 is regressed (see equation [12.7]) onto a
quantitative covariate x uniformly distributed in (−0.5, 0.5). Regression coefficients
are set equal to β10 = 1, β11 = 2, β20 = 0.5, β21 = −3, and the precision parameter
is α+ = 50. From Table 12.1, it emerges that both models provide accurate estimates
of regression coefficients and of the precision parameter. Thus, the fitted regression
lines of the DirReg and FDReg models are almost entirely overlapping. Furthermore,
it is worth noting that the true values of all parameters are included in the CIs of both
models. The FDReg model fits the Dirichlet structure of data very well by estimating
three equally weighted mixture components, i.e. pj ≈ 1/3 for j = 1, 2, 3; moreover,
the estimate of the parameter w∗ , which measures the distance between component
means, is near zero.
A fully Bayesian criterion that balances between the goodness of fit of a model and
its complexity and that is properly defined for mixture models is the widely applicable
information criterion (WAIC) (the lower the better) (Vehtari et al. 2017). Please note
that WAIC values, obtained as averages over the 500 replications, are similar among
the models (see Table 12.1), thus suggesting that they both well-adapt to data, despite
the fact that the Dirichlet is the favored one being the “true” data generating model.
These simulation results suggest that the FDReg model, despite guaranteeing a
greater flexibility and a richer parameterization than the DirReg model, is capable
of also accommodating for simpler scenarios without the risk of over-fitting and
penalization due to its higher number of parameters.
A Flexible Mixture Regression Model for Bounded Multivariate Responses 175
y ⊕ δ = C{y1 · δ1 , . . . , yD · δD } ∈ S D , [12.8]
q1 qD
where C{·} is the closure operation, C{q} = D , . . . , D , with
j=1 qj j=1 qj
qj > 0, ∀ j = 1, . . . , D. The vector resulting from perturbation operation in [12.8]
lies on the simplex as well. The neutral element of the perturbation operation is
(1/D, . . . , 1/D) and so if element yj is perturbed by a δj greater (lower) than 1/D,
the perturbation is upward (downward).
In all scenarios (see Tables 12.2, 12.3 and 12.4), the FDReg model provides a
better fit (lower WAIC value) than the DirReg. Nevertheless, neither model produces
robust estimates of the regression coefficients of the mean vector. For example, in
scenario (iii), the 25 randomly selected yi2 are perturbed upward and, as a result,
both the DirReg and the FDReg regression curves are flattened, with estimated
176 Applied Modeling Techniques and Data Analysis 2
regression coefficients β̂11 and β̂21 closer to zero than the true value. Therefore, it
is necessary to deepen the analysis to understand the reason why the FDReg model
provides a better fit than the DirReg model despite not determining an increase in
point estimate robustness. It is worth noting that the mixture structure of the FDReg
model provides the required flexibility to cluster the response values into outlying and
not-outlying values. Indeed, in all scenarios, one component of the mixture is
dedicated to describing the majority of observations, about 90%, another component
is dedicated to the 10% of outlying values and the remaining component is dedicated
to a residual amount of less than 1% observations. Parameter w∗ , which measures the
distance between the component means, is high, about 0.6, in all scenarios.
Y2 Y2
100 100
20
20
80 80
40
40
60 60
60
60
40 40
80
80
20 20
10
10
0
Y1 Y3 Y1 Y3
20
40
60
80
20
40
60
80
0
10
10
Y2 Y2
100 100
20
20
80 80
40
40
60 60
60
60
40 40
80
80
20 20
10
10
0
Y1 Y3 Y1 Y3
20
40
60
80
20
40
60
80
0
10
10
Figure 12.1. Clockwise from top-left panel: ternary plots at baseline (i) and in case
of perturbations (ii), (iii) and (iv) of one simulated sample. The perturbed response
values are in light blue. For a color version of this figure, see www.iste.co.uk/dimotikalis/
analysis2.zip
A Flexible Mixture Regression Model for Bounded Multivariate Responses 177
4
2
2
log(CPO)
log(CPO)
0
0
−2
−2
−4
−4
0 50 100 150 200 250 0 50 100 150 200 250
Obs Obs
4
4
2
2
0
0
log(CPO)
log(CPO)
−2
−2
−4
−4
−6
−6
−8
−8
4
2
2
0
0
log(CPO)
log(CPO)
−2
−2
−4
−4
−6
−6
−8
−8
Figure 12.2. Logarithm of CPO values for the DirReg (left panels) and FDReg (right
panels) models in scenarios (ii) (top panel), (iii) (middle panel) and (iv) (bottom panel).
CPO values associated with perturbed response values are in red. For a color version
of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
A Flexible Mixture Regression Model for Bounded Multivariate Responses 179
The FDReg model provides the best fit to data (see WAIC values in Table 12.5)
thanks to the flexibility of the FD distribution to describe bimodal shapes (see
the ternary plot in the top-left panel of Figure 12.3). The estimates of regression
coefficients are similar across models; nevertheless, the superiority of the FDReg
model emerges from the analysis of the behavior of the component means. Figure 12.3
shows, clockwise from top-right panel, the scatterplots of the quantitative covariate x
with respect to each element of the composition yij , j = 1, 2, 3. The fitted regression
curves of the DirReg (solid lines) and FDReg (dashed lines) models are quite similar.
Nevertheless, the FDReg has the component means λh (dotted curves) as an additional
180 Applied Modeling Techniques and Data Analysis 2
element of flexibility since they are capable of adapting to the clusters induced by the
mixture structure of the data generating process. Please note that two dotted curves are
represented in each scatterplot since the j−th element, for j = 1, 2, 3, of λj is equal
α +τ
to α+j +τ , while the j−th elements of the remaining λh , for h = j, are equal to each
αj
other and equal to α+ +τ .
Y2
100
0.8
20
80
0.6
40
60
Y1
0.4
60
40 0.2
80
20
10
0.0
0
40
60
80
x
10
0.5
0.8
0.4
0.6
0.3
Y2
Y3
0.4
0.2
0.2
0.1
0.0
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
x x
Figure 12.3. Top-left panel: ternary plot of one simulated sample from simulation
study 2. Scatterplots of x versus yi1 (top-right panel), yi2 (bottom-left panel) and
yi3 (bottom-right panel). Fitted regression curves for the mean vector of the DirReg
model (solid black lines) and the FDReg model (dashed lines). In dotted lines, the
regression curves λh of the FDReg model. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
Last, we simulate sample data with n = 250 from an FDReg model. The mean
vector is regressed onto a quantitative covariate in a similar way to that specified
in baseline scenario (i) of simulation study 1. Additional parameters of the FD
distribution are α+ = 100, w∗ = 0.6 and p = (1/3, 1/3, 1/3) .
A Flexible Mixture Regression Model for Bounded Multivariate Responses 181
The FDReg model ensures the best fit to data and by far the lowest WAIC
measure (see Table 12.6). The DirReg model provides acceptable estimates of the
regression coefficients since all CIs, apart from the one of β21 , contain the true value.
Nonetheless, it completely fails to grasp the mixture structure of data as is clear from
the graphical representations in Figure 12.4. The λh component means of the FDReg
model perfectly describe the clusters within each element of the composition (dotted
lines). Conversely, the mean elements of the DirReg (solid lines) turn out to be almost
outside the point clouds. The only element of flexibility available to the DirReg model
lies in the modulation of the precision parameter. Indeed, the estimate of the precision
parameter α+ of the DirReg is highly biased downward to induce high variability and
allow for describing the separated clusters.
Y2
0.8
100
0.7
20
80
0.6
40
60
0.5
Y1
60
0.4
40
0.3
80
20
0.2
10
0
40
60
80
0
10
x
0.6
0.8
0.5
0.6
0.4
Y2
Y3
0.3
0.4
0.2
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Figure 12.4. Top-left panel: ternary plot of one simulated sample from simulation
study 3. Scatterplots of x versus yi1 (top-right panel), yi2 (bottom-left panel) and
yi3 (bottom-right panel). Fitted regression curves for the mean vector of the DirReg
model (solid black lines) and the FDReg model (dashed lines). In dotted lines, the
regression curves λh of the FDReg model. For a color version of this figure, see
www.iste.co.uk/dimotikalis/analysis2.zip
182 Applied Modeling Techniques and Data Analysis 2
FDReg DirReg
Mean CI Mean CI
β10 0.365 (0.241;0.477) 0.350 (0.311;0.388)
β20 0.509 (0.322;0.679) 0.461 (0.426;0.498)
β11 1.745 (1.524;1.953) 2.008 (1.861;2.147)
β21 -2.540 (-3.119;-2.008) -2.559 (-2.702;-2.438)
α+ 21.780 (19.916;24.311) 11.569 (10.798;12.385)
p1 0.628 (0.589;0.673)
p2 0.203 (0.023;0.387)
p3 0.169 (0.009;0.340)
w∗ 0.506 (0.455;0.558)
WAIC -1067.373 -893.853
FDReg DirReg
Mean CI Mean CI
β10 = 1 0.973 (0.840;1.103) 0.998 (0.821;1.177)
β20 = 0.5 0.480 (0.388;0.581) 0.329 (0.092;0.543)
β11 = 2 -1.964 (-2.104;-1.817) -1.912 (-2.165;-1.676)
β21 = −3 -2.961 (-3.106;-2.823) -2.622 (-2.966;-2.275)
α+ = 100 91.653 (83.674;96.691) 9.425 (8.996;9.876)
p1 = 1/3 0.324 (0.259;0.385)
p2 = 1/3 0.331 (0.299;0.367)
p3 = 1/3 0.345 (0.304;0.382)
w∗ = 0.6 0.601 (0.574;0.626)
WAIC -1489.116 -818.006
12.5. Discussion
The FDReg proves to be a flexible model for compositional data. In addition to its
good theoretical properties, we show its adaptability to several scenarios. If data come
from a simpler model, such as the DirReg, it provides adequate fit without the risk of
over-fitting. Conversely, if data have a clear bimodal structure, the DirReg performs
poorly, while the FDReg greatly adapts thanks to its mixture structure. Although not
designed as a model to cope with outliers, the FDReg is proven to adapt to a variety of
perturbation schemes that induce artificial outlying observations. It provides a better
fit and a lower “sensibility” (meaning that perturbed observations are less influential)
than the DirReg model. Moreover, the FDReg is computationally very tractable and it
A Flexible Mixture Regression Model for Bounded Multivariate Responses 183
has runtimes similar to the DirReg ones despite its greater complexity. It follows that
the FDReg should be the preferable model in the presence of a possible bimodality or
of influential observations as well as in the absence of a clear mixture structure.
The main limitation of the FDReg lies in the (possibly rigid) assumption that the
component means have to be equally far away from each other and with an equal
distance proportional to w∗ . Possible extensions in this direction will be addressed in
future works.
12.6. References
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman and Hall,
London.
Campbell, G. and Mosimann, J.E. (1987). Multivariate analysis of size and shape: Modelling
with the Dirichlet distribution. ASA Proceedings of Section on Statistical Graphics, 93–101.
Di Brisco, A.M. and Migliorati, S. (2017). A special Dirichlet mixture model for multivariate
bounded responses. Cladag 2017 Book of Short Papers. Universitas Studiorum, Mantova.
Duane, S., Kennedy, A., Pendleton, B.J., and Roweth, D. (1987). Hybrid Monte Carlo. Physics
Letters B, 195(2), 216–222.
Filzmoser, P. and Hron, K. (2008). Outlier detection for compositional data using robust
methods. Mathematical Geosciences, 40(3), 233–248.
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Science
& Business Media, New York.
Gelfand, A.E. and Dey, D.K. (1994). Bayesian model choice: Asymptotics and exact
calculations. Journal of the Royal Statistical Society: Series B (Methodological), 56(3),
501–514.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2014). Bayesian Data Analysis 2. Taylor
& Francis, New York.
Gueorguieva, R., Rosenheck, R., and Zelterman, D. (2008). Dirichlet component regression
and its applications to psychiatric data. Computational Statistics & Data Analysis, 52(12),
5344–5355.
Hijazi, R.H. (2003). Analysis of Compositional Data Using Dirichlet Covariate Models.
The American University, Washington, DC, USA.
Maier, M.J. (2014). Dirichletreg: Dirichlet regression for compositional data in r. Report,
Department of Statistics and Mathematics, University of Economics and Business, Vienna.
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models 37. CRC Press, Boca Raton.
Migliorati, S., Ongaro, A., and Monti, G.S. (2017). A structured Dirichlet mixture model
for compositional data: Inferential and applicative issues. Statistics and Computing, 27(4),
963–983.
Migliorati, S., Di Brisco, A.M., and Ongaro, A. (2018). A new regression model for bounded
responses. Bayesian Analysis, 13(3), 845–872.
184 Applied Modeling Techniques and Data Analysis 2
Neal, R.M. (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm.
Journal of Computational Physics, 111(1), 194–203.
Ongaro, A. and Migliorati, S. (2013). A generalization of the Dirichlet distribution. Journal of
Multivariate Analysis, 114, 412–426.
Pawlowsky-Glahn, V., Egozcue, J.J., and Tolosana-Delgado, R. (2015). Modeling and Analysis
of Compositional Data. John Wiley & Sons, New York.
Stan Development Team (2016). Stan Modeling Language Users Guide and Reference Manual.
CreateSpace Independent Publishing Platform, Scotts Valley, CA.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using
leave-one-out cross-validation and waic. Statistics and Computing, 27(5), 1413–1432.
13
13.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
186 Applied Modeling Techniques and Data Analysis 2
initial state is empty, that is, X0 = 0, and the process starts due to immigrants.
Each individual at time n produces j progeny with probability p j independently
of each other so that p0 > 0. Simultaneously, in the population i, immigrants
arrive with probability hi in each moment n ∈ N. These individuals undergo further
transformation, obeying the reproduction law p j , and n-step transition probabilities
(n)
p := P Xn+k = j Xk = i for any k ∈ N are given by
ij
where fn (s) is the n-fold iteration of PGF f (s); see, for example, Pakes (1979). Note
that function fn (s) generates the distribution law of the number of individuals at the
time n in theprocess without immigration (see section 13.2).Thus, the transition
(n)
probabilities pi j are completely defined by the probabilities p j and h j .
Classification of states of the chain {Xn } is one of the fundamental problems in the
theory of GWPI. Direct differentiation of [13.1] gives
⎧
⎪
⎪ an + i , when m = 1,
⎨
E Xn | X0 = i =
⎪
⎪ a a
⎩ + i mn − , when m = 1,
m−1 m−1
Γ−1 (λ )xλ −1 e−x , provided that 0 < λ < ∞, where x > 0 and Γ(∗) is Euler’s Gamma
function. This result has been established also by Pakes (1971a) without reference to
Seneta. Afterwards, Pakes (1979, 1975), has obtained principally new results for all
cases m < ∞ and b = ∞.
Throughout this chapter, we keep on the critical case only and b = ∞. Our
reasoning will be bound up with elements of slow variation theory in the sense
of Karamata; see Seneta (1972). We remember that the real-valued, positive and
measurable function L(x) is said to be slowly varying (SV) at infinity if L(λ x) L(x) →
1 as x → ∞ for each λ > 0. For more information, see Seneta (1972), Asmussen (1983)
and Bingham (1987).
Consider PGF fn (s) := E sZn |Z0 = 1 and write Rn (s) := 1 − fn (s). Evidently,
Qn := Rn (0) is the survival probability of the process. By using Slack’s arguments
(1968), we can show that if the condition [ fν ] holds, then
1 1
Qνn · L ∼ asn → ∞. [13.2]
Qn νn
Slack (1968) also has shown that
fn (s) − fn (0)
Un (s) := −→ U(s) [13.3]
fn (0) − fn−1 (0)
188 Applied Modeling Techniques and Data Analysis 2
for s ∈ [0, 1), where the limit function U(s) satisfies the Abel equation
so that U(s) is the PGF of invariant measure for the GW process {Zn }. Combining
[ fν ], [13.2] and [13.3] and considering the properties of the process {Zn }, we have
Rn (s)
Un (s) ∼ Un (s) := 1 − ν n as n → ∞.
Qn
So we proved the following lemma.
Furthermore:
f (1 − y) − (1 − y) 1
Λ(y) := = yν L ,
y y
we establish the following important assertion.
f (s) ≤ ψ (s) ≤ 1;
The statements of the last lemma will play an important role in the proof of
Theorem 13.1.
where κn (s) = O 1 n uniformly in s ∈ [0, 1).
R EMARK 13.2.– Along with all applications, the second statement of Lemma 13.2
in combining with the formula [13.9] provides an opportunity
to find an asymptotic
representation of the transition probability P11 (n) := P Zn = 1 Z0 = 1 as n → ∞,
since fn (0) = P11 (n). In fact, we obtain
ψ (0) N (n)
P11(n) ∼ · as n → ∞,
p0 (ν n)1+1/ν
Now using [13.5]–[13.9] and considering Lemma 13.2, we track down an explicit
form of PGF U(s) and the asymptote of its derivative.
f (s) ≤ ψ (s) ≤ 1;
In this section, we consider GWPI. First of all, we recall the following theorem,
which was proved by Pakes (1975).
Critical Branching Processes with Infinite Variance and Allowing Immigration 191
then
⎧ ⎫
⎨ fn (0) ln h(y) ⎬
(n)
p00 ∼ K1 exp dy as n → ∞.
⎩ f (y) − y ⎭
0
Since this point, everywhere we will consider the case that immigration PGF h(s)
has the following form:
δ 1
1 − h(s) = (1 − s) ,] [hδ ]
1−s
where 0 < δ < 1 and (x) is SV at infinity. The assumption [hδ ] implies that an
average number of immigration distribution law is infinite, i.e. ∑ j∈S jh j = ∞, but
∑ j∈S jδ h j < ∞.
Our results appear provided that conditions [ fν ] and [hδ ] hold and δ > ν . As has
been shown in Pakes (1975), in this case, S is ergodic. Namely, we improve statements
of Theorem P1. Here, we put forward an additional requirement concerning L(x) and
(x). Since L(x) is SV, we can write
L (λ x)
= 1 + α (x)] [Lα ]
L(x)
for each λ > 0, where α (x) → 0 as x → ∞. Henceforth,we suppose
that some positive
function g(x) is given so that g(x) → 0 and α (x) = o g(x) as x → ∞. In this case,
L(x) is called SV with remainder α (x); see Bingham (1987, p. 185, condition SR3).
Wherever we exploit the condition [Lα ], we will suppose that
L (x)
α (x) = o as x → ∞. [13.10]
xν
Moreover, by perforce, we suppose the condition
(λ x)
= 1 + β (x)] [β ]
(x)
192 Applied Modeling Techniques and Data Analysis 2
Since fn (s) ↑ 1 for all s ∈ [0, 1) in virtue of [13.1], it is sufficient to observe the
case i = 0 as n → ∞. Denote
(0)
Pn (s) := Pn (s).
The next result directly follows from Theorem 13.2 setting x = 0 there.
Further, we need the following result which is an improved analog of the basic
lemma of the theory of critical GW processes.
L EMMA 13.5 (Imomov and Tukhtaev 2019).– Let conditions [ fν ], [Lα ] and [13.10]
hold. Then
1 1 1+ν
− = νn + · ln 1 + ν nΛ(1 − s) + ρn(s),
Λ Rn (s) Λ (1 − s) 2
where ρn (s) = o (ln n) + σn (s) and σn (s) is bounded uniformly in s ∈ [0, 1) and
converges to the limit σ (s) as n → ∞, which is a bounded function for s ∈ [0, 1).
Critical Branching Processes with Infinite Variance and Allowing Immigration 193
We make sure that at the conditions of the second part of Theorem 13.2, PGF Pn (s)
converges to a limit π (s), which we denote by the power series representation
π (s) = ∑ π js j.
j∈S
Now, using Lemma 13.5, we can establish a speed rate of this convergence in the
following theorem.
T HEOREM 13.3.– Let conditions [ fν ], [hδ ] holdand δ > ν . Then, Pn (s) converges
to π (s) which generates the invariant measures π j for GWPI. The convergence is
uniform over compact subsets of the open unit disc. If, in addition, the conditions [Lα ],
[13.10] and β are fulfilled, then
1
Pn (s) = π (s) 1 + Δn(s)Nδ ,
Rn (s)
1 1 1 + ν ln νn (s)
Δn (s) = − 1 + o(1)
δ − ν νn (s) /δ ν −1 2ν νn (s) / δ ν
R EMARK 13.3.– The analogous result, as in Theorem 13.2, has been proved in
Imomov (2015) for δ = 1 and f (1−) < ∞.
13.4. Conclusion
In this chapter, we consider and study the model of the evolution of the population
size of homogeneous individuals, called the branching process allowing immigration.
The main goal of the work is to study the asymptotic properties of the process
trajectory in the interpretation of transition probabilities in the minimal moment
conditions.
194 Applied Modeling Techniques and Data Analysis 2
In the monograph (Harris 1963, pp. 29–31), part of the treatment of gene fixation
was interpreted in terms of invariant (stationary) measures. We hope that the properties
of the invariant measures of the simple GW branching process established in
Theorem 13.1, and the asymptotic formulas for transition probabilities for the GWPI
(Theorem 13.2 and Theorem 13.3), showing approximations to invariant measures,
can be useful in theoretical aspects of applied problems similar to those described in
Harris (1963).
13.5. References
We will examine some properties of the extreme points of the probability density
distribution of the Wishart matrix, using properties of the Vandermonde determinant
and showing examples of the applications of these properties.
14.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
196 Applied Modeling Techniques and Data Analysis 2
The aim of this chapter is to illustrate the significance of the extreme points of
the Vandermonde matrix, in order to optimize its condition number as a measure of
sensitivity and stability of the given system. This will be discussed in section 14.5,
but first we give a brief outline on the background of the problem setup, based on
polynomial regression models and the close relation between the Vandermonde matrix
and the random Wishart matrix.
14.2. Background
For convenience, we will use the following notation for the Vandermonde matrix
of size N × N :
⎡ ⎤
1 1 ··· 1
⎢ x1 x2 · · · xN ⎥
⎢ ⎥
X = VN (x) = ⎢ .. .. . . . ⎥.
⎣ . . . .. ⎦
−1 N −1 −1
xN
1 x2 · · · xN
N
Here, x = (x1 , x2 , . . . , xN ) are N distinct data points or nodes. Note that the
Vandermonde matrix has a simple expression for its determinant given by
vn (x) ≡ |X| = (xi − xj ) [14.1]
1≤i<j≤N
Since the entries of the Vandermonde matrix are monomials of the form
xj−1
i , i, j = 1(1)N , then the Wishart matrix or moment matrix W = X X
Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function 197
N
has entries that are also polynomials of the form xij = xj−1
i which can
i,j=1
also be taken as entries of a Hankel matrix (Ljung et al. 2012). Applying the
usual Newton–Girard symmetric function (Abramowitz and Stegun 1965; Macdonald
1979), the matrices X and W can be decomposed and directly evaluate their inverses,
eigenvalues, determinants, matrix norms and condition numbers, as well as explain
some characteristic properties of extreme points of their determinants and other
applications. By extreme points, we refer to those points of the Vandermonde matrix
that maximize its determinant (as fully discussed in Muhumuza et al. (2018a);
Lundengård and Silvestrov (2013)). Knowing the extreme values of the Vandermonde
determinant can assist us in estimating the conditional number of the Vandermonde
matrix and the Wishart matrix. In sections 14.3 and 14.4, we describe how some
properties of the Vandermonde matrix and Wishart matrix relate to one another and in
section 14.5 we apply these relations when computing the conditional number of the
two types of matrices.
and
N +1
N
1
H(t) = = hτ (x0 , x1 , . . . , xN )tτ = (1 − xi t)−1
E(t) τ =0 i=0
m
where H(t)E(t) = 1 and (−1)m−τ em−τ (x)hN −l (x) = 0, n ≥ l.
r=0
N
Let X = VN (x) = xij i,j=1 , be an N × N where x = (x1 , x2 , . . . , xN ) are
pairwise distinct points. Setting PN [x] to be a vector space of polynomials in x over
the field R of degree, at most, N , then we define the sets B1 = {1, x, x2 , . . . , xN }
and B2 = {[x]0 , [x]1 , . . . , [x]N }; both form the bases of PN [x], where [x]k = (x −
x0 )(x − x1 ) . . . (x − xk ) for all 1 ≤ k ≤ N and [x]0 = 1. Thus, the entries of the
Vandermonde matrix X can be expressed in terms of symmetric functions; the lemma
below is taken from Oruç and Phillips (2000).
N
L EMMA 14.1.– The entries of the Vandermonde matrix X = VN (x) = xij i,j=0
can be expressed in the form
i
xij = hi−k (x0 , x1 , . . . , xk )[xj ]k , i, j = 0, 1, . . . , N.
k=0
Recalling from the Newton interpolating polynomial PN (x) for a function f (x) at
distinct points x0 , x1 , . . . , xN can be expressed in the form (Muhumuza et al. 2018a)
is the jth divided difference of f (x) with respect to the points x0 , x1 , . . . , xN and
when f (x) is a polynomial of degree at most N , then PN (x) = f (x). This forms
the basis for the LU factorization of the Vandermonde matrix and its inverse, as fully
Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function 199
discussed in Gautschi (1981) and Gautschi and Inglese (1987). The LU can be directly
transformed to the LDU decomposition, as discussed in Oruç and Phillips (2000) and
Oruç and Akmaz (2004):
j−1
xi − xj−k−1
lij = , 1 ≤ j < i ≤ N, [14.6]
xj − xj−k−1
k=0
j−1
dij = (xi − xj−k−1 ), i = j, [14.7]
k=0
The detailed discussion of the statement of proof to some of the results above can
be obtained in Oruç and Phillips (2000, 2007) and Oruç and Akmaz (2004).
Based on the above results of LDU factorization and applying the general
properties of matrices, we directly evaluate the determinant of the Vandermonde
matrix X, as expressed in the following theorem.
N
N j−1
det(X) = djj = (xj − xj−k−1 ) = (xj − xk ). [14.9]
j=1 i=1 k=0 0≤j<k≤N
j−1
dij = (xi − xj−k−1 ) for all i = j
k=0
200 Applied Modeling Techniques and Data Analysis 2
N
and det(D) = djj , we can write
j=1
N
N j−1
det(X) = det(D) = djj = (xj − xj−k−1 )
j=1 j=1 k=0
N
= (xj − xi ) = (xj − xi ).
j=1 1≤i<j 1≤i<j≤N
The above matrix decomposition techniques can also be directly applied to the
Wishart matrix W, since W = X X (detailed discussion on the same can be found
in Yang and Qiao (2003) and Yang (2005, 2007a, b)). In the following sections, we
will use this result to show an interesting relation between the extreme points of the
Vandermonde determinant and the condition number of the Wishart matrix.
Matrix norms and spectral norms are of great importance in giving bounds for the
spectrum of a matrix (von Neumann et al. 1963; Gautschi 1990; Muhumuza et al.
2019).
where ρ(·) denotes the spectral radius, tr(·) is the trace, · 2 is the natural L2 -norm
and · F is the Frobenius norm,
⎛ ⎞ 12
Using this important definition, we can express the matrix norms of the Wishart
matrix. We will also use the following lemma from Muhumuza et al. (2018b).
Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function 201
L EMMA 14.2.– For any symmetric n×n matrix A with eigenvalues {λi , i = 1, . . . , n}
that are all distinct, and any polynomial P :
n
P (λk ) = tr (P (A)) .
k=1
and thus P (λ) is an eigenvalue of P (A). For any matrix, A, the sum of eigenvalues
is equal to the trace of the matrix
n
λk = tr(A)
k=1
when multiplicities are taken into account. For the matrices considered in
Lemma 14.2, all eigenvalues are distinct. Thus, applying this property to the matrix
P (A) gives the desired statement.
N
T HEOREM 14.3.– Let X = VN (x) = xji be a Vandermonde matrix and
i=0
W = (X X) be the Wishart matrix, where W is diagonalizable. Then, the matrix
norm of W can be expressed as
N
W2F = ηj4 = tr(W2 ) [14.11]
j=1
and
⎛ ⎞
⎜ ⎟
⎜ N
N ⎟
⎜ ⎟
W−1 2F = det(X)−4 · ⎜ ηk4 ⎟ [14.12]
⎜ ⎟
j=1 ⎝ k = 1 ⎠
i = k
X = UDV [14.13]
202 Applied Modeling Techniques and Data Analysis 2
X X = V DU UDV = V D2 V, U U = I.
Since V is an orthogonal matrix, each column must have unit length, in other
N
2
words, vji = 1. Thus, we can express the norm as follows:
j=1
N N N N N
X2F = x2ij = ηj2 2
vji = ηj2 .
i=1 j=1 i=1 j=1 i=1
Since W = X X and applying SVD [14.13], following the steps above and the
fact that WF = X XF ≤ X F XF , we have
N N N
W2F = VD4 VT 2F = ηj4 |vji |2 = ηj4 . [14.15]
j=1 j=1 j=1
N
WF = ηj4 = tr(W2 ).
j=1
Next, we will prove the expression for the norm of W−1 . Since U is a unitary
matrix
N
N
N
Thus, tr(D −4
) = η̃j4 , where η̃j = ηk ηi . Since D is a diagonal
j=1 k=1 i=1
i = k
matrix, then det(D) = ηj and since
j=1
The concept of the matrix norm is closely related to the condition number that we
directly apply to the case of the Vandermonde matrix and the Wishart matrix using
extreme points, as discussed in the next section.
The concept of the condition number for the Vandermonde matrix has previously
been investigated in connection with the stability of polynomial interpolation and least
squares as discussed in Cheney (1966), Beckermann and Labahn (2000), Dunkl and
Xu (2001), Beckermann et al. (2007), Kaltofen et al. (2012) and von Neumann et al.
(1963), Smale (1985), Demmel (1988, 1997), Dubiner (1991), Brutman (1997) and
Calvi and Levenberg (2008), (Bos et al. 2010a, 2010b).
Our aim in this section is to illustrate the significance of the extreme points of
the Vandermonde matrix in order to optimize its condition number as a measure of
204 Applied Modeling Techniques and Data Analysis 2
sensitivity and stability of the given system. Condition numbers can explain the best
possible accuracy of the solution of say, a linear system, i.e. Ax = b in the presence
of approximations made by the computation (Bos et al. 2010a, 2010b). The condition
number can also bound the rate of convergence of iterative methods, measure distance
of an instance to singularity and/or shed light on preconditioning (Hoel 1958; Golub
and Van Loan 1996; Higham 2002).
Since the condition number of a matrix is expressed in terms of the norm of the
matrix and the norm of its inverse, then we can express the condition number in
terms of the matrix trace and determinant. We demonstrate this for the case of the
Vandermonde matrix and the Wishart.
N
T HEOREM 14.4.– Let X = VN (x) = xji be a Vandermonde matrix, and
i=0
W = (X X) be the Wishart matrix. Then, the conditional number of W, κ(W)
can be minimized by maximizing the determinant of X.
P ROOF.– Applying Theorem 14.3 in the previous section, and also the ideas of the
Vandermonde determinant, then we can express the matrix norm in [14.11] and
[14.12] and since X is the Vandermonde matrix, then it follows that
N N
X2F = tr(X X) = ηj2 , and X−1 2F = det(X)−2 · η̃j2
j=1 j=1
N
N
N
where η̃j = ηk ηi = ηk det(X).
k=1 i=1 k=1
i = k i = k
Since the product term in the denominator, which happens to be the determinant
of the Vandermonde matrix, is a dominant term compared to the partial sums in
the numerator, then the value of the condition number, κ(X), can be minimized by
maximizing the Vandermonde determinant.
Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function 205
Similarly, since the Wishart matrix W = X X, then its condition number κ(W)
follows immediately from the fact that
N
W2F = ηj4 ,
j=1
and
N
W−1 2F = det(X)−4 · η̃j4 .
j=1
Table 14.1. For different points on a three-dimensional sphere and the square of the
eigenvalues of the corresponding Vandermonde matrix. Here, pmax is a point that
maximizes the Vandermonde determinant and λi = ηi2 , where ηi is the ith eigenvalue
of the Vandermonde matrix
To illustrate the above results, we chose a few points on the sphere and computed
the condition number using [14.17] and [14.18], as well as the definition. The
results are shown in Tables 14.1 and 14.2. From these tables, we note that the
condition number of both the Vandermonde matrix κ(X) and the Wishart matrix
κ(W), with respect to Frobenius norm, is highly dependent and sensitive to the
−1 −2
Vandermonde determinant |X|. That is, κ(X) ∝ |X| and κ(W) ∝ |X| or simply
2
κ(W) ∝ κ(X) . Therefore, the extreme points that maximize the Vandermonde
determinant can minimize the condition number. These points are often referred to
as Fekete points, as discussed in Muhumuza et al. (2018a). In the next section, we
demonstrate that these extreme points are actually the eigenvalues of the Wishart
matrix whose joint eigenvalue probability density function is a Gaussian ensemble,
as discussed in Muhumuza et al. (2018b).
206 Applied Modeling Techniques and Data Analysis 2
κ(X)F κ(W)F
|X| Using (14.17) Using definition Using (14.18) Using definition
pmax 0.7071 6.000 6.000 23.74 23.74
p1 0.4958 8.898 8.898 58.56 58.56
p2 0.3958 12.81 14.97 119.7 163.7
p3 0.1360 23.23 9.226 422.7 66.66
p4 0.1094 23.30 16.53 476.8 240.0
p5 0.0739 37.24 37.24 1257 1257
pmin 0.05198 85.14 85.14 5999 5999
14.6. Conclusion
We have also been able to illustrate that the extreme points of the Vandermonde
determinant are indeed related to the eigenvalues of the Wishart and these extreme
points have a joint eigenvalue density function that is a Gaussian ensemble. These
points, which are also zero of classical orthogonal polynomials, provide the most
stable and economical interpolating points.
Our future plan is to apply these results to the optimal control theory, especially in
finance and high dimensional data analysis.
14.7. Acknowledgments
14.8. References
Forester, P.J. (2010). Log–Gases and Random Matrices. London Mathematical Society
Monographs, Princeton University Press, London.
Gautschi, W. (1981). A survey of Gauss–Christoffel quadrature formulae. EB Christoffel,
Birkhäuser, Basel.
Gautschi, W. (1990). How (stable) are Vandermonde systems. Asymptotic and Computational
Analysis, 124, 193–210.
Gautschi, W. and Inglese, G. (1987). Lower bounds for the condition number of Vandermonde
matrices. Numerische Mathematik, 52(3), 241–250.
Golub, G.H. and Van Loan, C.F. (1996). Matrix Computations, 3rd edition. Johns Hopkins
University Press, Baltimore.
Higham, N.J. (2002). Accuracy and Stability of Numerical Algorithms, 2nd edition. SIAM,
Philadelphia.
Hoel, P.G. (1958). Efficiency problems in polynomial estimation. The Annals of Mathematical
Statistics, 29(4), 1134–1145.
Kaltofen, E.L., Lee, W.S., Yang, Z. (2012). Fast estimates of Hankel matrix condition numbers
and numeric sparse interpolation. Proceedings of the 2011 International Workshop on
Symbolic-Numeric Computation, 130–136.
Karlin, S., and Studden, W.J. (1966). Tchebycheff System: With Applications in Analysis and
Statistics. Interscience Publishers, John Wiley & Sons, New York.
König, W. (2005). Orthogonal polynomial ensembles in probability theory. Probability Surveys,
2, 385–447.
Ljung, L., Pflug, G., Harro, W. (2012). Applied Stochastic Approximation and Optimization of
Random Systems 17. Birkhäuser, Basel.
Lundengård, K., Österberg, J., Silvestrov, S. (2013). Extreme points of the Vandermonde
determinant on the sphere and some limits involving the generalized Vandermonde
determinant. arXiv, eprint arXiv:1312.6193.
Macdonald, I.G. (1979). Symmetric Functions and Hall Polynomials. Oxford University Press,
Oxford.
Martinez, J.J. and Pena, J.M. (1998). Factorization of Cauchy–Vandermonde matrices. Linear
Algebra Appl., 284, 229–237.
Mehta, M.L. (1967). Random Matrices and the Statistical Theory of Energy Levels. Academic
Press, New York, London.
Muhumuza, A.K., Lundengård, K., Österberg, J., Silvestrov, S., Mango, J.M., Kakuba,
G. (2018a). The generalized Vandermonde interpolation polynomial based on divided
differences. In Proceedings of the SMTDA2018 Conference, Skiadas, C.H. (ed.). Crete.
Muhumuza, A.K., Lundengård, K., Österberg, J., Silvestrov, S., Mango, J.M., Kakuba,
G. (2018b). The multivariate Wishart distribution based on generalized Vandermonde
determinant. Submission, Methodology and Computing in Applied Probability, IWAP2018
Conference.
Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function 209
Muhumuza, A.K., Lundengård, K., Österberg, J., Silvestrov, S., Mango, J.M., Kakuba, G.
(2019). Notes on the extreme points of the Vandermonde determinant on surfaces implicitly
determined by a univariate polynomial. In Algebraic Structures and Applications. SPAS2017,
Västerås and Stockholm, Sweden, October 4 – 6, 2017, Silvestrov, S., Malyarenko, A.,
Rančić, M. (eds). Springer, Cham.
von Neumann, J., Taub, A.W., Taub, A.H. (1963). The Collected Works of John von Neumann:
6-Volume Set. Reader’s Digest Young Families, Pleasantville, New York.
Oruç, H. and Akmaz, H.K. (2004). Symmetric functions and the Vandermonde matrix.
J. Comput. Appl. Math., 172, 49–64.
Oruç, H. and Phillips, G.M. (2000). Explicit factorization of the Vandermonde matrix. Linear
Algebra Appl., 315, 113–123.
Oruç, H. and Phillips, G.M. (2007). LU factorization of the Vandermonde matrix and its
applications. Appl. Math. Lett., 20, 892–897.
Phillips, G.M. (2003). Interpolation and Approximation by Polynomials. Springer-Verlag,
New York.
Smale, S. (1985). On the efficiency of algorithms of analysis. Bull. New. Ser. Am. Math. Soc.,
13(2), 87–121.
Spivey, M.Z. and Zimmer, A.M. (2008). Symmetric polynomials, Pascal matrices, and Stirling
matrices. Linear Algebra Appl., 428(4), 1127–1134.
Szegő, G. (1939). Orthogonal Polynomials. American Mathematics Society, Rhode Island.
Tang, W.P. and Golub, G.H. (1981). The block decomposition of a Vandermonde matrix and its
applications. BIT, 21, 505–517.
Wigner, E.P. (1951). On the statistical distribution of the widths and spacings of nuclear
resonance levels. Mathematical Proceedings of the Cambridge Philosophical Society, 7(4).
Yang, S.-L. (2005). On the LU factorization of the Vandermonde matrix. Discrete Appl. Math.,
146(2), 102–105.
Yang, S.-L. (2007a). On a connection between the Pascal, Stirling and Vandermonde matrices.
Discrete Appl. Math., 155(2), 2025–2030.
Yang, S.-L. (2007b). Generalized Leibniz functional matrices and factorization of some
well-known matrices. Linear Algebra Appl., 430(1), 511–531.
Yang, S.-L. and Qiao, Z.-K. (2003). Stirling matrix and its property. Int. J. Appl. Math., 14(2),
145–157.
15
15.1. Introduction
In time series analysis, the forecast generation is often confined to the point
forecasts that undoubtedly have high relevance from the empirical point of view, but do
not give any information on the uncertainty of the predictor. In fact, the point forecasts
are usually evaluated considering indices that give evidence of how the predicted value
is “far” from the observed data or, in other cases, how the predictor makes it possible
to obtain forecasts that are more (or less) accurate than those obtained from other
predictors, but no information is given on their likely accuracy1.
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
212 Applied Modeling Techniques and Data Analysis 2
In more detail, we consider the PI of a new predictor proposed for the widely
known nonlinear time series model: the Threshold AutoRegressive model (see Tong
1978, 1983, 1990) whose structure is shortly presented here.
The generation of point forecasts from this class of models has been largely
investigated and different predictors have been proposed in the literature (see, for
example, Clements and Smith (1997), Clements et al. (2003) and Boero and Marrocu
(2004)).
In this context, a new proposal has been given in (Niglio 2019) that introduces a
predictor based on the weighted average of past observations and whose weights are
obtained from the minimization of the Mean Square Forecast Errors (MSFE).
In more detail, starting from model [15.1], in the following we consider the SETAR
parametrization with two regimes (k = 2) that we shortly denote as SETAR(2; p, p)
(1) (1)
Xt = φ0 + φ1 Xt−1 + . . . + φ(1) p X t−p I{Xt−d ∈R1 }
(2) (2)
+ φ0 + φ1 Xt−1 + . . . + φ(2) p X t−p (1 − I{Xt−d ∈R1 } ) + t [15.2]
Xt = I
t−d ΦXt−1 + t , [15.3]
Forecast Uncertainty of the Weighted TAR Predictor 213
where ⎛ ⎞
1
⎜Xt−1 ⎟
I{Xt−d ∈R1 }
(1) (1) (1)
φ0 φ1 . . . φp ⎜ ⎟
It−d = , Φ= (2) (2) (2) , Xt−1 = ⎜
⎜Xt−2 ⎟
⎟
1 − I{Xt−d ∈R1 } φ0 φ1 . . . φp ⎝ ... ⎠
Xt−p
and I
t−d is the transpose of the vector It−d .
T
T
X̂Tw+1 = wt Xt = wt (I
t−d ΦXt−1 + t ), [15.4]
t=max{p,d}+1 t=max{p,d}+1
Niglio (2019) shows that the main advantage in the predictor [15.4] is that
it takes into account all the observed data (and then the whole data generating
process) differently from other predictors, such as the conditional expectations, which
only involve the last observed values. From the computational point of view, the
minimization and, therefore, the derivation of the weights is not an easy task, but
when the forecasting performance of [15.4] is compared to the SETAR predictor
obtained from the conditional expectation of XT +1 or to other linear predictors (such
as the AR(p) or the Random Walk predictors), it often outperforms its competitors.
In particular, it is especially evident when the nonlinearity and the persistence of the
generating process grow and when the difference of the parameters between the two
regimes increases. On the other hand, when the number of observations that belong
to a single regime is limited and it is difficult to discriminate between a linear AR
and a nonlinear SETAR model, the weighted predictor does not outperform the linear
competitor.
The weighted predictor [15.4] has been discussed and evaluated in Niglio (2019);
however, its distribution and, in particular, the assessment of its uncertainty have not
been faced.
Starting from this point, in this chapter, we evaluate the uncertainty of the predictor
[15.4] through the generation of PI’s. In section 15.2, in more detail, we introduce the
main results given in the literature on the generation of bootstrap PI’s in the SETAR
214 Applied Modeling Techniques and Data Analysis 2
domain and we adapt those approaches to the weighted predictor constructed for
XT +1 . The theoretical details given in section 15.2 are then evaluated in section 15.3
where the coverage and the length of the PI are investigated through a Monte Carlo
study.
The generation of the PI’s for SETAR models has often raised the interest of the
literature (for recent contributions, see Li (2011) and Staszewska-Bystrova and Winker
(2016)).
The nonlinear dynamic of the SETAR model, the variability caused by the
parameter estimation and even their bias in the presence of small samples make the
distribution of the predictor “not standard”. It has led to the use of computational
intensive techniques that make it possible to construct PI’s even in the presence of
complex nonlinear structures.
In both contributions, the proposed PI’s are related to the SETAR predictor,
obtained as the conditional expected value of XT +1 , XTce+1 = E[XT +1 |Ft ], with
Ft being the set of information available until time T , that, in the presence of model
[15.2], with d = 1, becomes:
(1) (1)
XTce+1 = φ0 + φ1 XT + . . . + φ(1) p XT −p+1 I{XT ∈R1 }
(2) (2)
+ φ0 + φ1 XT + . . . + φ(2) p XT −p+1 (1 − I{XT ∈R1 } ), [15.5]
Among the bootstrap approaches developed in the time series context (for two
interesting reviews, see Bühlmann (2002) and Kreiss and Paparoditis (2011)), Li
(2011) and Staszewska-Bystrova and Winker (2016) make use of the residual
bootstrap and define proper procedures that are detailed here and then modified to
take into account the predictor [15.4].
L2. computes the fitted values ˆt = Xt − X̂t , for t = p + 1, . . . , T and generates
the bootstrap replicate Xt∗
(1) (1) ∗
Xt∗ = φ̂0 + φ̂1 Xt−1 + . . . + φ̂(1)
p X ∗
t−p I{Xt−1
∗ ∈R̂1 }
(2) (2) ∗
+ φ̂0 + φ̂1 Xt−1 + . . . + φ̂(2) ∗ ∗
p Xt−p (1 − I{X ∗ ∈R̂1 } ) + t [15.6]
t−1
where Xt∗ = Xt , for t ≤ p, ∗t is drawn with replacement from the ˆt ’s and R̂1 is the
real subset (−∞, r̂1 ].
L3. estimates the model parameters using the bootstrap series Xt∗ and so computes
∗
+1 fixing Xt = Xt , for t = T − p + 1, . . . , T
the one-step-ahead forecast X̂Tce∗
(1)∗ (1)∗ ∗ (1)∗ ∗
X̂Tce∗
+1 = φ̂0 + φ̂1 X T + . . . + φ̂p X T −p+1 I{XT
∗ ∈R̂∗ }
1
(2)∗ (2)∗ ∗
+ φ̂0 + φ̂1 XT + . . . + φ̂p XT −p+1 (1 − I{X ∗ ∈R̂∗ } ) + ∗T +1
(2)∗ ∗
T 1
[15.7]
where, even in this case, ∗T +1 is randomly drawn from the ˆt ’s and R̂∗1 is the subset
(−∞, r̂1∗ ];
L4. repeats B times the steps L2 and L3, so obtaining the set of bootstrap forecasts
{X̂Tce∗
+1,1 , X̂T +1,2 , . . . , X̂T +1,B }; the (1 − α)% PI is given by [X̂T +1 (α/2), X̂T +1
ce∗ ce∗ ce∗ ce∗
(1−α/2)], with X̂Tce∗ +1 (α/2) and X̂T +1 (1−α/2) being the α/2 and 1−α/2 quantiles,
ce∗
In this context, another recent contribution is given by Pan and Politis (2016) that
faces the generation of PI’s for autoregressive (linear, nonlinear and nonparametric)
process, considering a six-step procedure that they detail, in the parametric case,
for the linear autoregressive models. In the following, Pan and Politis’ bootstrap
procedure (2016) is detailed for SETAR models and, to give more emphasis on the
content of each step, the procedure is expanded to 10 steps.
216 Applied Modeling Techniques and Data Analysis 2
Note that Pan and Politis (2016) consider, for linear autoregressive models,
forward and backward procedures to generate the bootstrap series. The irreversibility
of the nonlinear time series automatically excludes the latter procedures, that is, those
based on the non-feasible assumption, in the nonlinear domain, that the pseudo-data
are generated starting from the last p observations.
In the so-called forward bootstrap domain, Pan and Politis (2016) distinguish two
approaches: the first considers the fitted residuals, whereas the second considers the
predictive residuals that, following Politis (2013), should be favored to limit the
finite-sample under-coverage of the former approach.
The steps of the forward bootstrap with fitted residuals are as follows:
PP1. the same as step L1;
PP2. compute the fitted values ˆ
t = Xt − X̂t , for t = p + 1, . . . , T ;
PP3. center the residuals ˆt and draw the bootstrap residuals, ∗t , extracting with
replacement from the centered ˆt ;
PP4. generate T + m artificial data from model [15.2] using as first p
pseudo-observations the vector (Xk , Xk+1 , Xk+p−1 ), for k = 1, . . . , T − p + 1,
randomly selected form the observed series such that
(1) (1) ∗
Xt∗ = φ̂0 + φ̂1 Xt−1 + . . . + φ̂(1)
p X ∗
t−p I{Xt−1
∗ ∈R̂1 }
(2) (2) ∗
+ φ̂0 + φ̂1 Xt−1 + . . . + φ̂(2) ∗ ∗
p Xt−p (1 − I{X ∗ ∈R̂1 } ) + t [15.8]
t−1
for t = p + 1, . . . , T and R̂1 is the real subset (−∞, r̂1 ]. Then, discard from the
pseudo-series the first m artificial data;
PP5. estimate the parameters of the SETAR(2; p, p) model using the pseudo-data
and then compute the bootstrap forecasts:
(1)∗ (1)∗ ∗ (1)∗ ∗
X̂Tce∗
+1 = φ̂0 + φ̂1 X T + . . . + φ̂p X T −p+1 I{XT
∗ ∈R̂∗ }
1
(2)∗ (2)∗ ∗ (2)∗ ∗
+ φ̂0 + φ̂1 XT + . . . + φ̂p XT −p+1 (1 − I{X ∗ ∈R̂∗ } ) [15.9]
T 1
where Xt∗
= Xt , for t = T − p + 1, . . . , T and R̂∗1
is the real subset (−∞, r̂1∗ ];
PP6. generate the future bootstrap observations
(1) (1)
XT∗ +1 = φ̂0 + φ̂1 XT∗ + . . . + φ̂(1) p X ∗
T −p+1 I{XT ∗ ∈R̂ }
1
(2) (2)
+ φ̂0 + φ̂1 XT∗ +. . .+ φ̂(2) ∗ ∗
p XT −p+1 (1−I{X ∗ ∈R̂1 } ) + T +1 [15.10]
T
The main difference of the forward bootstrap predictive residuals, with respect
to the fitted residuals, is the following: given the time series Xt , delete the single
observation Xt , for t = p + 1, . . . , T , from the series and then estimate the parameters
(k,t)
φ̂i , r̂(t) , for k = 1, 2 and i = 1, . . . , p.
(k,t)
The fitted values described in step PP2 are now obtained using the φ̂i , r̂(t)
(t) (t)
parameters and so the predictive residuals becomes ˆt = Xt − X̂t , for t = p +
1, . . . , T .
In the previous algorithm, the predictive residuals replace the fitted ones starting
from the step PP3; at the same time, even though their computation makes the
algorithm more heavy, they make it possible to face some coverage problems that
are encountered in the fitted residual case, when T is small. The algorithms Li
(2011), Pan and Politis (2016) and Staszewska-Bystrova and Winker (2016) have been
implemented for the predictor [15.4] where the X̂Tce∗ +1 in the steps L3 and PP5 (and,
obviously, in all subsequent steps that refer to this predictor), respectively, is replaced
with the predictor X̂Tw∗
+1 , whose weights are computed using the bootstrap series.
T
X̂Tw∗
+1 = ŵt∗ Xt∗ + ∗T +1
t=p+1
where the weights wt∗ , t = p + 1, . . . , T are estimated from the bootstrap series.
PP5 . estimate the parameters of the SETAR(2; p, p) model using the pseudo-data
and then compute the bootstrap forecasts
T
+1 =
X̂Tw∗ ŵt∗ Xt∗
t=p+1
where the weights wt∗ , t = p + 1, . . . , T are estimated from the bootstrap series.
218 Applied Modeling Techniques and Data Analysis 2
The four bootstrap procedures (Li (2011) and Staszewska-Bystrova and Winker
(2016), the fitted and predictive approaches of Pan and Politis (2016)) described
here show an increasing computational effort and a clear main difference: Li (2011)
and Staszewska-Bystrova and Winker (2016) build the PI’s using the percentile of
the bootstrap distribution of the predictor and Pan and Politis (2016) consider the
percentile of the bootstrap roots (defined in PP7) taking advantage of some results in
Politis (2013) in the regression domain.
To evaluate the performance of the bootstrap procedures to obtain the PI’s of the
weighted predictor, we have considered the approaches of Li (2011), Pan and Politis
(2016) and Staszewska-Bystrova and Winker (2016) adapted to the predictor [15.4],
discussed in section 15.2. We have implemented a Monte Carlo simulation study
where we have considered three different SETAR(2; 1,1) models
with R1 being the subset of not positive real numbers and t ∼ N (0, 1).
We have considered 2000 Monte Carlo replications, and in each replicate, we have
considered B = 2000 bootstrap pseudo-series that make it possible to obtain, for the
weighted predictor, the PI whose lower and upper bounds are denoted with Li and Ui ,
respectively, for i = 1, . . . , 2000. The empirical coverage of the interval [Li , Ui ] has
then been assessed, generating 2000 values from the SETAR(2; 1,1) model
(1) (2)
XT +1,j = φ̂1 XT I{XT ∈R̂1 } + φ̂1 XT (1 − I{XT ∈R̂1 } ) + ∗j
(1) (2)
where XT is the observation at time T , (φ̂1 , φ̂1 , r̂1 ) is the vector of the estimated
parameters, both obtained from the series generated at each Monte Carlo iteration.
The threshold estimate, r̂1 , is obtained defining
a grid of values, delimited from the
15th and 85th percentile of Xt , such that nt=p+1 ˆ2t is minimized. Finally, ∗j is the
Forecast Uncertainty of the Weighted TAR Predictor 219
bootstrap error randomly selected (with replacement) in the jth bootstrap replicate,
for j = 1, . . . , 2000, of the Monte Carlo iterations.
Given this, the empirical coverage of the bootstrap prediction interval is evaluated
by first computing
2000
1
CV Ri = I{XT +1,j ∈[Li ,Ui ]}
2000 j=1
and then
2000 2000
1 1
CV R = CV Ri LEN = (Ui − Li ),
2000 i=1 2000 i=1
The CVR and LEN indices for the four bootstrap approaches adapted to the
predictor [15.4] (Li (2011) and Staszewska-Bystrova and Winker (2016) and the
fitted and predictive bootstrap of Pan and Politis (2016)) are reported in Table 15.1,
considering time series of length T = {100, 200} at two different nominal coverages,
1−α = 0.95 and 1−α = 0.90. The variability of the length of the 2000 PI’s, obtained
from the Monte Carlo iterations, is further evaluated with the standard errors (s.e.).
If we examine the results in Table 15.1, we can observe that with model M1
and T = 100, the PI obtained from the Li (2011) and Staszewska-Bystrova and
Winker (2016) algorithm is characterized by empirical coverage (CVR) greater than
the nominal PI. This over-coverage could be due to the structure of the generating
process whose asymmetry in the distribution of the observations between the two
regimes affects not only the weighted point predictor (as pointed out in section
15.2) but even its accuracy. On the contrary, an over-coverage that is so high is not
recognized in the PI obtained from the fitted and predictive approaches of Pan and
Politis (2016) algorithms and even in all cases with T = 200, with the exception of the
Staszewska-Bystrova and Winker (2016) algorithm.
The results of models M2 and M3 show the main differences when T = 100.
In this case, the empirical coverage of the fitted bootstrap outperforms the other
approaches, whereas the predictive bootstrap approach is always characterized by PI’s
with the highest width and even the highest variability of their length. It can be further
noted that all bootstrap approaches benefit from the presence of the intercepts in the
SETAR model, as in the M3 case, where the length of the PI’s and its variability
are smaller than in the M1 and M2 models. This has different explanations: first, the
presence of the intercepts makes it possible to better discriminate between the two
regimes and so even the estimation of the SETAR model takes advantage of it; second,
as said before, the weighted predictor has better performance when the discrimination
among regimes is more marked.
220
Model M3
w_fit 0.9509 5.1458 0.4444 0.9495 4.6117 0.2509 0.9010 4.2817 0.3458 0.9000 3.8756 0.2069
w_pred 0.9580 5.3809 0.5834 0.9507 4.7108 0.2595 0.9110 4.4637 0.4265 0.9014 3.9489 0.2161
w_Li 0.9344 5.0661 0.4633 0.9389 4.5630 0.2558 0.8829 4.2109 0.3251 0.8901 3.8332 0.2123
w_SW 0.9438 5.2547 0.4105 0.9456 4.6650 0.2245 0.8975 4.3703 0.3238 0.8998 3.9094 0.2078
Table 15.1. Evaluation of the PI’s of the weighted predictor for the models M1, M2 and M3, at two
nominal coverages, 0.95 and 0.90, and two different series lengths, T = {100, 200}. w_fit and w_pred:
fitted bootstrap and predictive bootstrap PI’s based on Pan and Politis (2016); w_Li: bootstrap PI
based on Li (2011); w_SW bootstrap PI based on Staszewska-Bystrova and Winker (2016)
Forecast Uncertainty of the Weighted TAR Predictor 221
The results obtained from model M3 have been further investigated. The deviation
from the empirical and the nominal coverage of the PI’s (mainly of the Li (2011)
and Staszewska-Bystrova and Winker (2016) algorithms) has led to construct
skewness-adjusted confidence intervals. In particular, following Grabowski et al.
(2020), we mirror the bootstrap distribution of the weighted predictor [15.4] using
the mirrored bootstrap prediction
The results of model M3 with T = 100 are shown in Table 15.2, where it can
be noted that the coverage of the PI’s, the length and its variability do not take
much advantage of the correction of Grabowski et al. (2020) in the presence of
one-step-ahead forecasts. It is in line with their results but we expect that their
approach will give more clear results when the number of steps ahead increases.
From the computational point of view, the burden of the four algorithms is quite
heavy. All simulations have been carried out with a processor Intel Core i7 quad
core, 3.3 GHz and, on average, the computing time (in seconds) of each Monte
Carlo iteration for the four bootstrap methods, with series length T = 100 are:
fitted bootstrap 33.51”, predictive bootstrap 34.45”, the Li (2011) method 29.68” and
the Staszewska-Bystrova and Winker (2016) method 29.60”. As expected, the last
two approaches are less heavy from the computational point of view, while the best
coverage of the Pan and Politis (2016) algorithm is related to a longer computation
time.
Finally, we need to remark that the results given here can be further expanded
evaluating how the distribution of the errors can impact the predictive accuracy (and
the PI’s coverage) of the predictor [15.4]. This point and further evaluations on the
bootstrap PI’s are left for the future research.
222 Applied Modeling Techniques and Data Analysis 2
15.4. References
Boero, G. and Marrocu, M. (2004). The performance of SETAR models: A regime conditional
evaluation of point, interval and density forecasts. International Journal of Forecasting, 20,
305–320.
Bühlmann, P. (2002). Bootstraps for time series. Statistical Science, 17, 52–72.
Clements, M.P. and Smith, J. (1997). The performance of alternative forecasting methods for
SETAR models. International Journal of Forecasting, 13, 463–475.
Clements, M.P., Franses, J.S., van Dijk, D. (2003). On SETAR non-linearity and forecasting.
Journal of Forecasting, 22, 359–375.
Grabowski, D., Staszewska-Bystrova, A., Winker, P. (2020). Skewness-adjusted bootstrap
confidence intervals and confidence bands for impulse response functions. AStA Advances
in Statistical Analysis, 104, 5–32.
Kreiss, J.P. and Paparoditis, E. (2011). Bootstrap methods for dependent data: A review. Journal
of Korean Statistical Society, 40, 357–378.
Li, J. (2011). Bootstrap prediction intervals for SETAR models. International Journal of
Forecasting, 27, 320–332.
Niglio, M. (2019). SETAR forecasts with weighted observations. 39th International Symposium
on Forecasting, Thessaloniki, Greece.
Pan, L. and Politis, D.N. (2016). Bootstrap prediction intervals for linear, nonlinear and
nonparametric autoregressions. Journal of Statistical Planning and Inference, 177, 1–27.
Politis, D. (2013). Model-free model-fitting and predictive distributions (with discussions). Test,
22, 183–250.
Staszewska-Bystrova, A. and Winker, P. (2016). Improved bootstrap prediction intervals for
SETAR models. Statistical Papers, 57, 89–98.
Tong, H. (1978). On a threshold model. In Pattern Recognition and Signal Processing, Chen,
C.H. (ed.). Sijthoff and Noordhoff, Amsterdam.
Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis. Springer-Verlag,
New York.
Tong, H. (1990). Non-linear Time Series: A Dynamical System Approach. Oxford University
Press, New York.
16
Revisiting Transitions
Between Superstatistics
This work aims to provide an accurate method for the detection of a transition
between superstatistics. A slight improvement over the currently published method is
achieved. The superstatistics framework is briefly recalled and a rather new concept
of the transition of superstatistics, introduced by Xu and Beck (2016), is re-examined.
In addition, an original synthetic model for superstatistical transition, suggested by
Beck, is discussed. It is shown that its modified version, which takes into account a
stochastic nature of the transition, better reflects empirically observed transitions.
16.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
224 Applied Modeling Techniques and Data Analysis 2
Superstatistics is the generalized Boltzmann factor that describes the whole system
composed of small subsystems in local equilibrium
+∞
In general, the only restriction on f (β) is allowing only positive values for
β. However, experience showed that three distributions fit various empirical data
especially well. Therefore, Beck (2009) suggested the so-called universality classes.
It contains a gamma distribution
n2
1 n n nβ
f (β) = β 2 −1 exp − , [16.2]
Γ( n2 ) 2β0 2β0
a log-normal distribution
1 (log β − log μ)2
f (β) = √ exp − [16.3]
β 2πs2 2s2
and an Inverse χ2 distribution. The last distribution is disregarded here because it was
shown in Jizba et al. (2018) that it gives poor results for data at our disposal.
Originally in the pioneering paper (Xu and Beck 2016), the transition was
only assessed by looking at the histogram of β at two time scales (minutes and
days). Unfortunately, it is impossible to reliably detect the transition merely from
the histogram, see Figure 16.1. Therefore, in Jizba et al. (2018), a method based
on statistical distances was employed. It allowed us to see a change from one
superstatistic to another in a quantitative way. However, this still lacked a level of
significance because the fact that statistical distance is smaller for one distribution
than for another, may just be a manifestation of a random error. A slight improvement
described in the following attempts to address this issue.
The main difference from the method in Jizba et al. (2018) is that we try to assign
a probability distribution to each time scale according to all three statistical distances.
Kolmogorov–Smirnov distance
Dn = sup |Fn (x) − F (x)|, [16.4]
x
226 Applied Modeling Techniques and Data Analysis 2
where F (x) is the fully specified distribution function and Fn (x) is the empirical
distribution function
1
n
Fn (x) = I(ui ≤ x). [16.7]
n i=1
Apart from the Lilliefors test, the method of recognizing probability distribution
was inspired by Marshall et al. (2001), where only the Kolmogorov–Smirnov distance
was used to discriminate between two-parametric distribution families. It was shown
that the distance measure provides a reliable discriminating criteria.
4) Use the decision criteria for choosing gamma or log-normal distribution and
mark the trial as successful if the distribution matches the one generated in step 2.
5) Repeat steps 2 − 4 105 times and estimate the probability of successfully
selecting the probability distribution by relative frequencies.
The dataset used for testing this method is the same as the one used in Xu and Beck
(2016), i.e. stock prices of seven US companies from different sectors recorded on the
minute-tick basis during a period from January 2, 1998, to May 22, 2013. The output
of the simulation is depicted in Figure 16.5. It confirms the conclusion from Jizba
et al. (2018) about a transition for the companies Alcola Inc. (AA) and Wal-Mart
Stores Inc. (WMT). Moreover, it is seen that the time series for Bank of America
does not exhibit a transition of superstatistics. The key point to note is a relatively
high probability of successfully discriminating the two distributions. Therefore, it
may be concluded that the transition, especially for AA, is not smooth, but oscillates
between the two distributions around the transition point. This statistically significant
observation is examined in the next section.
In the original mention of superstatistical transition (Xu and Beck 2016), Beck and
Xu suggested the so-called synthetic model
βτ = κτ Lτ0 + 1 − κτ Gτ∞ , [16.8]
where Lτ0 and Gτ∞ are the two random variables with log-normal and gamma
distributions, respectively. The suffixes τ0 and τ∞ denote small and large time scales,
respectively, where the distribution of β is log-normal and gamma. For data at
hand, τ0 = 20 minutes and τ∞ ≈ 500 minutes. Lτ0 and Gτ∞ may be thought
of as asymptotic distributions. The parameter κ ∈ 0, 1 is a function of a time
scale τ and is responsible for a smooth transition from a region dominated by the
log-normal distribution to one with gamma distribution on larger time scales. A
reasonable functional form for κ which may reproduce the observed transition is
1
κ(τ ) = tanh a(τ − b) + 1 . [16.9]
2
The parameter a controls the sharpness of a transition and b selects a time scale
at which the transition occurs. See Figure 16.2 for demonstration of the sharpness
parameter.
Time scale τ
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0.8 0.8
20 100 200 300 400 500
Figure 16.3. Transition for the deterministic model. For a color version
of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip
The explanation for this discrepancy is rather simple. Since the distance measures
between probability distributions used for discriminating the two regions have a
significant statistical power (due to a large sample size especially at small time scales),
it will, at a certain level of κ (likely κ ≈ 12 ), flip from the log-normal distribution to
the gamma distribution and will never oscillate between those two states (as seen in
Figure 16.3).
In this chapter, we propose a model that better captures the observed behavior of
real transitions. As can be seen in Figure 16.5, the transitions possess a stochastic
nature. For example, time series for Wal-Mart Stores Inc. shows the transition from
log-normal to the gamma region around a time scale of 60 minutes. Nevertheless,
a quick unpredictable transition happens much sooner and also on higher time scale,
where an occasional flip to the log-normal region is observed. The stochastic nature
is more pronounced for Alcola Inc., where the transition again occurs around a time
scale of 60 minutes, but unlike for WMT, it is very slow (corresponds to a small
sharpness parameter in [16.9]). Even at τ ≥ 300 minutes, an occasional flip back to
the log-normal distribution is observed.
Revisiting Transitions between Superstatistics 229
7
mean value of κ
6 0.0
0.2
5 0.4
0.6
4
0.8
1.0
3
The suggested modification, which incorporates a random element into the model,
is to consider κ as a random variable. (Strictly speaking, a stochastic process since κ
is parametrized by a time scale τ ). κ is a parameter in 0, 1; therefore, it is necessary
to use a probability distribution with a compact support. A well-known distribution
with this property is a beta distribution
xγ−1 (1 − x)δ−1
p(x) = , γ, δ > 0. [16.10]
B(γ, δ)
The original parametrization is not the best choice, therefore, an alternative one
which contains the mode of the distribution μ and the so-called concentration ν is
used
γ = μ(ν − 2) + 1,
δ = (1 − μ)(ν − 2) + 1.
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0.8 0.8
20 100 200 300 400 500
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0.8 0.8
20 100 200 300 400 500
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0.8 0.8
20 100 200 300 400 500
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0.8 0.8
20 100 200 300 400 500
Figure 16.5. Transitions for companies Alcola Inc. (AA), Bank of America Corporation
(BAC), Wal-Mart Stores Inc. (WMT) and the synthetic model incorporating
randomness. For a color version of this figure, see www.iste.co.uk/dimotikalis/
analysis2.zip
It should be noted that even though the model reflects the empirical transition well,
a suitable estimator for corresponding parameters in the model has not been found.
16.5. Conclusion
The Beck’s synthetic model for superstatistical transition was revisited. It was
shown that its modified version, which involves a random element, is able to
correctly reproduce observed transitions and may therefore serve as a suitable model.
The modification is done by incorporating a random element into the transition
parameter κ. Namely, κ is considered to be a stochastic process (parametrized by
a time scale) with beta distribution. Moreover, a better method for assessing the
transition of superstatistics was provided, which assigns the probability of successfully
discriminating between two Superstatistical regions. These probabilities need to be
obtained by Monte Carlo simulations.
Revisiting Transitions between Superstatistics 231
16.6. Acknowledgments
This work was supported by the Grant Agency of the Czech Technical University
in Prague (grant no. SGS19/239/OHK4-009/19).
16.7. References
17.1. Introduction
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
234 Applied Modeling Techniques and Data Analysis 2
states was carried out by Yechiali and Naor (1971). This result was soon generalized
by Yechiali (1973), who used the Markov chain with an arbitrary finite number of
states as the controlling process. Purdue obtained the condition for the existence of a
stationary mode in such a queuing system.
Neuts (1971, 1978) reduced the task of studying this queuing system in a random
environment for a particular case, to solving a matrix equation.
There are models in which the random environment affects not only the
operation of the device, but also the parameters of the input request stream (Nazarov
and Phung-Duc 2019).
In this chapter, the main method of research is the asymptotic analysis method,
which makes it possible to find the main probabilistic characteristics of the system
in the asymptotic condition of a large delay in orbit. The most important
characteristic is the probability distribution density of device states.
Consider a retrial queue with an incoming flow of Poisson with the parameter λ .
An application received for a free device begins to be serviced. Service time is
exponential with parameter μ1 . The submitted application leaves the system. An
attempt to receive a new application for an occupied device leads to the transfer of
an application to an orbit. Requests are resubmitted to the device after an accidental
delay. The delay time has an exponential distribution with the parameter γ . The
number of applications in orbit is i . An unoccupied device causes additional claims
from the outer orbit with intensity θ . The service time of such an application has an
exponential distribution with the parameter μ2 .
2 ∞ ∞
P (i, s, t )ds = 1
k = 0 i = 0 −∞
k
∂P0 (i, s, t )
+ (λ + iγ + θ ) P0 (i, s, t ) = μ1 ( s ) P1 (i, s, t ) +
∂t
∂ 1 ∂2
+ μ2 ( s) P2 (i, s, t ) − {α ( s) P0 (i, s, t )} + 2 { β 2 (s) P0 (i, s, t )}
∂s 2 ∂s
∂P1 (i, s, t )
+ (λ + μ1 ( s )) P1 (i, s, t ) = λ P0 (i, s, t ) +
∂t
+ (i + 1)γ P0 (i + 1, s, t ) + λ P1 (i − 1, s, t ) −
∂ 1 ∂2
− {α (s) P1 (i, s, t )} + 2 { β 2 (s) P1 (i, s, t )} ,
∂s 2 ∂s
∂P2 (i, s, t )
+ ( λ + μ 2 ( s ) ) P2 (i, s, t ) = θ P0 (i, s, t ) +
∂t
∂ 1 ∂2
+λ P2 (i − 1, s, t ) − {α (s) P2 (i, s, t )} + 2 { β 2 ( s) P2 (i, s, t )} .
∂s 2 ∂s
Let
1
γ = ε 2 , ε 2t = τ , ε 2 i = x + ε y , Pk (i, s, t ) = H k ( y , s,τ , ε ) .
ε
∂H 0 ( y, s,τ , ε ) ∂H ( y, s,τ , ε )
ε2 − ε x ′(τ ) 0 + (θ + λ + x + ε y ) H 0 ( y, s,τ , ε ) =
∂τ ∂y
= μ1 ( s ) H1 ( y , s,τ , ε ) + μ 2 ( s ) H 2 ( y , s,τ , ε ) −
∂ 1 ∂2
− {α (s) H 0 ( y, s,τ , ε )} + 2 { β 2 (s) H 0 ( y, s,τ , ε )} ,
∂s 2 ∂s
= λ H1 ( y − ε , s,τ , ε ) + ( x + ε ( y + ε )) H 0 ( y + ε , s,τ , ε ) +
∂ 1 ∂2
+ λ H 0 ( y, s,τ , ε ) − {α (s) H1 ( y, s,τ , ε )} + 2 { β 2 (s) H1 ( y, s,τ , ε )} ,
∂s 2 ∂s
∂H 2 ( y, s,τ , ε ) ∂H ( y, s,τ , ε )
ε2 − ε x′(τ ) 2 + ( λ + μ 2 ( s ) ) H 2 ( y , s ,τ , ε ) =
∂τ ∂y
= θ H 0 ( y , s ,τ , ε ) + λ H 2 ( y − ε , s , τ , ε ) −
∂ 1 ∂2
− {α (s) H 2 ( y, s,τ , ε )} + 2 { β 2 (s) H 2 ( y, s,τ , ε )} [17.1]
∂s 2 ∂s
∂ 1 ∂2
− {α (s) H 0 ( y, s,τ )} + 2 { β 2 (s) H 0 ( y, s,τ )} ,
∂s 2 ∂s
μ1 ( s ) H1 ( y, s,τ ) = (λ + x) H 0 ( y, s,τ ) −
∂ 1 ∂2
− {α (s) H1 ( y, s,τ )} + 2 { β 2 (s) H1 ( y, s,τ )} ,
∂s 2 ∂s
μ 2 ( s ) H 2 ( y , s ,τ ) = θ H 0 ( y , s, τ ) −
∂ 1 ∂2
− {α (s) H 2 ( y, s,τ )} + 2 { β 2 (s) H 2 ( y, s,τ )} [17.2]
∂s 2 ∂s
The solution H k ( y, s,τ ) of system [17.2] can be written in the following form:
∂ 1 ∂2
− {α (s)Q0 ( x, s)} + 2 { β 2 (s)Q0 ( x, s)} ,
∂s 2 ∂s
∂ 1 ∂2
μ1 ( s )Q1 ( x, s ) = (λ + x)Q0 ( x, s ) − {α (s)Q1 ( x, s)} + 2 { β 2 (s)Q1 ( x, s)} ,
∂s 2 ∂s
238 Applied Modeling Techniques and Data Analysis 2
∂ 1 ∂2
μ 2 ( s )Q2 ( x, s ) = θ Q0 ( x, s ) − {α (s)Q2 ( x, s)} + 2 { β 2 (s)Q2 ( x, s)} [17.4]
∂s 2 ∂s
2 +∞
Q ( x, s)ds = 1
k = 0 −∞
k
[17.5]
We denote
2 +∞
Q ( x, s) = r (s) , Q ( x, s)ds = R ( x)
k =0
k k k
[17.6]
−∞
r (s)ds = 1 , R ( x) = 1
−∞ k =0
k [17.7]
We sum the equations of system [17.4] by k and take the notation [17.5] into
account, in order to obtain the following equation:
∂ 1 ∂2
− {α (s)r (s)} + 2 { β 2 (s)r (s)} = 0 [17.8]
∂s 2 ∂s
+∞ +∞
−∞
μ1 ( s )Q1 ( x, s ) ds = ψ R1 ( x ) , μ
−∞
2 ( s )Q2 ( x, s ) ds = φ R2 ( x ) [17.10]
+∞
1 ∂
−α ( s)Qk ( x, s) +
2 ∂s
{ β 2 ( s)Qk ( x, s)}
s =−∞
= 0,
(θ + λ + x) R0 ( x) = ψ R 1 ( x) + φ R2 ( x),
ψ R1 ( x) = (λ + x) R0 ( x),
φ R 2 ( x) = θ R0 ( x) [17.11]
ψφ (λ + x)φ
R0 ( x ) = , R1 ( x) = ,
θψ + (λ + x)φ + ψφ θψ + (λ + x)φ + ψφ
αψ
R2 ( x) = [17.12]
θψ + (λ + x)φ + ψφ
∂H 0 ( y, s,τ , ε )
−ε x '(τ ) + (θ + λ + x + ε y ) H 0 ( y, s,τ , ε ) =
∂y
= μ1 ( s ) H1 ( y , s,τ , ε ) + μ 2 ( s ) H 2 ( y , s,τ , ε ) −
∂ 1 ∂2
− {α (s) H 0 ( y, s,τ , ε )} + 2 { β 2 (s) H 0 ( y, s,τ , ε )} ,
∂s 2 ∂s
∂H1 ( y , s,τ , ε )
−ε x '(τ ) + μ1 ( s ) H1 ( y, s,τ , ε ) =
∂y
240 Applied Modeling Techniques and Data Analysis 2
∂ 1 ∂2
− {α (s) H1 ( y, s,τ , ε )} + 2 { β 2 (s) H1 ( y, s,τ , ε )} + o(ε ) ,
∂s 2 ∂s
∂H 2 ( y, s,τ , ε )
−ε x '(τ ) + μ 2 ( s ) H 2 ( y , s, τ , ε ) =
∂y
∂H 2 ( y, s,τ , ε )
= θ H 0 ( y, s,τ , ε ) − ελ −
∂y
∂ 1 ∂2
− {α (s) H 2 ( y, s,τ , ε )} + 2 { β 2 (s) H 2 ( y, s,τ , ε )} + o(ε ). [17.13]
∂s 2 ∂s
∂ 2 ∂
+∞ +∞
−ε x '(τ ) H k ( y, s,τ , ε )ds = ε x H 0 ( y, s,τ , ε )ds −
∂y k = 0 −∞ ∂y −∞
+∞ +∞
−λ H 2 ( y, s,τ , ε )ds − λ H ( y, s,τ , ε )ds + o(ε ) .
1
−∞ −∞
We divide both sides of the obtained equation by ε , perform the limit transition
and take [17.3] into account, in order to obtain
2 +∞
∂H ( y,τ )
+∞ +∞
− x '(τ ) Q ( x, s)ds
k = x Q0 ( x, s )ds − λ Q1 ( x, s )ds −
k = 0 −∞ ∂y −∞ −∞
∂H ( y,τ )
+∞
−λ Q2 ( x, s )ds .
−∞ ∂y
Research on Retrial Queue with Two-Way Communication in a Diffusion Environment 241
∂H ( y,τ )
{ x '(τ ) + xR ( x) − λ ( R ( x) + R ( x))}
0 1 2
∂y
=0.
( )
Let us now consider the process y (τ ) = lim ( ε 2 i (τ / ε 2 ) − x (τ ) ) ε . This process
ε →0
characterizes the deviation of the number of applications in the system. We prove
that it is a diffusion autoregression process.
Let us denote the right-hand side of the differential equation [17.15] as A(x):
A( x ) = λ − (λ + x ) R0 ( x ) [17.16]
We find the kind of functions hk ( y, s,τ ) . The system [17.13] can be written in
the form
∂
+ μ 2 ( s ) H 2 ( y , s ,τ , ε ) − {α ( s) H 0 ( y, s,τ , ε )} +
∂s
+
1 ∂2
2 ∂s 2
{ β 2 (s) H 0 ( y, s,τ , ε )} = −ε x′(τ ) ∂H 0 ( y∂,ys,τ , ε ) ,
∂ 1 ∂2
− {α (s) H1 ( y, s,τ , ε )} + 2 { β 2 (s) H1 ( y, s,τ , ε )} =
∂s 2 ∂s
242 Applied Modeling Techniques and Data Analysis 2
∂
= −ε
∂y
{( x '(τ ) − λ ) H1 ( y, s,τ , ε ) + x H 0 ( y, s,τ , ε )} + o(ε ) ,
− μ 2 ( s ) H 2 ( y , s ,τ , ε ) + θ H 0 ( y , s ,τ , ε ) −
∂ 1 ∂2
− {α (s) H 2 ( y, s,τ , ε )} + 2 { β 2 (s) H 2 ( y, s,τ , ε )} =
∂s 2 ∂s
∂
= −ε
∂y
{( x '(τ ) − λ ) H 2 ( y, s,τ , ε )} + o(ε ) .
By substituting the decomposition [17.17] into this system, taking [17.4] into
account and dividing all the equations by ε , the resulting system can be written in
the following form:
∂ 1 ∂2
− {α (s)h0 ( y, s,τ )} + 2 { β 2 (s)h0 ( y, s,τ )} =
∂s 2 ∂s
∂H ( y,τ )
= Q0 ( x, s ) yH ( y,τ ) − x ′(τ )Q0 ( x, s ) ,
∂y
∂ 1 ∂2
− {α (s)h1 ( y, s,τ )} + 2 { β 2 (s)h1 ( y, s,τ )} =
∂s 2 ∂s
∂H ( y,τ )
= −Q0 ( x, s ) yH ( y,τ ) − ( ( x ′(τ ) − λ )Q1 ( x, s ) + xQ0 ( x, s ) ) ,
∂y
∂
− μ 2 h2 ( y , s,τ ) + θ h0 ( y , s,τ ) − {α ( s)h2 ( y, s,τ )} +
∂s
+
1 ∂2
2 ∂s 2
{ β 2 (s)h2 ( y, s,τ )} = −( x′(τ ) − λ )Q2 ( x, s) ∂H ∂( yy,τ ) [17.18]
Research on Retrial Queue with Two-Way Communication in a Diffusion Environment 243
∂H ( y,τ )
hk ( y , s,τ ) = hk(1) ( x, s ) + hk(2) ( x, s ) yH ( y,τ ). [17.19]
∂y
We substitute [17.19] into [17.18] and present the system in the form of two
systems:
∂ 1 ∂2
−
∂s
{ α ( s)h0(1) ( x, s)} +
2 ∂s 2
{ β 2 (s)h0(1) ( x, s)} = − x′(τ )Q0 ( x, s) ,
∂
− μ1 ( s ) h1(1) ( x, s ) + (λ + x ) h0(1) ( x, s ) −
∂s
{ α ( s)h1(1) ( x, s)} +
1 ∂2
+
2 ∂s 2
{ β 2 (s)h1(1) ( x, s)} = −( x′(τ ) − λ )Q1 ( x, s) − xQ0 ( x, s),
∂
− μ 2 ( s ) h2(1) ( x, s ) + θ h1(1) ( x, s ) − {α ( s)h2 ( x, s)} +
∂s
1 ∂2
+
2 ∂s 2
{ β 2 (s)h2(1) ( x, s)} = −( x′(τ ) − λ )Q2 ( x, s) [17.20]
and
∂ 1 ∂2
−
∂s
{ α ( s)h0(2) ( x, s)} +
2 ∂s 2
{ β 2 (s)h0(2) ( x, s)} = Q0 ( x, s) ,
∂
− μ1 ( s ) h1(2) ( x, s ) + (λ + x ) h0(2) ( x, s ) −
∂s
{ α ( s)h1( 2) ( x, s)} +
1 ∂2
+
2 ∂s 2
{ β 2 (s)h1(2) ( x, s)} = −Q0 ( x, s) ,
− μ 2 ( s ) h2(2) ( x, s ) + θ h0(2) ( x, s ) −
244 Applied Modeling Techniques and Data Analysis 2
∂ 1 ∂2
−
∂s
{ α ( s)h2(2) ( x, s)} +
2 ∂s 2
{ β 2 (s)h2(2) ( x, s)} = 0 [17.21]
∂Qk ( x, s )
hk(2) ( x, s ) = [17.22]
∂x
By taking into account [17.22] and [17.19], decomposition [17.17] has the form
∂H ( y,τ )
H k ( y, s,τ , ε ) = Qk ( x, s ) H ( y,τ ) + ε hk(1) ( x, s ) +
∂y
∂Qk ( x, s )
+ε yH ( y ,τ ) + o(ε ) [17.23]
∂x
We now find the type of function H(y,τ). The functions on the right-hand side of
system [17.1] are expanded in a series in increments of the argument y up to
o (ε 2 ), in order to obtain
∂H 0 ( y, s,τ , ε ) ∂H ( y, s,τ , ε )
ε2 − ε x ′(τ ) 0 + (θ + λ + x + ε y ) H 0 ( y, s,τ , ε ) =
∂τ ∂y
= μ1 ( s ) H1 ( y , s,τ , ε ) + μ 2 ( s ) H 2 ( y , s,τ , ε ) −
∂ 1 ∂2
− {α (s) H 0 ( y, s,τ , ε )} + 2 { β 2 (s) H 0 ( y, s,τ , ε )} ,
∂s 2 ∂s
∂H 1 ( y , s , τ , ε ) ε 2 ∂ 2 H1 ( y , s,τ , ε )
= λ H 1 ( y , s ,τ , ε ) − ελ +λ +
∂y 2 ∂y 2
∂
+ (λ + x + ε y ) H 0 ( y , s , τ , ε ) + ε {( x + ε y ) H 0 ( y, s,τ , ε )} +
∂y
Research on Retrial Queue with Two-Way Communication in a Diffusion Environment 245
ε 2 ∂ 2 H 0 ( y , s,τ , ε ) ∂
+x − {α ( s ) H 1 ( y , s,τ , ε )} +
2 ∂y 2 ∂s
1 ∂2
+
2 ∂s 2
{ β 2 (s) H1 ( y, s,τ , ε )} + o(ε 2 ) ,
∂H 2 ( y, s,τ , ε ) ∂H ( y, s,τ , ε )
ε2 − ε x ′(τ ) 2 + (λ + μ 2 ( s )) H 2 ( y, s,τ , ε ) =
∂τ ∂y
∂H 2 ( y, s,τ , ε )
= θ H 0 ( y , s,τ , ε ) + λ H 2 ( y, s,τ , ε ) − λε +
∂y
ε 2 ∂ 2 H 2 ( y , s ,τ , ε ) ∂
+λ 2
− {α ( s ) H 2 ( y, s,τ , ε )} +
2 ∂y ∂s
1 ∂2
+
2 ∂s 2
{ β 2 (s) H 2 ( y, s,τ , ε )} + o(ε 2 ) [17.24]
∂H ( y,τ ) ∂H ( y,τ )
ε 2 r (s) − ε x ′(τ )r ( s ) −
∂τ ∂y
∂ 2 ∂ { yH ( y ,τ )} 2
∂ 2 H ( y ,τ )
−ε 2 x ′(τ ) k ε ′ τ
2 (1)
Q ( x , s ) − x ( ) hk ( x , s ) =
∂x k = 0 ∂y k =0 ∂y 2
∂H ( y,τ )
= −ε ( − xQ0 ( x, s ) + λ Q1 ( x, s ) + λ Q2 ( x, s ) ) −
∂y
∂Q ( x , s ) ∂Q ( x , s ) ∂Q ( x, s ) ∂ { yH ( y ,τ )}
−ε 2 −Q0 ( x, s ) − x 0 +λ 1 +λ 2 +
∂x ∂x ∂x ∂y
ε2
+ [ xQ0 ( x, s ) + λ Q1 ( x, s) + λ Q2 ( x, s ) + 2 ( xh0(1) ( x, s) − λ h1(1) ( x, s) −
2
246 Applied Modeling Techniques and Data Analysis 2
∂ 2 H ( y ,τ ) ∂ 2
−λ h2(1) ( x, s ) ) 2
− α ( s ) H k ( y , s,τ , ε ) +
∂y ∂s k =0
1 ∂2 2 2
+ 2
β ( s ) H k ( y , s,τ , ε ) + o(ε 2 ) [17.25]
2 ∂s k =0
We integrate the left and right sides of the equation [17.25] by s, use the
condition [17.7], the designation [17.6] and also denote
+∞ 2
h
(1)
k ( x, s )ds = hk(1) ( x ) , h
k =0
(1)
k ( x) = h(1) ( x) [17.26]
−∞
By taking [17.14] and [17.15] into account, as well as dividing both sides of the
equation by ε 2 , we obtain
∂H ( y , τ ) ∂R ( x ) ∂ ∂ { yH ( y ,τ )}
= − − R0 ( x ) − x 0 + λ { R1 ( x ) + R2 ( x )} +
∂τ ∂x ∂x ∂y
1
+ [ xR0 ( x) + λ R1 ( x) + λ R2 ( x) +
2
∂ 2 H ( y ,τ )
+2 ( xh0(1) ( x ) − (λ + x ) R0 ( x ) h (1) ( x ) ) [17.27]
∂y 2
We derived Fokker’s and Planck’s equation for the probability density H ( y,τ ) .
The drift coefficient of equation [17.27] is a derivative of the right-hand side of the
differential equation [17.15]:
∂R0 ( x ) ∂
Ax′ ( x ) = − R0 ( x ) − x + λ { R1 ( x) + R2 ( x)} =
∂x ∂x
∂ ∂
= {− xR0 ( x) + λ ( R1 ( x) + R2 ( x))} = {λ − (λ + x) R0 ( x)} [17.28]
∂x ∂x
We differentiate z (τ ) by τ :
dz (τ ) = [− xR0 ( x) + λ ( R1 ( x) + R2 ( x))]dτ +
∂
+ε y {− xR0 ( x) + λ ( R1 ( x) + R2 ( x))} dτ + ε B( x)dw(τ ) .
∂x
dz (τ ) = [−( x + ε y ) R0 ( x + ε y ) + λ ( R1 ( x + ε y ) + R2 ( x + ε y ))]dτ +
+ε B ( z − ε y )dw(τ ) .
∂F ( z,τ ) ∂ ε 2 ∂2
∂τ
= − { A( z ) F ( z,τ )} +
∂z 2 ∂z 2
{B2 ( z) F ( z,τ )} .
Consider the functioning of the process z (τ ) in a stationary mode
F ( z,τ ) ≡ F ( z ) . The stable distribution can be found from the equation
∂ ε 2 ∂2 2
0=−
∂z
{ A( z) F ( z )} +
2 ∂z 2
{B ( z) F ( z)} .
This equation is a homogeneous differential equation that has the solution
z z
2 A(u ) 2 A(u )
ε 2 B2 (u ) ε 2 0 B 2 ( u )
du ∞ du
1 1
F ( z) = 2
B ( z)
e 0 0 B 2 ( z ) e dz . [17.32]
17.6. Conclusion
In this chapter, for the presented model of the retrial queuing system in a
diffusion environment, we found the asymptotic average of the normalized number
of calls in the system in the form [17.15], the probability distribution of the device
states [17.12] and the deviation from the average, which are determined by the
stochastic equation [17.30]. The approximation of the number of applications in the
system was carried out by a homogeneous diffusion process. The probability density
of the process values was found in the form [17.32]. The results can be used in
servicing systems, such as call centers, in order to increase efficiency.
17.7. References
Neuts, M.P. (1971). A queue subject to extraneous phase changes. Advances in Applied
Probability, 3, 78–119.
Neuts, M.P. (1978). Further results of the M/M/1 queue with randomly varying rates.
Opsearch, 15, 139–157.
Purdue, P. (1974). The M/M/1 queue in a Markovian environment. Operations Research, 22,
562–569.
Yechiali, U. (1973). A queuing-type birth-and-death process defined on a continuous-time
Markov Chain. Operations Research, 21, 604–609.
Yechiali, U. and Naor P. (1971). Queuing problems with heterogeneous arrivals and services.
Operations Research, 19, 722–734.
List of Authors
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
252 Applied Modeling Techniques and Data Analysis 2
Applied Modeling Techniques and Data Analysis 1: Computational Data Analysis Methods and Tools,
First Edition. Edited by Yannis Dimotikalis, Alex Karagrigoriou, Christina Parpoula and Christos H Skiadas.
© ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
256 Applied Modeling Techniques and Data Analysis 2
in
Innovation, Entrepreneurship and Management
2021
BOBILLIER CHAUMON Marc-Eric
Digital Transformations in the Challenge of Activity and Work:
Understanding and Supporting Technological Changes
(Technological Changes and Human Resources Set – Volume 3)
2020
ACH Yves-Alain, RMADI-SAÏD Sandra
Financial Information and Brand Value: Reflections, Challenges and
Limitations
ANDREOSSO-O’CALLAGHAN Bernadette, DZEVER Sam, JAUSSAUD Jacques,
TAYLOR Robert
Sustainable Development and Energy Transition in Europe and Asia
(Innovation and Technology Set – Volume 9)
BEN SLIMANE Sonia, M’HENNI Hatem
Entrepreneurship and Development: Realities and Future Prospects
(Smart Innovation Set – Volume 30)
CHOUTEAU Marianne, FOREST Joëlle, NGUYEN Céline
Innovation for Society: The P.S.I. Approach
(Smart Innovation Set – Volume 28)
CORON Clotilde
Quantifying Human Resources: Uses and Analysis
(Technological Changes and Human Resources Set – Volume 2)
CORON Clotilde, GILBERT Patrick
Technological Change
(Technological Changes and Human Resources Set – Volume 1)
CERDIN Jean-Luc, PERETTI Jean-Marie
The Success of Apprenticeships: Views of Stakeholders on Training and
Learning
(Human Resources Management Set – Volume 3)
DELCHET-COCHET Karen
Circular Economy: From Waste Reduction to Value Creation
(Economic Growth Set – Volume 2)
DIDAY Edwin, GUAN Rong, SAPORTA Gilbert, WANG Huiwen
Advances in Data Science
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 4)
DOS SANTOS PAULINO Victor
Innovation Trends in the Space Industry
(Smart Innovation Set – Volume 25)
GASMI Nacer
Corporate Innovation Strategies: Corporate Social Responsibility and
Shared Value Creation
(Smart Innovation Set – Volume 33)
GOGLIN Christian
Emotions and Values in Equity Crowdfunding Investment Choices 1:
Transdisciplinary Theoretical Approach
GUILHON Bernard
Venture Capital and the Financing of Innovation
(Innovation Between Risk and Reward Set – Volume 6)
LATOUCHE Pascal
Open Innovation: Human Set-up
(Innovation and Technology Set – Volume 10)
LIMA Marcos
Entrepreneurship and Innovation Education: Frameworks and Tools
(Smart Innovation Set – Volume 32)
MACHADO Carolina, DAVIM J. Paulo
Sustainable Management for Managers and Engineers
MAKRIDES Andreas, KARAGRIGORIOU Alex, SKIADAS Christos H.
Data Analysis and Applications 3: Computational, Classification, Financial,
Statistical and Stochastic Methods
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 5)
Data Analysis and Applications 4: Financial Data Analysis and Methods
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 6)
MASSOTTE Pierre, CORSI Patrick
Complex Decision-Making in Economy and Finance
MEUNIER François-Xavier
Dual Innovation Systems: Concepts, Tools and Methods
(Smart Innovation Set – Volume 31)
MICHAUD Thomas
Science Fiction and Innovation Design (Innovation in Engineering and
Technology Set – Volume 6)
MONINO Jean-Louis
Data Control: Major Challenge for the Digital Society
(Smart Innovation Set – Volume 29)
MORLAT Clément
Sustainable Productive System: Eco-development versus Sustainable
Development
(Smart Innovation Set – Volume 26)
SAULAIS Pierre, ERMINE Jean-Louis
Knowledge Management in Innovative Companies 2: Understanding and
Deploying a KM Plan within a Learning Organization
(Smart Innovation Set – Volume 27)
2019
AMENDOLA Mario, GAFFARD Jean-Luc
Disorder and Public Concern Around Globalization
BARBAROUX Pierre
Disruptive Technology and Defence Innovation Ecosystems
(Innovation in Engineering and Technology Set – Volume 5)
DOU Henri, JUILLET Alain, CLERC Philippe
Strategic Intelligence for the Future 1: A New Strategic and Operational
Approach
Strategic Intelligence for the Future 2: A New Information Function
Approach
FRIKHA Azza
Measurement in Marketing: Operationalization of Latent Constructs
FRIMOUSSE Soufyane
Innovation and Agility in the Digital Age
(Human Resources Management Set – Volume 2)
GAY Claudine, SZOSTAK Bérangère L.
Innovation and Creativity in SMEs: Challenges, Evolutions and Prospects
(Smart Innovation Set – Volume 21)
GORIA Stéphane, HUMBERT Pierre, ROUSSEL Benoît
Information, Knowledge and Agile Creativity
(Smart Innovation Set – Volume 22)
HELLER David
Investment Decision-making Using Optional Models
(Economic Growth Set – Volume 2)
HELLER David, DE CHADIRAC Sylvain, HALAOUI Lana, JOUVET Camille
The Emergence of Start-ups
(Economic Growth Set – Volume 1)
HÉRAUD Jean-Alain, KERR Fiona, BURGER-HELMCHEN Thierry
Creative Management of Complex Systems
(Smart Innovation Set – Volume 19)
LATOUCHE Pascal
Open Innovation: Corporate Incubator
(Innovation and Technology Set – Volume 7)
LEHMANN Paul-Jacques
The Future of the Euro Currency
LEIGNEL Jean-Louis, MÉNAGER Emmanuel, YABLONSKY Serge
Sustainable Enterprise Performance: A Comprehensive Evaluation Method
LIÈVRE Pascal, AUBRY Monique, GAREL Gilles
Management of Extreme Situations: From Polar Expeditions to Exploration-
Oriented Organizations
MILLOT Michel
Embarrassment of Product Choices 2: Towards a Society of Well-being
N’GOALA Gilles, PEZ-PÉRARD Virginie, PRIM-ALLAZ Isabelle
Augmented Customer Strategy: CRM in the Digital Age
NIKOLOVA Blagovesta
The RRI Challenge: Responsibilization in a State of Tension with Market
Regulation
(Innovation and Responsibility Set – Volume 3)
PELLEGRIN-BOUCHER Estelle, ROY Pierre
Innovation in the Cultural and Creative Industries
(Innovation and Technology Set – Volume 8)
PRIOLON Joël
Financial Markets for Commodities
QUINIOU Matthieu
Blockchain: The Advent of Disintermediation
RAVIX Joël-Thomas, DESCHAMPS Marc
Innovation and Industrial Policies
(Innovation between Risk and Reward Set – Volume 5)
ROGER Alain, VINOT Didier
Skills Management: New Applications, New Questions
(Human Resources Management Set – Volume 1)
SAULAIS Pierre, ERMINE Jean-Louis
Knowledge Management in Innovative Companies 1: Understanding and
Deploying a KM Plan within a Learning Organization
(Smart Innovation Set – Volume 23)
SERVAJEAN-HILST Romaric
Co-innovation Dynamics: The Management of Client-Supplier Interactions
for Open Innovation
(Smart Innovation Set – Volume 20)
SKIADAS Christos H., BOZEMAN James R.
Data Analysis and Applications 1: Clustering and Regression, Modeling-
estimating, Forecasting and Data Mining
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 2)
Data Analysis and Applications 2: Utilization of Results in Europe and
Other Topics
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 3)
UZUNIDIS Dimitri
Systemic Innovation: Entrepreneurial Strategies and Market Dynamics
VIGEZZI Michel
World Industrialization: Shared Inventions, Competitive Innovations and
Social Dynamics
(Smart Innovation Set – Volume 24)
2018
BURKHARDT Kirsten
Private Equity Firms: Their Role in the Formation of Strategic Alliances
CALLENS Stéphane
Creative Globalization
(Smart Innovation Set – Volume 16)
CASADELLA Vanessa
Innovation Systems in Emerging Economies: MINT – Mexico, Indonesia,
Nigeria, Turkey
(Smart Innovation Set – Volume 18)
CHOUTEAU Marianne, FOREST Joëlle, NGUYEN Céline
Science, Technology and Innovation Culture
(Innovation in Engineering and Technology Set – Volume 3)
CORLOSQUET-HABART Marine, JANSSEN Jacques
Big Data for Insurance Companies
(Big Data, Artificial Intelligence and Data Analysis Set – Volume 1)
CROS Françoise
Innovation and Society
(Smart Innovation Set – Volume 15)
DEBREF Romain
Environmental Innovation and Ecodesign: Certainties and Controversies
(Smart Innovation Set – Volume 17)
DOMINGUEZ Noémie
SME Internationalization Strategies: Innovation to Conquer New Markets
ERMINE Jean-Louis
Knowledge Management: The Creative Loop
(Innovation and Technology Set – Volume 5)
GILBERT Patrick, BOBADILLA Natalia, GASTALDI Lise,
LE BOULAIRE Martine, LELEBINA Olga
Innovation, Research and Development Management
IBRAHIMI Mohammed
Mergers & Acquisitions: Theory, Strategy, Finance
LEMAÎTRE Denis
Training Engineers for Innovation
LÉVY Aldo, BEN BOUHENI Faten, AMMI Chantal
Financial Management: USGAAP and IFRS Standards
(Innovation and Technology Set – Volume 6)
MILLOT Michel
Embarrassment of Product Choices 1: How to Consume Differently
PANSERA Mario, OWEN Richard
Innovation and Development: The Politics at the Bottom of the Pyramid
(Innovation and Responsibility Set – Volume 2)
RICHEZ Yves
Corporate Talent Detection and Development
SACHETTI Philippe, ZUPPINGER Thibaud
New Technologies and Branding
(Innovation and Technology Set – Volume 4)
SAMIER Henri
Intuition, Creativity, Innovation
TEMPLE Ludovic, COMPAORÉ SAWADOGO Eveline M.F.W.
Innovation Processes in Agro-Ecological Transitions in Developing
Countries
(Innovation in Engineering and Technology Set – Volume 2)
UZUNIDIS Dimitri
Collective Innovation Processes: Principles and Practices
(Innovation in Engineering and Technology Set – Volume 4)
VAN HOOREBEKE Delphine
The Management of Living Beings or Emo-management
2017
AÏT-EL-HADJ Smaïl
The Ongoing Technological System
(Smart Innovation Set – Volume 11)
BAUDRY Marc, DUMONT Béatrice
Patents: Prompting or Restricting Innovation?
(Smart Innovation Set – Volume 12)
BÉRARD Céline, TEYSSIER Christine
Risk Management: Lever for SME Development and Stakeholder
Value Creation
CHALENÇON Ludivine
Location Strategies and Value Creation of International
Mergers and Acquisitions
CHAUVEL Danièle, BORZILLO Stefano
The Innovative Company: An Ill-defined Object
(Innovation between Risk and Reward Set – Volume 1)
CORSI Patrick
Going Past Limits To Growth
D’ANDRIA Aude, GABARRET Inés
Building 21st Century Entrepreneurship
(Innovation and Technology Set – Volume 2)
DAIDJ Nabyla
Cooperation, Coopetition and Innovation
(Innovation and Technology Set – Volume 3)
FERNEZ-WALCH Sandrine
The Multiple Facets of Innovation Project Management
(Innovation between Risk and Reward Set – Volume 4)
FOREST Joëlle
Creative Rationality and Innovation
(Smart Innovation Set – Volume 14)
GUILHON Bernard
Innovation and Production Ecosystems
(Innovation between Risk and Reward Set – Volume 2)
HAMMOUDI Abdelhakim, DAIDJ Nabyla
Game Theory Approach to Managerial Strategies and Value Creation
(Diverse and Global Perspectives on Value Creation Set – Volume 3)
LALLEMENT Rémi
Intellectual Property and Innovation Protection: New Practices
and New Policy Issues
(Innovation between Risk and Reward Set – Volume 3)
LAPERCHE Blandine
Enterprise Knowledge Capital
(Smart Innovation Set – Volume 13)
LEBERT Didier, EL YOUNSI Hafida
International Specialization Dynamics
(Smart Innovation Set – Volume 9)
MAESSCHALCK Marc
Reflexive Governance for Research and Innovative Knowledge
(Responsible Research and Innovation Set – Volume 6)
MASSOTTE Pierre
Ethics in Social Networking and Business 1: Theory, Practice
and Current Recommendations
Ethics in Social Networking and Business 2: The Future and
Changing Paradigms
MASSOTTE Pierre, CORSI Patrick
Smart Decisions in Complex Systems
MEDINA Mercedes, HERRERO Mónica, URGELLÉS Alicia
Current and Emerging Issues in the Audiovisual Industry
(Diverse and Global Perspectives on Value Creation Set – Volume 1)
MICHAUD Thomas
Innovation, Between Science and Science Fiction
(Smart Innovation Set – Volume 10)
PELLÉ Sophie
Business, Innovation and Responsibility
(Responsible Research and Innovation Set – Volume 7)
SAVIGNAC Emmanuelle
The Gamification of Work: The Use of Games in the Workplace
SUGAHARA Satoshi, DAIDJ Nabyla, USHIO Sumitaka
Value Creation in Management Accounting and Strategic Management:
An Integrated Approach
(Diverse and Global Perspectives on Value Creation Set –Volume 2)
UZUNIDIS Dimitri, SAULAIS Pierre
Innovation Engines: Entrepreneurs and Enterprises in a Turbulent World
(Innovation in Engineering and Technology Set – Volume 1)
2016
BARBAROUX Pierre, ATTOUR Amel, SCHENK Eric
Knowledge Management and Innovation
(Smart Innovation Set – Volume 6)
BEN BOUHENI Faten, AMMI Chantal, LEVY Aldo
Banking Governance, Performance And Risk-Taking: Conventional Banks
Vs Islamic Banks
BOUTILLIER Sophie, CARRÉ Denis, LEVRATTO Nadine
Entrepreneurial Ecosystems (Smart Innovation Set – Volume 2)
BOUTILLIER Sophie, UZUNIDIS Dimitri
The Entrepreneur (Smart Innovation Set – Volume 8)
BOUVARD Patricia, SUZANNE Hervé
Collective Intelligence Development in Business
GALLAUD Delphine, LAPERCHE Blandine
Circular Economy, Industrial Ecology and Short Supply Chains
(Smart Innovation Set – Volume 4)
GUERRIER Claudine
Security and Privacy in the Digital Era
(Innovation and Technology Set – Volume 1)
MEGHOUAR Hicham
Corporate Takeover Targets
MONINO Jean-Louis, SEDKAOUI Soraya
Big Data, Open Data and Data Development
(Smart Innovation Set – Volume 3)
MOREL Laure, LE ROUX Serge
Fab Labs: Innovative User
(Smart Innovation Set – Volume 5)
PICARD Fabienne, TANGUY Corinne
Innovations and Techno-ecological Transition
(Smart Innovation Set – Volume 7)
2015
CASADELLA Vanessa, LIU Zeting, DIMITRI Uzunidis
Innovation Capabilities and Economic Development in Open Economies
(Smart Innovation Set – Volume 1)
CORSI Patrick, MORIN Dominique
Sequencing Apple’s DNA
CORSI Patrick, NEAU Erwan
Innovation Capability Maturity Model
FAIVRE-TAVIGNOT Bénédicte
Social Business and Base of the Pyramid
GODÉ Cécile
Team Coordination in Extreme Environments
MAILLARD Pierre
Competitive Quality and Innovation
MASSOTTE Pierre, CORSI Patrick
Operationalizing Sustainability
MASSOTTE Pierre, CORSI Patrick
Sustainability Calling
2014
DUBÉ Jean, LEGROS Diègo
Spatial Econometrics Using Microdata
LESCA Humbert, LESCA Nicolas
Strategic Decisions and Weak Signals
2013
HABART-CORLOSQUET Marine, JANSSEN Jacques, MANCA Raimondo
VaR Methodology for Non-Gaussian Finance
2012
DAL PONT Jean-Pierre
Process Engineering and Industrial Management
MAILLARD Pierre
Competitive Quality Strategies
POMEROL Jean-Charles
Decision-Making and Action
SZYLAR Christian
UCITS Handbook
2011
LESCA Nicolas
Environmental Scanning and Sustainable Development
LESCA Nicolas, LESCA Humbert
Weak Signals for Strategic Intelligence: Anticipation Tool for Managers
MERCIER-LAURENT Eunika
Innovation Ecosystems
2010
SZYLAR Christian
Risk Management under UCITS III/IV
2009
COHEN Corine
Business Intelligence
ZANINETTI Jean-Marc
Sustainable Development in the USA
2008
CORSI Patrick, DULIEU Mike
The Marketing of Technology Intensive Products and Services
DZEVER Sam, JAUSSAUD Jacques, ANDREOSSO Bernadette
Evolving Corporate Structures and Cultures in Asia: Impact
of Globalization
2007
AMMI Chantal
Global Consumer Behavior
2006
BOUGHZALA Imed, ERMINE Jean-Louis
Trends in Enterprise Knowledge Management
CORSI Patrick et al.
Innovation Engineering: the Power of Intangible Networks