COVID-19 Epidemic SIR-based Model
COVID-19 Epidemic SIR-based Model
⋄
Università degli Studi di Milano, Italy
⋆
University of Limerick, Ireland
Abstract
Calibration of a SIR (Susceptibles-Infected-Recovered) model with of-
ficial data at international level for the COVID-19 pandemics provides
a good example of the difficulties inherent the solution of inverse prob-
lems. Inverse modeling is set up in a framework of discrete inverse
problems, which explicitly considers the role and the relevance of data.
Together with a physical vision of the model, this is very useful to dis-
cuss the uncertainties on the data and how they influence the reliability
of calibrated model parameters and, ultimately, of model predictions.
1 Introduction
Epidemic modeling is usually performed with compartmental models, often
called SIR models, which are claimed to go back to the works by Ronald
Ross and Hilda P. Hudson more than one century ago [13, 14] and by Ander-
son Gray McKendrick and William Ogilvy Kermack ten years later [9, 10].
This class of models shares several characteristics with models of population
dynamics and with conceptual lumped models in hydrology. These models
simulate the temporal evolution of some compartments of the population,
which is normally subdivided among Susceptibles (i.e., those persons who
have not yet been affected by the virus and which could be subject to in-
fection), Infected (i.e., those persons who have been infected by the virus)
and Recovered (i.e., those persons who have recovered, after having been in-
fected). For this reason these models are usually referred to as SIR models.
They are based on simple laws to describe the transfer of individuals from
one class to the others.
These models have found wide application both in life sciences, mostly in
epidemiology, and in the field of economic, political and social sciences, e.g.,
1
2 Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
to assess the costs of different policies to block epidemics and the diffusion
of viruses. Most papers consider academic issues and are rarely calibrated
against real data.
Model calibration is a common problem in geophysical and environmen-
tal modeling. A general framework to handle discrete inverse problems for
model calibration is proposed in [7] and can be useful to discuss some char-
acteristics of SIR models and the role of data, also following the discussion
in [6].
The wide number of data which are collected during the COVID-19
pandemic due to the diffusion of the SARS-CoV-2 virus (also called “coro-
navirus”) provides an exceptional basis to perform some modeling exercises
and to test different calibration methods.
The objectives of this paper are to fix some concepts about SIR models
and their calibration and to discuss the relevance of data for reliability of
model outcomes.
The paper is designed to advance the knowledge on the functioning,
potentialities and limitations of epidemic models. Also it is expected to
provide further insights in epidemic model calibration. Instead, it is not
designed to provide forecasts of the pandemic evolution. In authors’ opinion,
the quality of available data does not permit to perform reliable forecast and
model outcomes should be used with high prudence.
The paper is organized as follows. Next section is devoted to the descrip-
tion of the model, in the continuous and discrete case, of the methods used
for the calibration of the numerical model, and of the data for the applica-
tion to the COVID-19 pandemic; in particular, inverse modeling, i.e., model
calibration, will be set up and discussed within the framework proposed by
[7]. Some results of the model with reference to COVID-19 pandemic will
be shown in the third section, whereas the fourth section will be devoted
to a discussion of several topics: the assumptions at the basis of the SIR
model; some remarks about model calibration; some remarks about data
uncertainty. The concluding section will also include some hints for future
developments of this work.
The coefficients β and δ denote the birth and death rate, respectively,
under normal conditions, i.e., without considering deaths caused by the
epidemic. These coefficients are rarely considered in epidemic modeling, as
the variation due to the normal evolution of the population is negligible or
at least smoother than the variations caused by epidemics.
The following equations, based on historical papers [13, 14, 9, 10], are
used to describe the time evolution of S, I, D and R:
dS IS
= βS − γ − δS,
dt P
dI IS
= γ − ρI − φI + βI,
dt P
(1)
dD
= φI,
dt
dR
= βR + ρI − δR.
dt
The second term on the right hand side of the first equation of (1) repre-
sents the number of individuals who are infected per unit time. It is based
on the assumption that each infected person has contacts with a given num-
ber of persons in a certain time interval and that the fraction of them who
are susceptible is given by S/P , whereas (I + R)/P is the fraction of those
persons who cannot be infected, if it is assumed that recovered people are
immunized. The coefficient γ is the infection coefficient, i.e., the rate of
potential infection.
The coefficients ρ and φ represent the recovery and fatality rate, re-
spectively. The coefficients β, γ, δ, ρ and φ are assumed to be constant
and their dimension is [time−1 ]. The assumptions behind this model are
discussed thoroughly in section 4.
The initial conditions of the model are given by S(0) = P (0), I(0) = 1,
R(0) = 0 and D(0) = 0. This means that t = 0 corresponds to the time at
which the first individual is infected.
Notice that from equations (1), it follows that
dP
= βP − δP − φI. (2)
dt
I(t′ )
S(t′ + 1) = 1+ β−γ − δ ∆t S(t′ ),
P (t′ )
S(t′ )
′
I(t + 1) = 1+ γ − ρ − φ + β ∆t I(t′ ),
P (t′ ) (3)
Nobs denotes the number of time steps for which data are available. Notice
that in the discrete case, t ∈ Z is used as the time-index which denotes days,
starting from the reference date, taken as t = 0, which corresponds to the
first day for which epidemic data are available.
The model parameters are included in an array p:
where t0 represents the day, at which the first individual of the population
is infected and the total population is P0 .
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 5
The discrete model given by equations (3) and (4), can be written as the
following system of equations
f (p, s) = 0. (8)
If the numbers of model parameters and state parameters are N(p) and
N(s) , respectively, then p ∈ P ⊆ RN(p) and s ∈ S ⊆ RN(s) , where the
subset P takes into account some conditions on the parameters, e.g., that
the parameters ρ, φ, γ and P0 must satisfy the following constraints:
0 ≤ ρ ≤ 1 − φ ≤ 1, 0 ≤ φ ≤ 1 − ρ ≤ 1, 0 ≤ γ, 0 ≤ P0 . (9)
The unique solution of the forward problem, i.e., of (8), can be expressed
as
s = g(p). (10)
The array p can be subdivided in the two sub-arrays
which includes the model parameters, whose values are fixed before the
simulation, and
p(cal) = {ρ, φ, γ, t0 , P0 } , (12)
which includes the model parameters, whose values are obtained from the
solution of an inverse problem. Therefore
t t t
p = p(fix) , p(cal) . (13)
The misfit between model predictions and the target values is computed
by means of the following objective function:
tX
)1/2
2 −1
(
X(t − t0 ) − Xobs (t) 2
1
OX p(cal) = , (18)
t2 − t1 t=t max(ξ, Xobs (t))
1
with X ∈ {I, R, D}, Xobs being the corresponding element of {Iobs , Robs , Dobs },
and ξ ≥ 1 is a threshold. In other words, O is the sum of three functions,
each of which considers one of the three observed quantities, separately.
The model calibration is performed by solution of the following inverse
problem: given p(fix) and d, given the solution s = g (p) to (8), given the
⋆
functions y (d, g (p) , p) and t, find p(cal) ∈ P (cal) , such that
⋆
p(cal) = arg min O p(cal) ,
p(cal) ∈P (cal)
(19)
⋆
i.e., O p(cal) ≤ O p(cal) , ∀p(cal) ∈ P (cal) ,
t t t
where P (cal) = p(cal) : p(fix) ,p (cal) ∈P .
The threshold ξ ∈ R plays a double role. First of all, it keeps positive the
denominator of the fraction appearing in (18). Furthermore, it controls some
characteristics of the objective function. For ξ = 1, OX is nothing but the
root-mean-squared relative difference between observed and modeled values
of X. If a large value of ξ is used, then relative errors corresponding to
large values of Xobs will be dominant; from the practical point of view, this
means that early time behavior has a minor relevance for the model fitting.
In particular, if ξ > max{Xobs (t), t1 ≤ t < t2 }, then OX reduces to the
standard root-mean-squared error.
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 7
8
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
Number of cases Number of cases Number of cases
100
101
102
103
104
100
101
102
103
104
105
100
101
102
103
104
105
106
1/22/20 1/22/20 1/22/20
1/23/20 1/23/20 1/23/20
Korea, South
Japan
Iran
Ireland
Germany
France
Spain
Italy
United Kingdom
US
China
Korea, South
Japan
Iran
Ireland
Germany
France
Spain
Italy
United Kingdom
US
China
1/24/20 1/24/20 1/24/20
1/25/20 1/25/20 1/25/20
1/26/20 1/26/20 1/26/20
1/27/20 1/27/20 1/27/20
1/28/20 1/28/20 1/28/20
1/29/20 1/29/20 1/29/20
1/30/20 1/30/20 1/30/20
1/31/20 1/31/20 1/31/20
2/1/20 2/1/20 2/1/20
2/2/20 2/2/20 2/2/20
2/3/20 2/3/20 2/3/20
2/4/20 2/4/20 2/4/20
2/5/20 2/5/20 2/5/20
2/6/20 2/6/20 2/6/20
2/7/20 2/7/20 2/7/20
2/8/20 2/8/20 2/8/20
2/9/20 2/9/20 2/9/20
2/10/20 2/10/20 2/10/20
2/11/20 2/11/20 2/11/20
2/12/20 2/12/20 2/12/20
2/13/20 2/13/20 2/13/20
2/14/20 2/14/20 2/14/20
2/15/20 2/15/20 2/15/20
2/16/20 2/16/20 2/16/20
2/17/20 2/17/20 2/17/20
2/18/20 2/18/20 2/18/20
2/19/20 2/19/20 2/19/20
2/20/20 2/20/20 2/20/20
2/21/20 2/21/20 2/21/20
2/22/20 2/22/20 2/22/20
2/23/20 2/23/20 2/23/20
2/24/20 2/24/20 2/24/20
2/25/20 2/25/20 2/25/20
2/26/20 2/26/20 2/26/20
2/27/20 2/27/20 2/27/20
2/28/20 2/28/20 2/28/20
2/29/20 2/29/20 2/29/20
3/1/20 3/1/20 3/1/20
recovered
confirmed
3/2/20 3/2/20 3/2/20
deaths
3/3/20 3/3/20 3/3/20
3/4/20 3/4/20 3/4/20
time
time
time
3/5/20 3/5/20 3/5/20
3/6/20 3/6/20 3/6/20
3/7/20 3/7/20 3/7/20
3/8/20 3/8/20 3/8/20
3/9/20 3/9/20 3/9/20
3/10/20 3/10/20 3/10/20
3/11/20 3/11/20 3/11/20
3/12/20 3/12/20 3/12/20
3/13/20 3/13/20 3/13/20
3/14/20 3/14/20 3/14/20
3/15/20 3/15/20 3/15/20
3/16/20 3/16/20 3/16/20
3/17/20 3/17/20 3/17/20
3/18/20 3/18/20 3/18/20
3/19/20 3/19/20 3/19/20
3/20/20 3/20/20 3/20/20
3/21/20 3/21/20 3/21/20
3/22/20 3/22/20 3/22/20
3/23/20 3/23/20 3/23/20
3/24/20 3/24/20 3/24/20
3/25/20 3/25/20 3/25/20
3/26/20 3/26/20 3/26/20
3/27/20 3/27/20 3/27/20
3/28/20 3/28/20 3/28/20
3/29/20 3/29/20 3/29/20
3/30/20 3/30/20 3/30/20
3/31/20 3/31/20 3/31/20
4/1/20 4/1/20 4/1/20
4/2/20 4/2/20 4/2/20
4/3/20 4/3/20 4/3/20
4/4/20 4/4/20 4/4/20
4/5/20 4/5/20 4/5/20
4/6/20 4/6/20 4/6/20
4/7/20 4/7/20 4/7/20
4/8/20 4/8/20 4/8/20
4/9/20 4/9/20 4/9/20
4/10/20 4/10/20 4/10/20
Korea, South
Japan
Iran
Ireland
Germany
France
Spain
Italy
United Kingdom
US
China
4/11/20 4/11/20 4/11/20
4/12/20 4/12/20 4/12/20
4/13/20 4/13/20 4/13/20
4/14/20 4/14/20 4/14/20
4/15/20 4/15/20 4/15/20
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 9
volume. The values of population, birth and death rate of each country, for
which the model has been tested, are included in d. They are used to fix
the values of β and δ, which are expressed on a daily basis, and to provide
a first estimate of P0 .
3 Results
3.1 Model results
First of all the behavior of the model is shown with test case 1, which in-
cludes three model runs for which all the model parameters are kept fixed,
but ρ: the list of parameter values is given in Table 1; the results of the
model for a one-year-long simulation period are shown in Figure 2. The
general behavior shows an exponential increase in the number of infected
persons (notice that the vertical axis is in logarithmic scale) followed by an
exponential decrease but with a longer characteristic time. The number of
deaths obviously decreases if ρ increases and in particular, we have three
different situations for the three runs: (a) for the smallest value of ρ, the
curve of susceptible persons dramatically decreases from some days before
the peak of infections and reaches very small values after few weeks; (b) for
the intermediate value of ρ, the chosen values of model parameters yield a
stationary conditions after about 8 months from the start of the epidemic
for the number of susceptible and dead people, which reach almost the same
value; (c) for the highest value of ρ, the number of susceptible people de-
creases with time, but remains consistent. Notice that, for this test case,
the reduction of the total population is limited, less than 10%, and after
one year almost all the living population is recovered. It is important to
stress that this test case has the goal of showing how the model can predict
different behavior and these results should not be considered as a forecast
of the actual behavior of any real pandemic.
SIR models are sometimes applied using the ratio of the number of indi-
viduals in each category with respect to the total population as a variable.
Even if test case 1 showed that for three sets of model parameters, which
differ only for the value of ρ, the total population shows only a limited vari-
ation, nevertheless, the term used to compute the infection rate introduces
a non-linearity in the model. Therefore test case 2 is designed to assess
the effect of P0 on model results. P0 values span four orders of magnitude,
10 Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
1010
108
102
100
0 50 100 150 200 250 300 350
Time (days)
1010
108
106
Number of cases
104
102
Susceptibles (18400416)
Infected (33395)
Dead (19227934)
Recovered (960371072)
Population (978799487)
100
0 50 100 150 200 250 300 350
Time (days)
1010
108
106
Number of cases
104
102
Susceptibles (200472208)
Infected (48206)
Dead (7902054)
Recovered (789595520)
Population (990110463)
100
0 50 100 150 200 250 300 350
Time (days)
Figure 2: Model results for test case 1. Numbers in the legends refer to the
values at the end of the simulation period.
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 11
from 106 to 109 , whereas the other parameters are fixed at the values of run
(a) of test case 1. The results are shown in Figure 3 as functions of the
normalized quantities versus time. The values of each function at the end
of the simulation period are very similar. The main differences are in the
evolving phase, for which the response of a small population appears to be
more rapid than that of a large population. Roughly speaking, the curves
corresponding to high populations show a delay with respect to the curve
for the smallest population of about 15 days per an increase in P0 by an
order of magnitude. This remark, if confirmed by runs with more reliable
parameter sets, could have fundamental consequences in the design of early
warning systems.
(a) P0 = 106
100
Susceptibles
Infected
Dead
Reco ered
10−1
10−2
10−4
10−5
10−6
0 50 100 150 200 250 300 350
Time (days)
(b) P0 = 107
100
Susceptibles
Infected
Dead
Reco ered
10−1
10−2
Fraction of cases o er population
10−3
10−4
10−5
10−6
0 50 100 150 200 250 300 350
Time (days)
(c) P0 = 108
100
Susceptibles
Infected
Dead
Reco ered
10−1
10−2
Fraction of cases o er population
10−3
10−4
10−5
10−6
0 50 100 150 200 250 300 350
Time (days)
(d) P0 = 109
100
Susceptibles
Infected
Dead
Reco ered
10−1
10−2
Fraction of cases o er population
10−3
10−4
10−5
10−6
0 50 100 150 200 250 300 350
Time (days)
108
106
Number of cases
104
Figure 4: Comparison of observations for Italy and modelled data with the
parameters obtained from subjective “trial-and-error” calibration.
infection started before the official appearance of the first confirmed case; P0
is close to the lower bound, so that the model predicts that the population
which has been involved in the infection could be relatively small.
14
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
Number of cases Number of cases
100
101
102
103
104
105
106
100
101
102
103
104
105
106
1/22/20
1/23/20 1/22/20
1/23/20
1/24/20
1/25/20 1/24/20
1/25/20
1/26/20
1/27/20 1/26/20
1/27/20
1/28/20
1/29/20 1/28/20
1/29/20
1/30/20
1/31/20 1/30/20
1/31/20
2/1/20
2/2/20 2/1/20
2/2/20
2/3/20
2/4/20 2/3/20
2/4/20
2/5/20
2/6/20 2/5/20
2/6/20
2/7/20
2/8/20 2/7/20
2/8/20
2/9/20
2/10/20 2/9/20
2/10/20
2/11/20
2/12/20 2/11/20
2/12/20
2/13/20
2/14/20 2/13/20
2/14/20
2/15/20
2/16/20 2/15/20
2/16/20
2/17/20
2/18/20 2/17/20
2/18/20
2/19/20
2/20/20 2/19/20
2/20/20
2/21/20
2/22/20 2/21/20
2/22/20
2/23/20
2/24/20 2/23/20
2/24/20
2/25/20
2/26/20 2/25/20
2/26/20
2/27/20
2/28/20 2/27/20
2/28/20
2/29/20
3/1/20 2/29/20
3/1/20
Time (days)
Time (days)
3/2/20
3/3/20 3/2/20
3/3/20
3/4/20
3/5/20 3/4/20
3/5/20
3/6/20
3/7/20 3/6/20
3/7/20
3/8/20
3/9/20 3/8/20
3/9/20
3/10/20
3/11/20 3/10/20
3/11/20
3/12/20
3/13/20 3/12/20
3/13/20
3/14/20
3/15/20 3/14/20
3/15/20
3/16/20
3/17/20 3/16/20
3/17/20
3/18/20
3/19/20 3/18/20
3/19/20
3/20/20
3/21/20 3/20/20
3/21/20
3/22/20
3/23/20 3/22/20
3/23/20
3/24/20
3/25/20 3/24/20
3/25/20
3/26/20
3/27/20 3/26/20
3/27/20
3/28/20
3/29/20 3/28/20
3/29/20
3/30/20 3/30/20
Population (244966)
Recovered (39928)
Dead (25580)
Infected (154512)
Susceptibles (27238)
Observed recovered (38092)
Observed dead (21645)
Observed infected (165155)
Population (202162)
Recovered (40271)
Dead (25434)
Infected (143076)
Susceptibles (9275)
Observed recovered (38092)
Observed dead (21645)
Observed infected (165155)
3/31/20
4/1/20 3/31/20
4/1/20
4/2/20
4/3/20 4/2/20
4/3/20
4/4/20
4/5/20 4/4/20
4/5/20
4/6/20
4/7/20 4/6/20
4/7/20
4/8/20
4/9/20 4/8/20
4/9/20
4/10/20
4/11/20 4/10/20
4/11/20
4/12/20
4/13/20 4/12/20
4/13/20
4/14/20
4/15/20 4/14/20
4/15/20
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 15
4 Discussion
4.1 Remarks about the model
Some basic assumptions, on which the model developed in this work is
funded, deserve to be recalled and discussed.
The developed model basically assumes “homogeneity” of the popula-
tion. In other words, no distinction is done in terms of sex, age, economic
wealth, health and wellness, working conditions, life style, home state, and
any other, including genetic background. Also, the model assumes that
the population under study is a closed system, thus disregarding variations
induced by short-time, touristic or business travels, by intermediate-time
mobility of students and workers and by long-time effects of migrant fluxes.
The model is also independent of the climatic and environmental condi-
tions, i.e., the processes considered by the model are assumed to be indepen-
dent of the variability of weather conditions and environmental quality at
any temporal and space scale. In particular, this means that neither sharp
and rapid variations nor annual or seasonal cycling should affect these pro-
cesses.
Epidemic models rarely consider birth and death rates, because the cor-
responding terms in the equations are usually negligible. In this work, these
terms have been kept, in order to facilitate this discussion. In particular,
following the assumption of population homogeneity, it is assumed that in-
fected pregnant women give birth to infected babies and that this occur at
the same rate as for susceptible women.
With regard to infection rate, which is described by the term γIS/P in
(1), some remarks are in order. This term is computed by assuming that
each infected individual has a given, constant number of contacts with other
persons per unit time. The fraction of contacted persons which cannot be
infected is given by (I + R)/P , which assumes that recovered people become
immune to the virus, an aspect which is not confirmed by the scientific
community (see, e.g., [15]). Moreover, recovered people are assumed to be
not infectious, which is the case if the response of their immune system
is so fast that, once they come in contact with the virus again, the virus
is destroyed by the immune system before it can be spread to susceptible
persons. On the other hand, the fraction of contacted individuals which can
be infected is given by S/P . The γ coefficient, due to the “homogeneity”
assumption, is considered to be independent of the factors which have been
recalled at the beginning of this subsection; in particular, working and living
conditions could control the distance and the duration of contacts of infected
- and therefore infectious - individuals with other persons.
The so-called recovery and fatality coefficients ρ and φ are assumed to
be constant. This is not based on the “homogeneity” assumption only. In
fact, this implies that recovery and fatality are modeled as instantaneous
16 Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
Notice that the fatality coefficient, φ, accounts for the deaths related to
the pandemic, i.e., it represents the increase in the death rate due to the
pandemic. The normal death rate is considered through δ.
5 Conclusions
The modeling exercis conducted within this work supports a series of re-
marks, which are summarized in this conclusive section, together with some
future perspectives.
Starting from some remarks about modeling aspects, the limitations of
classical SIR models have been recalled. These should be always recalled
and carefully considered especially for applications and when these models
are used as engines of decision support systems.
The main limitation is related to the “homogeneity” assumption, accom-
panied by the steadiness of the recovery and fatality coefficients. The latter
aspect could be taken care of as discussed in subsection 4.1 and might yield
18 Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic
Acknowledgments
This work presents results of a purely curiosity-driven research, which has
received support only through the standard working facilities of the authors’
institutions.
References
[1] David Baud, Xiaolong Qi, Karin Nielsen-Saines, Didier Musso,
Léo Pomar, and Guillaume Favre, Real estimates of mortality fol-
lowing covid-19 infection, The Lancet Infectious Diseases (2020),
DOI:10.1016/S1473-3099(20)30195-X.
Giudici et al., Inversion of a SIR-based model: application to COVID-19 epidemic 19
[3] Ensheng Dong, Hongru Du, and Lauren Gardner, An interactive web-
based dashboard to track covid-19 in real time, The Lancet Infectious
Diseases (2020), DOI:10.1016/S1473-3099(20)30120-1.
[15] Yufang Shi, Ying Wang, Changshun Shao, Jianan Huang, Jianhe Gan,
Xiaoping Huang, Enrico Bucci, Mauro Piacentini, Giuseppe Ippolito,
and Gerry Melino, Covid-19 infection: the perspectives on immune re-
sponses, Cell Death & Differentiation (2020), DOI:10.1038/s41418-020-
0530-3.
Authors’ affiliations
Mauro Giudici
Università degli Studi di Milano, Dipartimento di Scienze della
Terra “A.Desio”, Milano, Italy
[email protected]
Alessandro Comunian
Università degli Studi di Milano, Dipartimento di Scienze della
Terra “A.Desio”, Milano, Italy
[email protected]
Romina Gaburro
University of Limerick, Department of Mathematics and Statistics,
Health Research Institute (HRI), Limerick, Ireland
[email protected]