Optimal Inspection and Maintenance Planning For Deteriorating Structural Components Through Dynamic Bayesian Networks and Markov Decision Processes
Optimal Inspection and Maintenance Planning For Deteriorating Structural Components Through Dynamic Bayesian Networks and Markov Decision Processes
Structural Safety
journal homepage: www.elsevier.com/locate/strusafe
A R T I C L E I N F O A B S T R A C T
Keywords: Civil and maritime engineering systems, among others, from bridges to offshore platforms and wind turbines,
Infrastructure management must be efficiently managed, as they are exposed to deterioration mechanisms throughout their operational life,
Inspection and maintenance such as fatigue and/or corrosion. Identifying optimal inspection and maintenance policies demands the solution
Partially Observable Markov Decision
of a complex sequential decision-making problem under uncertainty, with the main objective of efficiently
Processes
Deteriorating structures
controlling the risk associated with structural failures. Addressing this complexity, risk-based inspection planning
Dynamic Bayesian networks methodologies, supported often by dynamic Bayesian networks, evaluate a set of pre-defined heuristic decision
Decision analysis rules to reasonably simplify the decision problem. However, the resulting policies may be compromised by the
limited space considered in the definition of the decision rules. Avoiding this limitation, Partially Observable
Markov Decision Processes (POMDPs) provide a principled mathematical methodology for stochastic optimal
control under uncertain action outcomes and observations, in which the optimal actions are prescribed as a
function of the entire, dynamically updated, state probability distribution. In this paper, we combine dynamic
Bayesian networks with POMDPs in a joint framework for optimal inspection and maintenance planning, and we
provide the relevant formulation for developing both infinite and finite horizon POMDPs in a structural reli
ability context. The proposed methodology is implemented and tested for the case of a structural component
subject to fatigue deterioration, demonstrating the capability of state-of-the-art point-based POMDP solvers of
solving the underlying planning stochastic optimization problem. Within the numerical experiments, POMDP
and heuristic-based policies are thoroughly compared, and results showcase that POMDPs achieve substantially
lower costs as compared to their counterparts, even for traditional problem settings.
* Corresponding author.
E-mail address: [email protected] (P.G. Morato).
https://doi.org/10.1016/j.strusafe.2021.102140
Received 9 September 2020; Received in revised form 20 May 2021; Accepted 18 August 2021
Available online 30 October 2021
0167-4730/© 2021 Elsevier Ltd. All rights reserved.
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 1. (Top) Inspection and Maintenance (I&M) planning decision tree. Maintenance actions and observation decisions are represented by blue boxes and chance
nodes are depicted by white circles. At every time step, the cost Ct depends on the action a, observation decision e, and state s of the component. (Bottom) An I&M
POMDP sequence is represented where at each step t, the cost Ct depends on the action a, observation decision e, and state s of the component. In both repre
sentations, an observation outcome o is collected according to the current state, taken action and observation decision.
lifetime of the structure [16,17]. especially, in the field of computer science and robot navigation [30,31].
In the context of Inspection and Maintenance (I&M) planning, the POMDPs have also been proposed for I&M of engineering systems
decision maker faces a complex sequential decision-making problem [32–36]. In the reported POMDP methodologies, either the condition of
under uncertainty. This sequential decision-making problem is illus the structural component has been modeled with less than five discrete
trated in Fig. 1, showcasing the involved random events and decision states or the rewards have not been defined in a structural reliability
points, and can be formulated either from the perspective of the classical context. This different POMDP approach to the I&M problem, as
applied statistical decision theory [18], or through artificial intelligence compared with typical RBI applications, has raised some misconceptions
[19] conceptions, or a combination thereof. In all cases, the main in the literature about their use, which we formally rectify herein.
objective of a decision maker, or an intelligent agent, is to identify the In this work, POMDPs are successfully combined with dynamic
optimal policy that minimizes the total expected costs. Bayesian networks in a joint framework, for optimal inspection and
With the aim of addressing this complex decision-making problem, maintenance planning, in order to take advantage of both the modeling
Risk-Based Inspection (RBI) planning methodologies have been tradi flexibility of DBNs and the advanced optimization capabilities of
tionally proposed [20] and have often also been applied to the I&M POMDPs. In particular, this paper originally derives the POMDP dy
planning of offshore structures [21,22]. By imposing a set of heuristic namics from DBNs, enabling optimal control of physically-based sto
decision rules, RBI methodologies are able to simplify and solve the chastic deterioration processes, modeled either through a conditional set
decision-making problem within a reasonable computational time, of time-invariant parameters or as a function of the deterioration rate.
while structural reliability methods are often employed within this We further provide all relevant formulations for deriving both infinite
framework, to quantify and update the reliability and risk metrics. and finite horizon POMDPs within a structural reliability context. The
More recently, RBI methodologies have also been integrated with proposed framework is analyzed, implemented, and tested for the case
Dynamic Bayesian Networks (DBNs) [23–27]. DBNs provide an intuitive of a structural component subject to a fatigue deterioration process, and
and robust inference approach to Bayesian updating; however, they do the capability of state-of-the-art point-based POMDP value iteration
not tackle the decision optimization problem by themselves. In the methods to efficiently solve challenging I&M optimization problems is
proposed methodologies, heuristic decision rules, usually based on en verified. POMDP and typical heuristic risk-based and/or periodic pol
gineering principles and understanding of the problem, are still utilized icies are thoroughly analyzed and compared, in a variety of problem
to simplify the decision problem. Albeit their practical advantages, the settings, and results demonstrate that POMDP solutions achieve sub
main shortcoming of heuristic-based policies is the limited policy space stantially lower costs in all cases, as compared to their counterparts.
exploration due to the prior, ad hoc prescription of decision rules. In this
paper, we thus present how DBNs describing deterioration processes can 2. Background: Risk-based inspection planning
be instead combined with Markov decision processes and dynamic
programming [28], and be used to define transition and emission A typical Inspection and Maintenance (I&M) sequential decision
probabilities in such settings. problem under uncertainty is illustrated in Fig. 1. The optimal strategy
Partially Observable Markov Decision Processes (POMDPs) provide a can be theoretically identified by means of a pre-posterior decision
principled mathematical methodology for planning in stochastic envi analysis [18]. Assuming the costs at different times to be additive in
ronments under partial observability. In the past, POMDPs were only dependent, the pre-posterior analysis prescribes the observation de
applicable for small state space problems due to the difficulty of finding cisions e ∈ E and actions a ∈ A that minimize the total expected cost
appropriate solutions in a reasonable computation time. However, CT (a, e) = Ct0 (e, a, s)t0 + … + CtN (e, a, s)tN γtN , i.e. the sum over the life
starting with the development of point-based solvers [29], which time tN of the discounted costs received at each time step t, with γ being
managed to efficiently alleviate the inherent complexities of the solution the discount factor. Note that societal and environmental consequences,
process, POMDPs have been increasingly used for planning problems,
2
P.G. Morato et al. Structural Safety 94 (2022) 102140
specified in monetary units, can also be included within the definition of conducted inspections In , with individual inspection cost Ci , and dis
the total expected cost. counted by the factor γ ∈ [0, 1]:
If the probabilities associated with the random events, as well as the tIn
∑
costs, are assigned to each branch of the decision tree, then the branch E[CI (h, tN )] = Ci γ t I (2)
corresponding to the optimal cost C*T (a, e) can be identified. This anal tI =tI1
The expected risk of failure E[CF (h, tN )] is computed as the sum of dis
2.1. RBI assumptions and heuristic rules counted annual failure risks, in which ΔPF is the annual failure proba
bility and Cf is the cost of failure:
Risk-Based Inspection (RBI) planning methodologies [37] introduce
simplifications to the I&M decision-making problem in order to be able ∑
tN
E[CF (h, tN )] = Cf ΔPF (h, t)γ t (4)
to identify strategies in a reasonable computational time. To simplify the t=1
problem, the expected cost is computed only for a limited set of pre-
defined decision rules ha,e . The best strategy among them is then iden
2.2. Probabilistic deterioration model and reliability updating
tified as the decision rule with the minimum cost.
Within an I&M planning context, the total expected cost E[CT (h, tN )]
Structural reliability methods and general sampling based methods
is the combination of expected costs from inspections E[CI (h,tN )], repairs
[39] can be used to compute the probabilities associated with the
E[CR (h, tN )], and failures E[CF (h, tN )], as a function of the imposed deci
random events represented in the I&M decision tree (Fig. 1). In a
sion rules ha,e . This expectation for a structural component designed for
simplified decision tree, the main random events are the damage de
a lifetime of tN years is simply computed as:
tections during inspections and the structural failure.
E[CT (h, tN )] = E[CI (h, tN )] + E[CR (h, tN )] + E[CF (h, tN )] (1) The failure event is defined through a limit state gF (t) = dc − d(t), in
which dc represents the failure criteria, such as the critical crack size,
The simplifications introduced to the I&M decision-making problem by and d(t) is related to the temporal deterioration evolution. Uncertainties
pre-defining a set of decision rules are listed below: involved in the deterioration process are incorporated by defining d(t) as
a function of a group of random variables or random processes. The
i) Observations (inspections) are planned according to a pre- probability of failure PF (t) can be then computed as the probability of
defined heuristic rule. Two heuristic rules are commonly the limit state being negative PF = P{gF (t) ≤ 0}, and the reliability index
employed in the literature [38]: is inversely related to the failure probability, usually defined in the
• Equidistant inspections: inspections are planned at constant standard normal space as β(t) = − Φ− 1 {PF (t)}, in which Φ is the stan
intervals of time Δt. dard normal cumulative distribution function. The probability of the
• Failure probability threshold: inspections are planned just failure event can also be defined over a reference period, e.g. the annual
before a pre-defined annual failure probability ΔPF threshold is failure probability can be computed as ΔPF (t) = {PF (t) − PF (t − 1)}.
reached. The measurement uncertainty of the available observations (in
ii) If the outcome of an inspection indicates damage detection spections) is often quantified by means of Probability of Detection
(d > ddet ), a repair action is immediately performed. In that (PoD) curves. A PoD indicates the probability of detection as a function
case, the repair probability is equal to the probability of of the damage size d and depends on the employed inspection method,
detection PR = P(d > ddet ). Alternatively, other heuristic rules i.e. the function of the detectable damage size can be modeled by an
can also be imposed (adding computational complexity), such exponential distribution F(dd ) = F0 [1 − exp(− d/λ)], where F0 and λ are
as that a repair is performed if an inspection indicates detection parameters determined by experiments. The event of no detection at
(d > ddet ) and a pre-defined failure probability threshold PF is time tI is then modeled by the limit state function gInd (tI ) = d(tI ) − dd (tI ).
simultaneously exceeded. Similarly, the event of detection at time tI is modeled by the limit state
iii) After a component is repaired, it is assumed that it behaves like a gId (tI ) = dd (tI ) − d(tI ). Both detection and no detection events are
component with no damage detection, i.e. the remaining life can evaluated as inequalities, for instance, the probability of no detection
be computed as if the inspection at the time of repair indicates is assessed as the probability of the limit state being negative PInd =
no damage detection. With these assumptions, the decision tree P{gInd (tI ) ≤ 0}. Alternatively, a discrete damage measurement dm can
represented in Fig. 1 can be simplified to a single branch. be collected and the limit state is modeled in this case as gm (tI ) =
Alternatively, if a repair is performed at time t and it is assumed d(tI ) − (dm − ∊m ), where ∊m is a random variable that represents the
to be perfect, the component returns to its initial damage state at measurement uncertainty, and the equality event Pm = P{gm (t) = 0}
the beginning of a new decision tree with a lifetime equal to can be estimated equal to some limit, as explained in [39–41].
t N = tN − t. The additional information gained by observations can be used to
update the structural reliability or failure probability PF by computing a
Summarizing, one can simplify the problem to one decision tree failure event conditional on inspection events [42], as:
branch by assuming that: (i) inspections are to be planned according to a [ ]
heuristic rule, (ii) a repair is to be performed if an inspection indicates P gF (t) ≤ 0 ∩ gI1 (t) ≤ 0 ∩ … ∩ gIN (t) ≤ 0
PF|I1 ,…,IN (t) = [ ] (5)
detection, and (iii) after a repair is performed, the inspection at that time P gI1 (t) ≤ 0 ∩ … ∩ gIN (t) ≤ 0
is considered as a no detection event. In this case, the individual con
tributions to the total expected cost in Eq. (1) can be computed The conditional failure probability introduced in Eq. (5) can be
analytically. computed by structural reliability methods (FORM, SORM) or by Monte
The expected inspection cost E[CI (h, tN )] is computed as the sum of all Carlo sampling methods [39].
3
P.G. Morato et al. Structural Safety 94 (2022) 102140
ward operation is then applied for the subsequent time steps, comprised
i) Discrete state space: Exact inference algorithms are limited to
of the following steps:
discrete random variables [45]. A discretization operation must thus
be performed to convert the original continuous random variables to
1. Transition step: the belief propagates in time according to a pre-
the discrete space. The unknown error introduced by the dis
defined conditional probability distribution or transition matrix
cretization operation converges to zero in the limit of an infinitesimal
P(dt+1 , θt+1 |dt , θt ), as:
interval size. However, the computational complexity of the infer
ence task grows linearly with the number of states and exponentially P(dt+1∑
, θ∑
t+1 |o0 , …, ot )
with the number of random variables. = P(dt+1 , θt+1 |dt , θt ) P(dt , θt |o0 , …, ot ) (6)
ii) Markovian assumption: The state space S is the domain of all random dt θt
The transition probability matrix P(st+1 |st ) can also be assumed as sta The quality of the observation is quantified by the likelihood
tionary for some applications, thus facilitating the formulation of the P(ot+1 |dt+1 ). This likelihood can be directly obtained from probabil
problem. This can however be easily relaxed without entailing addi ity of detection curves or by discretizing a direct measurement. Since
tional computational efforts [46]. the random variables are discrete, a normalization of P(dt+1 , θt+1 |o0 ,
…, ot+1 ) can be easily implemented.
3.1. Parametric DBN
The failure probability assigned to the node Ft corresponds to the
A stochastic deterioration process can be represented by the DBN probability of being in a failure state. As the failure states are defined
shown in Fig. 2. The deterioration is represented through the damage based on the damage condition dt , the time invariant parameters θt can
node dt which is influenced by a set of time-invariant random variables be marginalized out to compute the failure probability. Disregarding the
θt . The model is denoted as parametric DBN as the damage dt is influ discretization error, the resulting structural reliability is equivalent to
enced by the parameters θt . Imperfect observations are added into the the one computed in Eq. (5).
DBNs by means of the node ot . This DBN can be extended by incorpo In terms of computational complexity, note that the belief is
rating time-variant random variables as proposed by [23]; yet, we composed of (|θ1 |⋅…⋅|θk | |d|) states, defined by the damage d along with k
consider only time-invariant random variables here as they are widely time-invariant random variables. Thus, the transition matrix includes
used in the literature and to avoid unnecessary presentation complica (|θ1 |⋅…⋅|θk | |d|)2 elements. Since P(θt+1 |θt ) is defined by an identity
tions. Finally, the binary node Ft provides an indication of the failure matrix, the transition is prescribed by a very sparse, block-diagonal
and survivability. matrix with a maximum density of ρP = 1/(|θ1 |⋅…⋅|θk |).
Within the context of structural reliability and related problems,
DBNs are often employed to propagate and update the uncertainty
related to a deterioration process, incorporating evidence from in 3.2. Deterioration rate DBN
spections or monitoring. Filtering becomes the preferred inference task
for inspection and maintenance planning problems, as a decision is We present herein an alternative DBN in which a stochastic deteri
taken at time t supported by evidence gathered from the initial time step oration process is represented as a function of the deterioration rate.
t0 up to time t. The belief state, defined as the probability distribution This model is adopted from [47] and denoted here as deterioration rate
over states, can be propagated and updated by applying the forward DBN. Fig. 3 graphically illustrates the model. In this case, the stochastic
operation from the forward–backward algorithm [45]. The transition deterioration process is described in time t by the nodes dt , conditional
algorithmic step of the forward operation is assumed to be Markovian, on the deterioration rate τt . If the stochastic process is stationary, the
being therefore equivalent to the underlying transition model of a deterioration evolution will vary equally over time, and thus the dete
POMDP. More details on the formulation of POMDP transition models rioration rate τt is not utilized. The deterioration does not, however,
are introduced in Section 4.1. progress equally over time in a non-stationary process, and in that case,
At time step t0 , the initial belief corresponds to the joint probability the parameter τt needs to be incorporated to effectively model the
of the initial damage and time-invariant parameters P(dt0 , θt0 ). The for varying deterioration effects over time. After collecting experimental or
4
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 3. Deterioration rate dynamic Bayesian network, derived from [47]. The The total expected cost E[CT (h)] is then computed with a Monte Carlo
evolution of a stochastic deterioration process is represented by the nodes dt simulation of nep episodes (policy realizations):
dependent on the deterioration rate τt . Imperfect observations are included ∑nep [ ]
through the nodes ot , and Ft binary node indicates the probability of failure and ep=1 CTep (h)
survival events.
E[CT (h)] = (11)
nep
physically-based simulated data (e.g. Monte Carlo simulations) from a One can compute the costs of all pre-defined heuristic rules and identify
non-stationary deterioration process, the transition probabilities can be the strategy with the minimum expected cost as the optimal policy.
calculated, for each deterioration rate τt , by counting the number of However, the resulting optimal policies might be compromised due to
transitions from dt to dt+1 over the total data available in dt . Additional the limited space covered by the imposed heuristic rules, out of all
methods to compute the transition model are described in [47]. As possible decision rules.
illustrated in Fig. 3, imperfect observations are added through the nodes
ot and the structural reliability is indicated through the node Ft . 4. Optimal I&M planning through POMDPs
To ensure compliance with the DBNs time invariant property, the
belief incorporates both the damage condition and deterioration rate We propose herein a methodology for optimal I&M planning of
through the joint probability P(dt , τt ). Yet, the node τt is a zero-one deteriorating structures under uncertainty based on Partially Observable
vector (one-hot) that transitions each time step from one deterioration Markov Decision Processes (POMDPs). The methodology is adopted by
rate τi to the next τi+1 . The deterioration evolution is computed by a similar frameworks, as studied in [49]. While the damage evolution was
forward operation in a similar manner as for the parametric DBN. modeled in [49] as function of its deterioration rate, following the
Initially, the belief corresponds to the joint probability P(d0 , τ0 ). Sub formulation presented in Section 3.2, we extend here the methodology
sequently, the belief experiences a transition according to the transition to deterioration mechanisms modeled as functions of time-invariant
matrix P(dt+1 , τt+1 |dt , τt ): parameters, formulated according to Section 3.1. In addition, the user
penalty is defined in this work as a consequence of the annual failure
, τt+1
P(dt+1∑ ∑|o0 , …, ot )
(8) probability experienced by the component.
= P(dt+1 , τt+1 |dt , τt ) P(dt , τt |o0 , …, ot )
dt τt
A Markov decision process (MDP) is a 5-tuple 〈S, A, T, R, γ〉 controlled
stochastic process in which an intelligent agent acts in a stochastic
Based on the gathered observations, the beliefs are then updated by environment. The agent observes the component at state s ∈ S and takes
applying Bayes’ rule. The likelihood P(ot+1 |dt+1 ) can be directly defined an action a ∈ A, then the state randomly transitions to state s ∈ S ac
′
measures:
the agent receives a relevant reward Rt (s, a), where t is the current de
P(dt+1 , τt+1 |o0 , …, ot+1 ) ∝ P(ot+1 |dt+1 )P(dt+1 , τt+1 |o0 , …, ot ) (9) cision step.
As described in Section 1, the optimal decisions result in a mini
The computational complexity is influenced by the belief size. For
mum expected cost. The expected cost, or value function, is expressed
the case of a deterioration rate DBN, the belief P(dt , τt ) is composed of
for a finite horizon MDP as the summation of the decomposed rewards
|τ|⋅|d| states and its sparse transition matrix P(dt+1 , τt+1 |dt , τt ) accounts
V(s0 ) = Rt0 + … + RtN− 1 γtN− 1 , from time step t0 up to the final time step
for (|τ| |d|)2 elements. Since the only non-zero probabilities of the tN− 1 . For an infinite or unbounded horizon MDP, the rewards are
transition matrix P(τt+1 |τt ) are the ones to define the transition from infinitely summed up (tN = ∞). Note that the rewards are discounted
deterioration rate τt to the next deterioration rate τt+1 , the maximum by the factor γ. From an economic perspective, the discount factor
density of P(dt+1 , τt+1 |dt , τt ) is ρDR = 1/|τ|. converts future rewards into their present value. Computationally,
Advantages between a parametric DBN and a deterioration rate one discounting is also necessary to guarantee convergence in infinite
are case dependent. If the deterioration process can be modeled by just horizon problems.
few parameters or it evolves over a long time span, the parametric DBN An MDP policy (π : S→A) prescribes an action as a function of the
is recommended. However, if the deterioration modeling involves many current state. The main goal of an MDP is the identification of the
parameters or complex random processes spanning over a short time optimal policy π* (s) which maximizes the value function V * (s). There
horizon, the deterioration rate DBN should be preferred. If both DBN exist efficient algorithms that compute the optimal policy using the
models are applied for the same problem, the results should be equiv principles of dynamic programming and invoking Bellman’s equation.
alent and differences are only affected by the discretization error. Both value and policy iteration algorithms can be implemented to
identify the optimal policy π* (s) [50]. While the state of the component
3.3. Risk-based inspection planning and DBNs in an MDP is known at each time step, imperfect observations are usually
obtained in real situations, e.g. noise in the sensor of a robot, mea
While DBNs can be successfully used for reliability updating, they do surement uncertainty of an inspection, etc. POMDPs are a generalization
not possess by themselves intrinsic optimization capabilities. To this of MDPs in which the states are perceived by the agent through imper
end, modern RBI methodologies include a combination of DBNs and fect observations. The POMDP becomes a 7-tuple 〈S,A,O,T,Z,R,γ〉. While
heuristic rules to identify the optimal strategy [43,48]. The the dynamics of the environment are the same as for an MDP, an agent
5
P.G. Morato et al. Structural Safety 94 (2022) 102140
observations into groups [49]. For instance, one can combine the action
∑ ′
′ ′ ′
b (s ) ∝ P(o|s , a) P(s |s, a)b(s) (12) “do-nothing” with two inspections, resulting in the two combinations:
s∈S “do-nothing/visual-inspection” or “do-nothing/NDE-inspection” and a
relevant reward will be assigned to each combination.
The normalizing constant P(o|b, a) is the probability of collecting an
observation o ∈ O given the belief state b and action a ∈ A. 4.1.3. Transition probabilities
One can see in Eq. (12) that for a specific action a ∈ A, updating a A transition matrix T(s, a, s ) models the transition probability P(s |s, a)
′ ′
provided and in Section 4.2, we explain how point-based solvers are able or P(dt+1 , τt+1 |dt , τt ), derived in Section 3.
to solve high-dimensional state space POMDPs and find the optimal • Perfect repair (PR) action: a maintenance action is performed and the
strategies. component returns from its current damage belief bt , at time step t, to
the belief b0 , associated with an intact status. In a belief space
4.1. POMDP model implementation environment, a perfect repair transition matrix is defined as:
⎛ ⎞
A systematic scheme for building a POMDP model in the context of b0 (s0 ) b0 (s1 ) ⋯ b0 (sk )
⎜ b0 (s0 ) b0 (s1 ) ⋯ b0 (sk ) ⎟
optimal inspection and maintenance planning is provided in this section. (13)
′
P(s |s, aPR ) = ⎜
⎝ ⋮
⎟
⋮ ⋱ ⋮ ⎠
A POMDP is built by defining all the elements of the tuple 〈S,A,O,T,Z,R,γ〉.
b0 (s0 ) b0 (s1 ) ⋯ b0 (sk )
While most of the reported applications of POMDPs for infrastructure
planning employed a deterioration rate model [49], a parametric model Since the belief state is a probability distribution, the summation
as presented in Section 3.1 is originally implemented here. ∑
over all the states is equal to one ( bt (s) = 1). If one multiplies a
belief state by the transition matrix defined in Eq. 13, the current
4.1.1. States belief returns to the belief b0 , independently of its current condition
For the typical discrete state MDP/POMDP cases, a discretization
6
P.G. Morato et al. Structural Safety 94 (2022) 102140
as: ∑{ }
(16)
′
CF (s, aDN− NO ) = P(s |s, aDN− NO ) Cf − C(s, aDN− NO )
(14)
′
b0 (s) = bt (s) P(s |s, aPR ) ′
s ∈SF
If the states are fully observable, the belief state becomes a zero-one
vector and a perfect repair matrix can be formulated as P(s0 |st ,aPR ) = • Do-nothing/observation (DN/O): the cost is equal in this case to the
1, transferring any state st to the intact state s0 . one related failure risk plus one inspection cost. Both discrete and
• Imperfect repair (IR) action: a maintenance action is performed and continuous indications can be included in this category. One can
the component returns from a damage belief bt to a healthier damage therefore compute the cost CO (s, aDN− O ) just by further considering
state or more benign deterioration rate. The definition of the repair the inspection cost Ci :
transition matrix P(st+1 |st , aIR ) is thus case dependent. Some exam
CO (s, aDN− O ) = CF (s, aDN− NO ) + Ci (17)
ples can be found in [49].
4.1.4. Observation probabilities • Repair/no-observation (R/NO): the cost CR (s, aR− is equal to the
NO )
An observation matrix Z(o, s , a) quantifies the probability P(o|s , a) of
′ ′
repair cost Cr :
perceiving an observation o ∈ O in state s ∈ S after taking action a ∈ A.
′
Note that we denote the observation action as a to be coherent with CR (s, aR− NO ) = Cr (18)
usual POMDP formulation; yet the observation action could be also
The cost CR (s, aR− O ) for a repair/inspection combination can be
named as e to be consistent with the nomenclature used in Section 2.1.
similarly defined by including also the inspection cost Ci along with
The relevant observation actions considered here are:
the repair cost CR (s, aR− NO ).
• No observation (NO): the belief state should remain unchanged after
the transition as no additional information is gathered. The emission 4.2. Point-based POMDP solvers
probability P(o|s , aNO ) can be modeled as a uniform distribution over
′
all observations. Alternatively, it can be modeled as P(o0 |s ,aNO ) = 1. In principle, one could apply a value iteration algorithm [54] to solve
′
The former is recommended as it will speed up the computation [49]. a POMDP. While value updates are computed in a |S|-dimensional
• Discrete indication (DI): the likelihood P(o|s , aDI ) is modeled as a discrete space for an MDP, value updates for POMDPs should be instead
′
discrete event, for instance, a binary indication: detection or no- computed in a (|S| − 1)-dimensional continuous space. The computation
detection. The likelihood is usually quantified for the binary case thus scales up considerably with the number of dimensions, increasing
by a Probability of Detection (PoD) curve. A PoD(s ) is equivalent to the computational complexity. This fact is denoted as the curse of
′
S. Alternatively, Eq. 16 defines the cost CF (s, aDN− NO ) only as a function maximize the value function V * (b):
of the initial state s ∈ S, if the transition matrix P(s |s, a) is implicitly ∑
′
7
P.G. Morato et al. Structural Safety 94 (2022) 102140
8
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 5. Graphical representation of the POMDPs utilized for the numerical experiments. A parametric POMDP and a deterioration rate POMDP are created from the
DBNs displayed in Figs. 2 and 3, respectively. Note that the random variables CFM and SR are combined into the variable K.
application, the deterioration rate model with 930 states is utilized for making problem is solved here by both POMDPs and heuristics. For the
the numerical experiments, due to its reduced state space as compared to case of POMDPs, point-based solvers provide a theoretical guarantee to
the parametric models. optimality, whereas RBI approaches can analytically compute the E[CT ]
from a simplified decision tree, as explained in Section 2. Alternatively,
the computation of the E[CT ] can be performed in a simulation envi
5.2. Case 1. Traditional I&M planning setting ronment, in which the deterioration process is modeled by DBNs and the
costs are evaluated according to the predefined heuristic policies, as
The fatigue deterioration is modeled according to the time-invariant shown in Eq. (11). To equally compare the policies generated by POMDP
crack growth described at the beginning of Section 5. In this traditional and heuristics, the total expected costs E[CT ] are computed both on an
setting, the decision maker is only allowed to control the deterioration analytical basis and in a simulation environment.
by undertaking a perfect repair and is able to collect observations
through one inspection technique type. The perfect repair returns the 5.2.1. Analytical comparison
component to its initial condition d0 and the quality of the inspection Following the results of the discretization analysis, a finite horizon
technique is quantified with a PoD(d) ∼ Exp[μ = 8]. This I&M decision- (FH) POMDP is derived from the deterioration rate model with 930
states (|Sd | = 30 and |Sτ | = 31). Since the horizon spans over 30 years,
Table 2 the state space is augmented from 930 to 14,880 states, as explained in
Description of the discretization schemes considered in the sensitivity analysis, Section 4.2. Actions and observations are combined into three action-
for both parametric and deterioration rate POMDP models. observation groups: (1) do-nothing/no-inspection, (2) do-nothing/
Variable Interval boundaries inspection, and (3) perfect-repair/no-inspection. The fourth combina
tion (repair/inspection) is not included as it will hardly be the optimal
Parametric model
Sd
{ (
) ln(dc ) − ln(10− 1 )
)} action at any time step. A total of three representative experiments are
0, exp ln 10− 1 : : ln(dc ,∞ conducted, assigning different inspection, repair and failure costs to
{ ( |S d | − 2 )}
SK − 5
) ln(1) − ln(10− 5 ) each of them. Each experiment is characterized by a different ratio be
0, exp ln 10 : : ln(1 , ∞
|SK | − 2
tween repair and inspection costs RR/I , as well as the ratio between
Deterioration rate model
{ ( )} failure and repair costs RF/R . Since these ratios are of relevance in this
Sd ) ln(dc ) − ln(10− 4 )
0, exp ln 10− 4 :
|Sd | − 2
: ln(dc ,∞ work, analyzing the problem from an optimization perspective, an
Sτ 0 : 1 : 30 explicit separation of economic, societal, and environmental conse
quences and their scaling to monetary units is not considered. The
SARSOP point-based POMDP solver [30] is used for the computation of
the optimal I&M policies. Additionally, the policies from FRTDP [31]
Table 3
and Perseus [55] point-based solvers are computed specifically for
Accuracy of the considered discretization schemes. The normalized error ξ and
state spaces corresponding to each parameter are reported.
experiment RR/I 50 − RF/R 20. In this theoretical comparison, the expected
costs are computed based on the lower bound alpha vectors, as
Model
explained in Section 4.2.
|SK | |Sτ | |Sd | |S| ξ
Deterioration rate (DRd15 ) – 31 15 465 8.6⋅10− 3 In contrast, the optimal RBI policies are determined based on the best
Deterioration rate (DRd30 ) – 31 30 930 2.1⋅10− 4 identified heuristic decision rules. For this theoretical comparison, the
Parametric (PARK50− d40 ) 50 – 40 2,000 7.1⋅10 − 2 decision tree is simplified to a single branch with two schemes consid
Parametric (PARK50− d80 ) 50 – 80 4,000 7.2⋅10− 3 ered here: equidistant inspections (EQ-INS) and annual failure proba
Parametric (PARK50− d160 ) 50 – 160 8,000 3.4⋅10− 3 bility ΔPF threshold (THR-INS). For the maintenance actions, the
Parametric (PARK100−
component is perfectly repaired after a detection indication, behaving
d80 ) 100 80 8,000 2.5⋅10−
– 3
thereafter as if a crack was not detected at that inspection. The opti
Parametric (PARK100− d160 ) 100 – 160 16,000 4.3⋅10− 4
mized heuristics for each experiment are listed in Table 4, e.g. an
9
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 6. Error |PFMCS − PFDBN | between the continuous deterioration model and the considered discrete space models. The continuous model is computed by a Monte
Carlo simulation of 10 million samples and is compared with discrete state-space DBN models. The circles in the graph represent the error from deterioration rate
models and the squares represent the error from parametric models.
Table 4
Analytical (AN) and simulation-based (SIM) comparison between POMDPs and optimized heuristic-based policies in a traditional setting. E[CT ] is the total expected
cost and Δ%[POMDP FH] indicates the relative difference between each method and SARSOP finite horizon POMDP. Confidence intervals on the expected costs,
assuming Gaussian estimators, are listed for the simulation-based cases.
Traditional setting E[CT ] (95%C.I) Δ%[POMDP FH]
*
The decision tree is simplified to one single branch, as explained in Section 2.1.
**
Simulation of an infinite horizon POMDP policy over a horizon of 30 years.
***
Perfect repair actions are undertaken after two consecutive ‘detection’ observations.
10
P.G. Morato et al. Structural Safety 94 (2022) 102140
inspection every 4 years (ΔIns = 4) is identified as the optimal equidis The expected utility E[CT ] is estimated according to Eq. (11).
tant inspection heuristic (EQ-INS) for Experiment RR/I 20 − RF/R 100. Table 4 lists the results of the comparison and given that the expected
The total expected cost E[CT ] resulting from finite horizon POMDPs cost E[CT ] is estimated through simulations, the numerical confidence
and the best identified heuristics are listed in Table 4. Along with the bounds are also reported, assuming a Gaussian estimator. All the
E[CT ], the relative difference between each method and the finite hori methods are compared relatively to the finite horizon POMDP that again
zon POMDP is also reported, and Table 4 demonstrates that finite ho outperforms the heuristic-based policies. The reduced state-space
rizon POMDP policies outperform heuristic-based policies. Even for this infinite horizon POMDP policy results in only a slight increment to the
traditional I&M decision-making problem, POMDPs provide a signifi total expected cost obtained by the finite horizon POMDP, in this finite
cant cost reduction ranging from 11% in Experiment RR/I 20 − RF/R 100 to horizon problem. The optimal policy for an infinite horizon in experi
37% reduction in Experiment RR/I 50 − RF/R 20. Experiment ment RR/I 20 − RF/R 100 includes the possibility of maintenance actions,
RR/I 10 − RF/R 10 is merely conducted to validate the comparative results whereas the policy for a finite horizon prescribes only the action do-
by checking that all the methods provide the same results for the case in nothing/no-inspection. This explains the slight difference of expected
which repairs and inspections are very expensive relatively to the failure costs for the infinite horizon POMDP. The infinite horizon POMDP for a
cost. parametric model of 16,000 states is also computed and listed in Table 4
As pointed out in Section 4.2, point-based solvers are able to rapidly for the experiment RR/I 50 − RF/R 20. As expected,the E[CT ] for the para
solve large state-space POMDPs. This is demonstrated in Fig. 7, where metric (PAR) model results in good agreement with the deterioration
SARSOP outperforms heuristic-based schemes in less than one second of rate (DR) model and the small difference is attributed to the dis
computational time. Note that POMDP policies are based on the lower cretization quality.
bound, whereas the upper bound, when provided, is just an approxi Finally, we showcase policy realizations to visualize the difference
mation, to optimally sample reachable belief points [56]. between POMDPs and heuristic-based policies over an episode, related
to the experiment RR/I 50 − RF/R 20. Fig. 8a and b represent realizations of
5.2.2. Comparison in a simulation environment POMDP policies, whereas, Fig. 8c and d represent the realizations of
In this case, the total expected cost E[CT ] is evaluated in a simulation heuristic-based policies. While heuristic-based policies prescribe a
environment. Since the horizon can be controlled in a policy evaluation, repair action immediately after a detection, POMDP-based policies
infinite horizon POMDPs are also included in this comparison. The might also consider a second inspection after a detection outcome. If the
infinite horizon POMDP is directly derived from the deterioration rate second inspection results in a no-detection outcome, a repair action may
model, and while the action-observation combinations remain the same not be prescribed; however, if the second inspection also results in
as for the finite horizon POMDP, the belief space is now reduced to 930 detection, a perfect repair is planned. POMDP-based policies provide,
states, offering a substantial reduction in computational cost, as therefore, more flexibility, in general, and can reveal interesting pat
explained before. Note that even though policies generated by infinite terns, such that it might be worthy, in certain cases, to conduct a second
horizon POMDPs can be evaluated over a finite horizon, the policies are inspection before prescribing an expensive repair action. As such, based
truly optimal only in an infinite horizon setting. on analyzed POMDP policy patterns, heuristic rules can be informed and
In this comparison, the best heuristic-based I&M policy is also defined anew. As reported in Table 4, two additional heuristic rules are
identified by analyzing two inspection planning heuristics, as previ thus examined, where perfect repair actions are undertaken after two
ously, either based on equidistant inspections (EQ-INS) or based on an consecutive ‘detection’ observations. These modified heuristics yield
annual failure probability threshold (THR-INS). However, in this simu results closer to those provided by POMDP policies, with POMDP pol
lation setting, the component naturally returns to its initial condition icies surpassing now the two heuristic ones by 7% and 14%, respec
after a repair, instead of modeling its evolution as a no-detection event. tively. While an experienced operator might have initially guessed these
This operation might add a significant computational expense for more sophisticated heuristic decision rules, based on the imperfect and
analytical computations, if the decision tree is explicitly modeled; cheap observation model specified in this setting, in more complex
however, it can be easily modeled in a simulation-based environment. settings, e.g. an I&M planning scenario with inspections that provide
11
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 8. Experiment RR/I 50 − RF/R 20 policy realizations. The failure probability is plotted in blue and the prescribed maintenance actions are represented by black
bars. A detection outcome is marked by a cross, whereas a no-detection outcome is marked by a circle.
more than two indications (as shown in Section 5.3), decision makers PoI(d) ∼ Exp[μ = 7]; PoI(d) ∼ Exp[μ = 10]; and PoI(d) ∼ Exp[μ = 13].
might guide their choices for the selection of more advanced heuristic The probability of observing each indicator is represented in Fig. 9b as a
rules through an investigation of the patterns exposed by POMDP policy function of the crack size.
realizations. Similar to the previous case, we solve a finite horizon POMDP with
14,880 states to identify the optimal policy. However, in this setting,
5.3. Case 2. Detailed I&M planning setting actions and observations are combined into seven groups: (1) do-
nothing/no-inspection (DN-NI); (2) do-nothing/inspection-1 (DN-I1);
While only a perfect repair and one inspection technique have been (3) do-nothing/inspection-2 (DN-I2); (4) minor-repair/no-inspection
available for the traditional setting applications, two repair actions and (mRP-NI); (5) minor-repair/inspection-1 (mRP-I1); (6) minor-repair/
two inspection techniques are now available in this more complex case. inspection-2 (mRP-I2); and (7) perfect-repair/ no-inspection (pRP-NI),
Fatigue deterioration in this setting can be controlled by either per and analyses are conducted for a modified version of experiment
forming a perfect or a minor repair. The perfect repair returns the RR/I 50 − RF/R 20. The individual costs for this example are listed in
component to its initial condition and the minor repair transfers the Table 5. Inspection type-2 costs twice the cost of inspection type-1, as it
component two deterioration rates back. The two inspection techniques is more accurate and provides more information about the deterioration.
considered are inspection 1 (I1) with only 2 indicators: detection (D) or For this setting, heuristic inspection decision rules are prescribed
no-detection (ND); and inspection 2 (I2) with 5 indicators: no-detection considering again both equidistant inspections and annual failure
(ND), low damage (LD), minor damage (mD), major damage (MD) and probability ΔPF threshold schemes. All heuristics are evaluated in a
extensive damage (D). The quality of each inspection technique is simulation environment, computing the expected cost E[CT ], as indi
quantified through probability of indication (PoI) curves. Fig. 9a cor cated in Eq. (11). Maintenance heuristic rules are accordingly defined
responds to the first inspection type with a PoD(d) ∼ Exp[μ = 8]. This considering the following two schemes:
inspection method is the same as the one used in the traditional I&M
planning setting. The second inspection method includes, however, the
following detection boundaries: PoI(d) ∼ Exp[μ = 4];
12
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 9. Quantification of the inspection uncertainty. The probability of retrieving each indicator is represented as a function of the crack size. For inspection type-1,
the observation model includes two indicators: “detection” D1 and “no-detection” ND1. For inspection type-2, the observation model is composed of five indicators:
“no-detection” ND2, “low damage” LD2, “minor damage” mD2, “major damage” MD2, and “extensive damage” D2.
• Observation-based maintenance rules: a maintenance action is un size threshold, E[d]. Threshold-based maintenance rules based on
dertaken after an observation. For example, a minor repair is un expected damage have also been evaluated against POMDPs in [61].
dertaken if a minor damage is observed. The number ⃒ of potential
⃒
observation-based maintenance rules scales to ⃒AR ||O| pairs, where, The expected costs E[CT ] resulting from both POMDP and heuristic-
|O| and |AR | are the number of observations and maintenance actions, based policies are reported in Table 5. Additionally, we list the rela
respectively. If we consider inspection type-2, the heuristic rules tive difference between each policy and a finite horizon POMDP policy
result in 35 combinations. Such combinatoric heuristic rules, solved by SARSOP. In this detailed setting, POMDP-based policies
together with failure probability thresholds or intervals for in outperform again heuristic-based ones. In terms of POMDP-based pol
spections, have been evaluated against POMDPs in [60]. Due to the icies, SARSOP and FRTDP achieve similar results. Results obtained from
large computational cost of evaluating all possible decision rules, we heuristic-based policies vary depending on their prescribed set of heu
evaluated only a subset of these combinations here. The most ristics. For equidistant inspection planning, inspection type-1 is
competitive set of heuristic rules for this case are listed in Table 5, e. preferred rather than inspection type-2, because the inspections are
g. the optimized equidistant inspection type-1 heuristic (EQ-INS1) fixed in time, and the additional information provided by inspection
prescribes an inspection every 11 years (ΔIns = 11), and a perfect type-2 becomes too expensive. In contrast, inspection type-2 is the best
repair after a detection observation (pRP-D1). scheme for annual failure probability threshold inspection planning. The
• Threshold-based maintenance rules: a maintenance action is under threshold-based maintenance heuristics proved to be better than
taken when a specific threshold is reached after an observation. The observation-based heuristics, yet threshold-based maintenance heuris
threshold can be prescribed in terms of failure probability PF or ex tics imply additional computational costs, as generally, more heuristic
pected damage size, as proposed in [46]. We consider both cases rules must be evaluated. Fig. 10 illustrates the expected cost E[CT ] of
here, i.e. a failure probability threshold PFth and an expected damage each policy as a function of the computational time. We can see how the
POMDP point-based solvers improve their low bounds in time, along
with the computational cost incurred by evaluating the various heuristic
Table 5 rules.
Comparison between POMDP and optimized heuristic-based policies in a To visualize the actions prescribed by each approach, Fig. 11 displays
detailed setting. E[CT ] is the total expected cost and Δ%[POMDP FH] indicates
a frequency histogram of the actions taken over 104 policy realizations.
the relative difference between each method and SARSOP finite horizon POMDP
The action do-nothing/no-inspecion (DN-NI) predominates over all
results. Confidence intervals on the expected costs, assuming Gaussian estima
other actions. While heuristic policies conduct either inspection type-1
tors, are also listed.
(DN-I1) or inspection type-2 (DN-I2), the POMDP-based policy utilizes
Detailed setting E[CT ](95%C.I) Δ%[POMDP
both inspection types. This is also true for the maintenance actions, in
FH]
which heuristic policies prescribe only perfect repairs, whereas POMDP
Ci1 = 1, Ci2 = 2, CmRP = 10, CpRP = 50, Cf = 103 , γ = 0.95 policies choose sometimes to undertake minor-repairs (mRP) as well.
POMDP Finite Horizon (FH). SARSOP - Lower 12.26 –
Bound 6. Discussion
POMDP Finite Horizon (FH). FRTDP - Lower 12.30 <1%
Bound
Heur. EQ-INS1 ΔIns = 11; pRP-D1 16.23 (±0.19) +32% The results of this investigation show that POMDPs are able to
Heur. EQ-INS2 ΔIns = 11; pRP-D2 18.08 (±0.31) +47% identify optimal I&M policies for deteriorating structures and offer
Heur. THR-INS1 ΔPFth = 1.5⋅10− 3 ; pRP-D1 16.40 (±0.20) +33% substantially lower costs than heuristic-based policies, as is theoretically
15.55 (±0.21)
explained and justified, and as it has also been demonstrated through
Heur. THR-INS2 ΔPFth = 1.1⋅10− 3 ; pRP-D2 +26%
numerical examples in Sections 5.2 and 5.3. The policy optimization
Heur. THR-INS2 ΔPFth = 5.0⋅10− 4 ; pRP-PFth = 13.88 (±0.29) +13%
2.2⋅10− 2
based on heuristic-based approaches may be constrained by the limited
Heur. THR-INS2 ΔPFth = 13.66 (±0.24) +11%
number of decision rules assessed, out of all possible decision rules.
1.0⋅10− 3 ; pRP-E[d] > 4 Avoiding these limitations, POMDPs prescribe actions as a function of
13
P.G. Morato et al. Structural Safety 94 (2022) 102140
Fig. 10. Computational details of POMDP and simulation-based heuristic schemes in a detailed setting. The expected total costs E[CT ] are represented over the
computational time. Results of SARSOP and FRTDP point-based POMDP solvers are plotted, with a continuous line for the low bound and a dashed line for the upper
bound. Optimized heuristic policies results are reported by markers and are directly linked to the schemes shown in Table 5.
Fig. 11. Frequency histogram of the actions prescribed by each considered approach over 104 policy realizations. The policies presented here are linked to those
listed in Table 5.
the belief state, which is a sufficient statistic of the whole, dynamically gradually reach a converged solution. For both traditional and detailed
updated, action-observation history. This implies that the actions are settings, both SARSOP and FRTDP point-based solvers outperform
taken according to the whole history of actions and observations, rather heuristic-based policies after only few seconds of computational time.
than as a result of an immediate inspection outcome or pre-defined static For modeling the deterioration process, one can utilize either a
policies. parametric or a deterioration rate model, as explained in Section 2. A
As demonstrated in Section 5.3, POMDPs can be applied to detailed deterioration rate model generally results in a smaller state space than a
I&M decision settings, in which multiple actions and inspection methods parametric model, except for very long horizons. In this latter case, a
are available. In terms of computational efficiency, state-of-the-art parametric model might lead to a smaller state space, due to its sta
point-based solvers are able to solve high-dimensional state space tionary nature. In any case, a discretization analysis must be conducted
POMDPs within a reasonable computational time. In particular, SARSOP to select the appropriate state model for the problem at hand. More ef
point-based solver very quickly improves its policy at the beginning of forts are worth being made in the future towards continuous state space
the solution process and employs an approximate upper bound to POMDPs and optimal discretization schemes for discrete state spaces.
14
P.G. Morato et al. Structural Safety 94 (2022) 102140
In this paper, we examine the effectiveness of Partially Observable [1] Frangopol DM. Life-cycle performance, management, and optimisation of
structural systems under uncertainty: accomplishments and challenges. Struct
Markov Decision Processes (POMDPs) to identify optimal Inspection and Infrastruct Eng 2011;7(6):389–413. https://doi.org/10.1080/
Maintenance (I&M) strategies for deteriorating structures, and we 15732471003594427. http://www.tandfonline.com/doi/abs/10.1080/
clarify that Dynamic Bayesian Networks (DBNs) can be combined with 15732471003594427.
[2] Stewart MG, Rosowsky DV. Time-dependent reliability of deteriorating reinforced
POMDPs, providing a joint framework for efficient inspection and concrete bridge decks. Struct Saf 1998;20(1):91–109. https://doi.org/10.1016/
maintenance planning. The formulation for deriving POMDPs in a S0167-4730(97)00021-0. https://linkinghub.elsevier.com/retrieve/pii/
structural reliability context is also presented, and two alternative DBN S0167473097000210.
[3] Akgül F, Frangopol DM. Lifetime Performance Analysis of Existing Steel Girder
formulations for deterioration modeling are described, together with Bridge Superstructures. J Struct Eng 2004;130(12):1875–88. https://doi.org/
their POMDP implementations. 10.1061/(ASCE)0733-9445(2004)130:12(1875).
Modern Risk Based Inspection (RBI) planning methodologies are [4] Val DV, Stewart MG, Melchers RE. Effect of reinforcement corrosion on reliability
of highway bridges. Eng Struct 1998;20(11):1010–9. https://doi.org/10.1016/
often supported by DBNs, and a pre-defined set of decision rules is
S0141-0296(97)00197-1. https://linkinghub.elsevier.com/retrieve/pii/
evaluated. These policies can on occasions diverge significantly from S0141029697001971.
globally optimal solutions, because of the limited domain space of [5] Val DV, Melchers RE. Reliability of Deteriorating RC Slab Bridges. J Struct Eng
searched policies that may not include the global optimum. In contrast, 1997;123(12):1638–44. https://doi.org/10.1061/(ASCE)0733-9445(1997)123:12
(1638).
POMDP policies prescribe an action as a function of the belief state, [6] Moan T. Reliability-based management of inspection, maintenance and repair of
which is a sufficient statistic of the whole action-observation history. offshore structures. Struct Infrastruct Eng 2005;1(1):33–62. https://doi.org/
I&M policies generated by finite horizon POMDPs are compared with 10.1080/15732470412331289314.
[7] Lotsberg I, Sigurdsson G, Fjeldstad A, Moan T. Probabilistic methods for planning
heuristic-based policies, for the case of a structural component subjected of inspection for fatigue cracks in offshore structures. Marine Struct 2016;46:
to fatigue deterioration. In the first example, the stochastic deterioration 167–92. https://doi.org/10.1016/j.marstruc.2016.02.002. https://linkinghub.
is modeled as a function of time-invariant parameters, with only one elsevier.com/retrieve/pii/S0951833916000071.
[8] Wirsching PH. Fatigue reliability in welded joints of offshore structures. Int J
inspection type and one perfect repair available. Our numerical findings Fatigue 1980;2(2):77–83. https://doi.org/10.1016/0142-1123(80)90035-3.
verify that POMDP-based policies can approximate the global solution [9] Schaumann P, Lochte-Holtgreven S, Steppeler S. Special fatigue aspects in support
better than heuristic-based policies, thus being more efficient even for structures of offshore wind turbines. Materialwissenschaft und Werkstofftechnik
2011;42(12):1075–81. https://doi.org/10.1002/mawe.201100913.
typical RBI applications. The 14,880 states finite-horizon POMDP out [10] Dong W, Moan T, Gao Z. Fatigue reliability analysis of the jacket support structure
performs heuristic-based policies in less than a second of computational for offshore wind turbine considering the effect of corrosion and inspection. Reliab
time. For the second numerical example, we consider an I&M decision- Eng Syst Saf 2012;106:11–27. https://doi.org/10.1016/j.ress.2012.06.011.
[11] Yeter B, Garbatov Y, Guedes Soares C. Fatigue damage assessment of fixed offshore
making problem in a more detailed setting, including two inspection
wind turbine tripod support structures. Eng Struct 2015;101:518–28. doi:10.1016/
methods and two repair actions. Whereas the outcome of the first in j.engstruct.2015.07.038. https://doi.org/10.1016/j.engstruct.2015.07.038.
spection type is set up as a binary indicator, the second inspection [12] Andriotis CP, Papakonstantinou KG, Chatzi EN. Value of structural health
technique indicates the damage level through five alarms. With this information in partially observable stochastic environments; 2020. arXiv preprint
arXiv:1912.12534.
application, we demonstrate the capabilities of POMDPs in efficiently [13] Fauriat W, Zio E. Optimization of an aperiodic sequential inspection and condition-
handling complex decision problems, outperforming again heuristic- based maintenance policy driven by value of information. Reliab Eng Syst Saf
based polices. 2020;204:107133.
[14] Ellingwood BR. Risk-informed condition assessment of civil infrastructure: state of
The main limitation of the presented approaches, including POMDPs, practice and research issues. Struct Infrastruct Eng 2005;1(1):7–18. https://doi.
is the increase of computational complexity for very large state and org/10.1080/15732470412331289341.
action spaces, such as the ones for a system of multiple components. [15] Sánchez-Silva M, Frangopol DM, Padgett J, Soliman M. Maintenance and operation
of infrastructure systems. J Struct Eng 2016;142(9):F4016004.
Dynamic Bayesian networks with large state spaces are similarly con [16] Faber MH, Stewart MG. Risk assessment for civil engineering facilities: Critical
strained by the curse of dimensionality. To overcome this limitation, we overview and discussion. Reliab Eng Syst Saf 2003;80(2):173–84. https://doi.org/
suggest further research efforts toward the development of POMDP- 10.1016/S0951-8320(03)00027-9.
[17] Kim S, Ge B, Frangopol DM. Effective optimum maintenance planning with
based Deep Reinforcement Learning (DRL) methodologies. As demon updating based on inspection information for fatigue-sensitive structures. Probab
strated in [60,61], a multi-agent actor-critic DRL approach is able to Eng Mech 2019;58(August):103003. https://doi.org/10.1016/j.
identify optimal strategies for multi-component systems with large state, probengmech.2019.103003. https://linkinghub.elsevier.com/retrieve/pii/
S0266892019300700.
action and observation spaces. In particular, POMDP-based actor-critic
[18] Raiffa H, Schlaifer R. Applied Statistical Decision Theory, Harvard University
DRL methods approximate the policy and the value function with neural Graduate School of Business Administration (Division of Research). Bailey &
networks, alleviating therefore the curse of dimensionality through the Swinfen; 1961.
deep networks parametrizations, and the curse of history through the [19] Russell SJ. Artificial intelligence: a modern approach. 3rd Ed. Upper Saddle River,
NJ: Prentice Hall; 2010.
reliance on dynamic programming MDP principles, the full advantages [20] Sørensen JD, Rackwitz R, Faber MH, Thoft-Christensen P. Modelling in optimal
of which may be compromised if heuristic rules are instead considered. inspection and repair. In: Proceedings of the 10th international conference on
offshore mechanics and arctic engineering, vol. 2. United States: American Society
of Mechanical Engineers; 1991. p. 281–8.
Declaration of Competing Interest [21] Goyet J, Straub D, Faber MH. Risk-based inspection planning of offshore
installations. Struct Eng Int 2002;12(3):200–8. https://doi.org/10.2749/
The authors declare that they have no known competing financial 101686602777965360. https://www.tandfonline.com/doi/full/10.2749/
101686602777965360.
interests or personal relationships that could have appeared to influence [22] Rangel-Ramírez JG, Sørensen JD. Risk-based inspection planning optimisation of
the work reported in this paper. offshore wind turbines. Struct Infrastruct Eng 2012;8(5):473–81. https://doi.org/
10.1080/15732479.2010.539064.
[23] Straub D. Stochastic Modeling of Deterioration Processes through Dynamic
Acknowledgements Bayesian Networks. J Eng Mech 2009;135(10):1089–99. https://doi.org/10.1061/
(asce)em.1943-7889.0000024.
This research is funded by the National Fund for Scientific Research [24] Luque J, Straub D. Risk-based optimal inspection strategies for structural systems
using dynamic Bayesian networks. Struct Saf 2019;76(January 2019):68–80. doi:
in Belgium F.R.I.A. – F.N.R.S. This support is gratefully acknowledged.
10.1016/j.strusafe.2018.08.002. doi: 10.1016/j.strusafe.2018.08.002.
Dr. Papakonstantinou and Dr. Andriotis would further like to acknowl [25] Bismut E, Straub D. Optimal adaptive inspection and maintenance planning for
edge that this material is also based upon work supported by the U.S. deteriorating structural systems; 2021. arXiv preprint arXiv:2102.06016.
National Science Foundation under Grant No. 1751941. [26] Yang DY, Frangopol DM. Probabilistic optimization framework for inspection/
repair planning of fatigue-critical details using dynamic bayesian networks.
Comput Struct 2018;198:40–50.
15
P.G. Morato et al. Structural Safety 94 (2022) 102140
[27] Tien I, Der Kiureghian A. Reliability assessment of critical infrastructure using [46] Nielsen JS, Sørensen JD. Computational framework for risk-based planning of
bayesian networks. J Infrastruct Syst 2017;23(4):04017025. inspections, maintenance and condition monitoring using discrete Bayesian
[28] Puterman ML. Markov decision processes: discrete stochastic dynamic networks. Struct Infrastruct Eng 2018;14(8):1082–94. https://doi.org/10.1080/
programming. John Wiley & Sons; 2014. 15732479.2017.1387155.
[29] Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Autonom [47] Papakonstantinou KG, Shinozuka M. Optimum inspection and maintenance
Agents Multi-Agent Syst 2013;27(1):1–51. policies for corroded structures using partially observable Markov decision
[30] Kurniawati H, Hsu D, Sun Lee W. SARSOP: Efficient Point-Based POMDP Planning processes and stochastic, physically based models. Probab Eng Mech 2014;37:
by Approximating Optimally Reachable Belief Spaces. In: Proceedings of Robotics: 93–108. https://doi.org/10.1016/j.probengmech.2014.06.002.
Science and Systems IV, Zurich, Switzerland; 2008. doi:10.15607/RSS.2008. [48] Straub D. Stochastic Modeling of Deterioration Processes through Dynamic
IV.009. http://www.roboticsproceedings.org/rss04/p9.pdf. Bayesian Networks. J Eng Mech 2009;135(10):1089–99. https://doi.org/10.1061/
[31] Smith T, Simmons R. Focused real-time dynamic programming for MDPs: (ASCE)EM.1943-7889.0000024.
Squeezing more out of a heuristic. Proc Natl Conf Artif Intell 2006;2(January): [49] Papakonstantinou KG, Shinozuka M. Planning structural inspection and
1227–32. maintenance policies via dynamic programming and Markov processes. Part II:
[32] Memarzadeh M, Pozzi M, Zico Kolter J. Optimal Planning and Learning in POMDP implementation. Reliab Eng Syst Saf 2014;130:214–24. https://doi.org/
Uncertain Environments for the Management of Wind Farms. J Comput Civil Eng 10.1016/j.ress.2014.04.006.
2015;29(5):04014076. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000390. [50] Kochenderfer MJ, Amato C, Chowdhary G, How JP, Reynolds HJD, Thornton JR,
[33] Memarzadeh M, Pozzi M, Kolter JZ. Hierarchical modeling of systems with similar Torres-Carrasquillo PA, Ure KN, Vian J. Decision Making Under Uncertainty:
components: A framework for adaptive monitoring and control. Reliab Eng Syst Saf Theory and Application. 1st Ed. The MIT Press; 2015.
2016;153:159–69. https://doi.org/10.1016/j.ress.2016.04.016. http://www. [51] Nielsen JS, Sørensen JD. Risk-based Decision Making for Deterioration Processes
sciencedirect.com/science/article/pii/S0951832016300448. Using POMDP. In: T. Haukaas (Ed.), 12th International Conference on Applications
[34] Papakonstantinou KG, Shinozuka M. Planning structural inspection and of Statistics and Probability in Civil Engineering, ICASP12, Civil Engineering Risk
maintenance policies via dynamic programming and Markov processes. Part I: and Reliability Association, Vancouver, Canada; 2015, p. 8. doi:10.14288/
Theory. Reliab Eng Syst Saf 2014;130:214–24. https://doi.org/10.1016/j. 1.0076132.
ress.2014.04.005. [52] Robelin C-A, Madanat SM. History-dependent bridge deck maintenance and
[35] Corotis RB, Hugh Ellis J, Jiang M. Modeling of risk-based inspection, maintenance replacement optimization with Markov decision processes. J Infrastruct Syst 2007;
and life-cycle cost with partially observable Markov decision processes. Struct 13(3):195–201.
Infrastruct Eng 2005;1(1):75–84. doi:10.1080/15732470412331289305. http:// [53] Kim JW, Choi GB, Lee JM. A POMDP framework for integrated scheduling of
www.tandfonline.com/doi/abs/10.1080/15732470412331289305. infrastructure maintenance and inspection. Comput Chem Eng 2018;112:239–52.
[36] Morato PG, Nielsen JS, Mai AQ, Rigo P. POMDP based maintenance optimization [54] Kaelbling LP, Littman ML, Cassandra AR. Planning and acting in partially
of offshore wind substructures including monitoring. In: 13th International observable stochastic domains. Artif Intell 1998;101(1–2):99–134. https://doi.org/
Conference on Applications of Statistics and Probability in Civil Engineering, 10.1016/s0004-3702(98)00023-x. papers3://publication/uuid/5ECA204D-77B0-
ICASP; 2019. p. 270–7. https://doi.org/10.22725/ICASP13.067. 4C68-A0A6-7323A9D8893B.
[37] Faber MH. Risk-based inspection: The framework. Struct Eng Int 2002;12(3): [55] Spaan MTJ, Vlassis N. Perseus: Randomized point-based value iteration for
186–94. https://doi.org/10.2749/101686602777965388. POMDPs. J Artif Intell Res 2005;24:195–220. https://doi.org/10.1613/jair.1659.
[38] Straub D. Generic Approaches to Risk Based Inspection Planning for Steel [56] Papakonstantinou KG, Andriotis CP, Shinozuka M. POMDP and MOMDP solutions
Structures, Ph.D. thesis, Swiss Federal Institute of Technology Zürich (ETH); 2004. for structural life-cycle cost minimization under partial and mixed observability.
[39] Ditlevsen O, Madsen HO. Structural reliability methods. Department of Mechanical Struct Infrastruct Eng 2018;14(7):869–82. https://doi.org/10.1080/
Engineering, Technical University of Denmark; 2007. 15732479.2018.1439973.
[40] Madsen HO, Krenk S, Lind NC. Methods of structural safety. Dover Publications [57] Pineau J, Gordon G, Thrun S. Point-based value iteration: An anytime algorithm for
Inc.; 2006. POMDPs. In: IJCAI International Joint Conference on Artificial Intelligence; 2003.
[41] Straub D. Reliability updating with equality information. Probab Eng Mech 2011; p. 1025–30.
26(2):254–8. [58] Howard RA. Information value theory. IEEE Trans Syst Sci Cybern 1966;2(1):22–6.
[42] Lotsberg I, Sigurdsson G, Fjeldstad A, Moan T. Probabilistic methods for planning [59] Walraven E, Spaan MT. Point-based value iteration for continuous POMDPs. J Artif
of inspection for fatigue cracks in offshore structures. Marine Struct 2016;46: Intell Res 2019;65:307–41. https://doi.org/10.13039/501100000780.
167–92. https://doi.org/10.1016/j.marstruc.2016.02.002. [60] Andriotis CP, Papakonstantinou KG. Deep reinforcement learning driven
[43] Luque J, Straub D. Reliability analysis and updating of deteriorating systems with inspection and maintenance planning under incomplete information and
dynamic Bayesian networks. Struct Saf 2016;62:34–46. https://doi.org/10.1016/j. constraints. eliab Eng Syst Saf 2021;212:107551. https://doi.org/10.1016/j.
strusafe.2016.03.004. ress.2021.107551.
[44] Jensen FV. Introduction to Bayesian Networks. Berlin, Heidelberg: Springer-Verlag; [61] Andriotis CP, Papakonstantinou KG. Managing engineering systems with large
1996. state and action spaces through deep reinforcement learning. Reliab Eng Syst Saf
[45] Murphy KP. Dynamic Bayesian Networks: Representation, Inference and Learning. 2019;191. https://doi.org/10.1016/j.ress.2019.04.036.
Ph.D. thesis, University of California, Berkeley; 2002.
16