0% found this document useful (0 votes)
33 views16 pages

Optimal Inspection and Maintenance Planning For Deteriorating Structural Components Through Dynamic Bayesian Networks and Markov Decision Processes

This document discusses using dynamic Bayesian networks and Markov decision processes to optimize inspection and maintenance planning for deteriorating structural components by balancing costs of maintenance, inspections, and failure risk. It combines these approaches in a new framework and provides formulations for developing infinite and finite horizon models. The methodology is demonstrated on a structural component subject to fatigue deterioration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views16 pages

Optimal Inspection and Maintenance Planning For Deteriorating Structural Components Through Dynamic Bayesian Networks and Markov Decision Processes

This document discusses using dynamic Bayesian networks and Markov decision processes to optimize inspection and maintenance planning for deteriorating structural components by balancing costs of maintenance, inspections, and failure risk. It combines these approaches in a new framework and provides formulations for developing infinite and finite horizon models. The methodology is demonstrated on a structural component subject to fatigue deterioration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Structural Safety 94 (2022) 102140

Contents lists available at ScienceDirect

Structural Safety
journal homepage: www.elsevier.com/locate/strusafe

Optimal inspection and maintenance planning for deteriorating structural


components through dynamic Bayesian networks and Markov
decision processes
P.G. Morato a, *, K.G. Papakonstantinou b, C.P. Andriotis b, J.S. Nielsen c, P. Rigo a
a
ANAST, Department of ArGEnCo, University of Liege, 4000 Liege, Belgium
b
Department of Civil & Environmental Engineering, The Pennsylvania State University, University Park, PA 16802, USA
c
Department of the Built Environment, Aalborg University, 9220 Aalborg, Denmark

A R T I C L E I N F O A B S T R A C T

Keywords: Civil and maritime engineering systems, among others, from bridges to offshore platforms and wind turbines,
Infrastructure management must be efficiently managed, as they are exposed to deterioration mechanisms throughout their operational life,
Inspection and maintenance such as fatigue and/or corrosion. Identifying optimal inspection and maintenance policies demands the solution
Partially Observable Markov Decision
of a complex sequential decision-making problem under uncertainty, with the main objective of efficiently
Processes
Deteriorating structures
controlling the risk associated with structural failures. Addressing this complexity, risk-based inspection planning
Dynamic Bayesian networks methodologies, supported often by dynamic Bayesian networks, evaluate a set of pre-defined heuristic decision
Decision analysis rules to reasonably simplify the decision problem. However, the resulting policies may be compromised by the
limited space considered in the definition of the decision rules. Avoiding this limitation, Partially Observable
Markov Decision Processes (POMDPs) provide a principled mathematical methodology for stochastic optimal
control under uncertain action outcomes and observations, in which the optimal actions are prescribed as a
function of the entire, dynamically updated, state probability distribution. In this paper, we combine dynamic
Bayesian networks with POMDPs in a joint framework for optimal inspection and maintenance planning, and we
provide the relevant formulation for developing both infinite and finite horizon POMDPs in a structural reli­
ability context. The proposed methodology is implemented and tested for the case of a structural component
subject to fatigue deterioration, demonstrating the capability of state-of-the-art point-based POMDP solvers of
solving the underlying planning stochastic optimization problem. Within the numerical experiments, POMDP
and heuristic-based policies are thoroughly compared, and results showcase that POMDPs achieve substantially
lower costs as compared to their counterparts, even for traditional problem settings.

1. Introduction Information about the condition of structural components can be


gathered during their operational life through inspections or moni­
Preserving infrastructures in a good condition, despite their exposure toring, allowing the decision maker to take more informed and rational
to diverse deterioration mechanisms throughout their operational life, actions [12,13]. However, both maintenance actions and observations
enables, in most countries, a stable economic growth and societal are associated with certain costs which must be optimally balanced
development [1]. For instance, a bridge structural component may against the risk of structural failure. As suggested by [14,15] and others,
experience a thickness reduction due to corrosion effects [2–5]; or a inspections and/or maintenance actions should be planned with the
surface crack at an offshore platform might drastically propagate due to objective of optimizing the structural life-cycle cost. Besides economic
fatigue deterioration [6–8]; or the structural resistance of an offshore consequences associated with structural failures or maintenance in­
welded joint can be reduced due to the combined cyclic actions of wind terventions, societal and environmental aspects can also be included
and ocean waves [9–11]. The prediction of such deterioration processes within a decision-making context in terms of utilities, defined in mon­
involves a probabilistic analysis in which all relevant uncertainties are etary units. A decision maker should, therefore, identify the decisions
properly quantified. that result in the minimization of the total expected costs over the

* Corresponding author.
E-mail address: [email protected] (P.G. Morato).

https://doi.org/10.1016/j.strusafe.2021.102140
Received 9 September 2020; Received in revised form 20 May 2021; Accepted 18 August 2021
Available online 30 October 2021
0167-4730/© 2021 Elsevier Ltd. All rights reserved.
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 1. (Top) Inspection and Maintenance (I&M) planning decision tree. Maintenance actions and observation decisions are represented by blue boxes and chance
nodes are depicted by white circles. At every time step, the cost Ct depends on the action a, observation decision e, and state s of the component. (Bottom) An I&M
POMDP sequence is represented where at each step t, the cost Ct depends on the action a, observation decision e, and state s of the component. In both repre­
sentations, an observation outcome o is collected according to the current state, taken action and observation decision.

lifetime of the structure [16,17]. especially, in the field of computer science and robot navigation [30,31].
In the context of Inspection and Maintenance (I&M) planning, the POMDPs have also been proposed for I&M of engineering systems
decision maker faces a complex sequential decision-making problem [32–36]. In the reported POMDP methodologies, either the condition of
under uncertainty. This sequential decision-making problem is illus­ the structural component has been modeled with less than five discrete
trated in Fig. 1, showcasing the involved random events and decision states or the rewards have not been defined in a structural reliability
points, and can be formulated either from the perspective of the classical context. This different POMDP approach to the I&M problem, as
applied statistical decision theory [18], or through artificial intelligence compared with typical RBI applications, has raised some misconceptions
[19] conceptions, or a combination thereof. In all cases, the main in the literature about their use, which we formally rectify herein.
objective of a decision maker, or an intelligent agent, is to identify the In this work, POMDPs are successfully combined with dynamic
optimal policy that minimizes the total expected costs. Bayesian networks in a joint framework, for optimal inspection and
With the aim of addressing this complex decision-making problem, maintenance planning, in order to take advantage of both the modeling
Risk-Based Inspection (RBI) planning methodologies have been tradi­ flexibility of DBNs and the advanced optimization capabilities of
tionally proposed [20] and have often also been applied to the I&M POMDPs. In particular, this paper originally derives the POMDP dy­
planning of offshore structures [21,22]. By imposing a set of heuristic namics from DBNs, enabling optimal control of physically-based sto­
decision rules, RBI methodologies are able to simplify and solve the chastic deterioration processes, modeled either through a conditional set
decision-making problem within a reasonable computational time, of time-invariant parameters or as a function of the deterioration rate.
while structural reliability methods are often employed within this We further provide all relevant formulations for deriving both infinite
framework, to quantify and update the reliability and risk metrics. and finite horizon POMDPs within a structural reliability context. The
More recently, RBI methodologies have also been integrated with proposed framework is analyzed, implemented, and tested for the case
Dynamic Bayesian Networks (DBNs) [23–27]. DBNs provide an intuitive of a structural component subject to a fatigue deterioration process, and
and robust inference approach to Bayesian updating; however, they do the capability of state-of-the-art point-based POMDP value iteration
not tackle the decision optimization problem by themselves. In the methods to efficiently solve challenging I&M optimization problems is
proposed methodologies, heuristic decision rules, usually based on en­ verified. POMDP and typical heuristic risk-based and/or periodic pol­
gineering principles and understanding of the problem, are still utilized icies are thoroughly analyzed and compared, in a variety of problem
to simplify the decision problem. Albeit their practical advantages, the settings, and results demonstrate that POMDP solutions achieve sub­
main shortcoming of heuristic-based policies is the limited policy space stantially lower costs in all cases, as compared to their counterparts.
exploration due to the prior, ad hoc prescription of decision rules. In this
paper, we thus present how DBNs describing deterioration processes can 2. Background: Risk-based inspection planning
be instead combined with Markov decision processes and dynamic
programming [28], and be used to define transition and emission A typical Inspection and Maintenance (I&M) sequential decision
probabilities in such settings. problem under uncertainty is illustrated in Fig. 1. The optimal strategy
Partially Observable Markov Decision Processes (POMDPs) provide a can be theoretically identified by means of a pre-posterior decision
principled mathematical methodology for planning in stochastic envi­ analysis [18]. Assuming the costs at different times to be additive in­
ronments under partial observability. In the past, POMDPs were only dependent, the pre-posterior analysis prescribes the observation de­
applicable for small state space problems due to the difficulty of finding cisions e ∈ E and actions a ∈ A that minimize the total expected cost
appropriate solutions in a reasonable computation time. However, CT (a, e) = Ct0 (e, a, s)t0 + … + CtN (e, a, s)tN γtN , i.e. the sum over the life­
starting with the development of point-based solvers [29], which time tN of the discounted costs received at each time step t, with γ being
managed to efficiently alleviate the inherent complexities of the solution the discount factor. Note that societal and environmental consequences,
process, POMDPs have been increasingly used for planning problems,

2
P.G. Morato et al. Structural Safety 94 (2022) 102140

specified in monetary units, can also be included within the definition of conducted inspections In , with individual inspection cost Ci , and dis­
the total expected cost. counted by the factor γ ∈ [0, 1]:
If the probabilities associated with the random events, as well as the tIn

costs, are assigned to each branch of the decision tree, then the branch E[CI (h, tN )] = Ci γ t I (2)
corresponding to the optimal cost C*T (a, e) can be identified. This anal­ tI =tI1

ysis is denoted backwards induction or extensive analysis. Alternatively,


a normal analysis can also be conducted by identifying the optimal de­ The expected repair cost E[CR (h, tN )] corresponding to a heuristic
cision rule, h*a,e ,
from all possible decision rules. In any case, the exact scheme h is calculated as the repair cost Cr multiplied by the probability
of repair PR at each inspection year tI :
solution of a pre-posterior analysis very quickly becomes computation­
ally intractable for practical problems because the number of branches tIn

increases exponentially with the number of time steps, actions, and E[CR (h, tN )] = Cr PR (h, t)γtI (3)
observations.
tI =tI1

The expected risk of failure E[CF (h, tN )] is computed as the sum of dis­
2.1. RBI assumptions and heuristic rules counted annual failure risks, in which ΔPF is the annual failure proba­
bility and Cf is the cost of failure:
Risk-Based Inspection (RBI) planning methodologies [37] introduce
simplifications to the I&M decision-making problem in order to be able ∑
tN
E[CF (h, tN )] = Cf ΔPF (h, t)γ t (4)
to identify strategies in a reasonable computational time. To simplify the t=1
problem, the expected cost is computed only for a limited set of pre-
defined decision rules ha,e . The best strategy among them is then iden­
2.2. Probabilistic deterioration model and reliability updating
tified as the decision rule with the minimum cost.
Within an I&M planning context, the total expected cost E[CT (h, tN )]
Structural reliability methods and general sampling based methods
is the combination of expected costs from inspections E[CI (h,tN )], repairs
[39] can be used to compute the probabilities associated with the
E[CR (h, tN )], and failures E[CF (h, tN )], as a function of the imposed deci­
random events represented in the I&M decision tree (Fig. 1). In a
sion rules ha,e . This expectation for a structural component designed for
simplified decision tree, the main random events are the damage de­
a lifetime of tN years is simply computed as:
tections during inspections and the structural failure.
E[CT (h, tN )] = E[CI (h, tN )] + E[CR (h, tN )] + E[CF (h, tN )] (1) The failure event is defined through a limit state gF (t) = dc − d(t), in
which dc represents the failure criteria, such as the critical crack size,
The simplifications introduced to the I&M decision-making problem by and d(t) is related to the temporal deterioration evolution. Uncertainties
pre-defining a set of decision rules are listed below: involved in the deterioration process are incorporated by defining d(t) as
a function of a group of random variables or random processes. The
i) Observations (inspections) are planned according to a pre- probability of failure PF (t) can be then computed as the probability of
defined heuristic rule. Two heuristic rules are commonly the limit state being negative PF = P{gF (t) ≤ 0}, and the reliability index
employed in the literature [38]: is inversely related to the failure probability, usually defined in the
• Equidistant inspections: inspections are planned at constant standard normal space as β(t) = − Φ− 1 {PF (t)}, in which Φ is the stan­
intervals of time Δt. dard normal cumulative distribution function. The probability of the
• Failure probability threshold: inspections are planned just failure event can also be defined over a reference period, e.g. the annual
before a pre-defined annual failure probability ΔPF threshold is failure probability can be computed as ΔPF (t) = {PF (t) − PF (t − 1)}.
reached. The measurement uncertainty of the available observations (in­
ii) If the outcome of an inspection indicates damage detection spections) is often quantified by means of Probability of Detection
(d > ddet ), a repair action is immediately performed. In that (PoD) curves. A PoD indicates the probability of detection as a function
case, the repair probability is equal to the probability of of the damage size d and depends on the employed inspection method,
detection PR = P(d > ddet ). Alternatively, other heuristic rules i.e. the function of the detectable damage size can be modeled by an
can also be imposed (adding computational complexity), such exponential distribution F(dd ) = F0 [1 − exp(− d/λ)], where F0 and λ are
as that a repair is performed if an inspection indicates detection parameters determined by experiments. The event of no detection at
(d > ddet ) and a pre-defined failure probability threshold PF is time tI is then modeled by the limit state function gInd (tI ) = d(tI ) − dd (tI ).
simultaneously exceeded. Similarly, the event of detection at time tI is modeled by the limit state
iii) After a component is repaired, it is assumed that it behaves like a gId (tI ) = dd (tI ) − d(tI ). Both detection and no detection events are
component with no damage detection, i.e. the remaining life can evaluated as inequalities, for instance, the probability of no detection
be computed as if the inspection at the time of repair indicates is assessed as the probability of the limit state being negative PInd =
no damage detection. With these assumptions, the decision tree P{gInd (tI ) ≤ 0}. Alternatively, a discrete damage measurement dm can
represented in Fig. 1 can be simplified to a single branch. be collected and the limit state is modeled in this case as gm (tI ) =
Alternatively, if a repair is performed at time t and it is assumed d(tI ) − (dm − ∊m ), where ∊m is a random variable that represents the
to be perfect, the component returns to its initial damage state at measurement uncertainty, and the equality event Pm = P{gm (t) = 0}
the beginning of a new decision tree with a lifetime equal to can be estimated equal to some limit, as explained in [39–41].
t N = tN − t. The additional information gained by observations can be used to
update the structural reliability or failure probability PF by computing a
Summarizing, one can simplify the problem to one decision tree failure event conditional on inspection events [42], as:
branch by assuming that: (i) inspections are to be planned according to a [ ]
heuristic rule, (ii) a repair is to be performed if an inspection indicates P gF (t) ≤ 0 ∩ gI1 (t) ≤ 0 ∩ … ∩ gIN (t) ≤ 0
PF|I1 ,…,IN (t) = [ ] (5)
detection, and (iii) after a repair is performed, the inspection at that time P gI1 (t) ≤ 0 ∩ … ∩ gIN (t) ≤ 0
is considered as a no detection event. In this case, the individual con­
tributions to the total expected cost in Eq. (1) can be computed The conditional failure probability introduced in Eq. (5) can be
analytically. computed by structural reliability methods (FORM, SORM) or by Monte
The expected inspection cost E[CI (h, tN )] is computed as the sum of all Carlo sampling methods [39].

3
P.G. Morato et al. Structural Safety 94 (2022) 102140

3. Stochastic deterioration processes via dynamic Bayesian


networks

A brief overview on the adoption of dynamic Bayesian networks


(DBNs) for structural deterioration and reliability problems is presented
here, with the objective of demonstrating that the main principles un­
derlying BNs inference tasks are fundamentally invariant to those
employed by POMDPs. Bayesian networks (BNs) are directed acyclic
graphical models particularly suited for inference tasks in probabilistic
environments. A DBN is a template model of a Bayesian network
evolving over time and in the context of structural reliability and related
problems, DBNs have played an important role [23,24,43]. For a
detailed background of probabilistic graphical models and BNs, the Fig. 2. Parametric dynamic Bayesian network, adapted from [23]. The evolu­
reader is directed to [44]. tion of a stochastic deterioration process is represented by the nodes dt influ­
enced by a set of time-invariant random variables θt . Imperfect observations are
To allow DBNs based inference within a reasonable computational
added through the nodes ot , and Ft binary node indicates the probability of
time for practical problems, the following assumptions are often
failure and survival events.
imposed:

ward operation is then applied for the subsequent time steps, comprised
i) Discrete state space: Exact inference algorithms are limited to
of the following steps:
discrete random variables [45]. A discretization operation must thus
be performed to convert the original continuous random variables to
1. Transition step: the belief propagates in time according to a pre-
the discrete space. The unknown error introduced by the dis­
defined conditional probability distribution or transition matrix
cretization operation converges to zero in the limit of an infinitesimal
P(dt+1 , θt+1 |dt , θt ), as:
interval size. However, the computational complexity of the infer­
ence task grows linearly with the number of states and exponentially P(dt+1∑
, θ∑
t+1 |o0 , …, ot )
with the number of random variables. = P(dt+1 , θt+1 |dt , θt ) P(dt , θt |o0 , …, ot ) (6)
ii) Markovian assumption: The state space S is the domain of all random dt θt

variables involved in the description of the deterioration process,


and the conditional probabilities P(st+1 |st ) associated with the
random variables at time step t +1 depend only on the random 2. Estimation step: the belief is now updated based on the obtained
variables at the current time step t, and are independent of all past evidence by means of Bayes’ rule, as:
states. P(dt+1 , θt+1 |o0 , …, ot+1 ) ∝ P(ot+1 |dt+1 )P(dt+1 , θt+1 |o0 , …, ot ) (7)

The transition probability matrix P(st+1 |st ) can also be assumed as sta­ The quality of the observation is quantified by the likelihood
tionary for some applications, thus facilitating the formulation of the P(ot+1 |dt+1 ). This likelihood can be directly obtained from probabil­
problem. This can however be easily relaxed without entailing addi­ ity of detection curves or by discretizing a direct measurement. Since
tional computational efforts [46]. the random variables are discrete, a normalization of P(dt+1 , θt+1 |o0 ,
…, ot+1 ) can be easily implemented.
3.1. Parametric DBN
The failure probability assigned to the node Ft corresponds to the
A stochastic deterioration process can be represented by the DBN probability of being in a failure state. As the failure states are defined
shown in Fig. 2. The deterioration is represented through the damage based on the damage condition dt , the time invariant parameters θt can
node dt which is influenced by a set of time-invariant random variables be marginalized out to compute the failure probability. Disregarding the
θt . The model is denoted as parametric DBN as the damage dt is influ­ discretization error, the resulting structural reliability is equivalent to
enced by the parameters θt . Imperfect observations are added into the the one computed in Eq. (5).
DBNs by means of the node ot . This DBN can be extended by incorpo­ In terms of computational complexity, note that the belief is
rating time-variant random variables as proposed by [23]; yet, we composed of (|θ1 |⋅…⋅|θk | |d|) states, defined by the damage d along with k
consider only time-invariant random variables here as they are widely time-invariant random variables. Thus, the transition matrix includes
used in the literature and to avoid unnecessary presentation complica­ (|θ1 |⋅…⋅|θk | |d|)2 elements. Since P(θt+1 |θt ) is defined by an identity
tions. Finally, the binary node Ft provides an indication of the failure matrix, the transition is prescribed by a very sparse, block-diagonal
and survivability. matrix with a maximum density of ρP = 1/(|θ1 |⋅…⋅|θk |).
Within the context of structural reliability and related problems,
DBNs are often employed to propagate and update the uncertainty
related to a deterioration process, incorporating evidence from in­ 3.2. Deterioration rate DBN
spections or monitoring. Filtering becomes the preferred inference task
for inspection and maintenance planning problems, as a decision is We present herein an alternative DBN in which a stochastic deteri­
taken at time t supported by evidence gathered from the initial time step oration process is represented as a function of the deterioration rate.
t0 up to time t. The belief state, defined as the probability distribution This model is adopted from [47] and denoted here as deterioration rate
over states, can be propagated and updated by applying the forward DBN. Fig. 3 graphically illustrates the model. In this case, the stochastic
operation from the forward–backward algorithm [45]. The transition deterioration process is described in time t by the nodes dt , conditional
algorithmic step of the forward operation is assumed to be Markovian, on the deterioration rate τt . If the stochastic process is stationary, the
being therefore equivalent to the underlying transition model of a deterioration evolution will vary equally over time, and thus the dete­
POMDP. More details on the formulation of POMDP transition models rioration rate τt is not utilized. The deterioration does not, however,
are introduced in Section 4.1. progress equally over time in a non-stationary process, and in that case,
At time step t0 , the initial belief corresponds to the joint probability the parameter τt needs to be incorporated to effectively model the
of the initial damage and time-invariant parameters P(dt0 , θt0 ). The for­ varying deterioration effects over time. After collecting experimental or

4
P.G. Morato et al. Structural Safety 94 (2022) 102140

methodologies often follow a similar logic as the theoretical scheme


presented in Section 2, where the decision tree is simplified.
Alternatively, the optimal I&M strategy among different alternatives
can be identified with the support of DBNs in a simulation environment.
Any of the proposed DBN types (Sections 3.1 and 3.2) can be generalized
to an influence diagram by adding utility and decision nodes [24]. The
total cost CT for a set of pre-defined heuristic rules ha,e can be computed
by simulating one episode ep of length tN as:

tN
[ ]
CTep (h) = Ci (t)γ t + Cr (t)γt + ΔPF (t)Cf γt (10)
t=t0

Fig. 3. Deterioration rate dynamic Bayesian network, derived from [47]. The The total expected cost E[CT (h)] is then computed with a Monte Carlo
evolution of a stochastic deterioration process is represented by the nodes dt simulation of nep episodes (policy realizations):
dependent on the deterioration rate τt . Imperfect observations are included ∑nep [ ]
through the nodes ot , and Ft binary node indicates the probability of failure and ep=1 CTep (h)
survival events.
E[CT (h)] = (11)
nep

physically-based simulated data (e.g. Monte Carlo simulations) from a One can compute the costs of all pre-defined heuristic rules and identify
non-stationary deterioration process, the transition probabilities can be the strategy with the minimum expected cost as the optimal policy.
calculated, for each deterioration rate τt , by counting the number of However, the resulting optimal policies might be compromised due to
transitions from dt to dt+1 over the total data available in dt . Additional the limited space covered by the imposed heuristic rules, out of all
methods to compute the transition model are described in [47]. As possible decision rules.
illustrated in Fig. 3, imperfect observations are added through the nodes
ot and the structural reliability is indicated through the node Ft . 4. Optimal I&M planning through POMDPs
To ensure compliance with the DBNs time invariant property, the
belief incorporates both the damage condition and deterioration rate We propose herein a methodology for optimal I&M planning of
through the joint probability P(dt , τt ). Yet, the node τt is a zero-one deteriorating structures under uncertainty based on Partially Observable
vector (one-hot) that transitions each time step from one deterioration Markov Decision Processes (POMDPs). The methodology is adopted by
rate τi to the next τi+1 . The deterioration evolution is computed by a similar frameworks, as studied in [49]. While the damage evolution was
forward operation in a similar manner as for the parametric DBN. modeled in [49] as function of its deterioration rate, following the
Initially, the belief corresponds to the joint probability P(d0 , τ0 ). Sub­ formulation presented in Section 3.2, we extend here the methodology
sequently, the belief experiences a transition according to the transition to deterioration mechanisms modeled as functions of time-invariant
matrix P(dt+1 , τt+1 |dt , τt ): parameters, formulated according to Section 3.1. In addition, the user
penalty is defined in this work as a consequence of the annual failure
, τt+1
P(dt+1∑ ∑|o0 , …, ot )
(8) probability experienced by the component.
= P(dt+1 , τt+1 |dt , τt ) P(dt , τt |o0 , …, ot )
dt τt
A Markov decision process (MDP) is a 5-tuple 〈S, A, T, R, γ〉 controlled
stochastic process in which an intelligent agent acts in a stochastic
Based on the gathered observations, the beliefs are then updated by environment. The agent observes the component at state s ∈ S and takes
applying Bayes’ rule. The likelihood P(ot+1 |dt+1 ) can be directly defined an action a ∈ A, then the state randomly transitions to state s ∈ S ac­

from probability of detection curves or other observation uncertainty


cording to a transition probability model T(s,a,s ) = P(s |s,a), and finally
′ ′

measures:
the agent receives a relevant reward Rt (s, a), where t is the current de­
P(dt+1 , τt+1 |o0 , …, ot+1 ) ∝ P(ot+1 |dt+1 )P(dt+1 , τt+1 |o0 , …, ot ) (9) cision step.
As described in Section 1, the optimal decisions result in a mini­
The computational complexity is influenced by the belief size. For
mum expected cost. The expected cost, or value function, is expressed
the case of a deterioration rate DBN, the belief P(dt , τt ) is composed of
for a finite horizon MDP as the summation of the decomposed rewards
|τ|⋅|d| states and its sparse transition matrix P(dt+1 , τt+1 |dt , τt ) accounts
V(s0 ) = Rt0 + … + RtN− 1 γtN− 1 , from time step t0 up to the final time step
for (|τ| |d|)2 elements. Since the only non-zero probabilities of the tN− 1 . For an infinite or unbounded horizon MDP, the rewards are
transition matrix P(τt+1 |τt ) are the ones to define the transition from infinitely summed up (tN = ∞). Note that the rewards are discounted
deterioration rate τt to the next deterioration rate τt+1 , the maximum by the factor γ. From an economic perspective, the discount factor
density of P(dt+1 , τt+1 |dt , τt ) is ρDR = 1/|τ|. converts future rewards into their present value. Computationally,
Advantages between a parametric DBN and a deterioration rate one discounting is also necessary to guarantee convergence in infinite
are case dependent. If the deterioration process can be modeled by just horizon problems.
few parameters or it evolves over a long time span, the parametric DBN An MDP policy (π : S→A) prescribes an action as a function of the
is recommended. However, if the deterioration modeling involves many current state. The main goal of an MDP is the identification of the
parameters or complex random processes spanning over a short time optimal policy π* (s) which maximizes the value function V * (s). There
horizon, the deterioration rate DBN should be preferred. If both DBN exist efficient algorithms that compute the optimal policy using the
models are applied for the same problem, the results should be equiv­ principles of dynamic programming and invoking Bellman’s equation.
alent and differences are only affected by the discretization error. Both value and policy iteration algorithms can be implemented to
identify the optimal policy π* (s) [50]. While the state of the component
3.3. Risk-based inspection planning and DBNs in an MDP is known at each time step, imperfect observations are usually
obtained in real situations, e.g. noise in the sensor of a robot, mea­
While DBNs can be successfully used for reliability updating, they do surement uncertainty of an inspection, etc. POMDPs are a generalization
not possess by themselves intrinsic optimization capabilities. To this of MDPs in which the states are perceived by the agent through imper­
end, modern RBI methodologies include a combination of DBNs and fect observations. The POMDP becomes a 7-tuple 〈S,A,O,T,Z,R,γ〉. While
heuristic rules to identify the optimal strategy [43,48]. The the dynamics of the environment are the same as for an MDP, an agent

5
P.G. Morato et al. Structural Safety 94 (2022) 102140

should be first performed for continuous random variables, transforming


them to the discrete state space. As mentioned in Section 3, an efficient
discretization has to balance model fidelity and computational
complexity.
To construct an infinite horizon POMDP equivalent to the DBN
parametric model presented in Section 3.1, the states St = Sdt × Sθ are
assigned as the domain instances of the joint probability P(dt , θ).
POMDPs are often represented in robotics applications by Markov hid­
den models containing only one hidden random variable. This has
induced some confusion in the literature, where it is reported that
POMDPs cannot handle deterioration mechanisms as function of time-
invariant parameters [51]. However, a deterioration represented by
time-invariant parameters can be easily modeled with POMDPs by
augmenting the state space to include the joint probability distribution
P(dt , θ). While state-augmentation techniques have been already pro­
posed in the literature [49,52,53], we particularly augment the state
space here in order to specify the POMDP dynamics based on deterio­
Fig. 4. Graphical representation of a Partially Observable Markov Decision
Process (POMDP). The states St are modeled as the joint distribution of the
ration processes modeled as parametric DBNs that also include time-
time-invariant parameters θt and the damage size dt . The imperfect observa­ invariant parameters. This approach can also accommodate formula­
tions are modeled by the node ot . Actions at are represented by rectangular tions with model updating. Naturally, augmenting the state space im­
decision nodes and rewards Rt are drawn with diamond shape nodes. A dete­ plies an increase of computational complexity, as is the case for both
rioration rate POMDP can be graphically modeled by adding a deterioration DBNs and POMDPs.
rate variable τt instead of the time-invariant parameters θt . If the deterioration rate model (Section 3.2) is instead preferred, the
states St = Sdt × Sτt are defined directly from the domain of the joint
collects an observation o ∈ O in the state s ∈ S with emission probability

probability P(dt , τt ). The implementation for this case is documented in
Z(o, s , a) = P(o|s , a), after an action a ∈ A is taken. Fig. 4 shows the
′ ′ [49]. At the initial time step, one can prescribe the initial belief b0 as the
dynamic decision network of a POMDP, which is built based on a joint probability P(dt=0 , θ) or P(dt=0 , τ0 ).
parametric model. A deterioration rate POMDP can be equally repre­
sented if one replaces the time-invariant parameters θ by a deterioration 4.1.2. Action-observation combinations
rate variable τ. Actions a ∈ A correspond to maintenance actions, such as “do-
Since an agent is uncertain about the current state, the decisions nothing”, “perfect-repair” or “minor-repair”, and observation action
should in principle be planned based on the full history of observations e ∈ E are defined based on the available inspection or monitoring
o1 : ot , up to the current decision step t. Instead, a belief state b(s) is techniques, such as “no-observation”, “visual-inspection” or “Nonde­
tracked to plan the decisions. A belief state b(s) is a probability distri­ structive Evaluation (NDE)-inspection”.
Since rewards are assigned as a result of an agent who takes an action
bution over states and it is updated as a function of the transition T(s , a, s)

and perceives an observation, it is recommended to combine actions and


and collected observation Z(o,s ,a):

observations into groups [49]. For instance, one can combine the action
∑ ′
′ ′ ′
b (s ) ∝ P(o|s , a) P(s |s, a)b(s) (12) “do-nothing” with two inspections, resulting in the two combinations:
s∈S “do-nothing/visual-inspection” or “do-nothing/NDE-inspection” and a
relevant reward will be assigned to each combination.
The normalizing constant P(o|b, a) is the probability of collecting an
observation o ∈ O given the belief state b and action a ∈ A. 4.1.3. Transition probabilities
One can see in Eq. (12) that for a specific action a ∈ A, updating a A transition matrix T(s, a, s ) models the transition probability P(s |s, a)
′ ′

belief is equivalent to the forward operation described for DBNs in Eqs.


of a component from state s ∈ S to state s ∈ S after taking an action a ∈ A.

(6)–(9). Yet, the main objective of a POMDP is to identify the optimal


Therefore, the transition matrix is constructed as a function of the main­
policy π* (b) as a function of the belief state b. Since the belief state is a tenance actions:
sufficient statistic equivalent to the history of all taken actions and
gathered observations, a policy π* (b) as function of b will always be • Do-nothing (DN) action: there is no maintenance action planned in
optimal, as compared to a policy π(h) constrained by a limited set of this case and the state evolves according to the stochastic deterio­
heuristic rules ha,e . This is also demonstrated through numerical exper­ ration process. For an infinite horizon POMDP, the transition prob­
iments in Section 5. In Section 4.1, POMDP implementation details are ability T(s, aDN , s ) is equal to the transition matrix P(dt+1 , θt+1 |dt , θt )

provided and in Section 4.2, we explain how point-based solvers are able or P(dt+1 , τt+1 |dt , τt ), derived in Section 3.
to solve high-dimensional state space POMDPs and find the optimal • Perfect repair (PR) action: a maintenance action is performed and the
strategies. component returns from its current damage belief bt , at time step t, to
the belief b0 , associated with an intact status. In a belief space
4.1. POMDP model implementation environment, a perfect repair transition matrix is defined as:
⎛ ⎞
A systematic scheme for building a POMDP model in the context of b0 (s0 ) b0 (s1 ) ⋯ b0 (sk )
⎜ b0 (s0 ) b0 (s1 ) ⋯ b0 (sk ) ⎟
optimal inspection and maintenance planning is provided in this section. (13)

P(s |s, aPR ) = ⎜
⎝ ⋮

⋮ ⋱ ⋮ ⎠
A POMDP is built by defining all the elements of the tuple 〈S,A,O,T,Z,R,γ〉.
b0 (s0 ) b0 (s1 ) ⋯ b0 (sk )
While most of the reported applications of POMDPs for infrastructure
planning employed a deterioration rate model [49], a parametric model Since the belief state is a probability distribution, the summation
as presented in Section 3.1 is originally implemented here. ∑
over all the states is equal to one ( bt (s) = 1). If one multiplies a
belief state by the transition matrix defined in Eq. 13, the current
4.1.1. States belief returns to the belief b0 , independently of its current condition
For the typical discrete state MDP/POMDP cases, a discretization

6
P.G. Morato et al. Structural Safety 94 (2022) 102140

as: ∑{ }
(16)

CF (s, aDN− NO ) = P(s |s, aDN− NO ) Cf − C(s, aDN− NO )
(14)

b0 (s) = bt (s) P(s |s, aPR ) ′
s ∈SF

If the states are fully observable, the belief state becomes a zero-one
vector and a perfect repair matrix can be formulated as P(s0 |st ,aPR ) = • Do-nothing/observation (DN/O): the cost is equal in this case to the
1, transferring any state st to the intact state s0 . one related failure risk plus one inspection cost. Both discrete and
• Imperfect repair (IR) action: a maintenance action is performed and continuous indications can be included in this category. One can
the component returns from a damage belief bt to a healthier damage therefore compute the cost CO (s, aDN− O ) just by further considering
state or more benign deterioration rate. The definition of the repair the inspection cost Ci :
transition matrix P(st+1 |st , aIR ) is thus case dependent. Some exam­
CO (s, aDN− O ) = CF (s, aDN− NO ) + Ci (17)
ples can be found in [49].

4.1.4. Observation probabilities • Repair/no-observation (R/NO): the cost CR (s, aR− is equal to the
NO )
An observation matrix Z(o, s , a) quantifies the probability P(o|s , a) of
′ ′

repair cost Cr :
perceiving an observation o ∈ O in state s ∈ S after taking action a ∈ A.

Note that we denote the observation action as a to be coherent with CR (s, aR− NO ) = Cr (18)
usual POMDP formulation; yet the observation action could be also
The cost CR (s, aR− O ) for a repair/inspection combination can be
named as e to be consistent with the nomenclature used in Section 2.1.
similarly defined by including also the inspection cost Ci along with
The relevant observation actions considered here are:
the repair cost CR (s, aR− NO ).
• No observation (NO): the belief state should remain unchanged after
the transition as no additional information is gathered. The emission 4.2. Point-based POMDP solvers
probability P(o|s , aNO ) can be modeled as a uniform distribution over

all observations. Alternatively, it can be modeled as P(o0 |s ,aNO ) = 1. In principle, one could apply a value iteration algorithm [54] to solve

The former is recommended as it will speed up the computation [49]. a POMDP. While value updates are computed in a |S|-dimensional
• Discrete indication (DI): the likelihood P(o|s , aDI ) is modeled as a discrete space for an MDP, value updates for POMDPs should be instead

discrete event, for instance, a binary indication: detection or no- computed in a (|S| − 1)-dimensional continuous space. The computation
detection. The likelihood is usually quantified for the binary case thus scales up considerably with the number of dimensions, increasing
by a Probability of Detection (PoD) curve. A PoD(s ) is equivalent to the computational complexity. This fact is denoted as the curse of

the probability P(oD |s ) of collecting an observation oD ∈ O as func­



dimensionality. Moreover, planning in a horizon tN also suffers from the
tion of the state s ∈ s, and the emission probability can be directly

curse of history, as the number of potential action-observation histories
implemented as P(oD |s , aDI ) = PoD(s ). The implementation can be
′ ′ scales exponentially with the number of time steps. Hence, solving
equally applied for a higher dimensional discrete observation space. POMDPs by applying a value iteration algorithm to the whole belief
• Continuous indication (CI): the likelihood P(o|s , aCI ) is modeled as a
′ state space B, or even to a discretized belief space grid, becomes
continuous distribution, for example, a direct measure of a crack. In computationally intractable for practical problems.
this case, the observation space must be discretized into a finite set of Relatively recent, however, point-based solvers have emerged able to
observations. solve high-dimensional state space POMDPs. Point-based solvers
compute value updates only based on a representative set of belief
4.1.5. Rewards points. Several point-based solvers [30,31,55] have been presented in
An agent having a belief b, receives a reward R(b, a) after taking an the literature. Their main difference is their basis for selecting the set of
action a ∈ A and collecting an observation o ∈ O. In a MDP, the reward representative belief points. The reader is directed to [56] for a detailed
R(s, a) is defined as a function of the state, while in a POMDP, the reward analysis of point-based solvers applied to infrastructure planning
R(s, a) is weighted over the belief state b to finally obtain R(b, a): problems.
∑ In an I&M planning context, the main objective is to identify the
R(b, a) = b(s)R(s, a) (15) optimal policy, as explained in Section 2. Instead of constraining the
s∈S policy space with pre-defined decision rules, POMDPs’ main objective is
to find the sequence of actions a0 , …, at that maximizes the expected sum
For ease of notation, the reward model is formulated hereafter based on
of rewards for each belief b ∈ B. The value function is then formulated
the same notation used for the definition of the RBI cost model in Section
as a function of beliefs:
2. If desired, societal, environmental, and other consequences can also [ ]
be incorporated to the reward model. In the context of infrastructure ∑ ∑
V * (b) = max b(s)R(s, a) + γ P(o|b, a)V * (bs′ ) (19)
planning, the state cost C(s, a, s ) is defined depending on the action-

a∈A
s∈S o∈O
observation combination. Some recommendations are listed below:
It is demonstrated in [57] that the value function is piece-wise linear and
• Do-nothing/no-observation (DN/NO): this case corresponds to convex when it is solved exactly. The piece-wise linearity property is
computing the failure risk. Once the failure state subspace SF ⫅S is related to an effective value function parametrization by a set of hyper-
′ ⃒
defined, the annual failure probability is the probability P(S ⃒S) of F planes or α-vectors ∈ Γ, each of them associated with an action a ∈ A.
reaching any state in the failure state subspace SF from the state space The optimal policy π* (b) can be selected by identifying the α-vectors that

S. Alternatively, Eq. 16 defines the cost CF (s, aDN− NO ) only as a function maximize the value function V * (b):
of the initial state s ∈ S, if the transition matrix P(s |s, a) is implicitly ∑

V * (b) = max α(s)b(s) (20)


considered. This option leads to a faster computation with a point- α∈Γ
s∈S
based solver, as explained subsequently. The cost value C(s, aDN− NO )
is equal to the failure cost Cf if s ∈ SF , and equal to 0, otherwise: The convexity property now is associated with the value of information
theory [58], i.e. lower-entropy states result in better decisions and as
such have higher expected values than higher-entropy states. Both of

7
P.G. Morato et al. Structural Safety 94 (2022) 102140

these properties of piece-wise linearity and convexity can be easily Table 1


visualized in up to 4D state spaces, e.g. in [34]. Naturally, in applica­ Random variables and deterministic parameters utilized to model fatigue
tions where the state space is augmented, as explained in Section 4.1, the deterioration.
belief still remains a probability over states and the value function Variable Distribution Mean Standard Deviation
preserves its piece-wise linearity and convexity at this newly defined, ln(CFM ) Normal − 35.2 0.5
enhanced state space. Normal 70 10
SR (N/mm2 )
d0 (mm) Exponential 1 1
4.2.1. Finite horizon POMDPs
m Deterministic 3.5 –
Existing point-based solvers are mostly able to solve large state space
n(cycles) Deterministic 106 –
problems for infinite horizon POMDPs [59]. However, an infinite hori­
tN (yr) Deterministic 30 –
zon POMDP can be transformed to a finite horizon one by augmenting
dc (mm) Deterministic 20 –
the state space, as proposed by [12,34,49]. In this case, the time must be
encoded in the state space and a terminal state is required. Note that the
resulting transition, observation and reward matrices will be very thickness dc and its considered life spans over a finite horizon tN of 30
sparse. Yet, it remains essential to augment the space efficiently by years.
taking into consideration the nature of the decision-making problem.
Some recommendations are listed below:
5.1. Discretization analysis
• Parametric model: the transition model is stationary. Then, the same
transition matrix built for an infinite horizon POMDP can be reused A discretization analysis is performed to select an appropriate state
for any time step of the augmented, finite horizon POMDP. To ensure space for this application. As explained in Section 3, either a parametric
a finite horizon, the last time step must include an absorbing state. model or a deterioration rate model can be used to track the deterio­
An infinite horizon POMDP with |S| states and |A| actions can be ration. The transition models are calculated, for both DBN models, based
augmented to a |A| |S| tN +|S| +1 finite horizon one with horizon tN . on data collected from simulations of the fracture mechanics model in
• Deterioration rate model: the state space can be efficiently formatted Eq. (21). The POMDPs associated with these models are graphically
if the component experiences only one deterioration rate per time represented in Fig. 5. Note that the parameters CFM and SR are grouped
step. This way, one deterioration rate is considered at the first time together for the parametric model, resulting in a new parameter K. By
step, two deterioration rates at the second time step, and so on, combining two random variables into one, we alleviate computational
incorporating one additional deterioration rate per step until the last efforts [23]. K thus corresponds to CFM Sm Rπ
m/2
n.
time step is reached. An absorbing state must also be included at the The main purpose of a proper discretization is to allocate the relevant
end. A deterioration rate model with |Sd | states, spanning over a tN intervals so that a high accuracy is achieved, without hindering
horizon and two actions (do-nothing and one maintenance action) computational tractability. Although several simulations were run, the
becomes a finite horizon POMDP with reported results are mainly related to the case in which two inspections
⃒ ⃒ ⃒ ⃒
{(tN + 1)2 ⃒Sd ⃒ +(tN +1)⃒Sd ⃒}/2 +1 states. Additional maintenance are planned at years 18 and 25, resulting in a no-detection outcome. The
actions can be included without an increase of the state space if they inspection quality is quantified with a probability of detection curve
do not introduce additional/new deterioration rates. PoD(d) ∼ Exp[μ = 8]. A crude Monte Carlo Simulation (MCS), contain­
ing 107 samples, was run to estimate the cumulative failure probability
5. Numerical experiments: Crack growth represented by time- PFMCS (Eq. (5)). The accuracy is quantified here as the squared difference
invariant parameters between PFMCS and the cumulative failure probability PF retrieved by
each discretized state space model. PF was obtained by unrolling a DBN
With the main objectives of providing implementation details for the over time. Note that PF can be calculated directly through a DBN, as the
two presented POMDP formulations, as well as quantifying the differ­ probability of being in the failure states of d. Both PFMCS and PF are
ences in policies and costs between POMDP and heuristic-based I&M normalized to PF = (PF − μPF− MCS )/σ PF− MCS , where μPF− MCS and σ PF− MCS are the
approaches, a set of numerical experiments is performed in this section. mean and standard deviation of the failure probabilities computed by
All computations are conducted on an Intel Core i9 − 7900X processor MCS, respectively. The error ξ is computed as the squared difference of
with a clock speed of 3.30 GHz. The experiments consist in identifying PFMCS and PF for each time step:
the optimal I&M strategy for a structural component subjected to fatigue
N [ () ( )]2
deterioration. In particular, the first presented I&M planning setting (in ξ=

PFMCS t − PF t (22)
Section 5.2) is inspired by an earlier investigation of risk-based main­ t=0
tenance planning methods [51]. In that study, the fatigue deterioration
Table 2 lists the discretization intervals for both parametric and
model was approximated by a 2-parameter Weibull distribution,
deterioration rate models. Since the discretization is arbitrary, the in­
whereas a physically-based crack growth model is directly utilized here.
terval boundaries were selected by trial and error, according to the
According to this fracture mechanics model, the crack size dt+1 is
recommendations proposed in [23], i.e. a logarithmic transformation is
computed as a function of the crack size at the previous time step dt :
applied to both Sd and Sk spaces. Different state spaces were also tested
[( m) ]2/(2− m)
by varying the number of states for |K| and |d|. Table 3 reports the error ξ
dt+1 = 1− CFM SmR πm/2 n + d1−t m/2
(21)
2 for each considered state space. While the deterioration rate model of
This Markovian model is derived from Paris’ law, as shown in [39]. 930 overall states results in an error of magnitude less than 10− 3 , the
The process uncertainty is incorporated through the random variables state space of the parametric model is increased up to 16,000 overall
listed in Table 1, where SR stands for stress range, CFM corresponds to a states to achieve an error of magnitude less than 10− 3 . To illustrate the
crack growth parameter, and d0 represents the initial crack size. While differences between the analyzed models, Fig. 6 shows the unnormal­
⃒ ⃒
the crack distribution evolves over time, the parameters CFM and SR are ized error ⃒PFMCS − PFDBN ⃒ for each case. The error of the deterioration rate
time-invariant random variables. The remaining parameters, i.e. the model is negligible before the first inspection update at 18 years, while
crack growth parameter m and the number of cycles n are considered the parametric model accumulates error throughout the whole analysis.
deterministic. The component fails once the crack exceeds the plate In general, the selection of the discretized model will depend on the
available computational resources and required accuracy. For this

8
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 5. Graphical representation of the POMDPs utilized for the numerical experiments. A parametric POMDP and a deterioration rate POMDP are created from the
DBNs displayed in Figs. 2 and 3, respectively. Note that the random variables CFM and SR are combined into the variable K.

application, the deterioration rate model with 930 states is utilized for making problem is solved here by both POMDPs and heuristics. For the
the numerical experiments, due to its reduced state space as compared to case of POMDPs, point-based solvers provide a theoretical guarantee to
the parametric models. optimality, whereas RBI approaches can analytically compute the E[CT ]
from a simplified decision tree, as explained in Section 2. Alternatively,
the computation of the E[CT ] can be performed in a simulation envi­
5.2. Case 1. Traditional I&M planning setting ronment, in which the deterioration process is modeled by DBNs and the
costs are evaluated according to the predefined heuristic policies, as
The fatigue deterioration is modeled according to the time-invariant shown in Eq. (11). To equally compare the policies generated by POMDP
crack growth described at the beginning of Section 5. In this traditional and heuristics, the total expected costs E[CT ] are computed both on an
setting, the decision maker is only allowed to control the deterioration analytical basis and in a simulation environment.
by undertaking a perfect repair and is able to collect observations
through one inspection technique type. The perfect repair returns the 5.2.1. Analytical comparison
component to its initial condition d0 and the quality of the inspection Following the results of the discretization analysis, a finite horizon
technique is quantified with a PoD(d) ∼ Exp[μ = 8]. This I&M decision- (FH) POMDP is derived from the deterioration rate model with 930
states (|Sd | = 30 and |Sτ | = 31). Since the horizon spans over 30 years,
Table 2 the state space is augmented from 930 to 14,880 states, as explained in
Description of the discretization schemes considered in the sensitivity analysis, Section 4.2. Actions and observations are combined into three action-
for both parametric and deterioration rate POMDP models. observation groups: (1) do-nothing/no-inspection, (2) do-nothing/
Variable Interval boundaries inspection, and (3) perfect-repair/no-inspection. The fourth combina­
tion (repair/inspection) is not included as it will hardly be the optimal
Parametric model
Sd
{ (
) ln(dc ) − ln(10− 1 )
)} action at any time step. A total of three representative experiments are
0, exp ln 10− 1 : : ln(dc ,∞ conducted, assigning different inspection, repair and failure costs to
{ ( |S d | − 2 )}
SK − 5
) ln(1) − ln(10− 5 ) each of them. Each experiment is characterized by a different ratio be­
0, exp ln 10 : : ln(1 , ∞
|SK | − 2
tween repair and inspection costs RR/I , as well as the ratio between
Deterioration rate model
{ ( )} failure and repair costs RF/R . Since these ratios are of relevance in this
Sd ) ln(dc ) − ln(10− 4 )
0, exp ln 10− 4 :
|Sd | − 2
: ln(dc ,∞ work, analyzing the problem from an optimization perspective, an
Sτ 0 : 1 : 30 explicit separation of economic, societal, and environmental conse­
quences and their scaling to monetary units is not considered. The
SARSOP point-based POMDP solver [30] is used for the computation of
the optimal I&M policies. Additionally, the policies from FRTDP [31]
Table 3
and Perseus [55] point-based solvers are computed specifically for
Accuracy of the considered discretization schemes. The normalized error ξ and
state spaces corresponding to each parameter are reported.
experiment RR/I 50 − RF/R 20. In this theoretical comparison, the expected
costs are computed based on the lower bound alpha vectors, as
Model
explained in Section 4.2.
|SK | |Sτ | |Sd | |S| ξ

Deterioration rate (DRd15 ) – 31 15 465 8.6⋅10− 3 In contrast, the optimal RBI policies are determined based on the best
Deterioration rate (DRd30 ) – 31 30 930 2.1⋅10− 4 identified heuristic decision rules. For this theoretical comparison, the
Parametric (PARK50− d40 ) 50 – 40 2,000 7.1⋅10 − 2 decision tree is simplified to a single branch with two schemes consid­
Parametric (PARK50− d80 ) 50 – 80 4,000 7.2⋅10− 3 ered here: equidistant inspections (EQ-INS) and annual failure proba­
Parametric (PARK50− d160 ) 50 – 160 8,000 3.4⋅10− 3 bility ΔPF threshold (THR-INS). For the maintenance actions, the
Parametric (PARK100−
component is perfectly repaired after a detection indication, behaving
d80 ) 100 80 8,000 2.5⋅10−
– 3
thereafter as if a crack was not detected at that inspection. The opti­
Parametric (PARK100− d160 ) 100 – 160 16,000 4.3⋅10− 4
mized heuristics for each experiment are listed in Table 4, e.g. an

9
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 6. Error |PFMCS − PFDBN | between the continuous deterioration model and the considered discrete space models. The continuous model is computed by a Monte
Carlo simulation of 10 million samples and is compared with discrete state-space DBN models. The circles in the graph represent the error from deterioration rate
models and the squares represent the error from parametric models.

Table 4
Analytical (AN) and simulation-based (SIM) comparison between POMDPs and optimized heuristic-based policies in a traditional setting. E[CT ] is the total expected
cost and Δ%[POMDP FH] indicates the relative difference between each method and SARSOP finite horizon POMDP. Confidence intervals on the expected costs,
assuming Gaussian estimators, are listed for the simulation-based cases.
Traditional setting E[CT ] (95%C.I) Δ%[POMDP FH]

Experiment RR/I 20 − RF/R 100


Ci = 5, Cr = 102 , Cf = 104 , γ = 0.95
AN: POMDP Finite horizon. SARSOP – Lower bound 58.35 –
AN: Heur.* EQ-INS ΔIns = 4 69.17 +18%
AN: Heur.* THR-INS ΔPFth = 3⋅10− 4 65.62 +12%
SIM: POMDP Infinite horizon. SARSOP – 30 years** 60.23 (±0.76) +3%
SIM: Heur. EQ-INS ΔIns = 4 69.02 (±0.83) +18%
SIM: Heur. THR-INS ΔPFth = 3⋅10− 4 64.81 (±0.75) +11%

Experiment RR/I 10 − RF/R 10


Ci = 1, Cr = 10, Cf = 102 , γ = 0.95
AN: POMDP Finite horizon. SARSOP – Lower Bound 2.25 –
AN: Heur.* EQ-INS no inspections 2.25 +0%
AN: Heur.* THR-INS no inspections 2.25 +0%
SIM: POMDP Infinite horizon. SARSOP – 30 years** 2.50 (±0.02) +11%
SIM: Heur. EQ-INS no inspections 2.25 (±0.00) +0%
SIM: Heur. THR-INS no inspections 2.25 (±0.00) +0%

Experiment RR/I 50 − RF/R 20


Ci = 1, Cr = 50, Cf = 103 , γ = 0.95
AN: POMDP Finite horizon. SARSOP – Lower Bound 12.45 –
AN: POMDP Finite horizon. FRTDP – Lower Bound 12.45 +0%
AN: POMDP Finite horizon. PERSEUS – Lower Bound 12.96 +4%
AN: Heur.* EQ-INS ΔIns = 11 17.06 +37%
AN: Heur.* THR-INS ΔPFth = 1⋅10− 3 16.69 +34%
SIM: POMDP Infinite horizon (DR). SARSOP – 30 years** 12.99 (±0.24) +4%
SIM: POMDP Infinite horizon (PAR). SARSOP – 30 years** 13.08 (±0.23) +5%
SIM: Heur. EQ-INS ΔIns = 11 16.28 (±0.19) +31%
SIM: Heur. THR-INS ΔPFth = 1.5⋅10− 3 16.43 (±0.20) +32%
SIM: Heur. EQ-INS*** ΔIns = 5 14.17 (±0.26) +14%
SIM: Heur. THR-INS*** ΔPFth = 8⋅10− 4 13.29 (±0.23) +7%

*
The decision tree is simplified to one single branch, as explained in Section 2.1.
**
Simulation of an infinite horizon POMDP policy over a horizon of 30 years.
***
Perfect repair actions are undertaken after two consecutive ‘detection’ observations.

10
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 7. Point-based POMDP solutions for Experi­


ment RR/I 50 − RF/R 20. The expected total cost E[CT ]
is represented over the computational time. Results
of SARSOP, FRTDP and Perseus point-based POMDP
solvers are plotted, with a continuous line for the
low bound and a dashed line for the upper bound.
Optimized heuristic methods are represented by
markers; the equidistant inspection planning
scheme in red, and the annual failure probability
threshold in black. The markers also indicate
whether the investigated heuristics plan performs
repair after observing one detection outcome,
pRP − D, or after the collection of two consecutive
detection outcomes, pRP − 2D.

inspection every 4 years (ΔIns = 4) is identified as the optimal equidis­ The expected utility E[CT ] is estimated according to Eq. (11).
tant inspection heuristic (EQ-INS) for Experiment RR/I 20 − RF/R 100. Table 4 lists the results of the comparison and given that the expected
The total expected cost E[CT ] resulting from finite horizon POMDPs cost E[CT ] is estimated through simulations, the numerical confidence
and the best identified heuristics are listed in Table 4. Along with the bounds are also reported, assuming a Gaussian estimator. All the
E[CT ], the relative difference between each method and the finite hori­ methods are compared relatively to the finite horizon POMDP that again
zon POMDP is also reported, and Table 4 demonstrates that finite ho­ outperforms the heuristic-based policies. The reduced state-space
rizon POMDP policies outperform heuristic-based policies. Even for this infinite horizon POMDP policy results in only a slight increment to the
traditional I&M decision-making problem, POMDPs provide a signifi­ total expected cost obtained by the finite horizon POMDP, in this finite
cant cost reduction ranging from 11% in Experiment RR/I 20 − RF/R 100 to horizon problem. The optimal policy for an infinite horizon in experi­
37% reduction in Experiment RR/I 50 − RF/R 20. Experiment ment RR/I 20 − RF/R 100 includes the possibility of maintenance actions,
RR/I 10 − RF/R 10 is merely conducted to validate the comparative results whereas the policy for a finite horizon prescribes only the action do-
by checking that all the methods provide the same results for the case in nothing/no-inspection. This explains the slight difference of expected
which repairs and inspections are very expensive relatively to the failure costs for the infinite horizon POMDP. The infinite horizon POMDP for a
cost. parametric model of 16,000 states is also computed and listed in Table 4
As pointed out in Section 4.2, point-based solvers are able to rapidly for the experiment RR/I 50 − RF/R 20. As expected,the E[CT ] for the para­
solve large state-space POMDPs. This is demonstrated in Fig. 7, where metric (PAR) model results in good agreement with the deterioration
SARSOP outperforms heuristic-based schemes in less than one second of rate (DR) model and the small difference is attributed to the dis­
computational time. Note that POMDP policies are based on the lower cretization quality.
bound, whereas the upper bound, when provided, is just an approxi­ Finally, we showcase policy realizations to visualize the difference
mation, to optimally sample reachable belief points [56]. between POMDPs and heuristic-based policies over an episode, related
to the experiment RR/I 50 − RF/R 20. Fig. 8a and b represent realizations of
5.2.2. Comparison in a simulation environment POMDP policies, whereas, Fig. 8c and d represent the realizations of
In this case, the total expected cost E[CT ] is evaluated in a simulation heuristic-based policies. While heuristic-based policies prescribe a
environment. Since the horizon can be controlled in a policy evaluation, repair action immediately after a detection, POMDP-based policies
infinite horizon POMDPs are also included in this comparison. The might also consider a second inspection after a detection outcome. If the
infinite horizon POMDP is directly derived from the deterioration rate second inspection results in a no-detection outcome, a repair action may
model, and while the action-observation combinations remain the same not be prescribed; however, if the second inspection also results in
as for the finite horizon POMDP, the belief space is now reduced to 930 detection, a perfect repair is planned. POMDP-based policies provide,
states, offering a substantial reduction in computational cost, as therefore, more flexibility, in general, and can reveal interesting pat­
explained before. Note that even though policies generated by infinite terns, such that it might be worthy, in certain cases, to conduct a second
horizon POMDPs can be evaluated over a finite horizon, the policies are inspection before prescribing an expensive repair action. As such, based
truly optimal only in an infinite horizon setting. on analyzed POMDP policy patterns, heuristic rules can be informed and
In this comparison, the best heuristic-based I&M policy is also defined anew. As reported in Table 4, two additional heuristic rules are
identified by analyzing two inspection planning heuristics, as previ­ thus examined, where perfect repair actions are undertaken after two
ously, either based on equidistant inspections (EQ-INS) or based on an consecutive ‘detection’ observations. These modified heuristics yield
annual failure probability threshold (THR-INS). However, in this simu­ results closer to those provided by POMDP policies, with POMDP pol­
lation setting, the component naturally returns to its initial condition icies surpassing now the two heuristic ones by 7% and 14%, respec­
after a repair, instead of modeling its evolution as a no-detection event. tively. While an experienced operator might have initially guessed these
This operation might add a significant computational expense for more sophisticated heuristic decision rules, based on the imperfect and
analytical computations, if the decision tree is explicitly modeled; cheap observation model specified in this setting, in more complex
however, it can be easily modeled in a simulation-based environment. settings, e.g. an I&M planning scenario with inspections that provide

11
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 8. Experiment RR/I 50 − RF/R 20 policy realizations. The failure probability is plotted in blue and the prescribed maintenance actions are represented by black
bars. A detection outcome is marked by a cross, whereas a no-detection outcome is marked by a circle.

more than two indications (as shown in Section 5.3), decision makers PoI(d) ∼ Exp[μ = 7]; PoI(d) ∼ Exp[μ = 10]; and PoI(d) ∼ Exp[μ = 13].
might guide their choices for the selection of more advanced heuristic The probability of observing each indicator is represented in Fig. 9b as a
rules through an investigation of the patterns exposed by POMDP policy function of the crack size.
realizations. Similar to the previous case, we solve a finite horizon POMDP with
14,880 states to identify the optimal policy. However, in this setting,
5.3. Case 2. Detailed I&M planning setting actions and observations are combined into seven groups: (1) do-
nothing/no-inspection (DN-NI); (2) do-nothing/inspection-1 (DN-I1);
While only a perfect repair and one inspection technique have been (3) do-nothing/inspection-2 (DN-I2); (4) minor-repair/no-inspection
available for the traditional setting applications, two repair actions and (mRP-NI); (5) minor-repair/inspection-1 (mRP-I1); (6) minor-repair/
two inspection techniques are now available in this more complex case. inspection-2 (mRP-I2); and (7) perfect-repair/ no-inspection (pRP-NI),
Fatigue deterioration in this setting can be controlled by either per­ and analyses are conducted for a modified version of experiment
forming a perfect or a minor repair. The perfect repair returns the RR/I 50 − RF/R 20. The individual costs for this example are listed in
component to its initial condition and the minor repair transfers the Table 5. Inspection type-2 costs twice the cost of inspection type-1, as it
component two deterioration rates back. The two inspection techniques is more accurate and provides more information about the deterioration.
considered are inspection 1 (I1) with only 2 indicators: detection (D) or For this setting, heuristic inspection decision rules are prescribed
no-detection (ND); and inspection 2 (I2) with 5 indicators: no-detection considering again both equidistant inspections and annual failure
(ND), low damage (LD), minor damage (mD), major damage (MD) and probability ΔPF threshold schemes. All heuristics are evaluated in a
extensive damage (D). The quality of each inspection technique is simulation environment, computing the expected cost E[CT ], as indi­
quantified through probability of indication (PoI) curves. Fig. 9a cor­ cated in Eq. (11). Maintenance heuristic rules are accordingly defined
responds to the first inspection type with a PoD(d) ∼ Exp[μ = 8]. This considering the following two schemes:
inspection method is the same as the one used in the traditional I&M
planning setting. The second inspection method includes, however, the
following detection boundaries: PoI(d) ∼ Exp[μ = 4];

12
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 9. Quantification of the inspection uncertainty. The probability of retrieving each indicator is represented as a function of the crack size. For inspection type-1,
the observation model includes two indicators: “detection” D1 and “no-detection” ND1. For inspection type-2, the observation model is composed of five indicators:
“no-detection” ND2, “low damage” LD2, “minor damage” mD2, “major damage” MD2, and “extensive damage” D2.

• Observation-based maintenance rules: a maintenance action is un­ size threshold, E[d]. Threshold-based maintenance rules based on
dertaken after an observation. For example, a minor repair is un­ expected damage have also been evaluated against POMDPs in [61].
dertaken if a minor damage is observed. The number ⃒ of potential

observation-based maintenance rules scales to ⃒AR ||O| pairs, where, The expected costs E[CT ] resulting from both POMDP and heuristic-
|O| and |AR | are the number of observations and maintenance actions, based policies are reported in Table 5. Additionally, we list the rela­
respectively. If we consider inspection type-2, the heuristic rules tive difference between each policy and a finite horizon POMDP policy
result in 35 combinations. Such combinatoric heuristic rules, solved by SARSOP. In this detailed setting, POMDP-based policies
together with failure probability thresholds or intervals for in­ outperform again heuristic-based ones. In terms of POMDP-based pol­
spections, have been evaluated against POMDPs in [60]. Due to the icies, SARSOP and FRTDP achieve similar results. Results obtained from
large computational cost of evaluating all possible decision rules, we heuristic-based policies vary depending on their prescribed set of heu­
evaluated only a subset of these combinations here. The most ristics. For equidistant inspection planning, inspection type-1 is
competitive set of heuristic rules for this case are listed in Table 5, e. preferred rather than inspection type-2, because the inspections are
g. the optimized equidistant inspection type-1 heuristic (EQ-INS1) fixed in time, and the additional information provided by inspection
prescribes an inspection every 11 years (ΔIns = 11), and a perfect type-2 becomes too expensive. In contrast, inspection type-2 is the best
repair after a detection observation (pRP-D1). scheme for annual failure probability threshold inspection planning. The
• Threshold-based maintenance rules: a maintenance action is under­ threshold-based maintenance heuristics proved to be better than
taken when a specific threshold is reached after an observation. The observation-based heuristics, yet threshold-based maintenance heuris­
threshold can be prescribed in terms of failure probability PF or ex­ tics imply additional computational costs, as generally, more heuristic
pected damage size, as proposed in [46]. We consider both cases rules must be evaluated. Fig. 10 illustrates the expected cost E[CT ] of
here, i.e. a failure probability threshold PFth and an expected damage each policy as a function of the computational time. We can see how the
POMDP point-based solvers improve their low bounds in time, along
with the computational cost incurred by evaluating the various heuristic
Table 5 rules.
Comparison between POMDP and optimized heuristic-based policies in a To visualize the actions prescribed by each approach, Fig. 11 displays
detailed setting. E[CT ] is the total expected cost and Δ%[POMDP FH] indicates
a frequency histogram of the actions taken over 104 policy realizations.
the relative difference between each method and SARSOP finite horizon POMDP
The action do-nothing/no-inspecion (DN-NI) predominates over all
results. Confidence intervals on the expected costs, assuming Gaussian estima­
other actions. While heuristic policies conduct either inspection type-1
tors, are also listed.
(DN-I1) or inspection type-2 (DN-I2), the POMDP-based policy utilizes
Detailed setting E[CT ](95%C.I) Δ%[POMDP
both inspection types. This is also true for the maintenance actions, in
FH]
which heuristic policies prescribe only perfect repairs, whereas POMDP
Ci1 = 1, Ci2 = 2, CmRP = 10, CpRP = 50, Cf = 103 , γ = 0.95 policies choose sometimes to undertake minor-repairs (mRP) as well.
POMDP Finite Horizon (FH). SARSOP - Lower 12.26 –
Bound 6. Discussion
POMDP Finite Horizon (FH). FRTDP - Lower 12.30 <1%
Bound
Heur. EQ-INS1 ΔIns = 11; pRP-D1 16.23 (±0.19) +32% The results of this investigation show that POMDPs are able to
Heur. EQ-INS2 ΔIns = 11; pRP-D2 18.08 (±0.31) +47% identify optimal I&M policies for deteriorating structures and offer
Heur. THR-INS1 ΔPFth = 1.5⋅10− 3 ; pRP-D1 16.40 (±0.20) +33% substantially lower costs than heuristic-based policies, as is theoretically
15.55 (±0.21)
explained and justified, and as it has also been demonstrated through
Heur. THR-INS2 ΔPFth = 1.1⋅10− 3 ; pRP-D2 +26%
numerical examples in Sections 5.2 and 5.3. The policy optimization
Heur. THR-INS2 ΔPFth = 5.0⋅10− 4 ; pRP-PFth = 13.88 (±0.29) +13%
2.2⋅10− 2
based on heuristic-based approaches may be constrained by the limited
Heur. THR-INS2 ΔPFth = 13.66 (±0.24) +11%
number of decision rules assessed, out of all possible decision rules.
1.0⋅10− 3 ; pRP-E[d] > 4 Avoiding these limitations, POMDPs prescribe actions as a function of

13
P.G. Morato et al. Structural Safety 94 (2022) 102140

Fig. 10. Computational details of POMDP and simulation-based heuristic schemes in a detailed setting. The expected total costs E[CT ] are represented over the
computational time. Results of SARSOP and FRTDP point-based POMDP solvers are plotted, with a continuous line for the low bound and a dashed line for the upper
bound. Optimized heuristic policies results are reported by markers and are directly linked to the schemes shown in Table 5.

Fig. 11. Frequency histogram of the actions prescribed by each considered approach over 104 policy realizations. The policies presented here are linked to those
listed in Table 5.

the belief state, which is a sufficient statistic of the whole, dynamically gradually reach a converged solution. For both traditional and detailed
updated, action-observation history. This implies that the actions are settings, both SARSOP and FRTDP point-based solvers outperform
taken according to the whole history of actions and observations, rather heuristic-based policies after only few seconds of computational time.
than as a result of an immediate inspection outcome or pre-defined static For modeling the deterioration process, one can utilize either a
policies. parametric or a deterioration rate model, as explained in Section 2. A
As demonstrated in Section 5.3, POMDPs can be applied to detailed deterioration rate model generally results in a smaller state space than a
I&M decision settings, in which multiple actions and inspection methods parametric model, except for very long horizons. In this latter case, a
are available. In terms of computational efficiency, state-of-the-art parametric model might lead to a smaller state space, due to its sta­
point-based solvers are able to solve high-dimensional state space tionary nature. In any case, a discretization analysis must be conducted
POMDPs within a reasonable computational time. In particular, SARSOP to select the appropriate state model for the problem at hand. More ef­
point-based solver very quickly improves its policy at the beginning of forts are worth being made in the future towards continuous state space
the solution process and employs an approximate upper bound to POMDPs and optimal discretization schemes for discrete state spaces.

14
P.G. Morato et al. Structural Safety 94 (2022) 102140

7. Concluding remarks References

In this paper, we examine the effectiveness of Partially Observable [1] Frangopol DM. Life-cycle performance, management, and optimisation of
structural systems under uncertainty: accomplishments and challenges. Struct
Markov Decision Processes (POMDPs) to identify optimal Inspection and Infrastruct Eng 2011;7(6):389–413. https://doi.org/10.1080/
Maintenance (I&M) strategies for deteriorating structures, and we 15732471003594427. http://www.tandfonline.com/doi/abs/10.1080/
clarify that Dynamic Bayesian Networks (DBNs) can be combined with 15732471003594427.
[2] Stewart MG, Rosowsky DV. Time-dependent reliability of deteriorating reinforced
POMDPs, providing a joint framework for efficient inspection and concrete bridge decks. Struct Saf 1998;20(1):91–109. https://doi.org/10.1016/
maintenance planning. The formulation for deriving POMDPs in a S0167-4730(97)00021-0. https://linkinghub.elsevier.com/retrieve/pii/
structural reliability context is also presented, and two alternative DBN S0167473097000210.
[3] Akgül F, Frangopol DM. Lifetime Performance Analysis of Existing Steel Girder
formulations for deterioration modeling are described, together with Bridge Superstructures. J Struct Eng 2004;130(12):1875–88. https://doi.org/
their POMDP implementations. 10.1061/(ASCE)0733-9445(2004)130:12(1875).
Modern Risk Based Inspection (RBI) planning methodologies are [4] Val DV, Stewart MG, Melchers RE. Effect of reinforcement corrosion on reliability
of highway bridges. Eng Struct 1998;20(11):1010–9. https://doi.org/10.1016/
often supported by DBNs, and a pre-defined set of decision rules is
S0141-0296(97)00197-1. https://linkinghub.elsevier.com/retrieve/pii/
evaluated. These policies can on occasions diverge significantly from S0141029697001971.
globally optimal solutions, because of the limited domain space of [5] Val DV, Melchers RE. Reliability of Deteriorating RC Slab Bridges. J Struct Eng
searched policies that may not include the global optimum. In contrast, 1997;123(12):1638–44. https://doi.org/10.1061/(ASCE)0733-9445(1997)123:12
(1638).
POMDP policies prescribe an action as a function of the belief state, [6] Moan T. Reliability-based management of inspection, maintenance and repair of
which is a sufficient statistic of the whole action-observation history. offshore structures. Struct Infrastruct Eng 2005;1(1):33–62. https://doi.org/
I&M policies generated by finite horizon POMDPs are compared with 10.1080/15732470412331289314.
[7] Lotsberg I, Sigurdsson G, Fjeldstad A, Moan T. Probabilistic methods for planning
heuristic-based policies, for the case of a structural component subjected of inspection for fatigue cracks in offshore structures. Marine Struct 2016;46:
to fatigue deterioration. In the first example, the stochastic deterioration 167–92. https://doi.org/10.1016/j.marstruc.2016.02.002. https://linkinghub.
is modeled as a function of time-invariant parameters, with only one elsevier.com/retrieve/pii/S0951833916000071.
[8] Wirsching PH. Fatigue reliability in welded joints of offshore structures. Int J
inspection type and one perfect repair available. Our numerical findings Fatigue 1980;2(2):77–83. https://doi.org/10.1016/0142-1123(80)90035-3.
verify that POMDP-based policies can approximate the global solution [9] Schaumann P, Lochte-Holtgreven S, Steppeler S. Special fatigue aspects in support
better than heuristic-based policies, thus being more efficient even for structures of offshore wind turbines. Materialwissenschaft und Werkstofftechnik
2011;42(12):1075–81. https://doi.org/10.1002/mawe.201100913.
typical RBI applications. The 14,880 states finite-horizon POMDP out­ [10] Dong W, Moan T, Gao Z. Fatigue reliability analysis of the jacket support structure
performs heuristic-based policies in less than a second of computational for offshore wind turbine considering the effect of corrosion and inspection. Reliab
time. For the second numerical example, we consider an I&M decision- Eng Syst Saf 2012;106:11–27. https://doi.org/10.1016/j.ress.2012.06.011.
[11] Yeter B, Garbatov Y, Guedes Soares C. Fatigue damage assessment of fixed offshore
making problem in a more detailed setting, including two inspection
wind turbine tripod support structures. Eng Struct 2015;101:518–28. doi:10.1016/
methods and two repair actions. Whereas the outcome of the first in­ j.engstruct.2015.07.038. https://doi.org/10.1016/j.engstruct.2015.07.038.
spection type is set up as a binary indicator, the second inspection [12] Andriotis CP, Papakonstantinou KG, Chatzi EN. Value of structural health
technique indicates the damage level through five alarms. With this information in partially observable stochastic environments; 2020. arXiv preprint
arXiv:1912.12534.
application, we demonstrate the capabilities of POMDPs in efficiently [13] Fauriat W, Zio E. Optimization of an aperiodic sequential inspection and condition-
handling complex decision problems, outperforming again heuristic- based maintenance policy driven by value of information. Reliab Eng Syst Saf
based polices. 2020;204:107133.
[14] Ellingwood BR. Risk-informed condition assessment of civil infrastructure: state of
The main limitation of the presented approaches, including POMDPs, practice and research issues. Struct Infrastruct Eng 2005;1(1):7–18. https://doi.
is the increase of computational complexity for very large state and org/10.1080/15732470412331289341.
action spaces, such as the ones for a system of multiple components. [15] Sánchez-Silva M, Frangopol DM, Padgett J, Soliman M. Maintenance and operation
of infrastructure systems. J Struct Eng 2016;142(9):F4016004.
Dynamic Bayesian networks with large state spaces are similarly con­ [16] Faber MH, Stewart MG. Risk assessment for civil engineering facilities: Critical
strained by the curse of dimensionality. To overcome this limitation, we overview and discussion. Reliab Eng Syst Saf 2003;80(2):173–84. https://doi.org/
suggest further research efforts toward the development of POMDP- 10.1016/S0951-8320(03)00027-9.
[17] Kim S, Ge B, Frangopol DM. Effective optimum maintenance planning with
based Deep Reinforcement Learning (DRL) methodologies. As demon­ updating based on inspection information for fatigue-sensitive structures. Probab
strated in [60,61], a multi-agent actor-critic DRL approach is able to Eng Mech 2019;58(August):103003. https://doi.org/10.1016/j.
identify optimal strategies for multi-component systems with large state, probengmech.2019.103003. https://linkinghub.elsevier.com/retrieve/pii/
S0266892019300700.
action and observation spaces. In particular, POMDP-based actor-critic
[18] Raiffa H, Schlaifer R. Applied Statistical Decision Theory, Harvard University
DRL methods approximate the policy and the value function with neural Graduate School of Business Administration (Division of Research). Bailey &
networks, alleviating therefore the curse of dimensionality through the Swinfen; 1961.
deep networks parametrizations, and the curse of history through the [19] Russell SJ. Artificial intelligence: a modern approach. 3rd Ed. Upper Saddle River,
NJ: Prentice Hall; 2010.
reliance on dynamic programming MDP principles, the full advantages [20] Sørensen JD, Rackwitz R, Faber MH, Thoft-Christensen P. Modelling in optimal
of which may be compromised if heuristic rules are instead considered. inspection and repair. In: Proceedings of the 10th international conference on
offshore mechanics and arctic engineering, vol. 2. United States: American Society
of Mechanical Engineers; 1991. p. 281–8.
Declaration of Competing Interest [21] Goyet J, Straub D, Faber MH. Risk-based inspection planning of offshore
installations. Struct Eng Int 2002;12(3):200–8. https://doi.org/10.2749/
The authors declare that they have no known competing financial 101686602777965360. https://www.tandfonline.com/doi/full/10.2749/
101686602777965360.
interests or personal relationships that could have appeared to influence [22] Rangel-Ramírez JG, Sørensen JD. Risk-based inspection planning optimisation of
the work reported in this paper. offshore wind turbines. Struct Infrastruct Eng 2012;8(5):473–81. https://doi.org/
10.1080/15732479.2010.539064.
[23] Straub D. Stochastic Modeling of Deterioration Processes through Dynamic
Acknowledgements Bayesian Networks. J Eng Mech 2009;135(10):1089–99. https://doi.org/10.1061/
(asce)em.1943-7889.0000024.
This research is funded by the National Fund for Scientific Research [24] Luque J, Straub D. Risk-based optimal inspection strategies for structural systems
using dynamic Bayesian networks. Struct Saf 2019;76(January 2019):68–80. doi:
in Belgium F.R.I.A. – F.N.R.S. This support is gratefully acknowledged.
10.1016/j.strusafe.2018.08.002. doi: 10.1016/j.strusafe.2018.08.002.
Dr. Papakonstantinou and Dr. Andriotis would further like to acknowl­ [25] Bismut E, Straub D. Optimal adaptive inspection and maintenance planning for
edge that this material is also based upon work supported by the U.S. deteriorating structural systems; 2021. arXiv preprint arXiv:2102.06016.
National Science Foundation under Grant No. 1751941. [26] Yang DY, Frangopol DM. Probabilistic optimization framework for inspection/
repair planning of fatigue-critical details using dynamic bayesian networks.
Comput Struct 2018;198:40–50.

15
P.G. Morato et al. Structural Safety 94 (2022) 102140

[27] Tien I, Der Kiureghian A. Reliability assessment of critical infrastructure using [46] Nielsen JS, Sørensen JD. Computational framework for risk-based planning of
bayesian networks. J Infrastruct Syst 2017;23(4):04017025. inspections, maintenance and condition monitoring using discrete Bayesian
[28] Puterman ML. Markov decision processes: discrete stochastic dynamic networks. Struct Infrastruct Eng 2018;14(8):1082–94. https://doi.org/10.1080/
programming. John Wiley & Sons; 2014. 15732479.2017.1387155.
[29] Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Autonom [47] Papakonstantinou KG, Shinozuka M. Optimum inspection and maintenance
Agents Multi-Agent Syst 2013;27(1):1–51. policies for corroded structures using partially observable Markov decision
[30] Kurniawati H, Hsu D, Sun Lee W. SARSOP: Efficient Point-Based POMDP Planning processes and stochastic, physically based models. Probab Eng Mech 2014;37:
by Approximating Optimally Reachable Belief Spaces. In: Proceedings of Robotics: 93–108. https://doi.org/10.1016/j.probengmech.2014.06.002.
Science and Systems IV, Zurich, Switzerland; 2008. doi:10.15607/RSS.2008. [48] Straub D. Stochastic Modeling of Deterioration Processes through Dynamic
IV.009. http://www.roboticsproceedings.org/rss04/p9.pdf. Bayesian Networks. J Eng Mech 2009;135(10):1089–99. https://doi.org/10.1061/
[31] Smith T, Simmons R. Focused real-time dynamic programming for MDPs: (ASCE)EM.1943-7889.0000024.
Squeezing more out of a heuristic. Proc Natl Conf Artif Intell 2006;2(January): [49] Papakonstantinou KG, Shinozuka M. Planning structural inspection and
1227–32. maintenance policies via dynamic programming and Markov processes. Part II:
[32] Memarzadeh M, Pozzi M, Zico Kolter J. Optimal Planning and Learning in POMDP implementation. Reliab Eng Syst Saf 2014;130:214–24. https://doi.org/
Uncertain Environments for the Management of Wind Farms. J Comput Civil Eng 10.1016/j.ress.2014.04.006.
2015;29(5):04014076. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000390. [50] Kochenderfer MJ, Amato C, Chowdhary G, How JP, Reynolds HJD, Thornton JR,
[33] Memarzadeh M, Pozzi M, Kolter JZ. Hierarchical modeling of systems with similar Torres-Carrasquillo PA, Ure KN, Vian J. Decision Making Under Uncertainty:
components: A framework for adaptive monitoring and control. Reliab Eng Syst Saf Theory and Application. 1st Ed. The MIT Press; 2015.
2016;153:159–69. https://doi.org/10.1016/j.ress.2016.04.016. http://www. [51] Nielsen JS, Sørensen JD. Risk-based Decision Making for Deterioration Processes
sciencedirect.com/science/article/pii/S0951832016300448. Using POMDP. In: T. Haukaas (Ed.), 12th International Conference on Applications
[34] Papakonstantinou KG, Shinozuka M. Planning structural inspection and of Statistics and Probability in Civil Engineering, ICASP12, Civil Engineering Risk
maintenance policies via dynamic programming and Markov processes. Part I: and Reliability Association, Vancouver, Canada; 2015, p. 8. doi:10.14288/
Theory. Reliab Eng Syst Saf 2014;130:214–24. https://doi.org/10.1016/j. 1.0076132.
ress.2014.04.005. [52] Robelin C-A, Madanat SM. History-dependent bridge deck maintenance and
[35] Corotis RB, Hugh Ellis J, Jiang M. Modeling of risk-based inspection, maintenance replacement optimization with Markov decision processes. J Infrastruct Syst 2007;
and life-cycle cost with partially observable Markov decision processes. Struct 13(3):195–201.
Infrastruct Eng 2005;1(1):75–84. doi:10.1080/15732470412331289305. http:// [53] Kim JW, Choi GB, Lee JM. A POMDP framework for integrated scheduling of
www.tandfonline.com/doi/abs/10.1080/15732470412331289305. infrastructure maintenance and inspection. Comput Chem Eng 2018;112:239–52.
[36] Morato PG, Nielsen JS, Mai AQ, Rigo P. POMDP based maintenance optimization [54] Kaelbling LP, Littman ML, Cassandra AR. Planning and acting in partially
of offshore wind substructures including monitoring. In: 13th International observable stochastic domains. Artif Intell 1998;101(1–2):99–134. https://doi.org/
Conference on Applications of Statistics and Probability in Civil Engineering, 10.1016/s0004-3702(98)00023-x. papers3://publication/uuid/5ECA204D-77B0-
ICASP; 2019. p. 270–7. https://doi.org/10.22725/ICASP13.067. 4C68-A0A6-7323A9D8893B.
[37] Faber MH. Risk-based inspection: The framework. Struct Eng Int 2002;12(3): [55] Spaan MTJ, Vlassis N. Perseus: Randomized point-based value iteration for
186–94. https://doi.org/10.2749/101686602777965388. POMDPs. J Artif Intell Res 2005;24:195–220. https://doi.org/10.1613/jair.1659.
[38] Straub D. Generic Approaches to Risk Based Inspection Planning for Steel [56] Papakonstantinou KG, Andriotis CP, Shinozuka M. POMDP and MOMDP solutions
Structures, Ph.D. thesis, Swiss Federal Institute of Technology Zürich (ETH); 2004. for structural life-cycle cost minimization under partial and mixed observability.
[39] Ditlevsen O, Madsen HO. Structural reliability methods. Department of Mechanical Struct Infrastruct Eng 2018;14(7):869–82. https://doi.org/10.1080/
Engineering, Technical University of Denmark; 2007. 15732479.2018.1439973.
[40] Madsen HO, Krenk S, Lind NC. Methods of structural safety. Dover Publications [57] Pineau J, Gordon G, Thrun S. Point-based value iteration: An anytime algorithm for
Inc.; 2006. POMDPs. In: IJCAI International Joint Conference on Artificial Intelligence; 2003.
[41] Straub D. Reliability updating with equality information. Probab Eng Mech 2011; p. 1025–30.
26(2):254–8. [58] Howard RA. Information value theory. IEEE Trans Syst Sci Cybern 1966;2(1):22–6.
[42] Lotsberg I, Sigurdsson G, Fjeldstad A, Moan T. Probabilistic methods for planning [59] Walraven E, Spaan MT. Point-based value iteration for continuous POMDPs. J Artif
of inspection for fatigue cracks in offshore structures. Marine Struct 2016;46: Intell Res 2019;65:307–41. https://doi.org/10.13039/501100000780.
167–92. https://doi.org/10.1016/j.marstruc.2016.02.002. [60] Andriotis CP, Papakonstantinou KG. Deep reinforcement learning driven
[43] Luque J, Straub D. Reliability analysis and updating of deteriorating systems with inspection and maintenance planning under incomplete information and
dynamic Bayesian networks. Struct Saf 2016;62:34–46. https://doi.org/10.1016/j. constraints. eliab Eng Syst Saf 2021;212:107551. https://doi.org/10.1016/j.
strusafe.2016.03.004. ress.2021.107551.
[44] Jensen FV. Introduction to Bayesian Networks. Berlin, Heidelberg: Springer-Verlag; [61] Andriotis CP, Papakonstantinou KG. Managing engineering systems with large
1996. state and action spaces through deep reinforcement learning. Reliab Eng Syst Saf
[45] Murphy KP. Dynamic Bayesian Networks: Representation, Inference and Learning. 2019;191. https://doi.org/10.1016/j.ress.2019.04.036.
Ph.D. thesis, University of California, Berkeley; 2002.

16

You might also like