0% found this document useful (0 votes)
42 views32 pages

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Maja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views32 pages

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Maja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Hierarchical Models for Loss Reservingg

CAS Annual Meeting Jim Guszcza, FCAS, MAAA


Washington, DC Deloitte Consulting
November, 2010
Agenda

Motivation Five Essential Features of Loss Reserving

Background Hierarchical Modeling Theory

Case Study
y Nonlinear Empirical
p Bayes
y Hierarchical Model

Next Steps A Few Words on the Hierarchical Bayesian Approach

Copyright © 2008 Deloitte Development LLC. All rights reserved. 1


Background
g

Models vs Methods
Need for Variability Estimates
Loss Reserving and its Discontents

• Much loss reserving practice is “pre-theoretical” in nature.


• Techniques like chain ladder, BF, and Cape Cod aren’t performed in a statistical
modeling framework.

• Traditional methods aren’t necessarily optimal from a statistical POV.


• Potential of over-fitting small datasets.
• Difficult to assess goodness-of-fit,
goodness of fit compare nested models,
models etc.
etc
• Often no concept of out-of-sample validation or diagnostic plots.

• Related p
point: traditional methods p
produce p
point estimates only.
y
 Reserve variability estimates in practice are often ad hoc.

• Stochastic reserving: build statistical models of loss development.


• Attempt to place loss reserving practice on a sound scientific footing.
• Field is developing rapidly.
• Today: explore non-linear hierarchical models (aka “nonlinear mixed effects
models”) as natural, parsimonious models of the loss development process.
• Initially motivated by Dave Clark’s paper [2003] as well as nonlinear mixed effects
model [NLME] theory.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 3


What Do You See?

• Here is an actual Schedule P cumulative loss triangle.

• We would like to do stochastic reserving the “right”


right way.
way

• What considerations come to mind?

Cumulative Losses in 1000's


AY premium 12 24 36 48 60 72 84 96 108 120
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991
99 2,609
,609 338 753
53 1,029
,0 9 1,195
, 95 1,326
,3 6 1,395
,395 1,446
, 6
1992 2,077 257 569 754 892 958 1,007
1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
1996 1,012 131 352
1997 976 122

Copyright © 2008 Deloitte Development LLC. All rights reserved. 4


Five Essential Features of Loss Reserving
Cumulative Losses in 1000's
AY premium 12 24 36 48 60 72 84 96 108 120

• Repeated measures
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446
1992 2,077 257 569 754 892 958 1,007

• The dataset is inherently longitudinal in nature.


1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
1996 1,012 131 352
1997 976 122

• A “Bundle” of time series


• A loss triangle is a collection of time series that are “related” to one another…
• … but no guarantee that the same development pattern is appropriate to each one

• Unbalanced data
• We are doing forecasting  The time series are necessarily of different lengths.

• Non-linear
• Each year’s loss development pattern in inherently non-linear
• Ultimate loss ((ratio)) is an asymptote
y p

• Incomplete information
• Few loss triangles contain all of the information needed to make forecasts
• Most reserving exercises must incorporate judgment and/or background information
 Loss reserving is inherently Bayesian

Copyright © 2008 Deloitte Development LLC. All rights reserved. 5


Towards a More Realistic Stochastic Reserving Framework

• How many stochastic loss reserving techniques reflect all of these


considerations?
1. Repeated
p Measures ((Isn’t loss reserving
g a type
yp of longitudinal
g data analysis?)
y )
2. Multiple Time Series
3. Unbalanced
4. Non-linear (Are GLMs really appropriate?)
5. Incomplete information (“Bayes or Bust”!)

1-3  We need to build hierarchical models


4  Our
O hie
hierarchical
a chical models should
sho ld be non-linear
non linea (growth
(g o th curves)
c es)
5  Our non-linear hierarchical models should be Bayesian

(Thi presentation
(This t ti will
ill cover 1
1-4…
4 Wayne
W will
ill cover 5
5.))

Cumulative Losses in 1000's


AY premium 12 24 36 48 60 72 84 96 108 120
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446
1992 2,077 257 569 754 892 958 1,007
1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
Copyright © 2008 Deloitte Development LLC. All rights reserved. 1996
1997
1,012
976
131
122
352 6
Components of Our Approach

• Growth curves to model the loss development process (Clark 2003)


• Parsimony; obviates need for tail factors

• Loss reserving treated as longitudinal data analysis (Guszcza 2008)


• Parsimony; similar approach to non-linear mixed effects models used in biological
and social sciences

• Further using the hierarchical modeling framework to simultaneously


model multiple loss triangles (Zhang-Dukic-Guszcza 2010)
• “Borrow strength” from other loss reserving triangles
• Similar in spirit to credibility theory

• Building a fully Bayesian model by assigning prior probability


distributions to all hyperparameters (Zhang-Dukic-Guszcza 2010)
• Provides formal mechanism for incorporating background knowledge and expert
opinion with data-driven indications.
• Results in full predictive distribution of all quantities of interest
• Conceptual
C t l advantages:
d t B
Bayesian
i paradigm
di ttreats
t data
d t as fixed
fi d and
d parameters
t
are randomly varying

Copyright © 2008 Deloitte Development LLC. All rights reserved. 7


Hierarchical Modelingg Theory
y

Hierarchical Data Structures


Hierarchical Models
Motivating Samples
What is Hierarchical Modeling?

• Hierarchical modeling is used when one’s data is grouped in


some important way.
• Claim experience by state or territory
• Workers Comp claim experience by class code
• Income by profession
• Claim severity by injury type
• Churn rate by agency
• Multiple years of loss experience by policyholder.
• Multiple observations of a cohort of claims over time

• Often grouped data is modeled either by:


• Building separate models by group
• Pooling the data and introducing dummy variables to reflect the groups

• Hierarchical modeling offers a “middle way”.


• Parameters reflecting
fl group membership
b h enter one’s
’ model
d l through
h h
appropriately specified probability sub-models.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 9


What’s in a Name?

• Hierarchical models go by many different names


• Mixed effects models
• Random effects models
• Multilevel models
• Longitudinal models
• Panel data models

• We prefer the “hierarchical model” terminology because it


evokes
k the
h way models-within-models
d l h d l are used
d to reflect
fl
levels-within-levels of ones data.

• An important special case of hierarchical models involves


multiple observations through time of each unit.
• Here group membership is the repeated observations belonging to each
i di id l
individual.
• Time is the covariate.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 10


Common Hierarchical Models

• Notation:
• Data points (Xi, Yi)i=1…N
• j[
j[i]:
] data p
point i belongsg to g
group
p jj.

• Classical Linear Model Yi = α + βXi + εi


• Equivalently: Yi ~ N(α + βXi, σ2)
• Same α and β for every data point

• Random Intercept Model Yi = αj[i] + βXi + εi


• Wh
Where αj ~ N(μ
N( α, σ2α) & εi ~ N(0,
N(0 σ2)
• Same β for every data point; but α varies by group

• Random Intercept and Slope Model Yi = αj[i] + βj[i]Xi + εi


• Where (αj, βj) ~ N(Μ, ) & εi ~ N(0, σ2)
• Both α and β vary by group

α j    μα    σ α2 σ αβ 
(
Yi ~ N α j[i ] + β j[ i ] ⋅ X i , σ 2
) where   ~ N   , Σ 
 μβ 
, Σ= 
βj     σ αβ σ β2 
Copyright © 2008 Deloitte Development LLC. All rights reserved. 11
Example: PIF Growth by Region

• Simple example:
Change in PIF by PIF Growth by Region
region from 2007-10 region1 region2 region3 region4

2600

• 32 data points
• 4 years 2400

• 8 regions

• But we could as easily


2200

have 80 or 800
regions 2000

• Our model would not region5 region6 region7 region8


change 2600

• W
We view
i th
the d
dataset
t t 2400
as a bundle of very
short time series
2200

2000

2007 2008 2009 2010 2007 2008 2009 2010


Copyright © 2008 Deloitte Development LLC. All rights reserved. 12
Classical Linear Model

• Option 1: the YiPIF


~N (α + β X , σ 2
)
classical linear model Growth by Region
i

region1 region2 region3 region4

• Sweep region under 2600

the carpet
2400

• Yi = α + βX
β i + εi
– Or: Yi ~ N(α + βXi, σ2) 2200

– Same α and β for every


data point
2000

region5 region6 region7 region8


• This obviously 2600
doesn’t cut it
2400

2200

2000

2007 2008 2009 2010 2007 2008 2009 2010


Copyright © 2008 Deloitte Development LLC. All rights reserved. 13
Randomly Varying Intercepts

• Option 2: random Yi ~ N (α j[i ] +PIF


βXGrowth
, σ 2
) α ~ N ( μ , σ 2
i j α α)
intercept model by Region
region1 region2 region3 region4

• Yi = αj[i] + βXi + εi 2600

• This model has 9


2400

parameters:
{α1, α2, …, α8, β} 2200

• And
d it contains 4
2000

region5 region6 region7 region8


hyperparameters:
2600
{μα, β2, σ, σα}.

2400
• This is a major
improvement
2200

2000

2007 2008 2009 2010 2007 2008 2009 2010


Copyright © 2008 Deloitte Development LLC. All rights reserved. 14
Randomly Varying Intercepts and Slopes

α j    μα    σ α2 σ αβ 
(
• Option 3: random Yi ~ N α j[i ] + β j[i ] ⋅ X i , σ 2
) where   ~ N   , Σ  , Σ = 
βj   μRegion 
slope and intercept PIFGrowth  β  
by σ αβ σ β2 
model region1 region2 region3 region4

2600

• Yi = αj[i] + βj[i]Xi + εi
2400

• This model has 18


parameters: 2200

{α1, α2, …, α8,


β1, β2, …, β8,}
} 2000

region5 region6 region7 region8

2600
• And it contains 6
hyperparameters:
2400
{μα, μβ, σ, σα, σβ, σαβ}

2200

2000

2007 2008 2009 2010 2007 2008 2009 2010


Copyright © 2008 Deloitte Development LLC. All rights reserved. 15
Compromise Between Complete Pooling & No Pooling

PIF = α + βt + ε {PIF = α k
+ β kt + ε k }
k =1, 2 ,..,8

Complete Pooling No Pooling


• Ignore group structure • Estimating one model for each
altogether group

Compromise

Hierarchical Model
• E
Estimates
ti t parameters
t
using a compromise
between complete
pooling and no pooling.

α j    μα    σ α2 σ αβ 
(
Yi ~ N α j[ i ] + β j[i ] ⋅ X i , σ ) 2
where   ~ N   , Σ 
βj   μβ 
, Σ= 
Copyright © 2008 Deloitte Development LLC. All rights reserved.     σ αβ σ β2 
16
A Credible Approach

• Recall the random intercept model:

Yi ~ N (α j[i ] + β X i , σ 2 ) α j ~ N ( μα , σ α2 )

• This model can contain a large number of parameters: {α1, α2, …, αJ, β}.
• And it contains 4 hyperparameters: {μα, β2, σ,
σ σα}.
}

• Here is how the hyperparameters relate to the parameters:

nj
αˆ j = Z j ⋅ ( y j − βx j ) + (1 − Z j ) ⋅ μˆα where Z j =
nj + σ 2

σ α2

• Does
D this
thi fformula
l llook
k ffamiliar?
ili ?
• Credibility theory is a special case of hierarchical models!
Copyright © 2008 Deloitte Development LLC. All rights reserved. 17
The Middle Way

• This makes precise the sense in which the random intercept model is a
compromise between the pooled-data model (option 1) and the separate
models
od for
o each
a region
g o (option
(op o 2).)

nj
αˆ j = Z j ⋅ ( y j − βt j ) + (1 − Z j ) ⋅ μˆα where Z j =
nj + σ 2

σ α2

• As σα0, the random intercept model  complete pooling


• As σα∞ , the random intercept model  separate models by group

• In principle it’s
it s always appropriate to use hierarchical models
• Rather than a judgment call, the data tells us the degree to which the groups should
be fit using separate models or a single common model
• It is no longer an all-or-nothing decision

Copyright © 2008 Deloitte Development LLC. All rights reserved. 18


Hierarchical Growth Curve
Loss Reserving Model
Hierarchical Modeling for Loss Reserving

• Here is our Schedule P loss triangle:


Cumulative Losses in 1000's
AY premium 12 24 36 48 60 72 84 96 108 120 CL Ult CL LR CL res
1988 2 609
2,609 404 986 1,342
1 342 1 582 1,736
1,582 1 736 1,833
1 833 1,907
1 907 1,967
1 967 2,006
2 006 2,036
2 036 2 036
2,036 0 78
0.78 0
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987 2,017 0.75 29
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919 1,986 0.77 67
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446 1,535 0.59 89
1992 2,077 257 569 754 892 958 1,007 1,110 0.53 103
1993 1,703 193 423 589 661 713 828 0.49 115
1994 1,438 142 361 463 533 675 0.47 142
1995 1,093 160 312 408 601 0.55 193
1996 1,012 131 352 702 0.69 350
1997 976 122 576 0.59 454

chain link 2.365


2 365 1.354
1 354 1.164
1 164 1.090
1 090 1.054
1 054 1.038
1 038 1.026
1 026 1.020
1 020 1.015
1 015 1 000
1.000 12 067
12,067 1 543
1,543
chain ldf 4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015 1.000
growth curve 21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0%

• Let’s
L t’ model
d l this
thi as a longitudinal
l it di l dataset.
d t t
• Grouping dimension: Accident Year (AY)

• W
We can b build
ild a parsimonious
i i non-linear
li model
d l that
th t uses random
d
effects to allow the model parameters to vary by accident year.
Copyright © 2008 Deloitte Development LLC. All rights reserved. 20
Growth Curves

• We want to build that


reflects the non-linear Weibull Growth Curve vs Chain Ladder LDFs
nature of loss development.
• GLM shows up a lot in the

1.0
stochastic loss reserving
literature.
• But… are GLMs natural

0.8
models for loss triangles?

Cumulative Perccent of Ultimate


e
• Growth curves

0.6


2-parameter curves
θ = scale
(
G ( x | ω , θ ) = 1 − exp − ( x / θ ) ω )
• ω = shape 0.4
• See Clark [2003]

• Heuristic idea:
C

0.2

• We fit these curves to the


LDFs
Weibull Growth Curve
• Add random effects Chain Ladder 1/LDF
0.0

• Allows ultimate loss (ratio)


and/or θ and/or ω to vary
randomly by year. 20 40 60 80 100

Development Months
Copyright © 2008 Deloitte Development LLC. All rights reserved. 21
Baseline Model: Heuristics

• Basic intuition is familiar: (CLAY,t) * (LDF) = Ult loss


Weibull and Loglogistic Growth Curves

1.0
 CLAY,t = (Ult lossAY)*(1 / LDFt)

0.8
Cumulative Percent of Ultimatte

0.6
0.4
 CLAY,t = (Ult lossAY)*Gω,θ(t) + error

0.2
Weibull
Loglogistic

12 36 60 84 108 132 156 180

Age in Months

[ (
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev )]
LR AY ~ N μ LR , σ LR
2
( )
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• The “growth curve” part comes in by using G(t) instead of LDFs.


• Think of LDF’s as a rough piecewise linear approximation to a G(t)

• The “hierarchical” part comes in because we can let LRAY, θ, and/or ω


vary by AY (using sub-models).
Copyright © 2008 Deloitte Development LLC. All rights reserved. 22
Other “Random Effects”

• Our model so far:


[ (
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev )]
(
LR AY ~ N μ LR , σ LR
2
)
ε AY ,dev = φε
φ AY ,dev −1 + a AY ,dev

• What if we want to include other random effects in the model?


• It’s easily done: ω
[ (
CumLoss AY ,dev = ULTAY 1 − exp − (dev / θ ) + ε AY ,dev )]
ULTAY   μULT   σ ULT2
σ ULT ,θ 
  ~ N  , Σ  , Σ =   2 

 θ AY   θULT   σ ULT ,θ σω 
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• Here we add a “random scale” effect to let θ vary by AY.


• This is analogous to letting slope vary in a linear model
• It is unlikely that a “random warp” (ω) effect will be significant.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 23


[ ( )]
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev

Baseline Model Performance (


LR AY ~ N μ LR , σ LR 2
)
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• Random LR effects allow a “custom fit” growth curve for each AY


• Yet the model is vary parsimonious
• The model contains only 6 hyperparameters, but fits the loss triangle very well
• Parsimonyy is achieved because the model is well suited to the data (not
( ad hoc))

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
2500 1988 1989 1990 1991 1992

2000

1500

1000

500
Cumulative Loss

2500 1993 1994 1995 1996 1997

2000

1500

1000

500

6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102


Copyright © 2008 Deloitte Development LLC. All rights reserved. 24
Development Time
[ ( )]
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev

Baseline Model Performance (


LR AY ~ N μ LR , σ LR 2
)
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• We have estimated the parameters: {μLR; ω ; θ; σLR ; φ; σa}


• Random effects are added to ultimate LR parameter Analogous to random intercepts
• Should we also include random scale (θ) effects? Analogous to random slopes

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
2500 1988 1989 1990 1991 1992

2000

1500

1000

500
Cumulative Loss

2500 1993 1994 1995 1996 1997

2000

1500

1000

500

6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102


Copyright © 2008 Deloitte Development LLC. All rights reserved. 25
Development Time
Residual Diagnostics

• Advantage of stochastic reserving: enables us to graphically analyze residuals.


• Residual diagnostics suggest a reasonable fit.

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
0.6

2000
Residual Histogram Norm al QQ Plot Actual vs Predicted
0.5

500
15
0.4

1
0.3

1000
0
0.2

00
50
-1
1
0.1
0.0

-2

-2 -1 0 1 2 3 -2 -1 0 1 2 500 1000 1500 2000

Residuals vs Predicted Residuals vs Developm ent Tim e Residuals vs Accident Year


3

3
3 3 3
2

2
8 7 6 4 8 7 6 4 4 6 7 8
1 1 1
1

1
5 2 11 5 2 1 1 1
1 2 5
5 2 5 2 2 5
6 49 2 1 6
4 9 2 1 1 2 4 6 9
4 3 4 2 3
4 4
2 2 3 4
4
2 2
1 1 1
5 1 5 1 1 5
0

0
107 5 4 21 3 10
7 4
2
5 2 4 5 7 10
863 3 6 8 1 3 1 3
3 6 8
1 4 2 1 4 2 1 2 4
8 2 3 8
2 3 2 3 8
65 3 5 6 3 3 5 6
9 7 5 1 9 1 7 5 1 5 7 9
-1

-1

-1
1 1 1
4 4 4
6 6 6
27 2 7 2 7
3 3 3
3 3 3
-2

-2

-2
-3

-3

-3

Copyright
500 © 2008 Deloitte
1000 Development
1500 LLC.
2000 All rights
6 reserved.
18 30 42 54 66 78 90 102 114 1988 1990 1992 1994 1996
26
Model Results
Chain Ladder Analysis
AY premium 6 18 30 42 54 66 78 90 102 114 CL Ult CL res
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036 2,036 0
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987 2,017 29
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919 1,986 67
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446 1,535 89
1992 2,077 257 569 754 892 958 1,007 1,110 103
1993 1,703 193 423 589 661 713 828 115
1994 1,438 142 361 463 533 675 142
1995 1,093 160 312 408 601 193
1996 1,012 131 352 702 350
1997 976 122 576 454

chain link 2.365 1.354 1.164 1.090 1.054 1.038 1.026 1.020 1.015 1.000 12,067 1,543
chain ldf 4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015 1.000
growth curve 21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0%

Parameters and Estimated Reserves - Weibull Model


AY prem dev LR omega theta growth reported eval120 eval240 ULT reserves
1988 2609 114 0.78 0.96 26.55 98.3% 2,036 2,017 2,045 2,045 9 The overall Weibull
1989 2694 102 0.75 0.96 26.55 97.4% 1,987 2,004 2,031 2,032 44 reserve estimate
1990 2594 90 0.78 0.96 26.55 96.1% 1,919 2,001 2,028 2,029 110 is higher than that
1991 2609 78 0.58 0.96 26.55 94.1% 1,446 1,497 1,518 1,518 72 of the chain
1992 2077 66 0 53
0.53 0 96
0.96 26 55
26.55 91 0%
91.0% 1 007
1,007 1 088
1,088 1 103 1,103
1,103 1 103 95 l dd b
ladder because off
1993 1703 54 0.49 0.96 26.55 86.2% 713 818 829 829 117 “tail” development
1994 1438 42 0.48 0.96 26.55 78.9% 533 688 697 697 164 beyond 120
1995 1093 30 0.53 0.96 26.55 67.5% 408 567 575 575 167 months
1996 1012 18 0.73 0.96 26.55 49.7% 352 725 735 735 384
1997 975.9 6 0.61 0.96 26.55 21.2% 122 587 595 596 474
total 11,991 12,159 1,636

These
Copyrightare theDeloitte
© 2008 12 hierarchical
Development LLC.model
All rightsparameters.
reserved. 27
Making Predictions is Difficult (Especially About the Future)

• Most loss reserve variability is due to “model risk”


• In this context, the most serious model risk is the choice of growth curves
• Both the Weibull and Loglogistic models fit the available data very well
• But they extrapolate very differently

Comparison of Weibull vs Loglogistic Model Extrapolations


Weibull Random LR Model Loglogistic Random LR Model

2500 1988 1988


1989 1989
1990 1990
1991 1991
1992 1992
1993 1993
1994 1994
2000 1995 1995
1996 1996
1997 1997

1500

1000

500

Copyright © 2008 Deloitte Development LLC. All rights reserved. 28


6 30 54 78 102 126 150 174 198 222 6 30 54 78 102 126 150 174 198 222
Bayesian Motivation

“Given any value (estimate of future payments) and our current state of
knowledge, what is the probability that the final payments will be no larger than
the given value?”
-- Casualty Actuarial Society’s Working Party on Quantifying Variability in
R
Reserve E
Estimates
ti t , 2004

• This can be read as a request for a Bayesian analysis


• Bayesians (unlike frequentists) are willing to make probability statements about
unknown parameters
• Ultimate losses are “single cases” – difficult to conceive as random draws from a
“sampling distribution in the sky”.
• Frequentist probability involved repeated trials of setups involving physical
randomization.
d i ti
• In contrast it is meaningful to apply Bayesian probabilities to “single case events”
• The Bayesian analysis yields an entire posterior probability distribution – not
merely moment estimates

• Bayesian statistics is the ideal framework for loss reserving


• Remaining task: put prior probability distributions on model hyperparameters
• Hierarchical Bayes framework also allows us to bring in collateral information in the
form of other loss triangles
• Other companies, other regions, …
Copyright © 2008 Deloitte Development LLC. All rights reserved. 29
MCMC from A to B

• Before 1990, Bayesian statistics was a lot more talk than action.
• Unless you use conjugate priors, calculating posterior probability distributions is
cumbersome at best, intractable at worst.

• Markov Chain Monte Carlo [MCMC]: a simulation technique used to


solve high-dimensional integration problems.
• Developed
D l d by
b physicists
h i i t att L
Los Al
Alamos
• Introduced as a Bayesian computational technique by Gelfand and Smith in 1990
• Gelfand and Smith sparked a renaissance in the practice of Bayesian statistics
• As a result of the Gelfand and Smith MCMC approach, Bayesian statistics is now
common ini fi
fields
ld lik
like:
– Marketing science
– Epidemiology, disease mapping
– Marketing science (hierarchical approach: customers are exchangeable units)
– Genomics

• Rather than explicitly calculating high-dimensional, posterior densities,


we estimate them by drawing repeated samples from them.
• C
Construct
t taM Markov
k chain
h i whose
h equilibrium
ilib i di
distribution
t ib ti iis th
the posterior
t i th thatt we’re

trying to estimate.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 30


References

Clark, David R. (2003) “LDF Curve Fitting and Stochastic Loss Reserving: A Maximum
Likelihood Approach,” CAS Forum.
Frees, Edward
F Ed d (2006).
(2006) Longitudinal
L it di l and
dPPanell D
Data
t AAnalysis
l i and
dAApplications
li ti iin th
the
Social Sciences. New York: Cambridge University Press.
Gelman, Andrew and Hill, Jennifer (2007). Data Analysis Using Regression and
Multilevel / Hierarchical Models. New York: Cambridge University Press.
Guszcza, James. (2008). “Hierarchical Growth Curve Models for Loss Reserving,”
CAS Forum.
Pinheiro, Jose and Douglas Bates (2000). Mixed-Effects Models in S and S-Plus. New
York: Springer-Verlag.
Springer Verlag
Guszcza, James and Thomas Herzog (2010), “Enhanced Credibility: Actuarial Science
and the Renaissance of Bayesian Thinking,” Contingencies
Zhang, Yanwei,
Zhang Yanwei Vanja Dukic,
Dukic James Guszcza (2010).
(2010) “A
A Bayesian Nonlinear Model for
Forecasting Insurance Payments,” working paper.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 31

You might also like