0% found this document useful (0 votes)

42 views32 pages

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Maja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views32 pages

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Maja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Hierarchical Models for Loss Reservingg

CAS Annual Meeting Jim Guszcza, FCAS, MAAA

Washington, DC Deloitte Consulting
November, 2010
Agenda

Motivation Five Essential Features of Loss Reserving

Background Hierarchical Modeling Theory

Case Study
y Nonlinear Empirical
p Bayes
y Hierarchical Model

Next Steps A Few Words on the Hierarchical Bayesian Approach

Copyright © 2008 Deloitte Development LLC. All rights reserved. 1

Background
g

Models vs Methods
Need for Variability Estimates
Loss Reserving and its Discontents

• Much loss reserving practice is “pre-theoretical” in nature.

• Techniques like chain ladder, BF, and Cape Cod aren’t performed in a statistical
modeling framework.

• Traditional methods aren’t necessarily optimal from a statistical POV.

• Potential of over-fitting small datasets.
• Difficult to assess goodness-of-fit,
goodness of fit compare nested models,
models etc.
etc
• Often no concept of out-of-sample validation or diagnostic plots.

• Related p
point: traditional methods p
produce p
point estimates only.
y
 Reserve variability estimates in practice are often ad hoc.

• Stochastic reserving: build statistical models of loss development.

• Attempt to place loss reserving practice on a sound scientific footing.
• Field is developing rapidly.
• Today: explore non-linear hierarchical models (aka “nonlinear mixed effects
models”) as natural, parsimonious models of the loss development process.
• Initially motivated by Dave Clark’s paper [2003] as well as nonlinear mixed effects
model [NLME] theory.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 3

What Do You See?

• Here is an actual Schedule P cumulative loss triangle.

• We would like to do stochastic reserving the “right”

right way.
way

• What considerations come to mind?

Cumulative Losses in 1000's

AY premium 12 24 36 48 60 72 84 96 108 120
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991
99 2,609
,609 338 753
53 1,029
,0 9 1,195
, 95 1,326
,3 6 1,395
,395 1,446
, 6
1992 2,077 257 569 754 892 958 1,007
1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
1996 1,012 131 352
1997 976 122

Copyright © 2008 Deloitte Development LLC. All rights reserved. 4

Five Essential Features of Loss Reserving
Cumulative Losses in 1000's
AY premium 12 24 36 48 60 72 84 96 108 120

• Repeated measures
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446
1992 2,077 257 569 754 892 958 1,007

• The dataset is inherently longitudinal in nature.

1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
1996 1,012 131 352
1997 976 122

• A “Bundle” of time series

• A loss triangle is a collection of time series that are “related” to one another…
• … but no guarantee that the same development pattern is appropriate to each one

• Unbalanced data
• We are doing forecasting  The time series are necessarily of different lengths.

• Non-linear
• Each year’s loss development pattern in inherently non-linear
• Ultimate loss ((ratio)) is an asymptote
y p

• Incomplete information
• Few loss triangles contain all of the information needed to make forecasts
• Most reserving exercises must incorporate judgment and/or background information
 Loss reserving is inherently Bayesian

Copyright © 2008 Deloitte Development LLC. All rights reserved. 5

Towards a More Realistic Stochastic Reserving Framework

• How many stochastic loss reserving techniques reflect all of these

considerations?
1. Repeated
p Measures ((Isn’t loss reserving
g a type
yp of longitudinal
g data analysis?)
y )
2. Multiple Time Series
3. Unbalanced
4. Non-linear (Are GLMs really appropriate?)
5. Incomplete information (“Bayes or Bust”!)

1-3  We need to build hierarchical models

4  Our
O hie
hierarchical
a chical models should
sho ld be non-linear
non linea (growth
(g o th curves)
c es)
5  Our non-linear hierarchical models should be Bayesian

(Thi presentation
(This t ti will
ill cover 1
1-4…
4 Wayne
W will
ill cover 5
5.))

Cumulative Losses in 1000's

AY premium 12 24 36 48 60 72 84 96 108 120
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446
1992 2,077 257 569 754 892 958 1,007
1993 1,703 193 423 589 661 713
1994 1,438 142 361 463 533
1995 1,093 160 312 408
Copyright © 2008 Deloitte Development LLC. All rights reserved. 1996
1997
1,012
976
131
122
352 6
Components of Our Approach

• Growth curves to model the loss development process (Clark 2003)

• Parsimony; obviates need for tail factors

• Loss reserving treated as longitudinal data analysis (Guszcza 2008)

• Parsimony; similar approach to non-linear mixed effects models used in biological
and social sciences

• Further using the hierarchical modeling framework to simultaneously

model multiple loss triangles (Zhang-Dukic-Guszcza 2010)
• “Borrow strength” from other loss reserving triangles
• Similar in spirit to credibility theory

• Building a fully Bayesian model by assigning prior probability

distributions to all hyperparameters (Zhang-Dukic-Guszcza 2010)
• Provides formal mechanism for incorporating background knowledge and expert
opinion with data-driven indications.
• Results in full predictive distribution of all quantities of interest
• Conceptual
C t l advantages:
d t B
Bayesian
i paradigm
di ttreats
t data
d t as fixed
fi d and
d parameters
t
are randomly varying

Copyright © 2008 Deloitte Development LLC. All rights reserved. 7

Hierarchical Modelingg Theory
y

Hierarchical Data Structures

Hierarchical Models
Motivating Samples
What is Hierarchical Modeling?

• Hierarchical modeling is used when one’s data is grouped in

some important way.
• Claim experience by state or territory
• Workers Comp claim experience by class code
• Income by profession
• Claim severity by injury type
• Churn rate by agency
• Multiple years of loss experience by policyholder.
• Multiple observations of a cohort of claims over time

• Often grouped data is modeled either by:

• Building separate models by group
• Pooling the data and introducing dummy variables to reflect the groups

• Hierarchical modeling offers a “middle way”.

• Parameters reflecting
fl group membership
b h enter one’s
’ model
d l through
h h
appropriately specified probability sub-models.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 9

What’s in a Name?

• Hierarchical models go by many different names

• Mixed effects models
• Random effects models
• Multilevel models
• Longitudinal models
• Panel data models

• We prefer the “hierarchical model” terminology because it

evokes
k the
h way models-within-models
d l h d l are used
d to reflect
fl
levels-within-levels of ones data.

• An important special case of hierarchical models involves

multiple observations through time of each unit.
• Here group membership is the repeated observations belonging to each
i di id l
individual.
• Time is the covariate.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 10

Common Hierarchical Models

• Notation:
• Data points (Xi, Yi)i=1…N
• j[
j[i]:
] data p
point i belongsg to g
group
p jj.

• Classical Linear Model Yi = α + βXi + εi

• Equivalently: Yi ~ N(α + βXi, σ2)
• Same α and β for every data point

• Random Intercept Model Yi = αj[i] + βXi + εi

• Wh
Where αj ~ N(μ
N( α, σ2α) & εi ~ N(0,
N(0 σ2)
• Same β for every data point; but α varies by group

• Random Intercept and Slope Model Yi = αj[i] + βj[i]Xi + εi

• Where (αj, βj) ~ N(Μ, ) & εi ~ N(0, σ2)
• Both α and β vary by group

α j    μα    σ α2 σ αβ 
(
Yi ~ N α j[i ] + β j[ i ] ⋅ X i , σ 2
) where   ~ N   , Σ 
 μβ 
, Σ= 
βj     σ αβ σ β2 
Copyright © 2008 Deloitte Development LLC. All rights reserved. 11
Example: PIF Growth by Region

• Simple example:
Change in PIF by PIF Growth by Region
region from 2007-10 region1 region2 region3 region4

2600

• 32 data points
• 4 years 2400

• 8 regions

• But we could as easily

2200

have 80 or 800
regions 2000

• Our model would not region5 region6 region7 region8

change 2600

• W
We view
i th
the d
dataset
t t 2400
as a bundle of very
short time series
2200

2000

2007 2008 2009 2010 2007 2008 2009 2010

• Option 1: the YiPIF

~N (α + β X , σ 2
)
classical linear model Growth by Region
i

region1 region2 region3 region4

• Sweep region under 2600

the carpet
2400

• Yi = α + βX
β i + εi
– Or: Yi ~ N(α + βXi, σ2) 2200

– Same α and β for every

data point
2000

region5 region6 region7 region8

• This obviously 2600
doesn’t cut it
2400

2200

2000

2007 2008 2009 2010 2007 2008 2009 2010

• Option 2: random Yi ~ N (α j[i ] +PIF

βXGrowth
, σ 2
) α ~ N ( μ , σ 2
i j α α)
intercept model by Region
region1 region2 region3 region4

• Yi = αj[i] + βXi + εi 2600

• This model has 9

2400

parameters:
{α1, α2, …, α8, β} 2200

• And
d it contains 4
2000

region5 region6 region7 region8

hyperparameters:
2600
{μα, β2, σ, σα}.

2400
• This is a major
improvement
2200

2000

2007 2008 2009 2010 2007 2008 2009 2010

α j    μα    σ α2 σ αβ 
(
• Option 3: random Yi ~ N α j[i ] + β j[i ] ⋅ X i , σ 2
) where   ~ N   , Σ  , Σ = 
βj   μRegion 
slope and intercept PIFGrowth  β  
by σ αβ σ β2 
model region1 region2 region3 region4

2600

• Yi = αj[i] + βj[i]Xi + εi
2400

• This model has 18

parameters: 2200

{α1, α2, …, α8,

β1, β2, …, β8,}
} 2000

region5 region6 region7 region8

2600
• And it contains 6
hyperparameters:
2400
{μα, μβ, σ, σα, σβ, σαβ}

2200

2000

2007 2008 2009 2010 2007 2008 2009 2010

PIF = α + βt + ε {PIF = α k
+ β kt + ε k }
k =1, 2 ,..,8

Complete Pooling No Pooling

• Ignore group structure • Estimating one model for each
altogether group

Compromise

Hierarchical Model
• E
Estimates
ti t parameters
t
using a compromise
between complete
pooling and no pooling.

α j    μα    σ α2 σ αβ 
(
Yi ~ N α j[ i ] + β j[i ] ⋅ X i , σ ) 2
where   ~ N   , Σ 
βj   μβ 
, Σ= 
Copyright © 2008 Deloitte Development LLC. All rights reserved.     σ αβ σ β2 
16
A Credible Approach

• Recall the random intercept model:

Yi ~ N (α j[i ] + β X i , σ 2 ) α j ~ N ( μα , σ α2 )

• This model can contain a large number of parameters: {α1, α2, …, αJ, β}.
• And it contains 4 hyperparameters: {μα, β2, σ,
σ σα}.
}

• Here is how the hyperparameters relate to the parameters:

nj
αˆ j = Z j ⋅ ( y j − βx j ) + (1 − Z j ) ⋅ μˆα where Z j =
nj + σ 2

σ α2

• Does
D this
thi fformula
l llook
k ffamiliar?
ili ?
• Credibility theory is a special case of hierarchical models!
Copyright © 2008 Deloitte Development LLC. All rights reserved. 17
The Middle Way

• This makes precise the sense in which the random intercept model is a
compromise between the pooled-data model (option 1) and the separate
models
od for
o each
a region
g o (option
(op o 2).)

nj
αˆ j = Z j ⋅ ( y j − βt j ) + (1 − Z j ) ⋅ μˆα where Z j =
nj + σ 2

σ α2

• As σα0, the random intercept model  complete pooling

• As σα∞ , the random intercept model  separate models by group

• In principle it’s
it s always appropriate to use hierarchical models
• Rather than a judgment call, the data tells us the degree to which the groups should
be fit using separate models or a single common model
• It is no longer an all-or-nothing decision

Copyright © 2008 Deloitte Development LLC. All rights reserved. 18

Hierarchical Growth Curve
Loss Reserving Model
Hierarchical Modeling for Loss Reserving

• Here is our Schedule P loss triangle:

Cumulative Losses in 1000's
AY premium 12 24 36 48 60 72 84 96 108 120 CL Ult CL LR CL res
1988 2 609
2,609 404 986 1,342
1 342 1 582 1,736
1,582 1 736 1,833
1 833 1,907
1 907 1,967
1 967 2,006
2 006 2,036
2 036 2 036
2,036 0 78
0.78 0
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987 2,017 0.75 29
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919 1,986 0.77 67
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446 1,535 0.59 89
1992 2,077 257 569 754 892 958 1,007 1,110 0.53 103
1993 1,703 193 423 589 661 713 828 0.49 115
1994 1,438 142 361 463 533 675 0.47 142
1995 1,093 160 312 408 601 0.55 193
1996 1,012 131 352 702 0.69 350
1997 976 122 576 0.59 454

chain link 2.365

2 365 1.354
1 354 1.164
1 164 1.090
1 090 1.054
1 054 1.038
1 038 1.026
1 026 1.020
1 020 1.015
1 015 1 000
1.000 12 067
12,067 1 543
1,543
chain ldf 4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015 1.000
growth curve 21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0%

• Let’s
L t’ model
d l this
thi as a longitudinal
l it di l dataset.
d t t
• Grouping dimension: Accident Year (AY)

• W
We can b build
ild a parsimonious
i i non-linear
li model
d l that
th t uses random
d
effects to allow the model parameters to vary by accident year.
Copyright © 2008 Deloitte Development LLC. All rights reserved. 20
Growth Curves

• We want to build that

reflects the non-linear Weibull Growth Curve vs Chain Ladder LDFs
nature of loss development.
• GLM shows up a lot in the

1.0
stochastic loss reserving
literature.
• But… are GLMs natural

0.8
models for loss triangles?

Cumulative Perccent of Ultimate

e
• Growth curves

0.6
•
•
2-parameter curves
θ = scale
(
G ( x | ω , θ ) = 1 − exp − ( x / θ ) ω )
• ω = shape 0.4
• See Clark [2003]

• Heuristic idea:
C

0.2

• We fit these curves to the

LDFs
Weibull Growth Curve
• Add random effects Chain Ladder 1/LDF
0.0

• Allows ultimate loss (ratio)

and/or θ and/or ω to vary
randomly by year. 20 40 60 80 100

Development Months
Copyright © 2008 Deloitte Development LLC. All rights reserved. 21
Baseline Model: Heuristics

• Basic intuition is familiar: (CLAY,t) * (LDF) = Ult loss

Weibull and Loglogistic Growth Curves

1.0
 CLAY,t = (Ult lossAY)*(1 / LDFt)

0.8
Cumulative Percent of Ultimatte

0.6
0.4
 CLAY,t = (Ult lossAY)*Gω,θ(t) + error

0.2
Weibull
Loglogistic

12 36 60 84 108 132 156 180

Age in Months

[ (
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev )]
LR AY ~ N μ LR , σ LR
2
( )
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• The “growth curve” part comes in by using G(t) instead of LDFs.

• Think of LDF’s as a rough piecewise linear approximation to a G(t)

• The “hierarchical” part comes in because we can let LRAY, θ, and/or ω

vary by AY (using sub-models).
Copyright © 2008 Deloitte Development LLC. All rights reserved. 22
Other “Random Effects”

• Our model so far:

[ (
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev )]
(
LR AY ~ N μ LR , σ LR
2
)
ε AY ,dev = φε
φ AY ,dev −1 + a AY ,dev

• What if we want to include other random effects in the model?

• It’s easily done: ω
[ (
CumLoss AY ,dev = ULTAY 1 − exp − (dev / θ ) + ε AY ,dev )]
ULTAY   μULT   σ ULT2
σ ULT ,θ 
  ~ N  , Σ  , Σ =   2 

 θ AY   θULT   σ ULT ,θ σω 
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• Here we add a “random scale” effect to let θ vary by AY.

• This is analogous to letting slope vary in a linear model
• It is unlikely that a “random warp” (ω) effect will be significant.

[ ( )]
CumLoss AY ,dev = LR AY * prem AY 1 − exp − (dev / θ )ω + ε AY ,dev

Baseline Model Performance (

LR AY ~ N μ LR , σ LR 2
)
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• Random LR effects allow a “custom fit” growth curve for each AY

• Yet the model is vary parsimonious
• The model contains only 6 hyperparameters, but fits the loss triangle very well
• Parsimonyy is achieved because the model is well suited to the data (not
( ad hoc))

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
2500 1988 1989 1990 1991 1992

2000

1500

1000

500
Cumulative Loss

2500 1993 1994 1995 1996 1997

2000

1500

1000

500

6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102

Baseline Model Performance (

LR AY ~ N μ LR , σ LR 2
)
ε AY ,dev = φε AY ,dev −1 + a AY ,dev

• We have estimated the parameters: {μLR; ω ; θ; σLR ; φ; σa}

• Random effects are added to ultimate LR parameter Analogous to random intercepts
• Should we also include random scale (θ) effects? Analogous to random slopes

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
2500 1988 1989 1990 1991 1992

2000

1500

1000

500
Cumulative Loss

2500 1993 1994 1995 1996 1997

2000

1500

1000

500

6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102 6 30 54 78 102

• Advantage of stochastic reserving: enables us to graphically analyze residuals.

• Residual diagnostics suggest a reasonable fit.

Weibull Growth Curve Model -- AR(1) Errors; Randomly Varying Ultimate Loss Ratio by AY
0.6

2000
Residual Histogram Norm al QQ Plot Actual vs Predicted
0.5

500
15
0.4

1
0.3

1000
0
0.2

00
50
-1
1
0.1
0.0

-2

-2 -1 0 1 2 3 -2 -1 0 1 2 500 1000 1500 2000

Residuals vs Predicted Residuals vs Developm ent Tim e Residuals vs Accident Year

3
3 3 3
2

2
8 7 6 4 8 7 6 4 4 6 7 8
1 1 1
1

1
5 2 11 5 2 1 1 1
1 2 5
5 2 5 2 2 5
6 49 2 1 6
4 9 2 1 1 2 4 6 9
4 3 4 2 3
4 4
2 2 3 4
4
2 2
1 1 1
5 1 5 1 1 5
0

0
107 5 4 21 3 10
7 4
2
5 2 4 5 7 10
863 3 6 8 1 3 1 3
3 6 8
1 4 2 1 4 2 1 2 4
8 2 3 8
2 3 2 3 8
65 3 5 6 3 3 5 6
9 7 5 1 9 1 7 5 1 5 7 9
-1

-1

-1
1 1 1
4 4 4
6 6 6
27 2 7 2 7
3 3 3
3 3 3
-2

-2

-2
-3

-3

Copyright
500 © 2008 Deloitte
1000 Development
1500 LLC.
2000 All rights
6 reserved.
18 30 42 54 66 78 90 102 114 1988 1990 1992 1994 1996
26
Model Results
Chain Ladder Analysis
AY premium 6 18 30 42 54 66 78 90 102 114 CL Ult CL res
1988 2,609 404 986 1,342 1,582 1,736 1,833 1,907 1,967 2,006 2,036 2,036 0
1989 2,694 387 964 1,336 1,580 1,726 1,823 1,903 1,949 1,987 2,017 29
1990 2,594 421 1,037 1,401 1,604 1,729 1,821 1,878 1,919 1,986 67
1991 2,609 338 753 1,029 1,195 1,326 1,395 1,446 1,535 89
1992 2,077 257 569 754 892 958 1,007 1,110 103
1993 1,703 193 423 589 661 713 828 115
1994 1,438 142 361 463 533 675 142
1995 1,093 160 312 408 601 193
1996 1,012 131 352 702 350
1997 976 122 576 454

chain link 2.365 1.354 1.164 1.090 1.054 1.038 1.026 1.020 1.015 1.000 12,067 1,543
chain ldf 4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015 1.000
growth curve 21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0%

Parameters and Estimated Reserves - Weibull Model

AY prem dev LR omega theta growth reported eval120 eval240 ULT reserves
1988 2609 114 0.78 0.96 26.55 98.3% 2,036 2,017 2,045 2,045 9 The overall Weibull
1989 2694 102 0.75 0.96 26.55 97.4% 1,987 2,004 2,031 2,032 44 reserve estimate
1990 2594 90 0.78 0.96 26.55 96.1% 1,919 2,001 2,028 2,029 110 is higher than that
1991 2609 78 0.58 0.96 26.55 94.1% 1,446 1,497 1,518 1,518 72 of the chain
1992 2077 66 0 53
0.53 0 96
0.96 26 55
26.55 91 0%
91.0% 1 007
1,007 1 088
1,088 1 103 1,103
1,103 1 103 95 l dd b
ladder because off
1993 1703 54 0.49 0.96 26.55 86.2% 713 818 829 829 117 “tail” development
1994 1438 42 0.48 0.96 26.55 78.9% 533 688 697 697 164 beyond 120
1995 1093 30 0.53 0.96 26.55 67.5% 408 567 575 575 167 months
1996 1012 18 0.73 0.96 26.55 49.7% 352 725 735 735 384
1997 975.9 6 0.61 0.96 26.55 21.2% 122 587 595 596 474
total 11,991 12,159 1,636

• Most loss reserve variability is due to “model risk”

• In this context, the most serious model risk is the choice of growth curves
• Both the Weibull and Loglogistic models fit the available data very well
• But they extrapolate very differently

Comparison of Weibull vs Loglogistic Model Extrapolations

Weibull Random LR Model Loglogistic Random LR Model

2500 1988 1988

1989 1989
1990 1990
1991 1991
1992 1992
1993 1993
1994 1994
2000 1995 1995
1996 1996
1997 1997

1500

1000

500

6 30 54 78 102 126 150 174 198 222 6 30 54 78 102 126 150 174 198 222
Bayesian Motivation

“Given any value (estimate of future payments) and our current state of
knowledge, what is the probability that the final payments will be no larger than
the given value?”
-- Casualty Actuarial Society’s Working Party on Quantifying Variability in
R
Reserve E
Estimates
ti t , 2004

• This can be read as a request for a Bayesian analysis

• Bayesians (unlike frequentists) are willing to make probability statements about
unknown parameters
• Ultimate losses are “single cases” – difficult to conceive as random draws from a
“sampling distribution in the sky”.
• Frequentist probability involved repeated trials of setups involving physical
randomization.
d i ti
• In contrast it is meaningful to apply Bayesian probabilities to “single case events”
• The Bayesian analysis yields an entire posterior probability distribution – not
merely moment estimates

• Bayesian statistics is the ideal framework for loss reserving

• Remaining task: put prior probability distributions on model hyperparameters
• Hierarchical Bayes framework also allows us to bring in collateral information in the
form of other loss triangles
• Other companies, other regions, …
Copyright © 2008 Deloitte Development LLC. All rights reserved. 29
MCMC from A to B

• Before 1990, Bayesian statistics was a lot more talk than action.
• Unless you use conjugate priors, calculating posterior probability distributions is
cumbersome at best, intractable at worst.

• Markov Chain Monte Carlo [MCMC]: a simulation technique used to

solve high-dimensional integration problems.
• Developed
D l d by
b physicists
h i i t att L
Los Al
Alamos
• Introduced as a Bayesian computational technique by Gelfand and Smith in 1990
• Gelfand and Smith sparked a renaissance in the practice of Bayesian statistics
• As a result of the Gelfand and Smith MCMC approach, Bayesian statistics is now
common ini fi
fields
ld lik
like:
– Marketing science
– Epidemiology, disease mapping
– Marketing science (hierarchical approach: customers are exchangeable units)
– Genomics

• Rather than explicitly calculating high-dimensional, posterior densities,

we estimate them by drawing repeated samples from them.
• C
Construct
t taM Markov
k chain
h i whose
h equilibrium
ilib i di
distribution
t ib ti iis th
the posterior
t i th thatt we’re
’
trying to estimate.

References

Clark, David R. (2003) “LDF Curve Fitting and Stochastic Loss Reserving: A Maximum
Likelihood Approach,” CAS Forum.
Frees, Edward
F Ed d (2006).
(2006) Longitudinal
L it di l and
dPPanell D
Data
t AAnalysis
l i and
dAApplications
li ti iin th
the
Social Sciences. New York: Cambridge University Press.
Gelman, Andrew and Hill, Jennifer (2007). Data Analysis Using Regression and
Multilevel / Hierarchical Models. New York: Cambridge University Press.
Guszcza, James. (2008). “Hierarchical Growth Curve Models for Loss Reserving,”
CAS Forum.
Pinheiro, Jose and Douglas Bates (2000). Mixed-Effects Models in S and S-Plus. New
York: Springer-Verlag.
Springer Verlag
Guszcza, James and Thomas Herzog (2010), “Enhanced Credibility: Actuarial Science
and the Renaissance of Bayesian Thinking,” Contingencies
Zhang, Yanwei,
Zhang Yanwei Vanja Dukic,
Dukic James Guszcza (2010).
(2010) “A
A Bayesian Nonlinear Model for
Forecasting Insurance Payments,” working paper.

Loss Data Analytics Aug 2020
No ratings yet
Loss Data Analytics Aug 2020
586 pages
Exam C Manual
100% (3)
Exam C Manual
810 pages
Loss Data Analytics (Frees)
No ratings yet
Loss Data Analytics (Frees)
319 pages
The Frailty Model Full Chapter Download
100% (13)
The Frailty Model Full Chapter Download
15 pages
10.1007@978 3 030 57556 4
100% (1)
10.1007@978 3 030 57556 4
235 pages
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
100% (1)
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
350 pages
Reflections On Econometric Methodology : Urcfur
No ratings yet
Reflections On Econometric Methodology : Urcfur
16 pages
(BOOK) A Primer in Econometric Theory - Stachurski 2016
No ratings yet
(BOOK) A Primer in Econometric Theory - Stachurski 2016
398 pages
Lecture Note 2019 PDF
100% (1)
Lecture Note 2019 PDF
235 pages
Construction of Actuarial Models
No ratings yet
Construction of Actuarial Models
724 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Chain Ladder
No ratings yet
Chain Ladder
51 pages
Ch2 Slides Edited
No ratings yet
Ch2 Slides Edited
66 pages
Unit 2 - Economic Analysis
No ratings yet
Unit 2 - Economic Analysis
138 pages
SSRN Id3270329
No ratings yet
SSRN Id3270329
57 pages
Nli Lecture Notes
No ratings yet
Nli Lecture Notes
122 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Ch11 Slides
No ratings yet
Ch11 Slides
49 pages
Alessandro Carrato From Chain Ladder To Individual Claims Reserving Using Machine Learning ASTIN Colloquium
No ratings yet
Alessandro Carrato From Chain Ladder To Individual Claims Reserving Using Machine Learning ASTIN Colloquium
19 pages
Vorlesung SLR PDF
No ratings yet
Vorlesung SLR PDF
169 pages
Econometric Modeling
No ratings yet
Econometric Modeling
38 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
SSRN Id3373116 PDF
No ratings yet
SSRN Id3373116 PDF
39 pages
Small Area Estimates of Labour Force Par
No ratings yet
Small Area Estimates of Labour Force Par
26 pages
Bootstrap
No ratings yet
Bootstrap
10 pages
Nonparametric Econometrics. A Primer
No ratings yet
Nonparametric Econometrics. A Primer
103 pages
Actuarial Applications of A Hierarchical Insurance
No ratings yet
Actuarial Applications of A Hierarchical Insurance
36 pages
INTRODUCTION TO ECONOMETRICS (Cap1) PDF
0% (1)
INTRODUCTION TO ECONOMETRICS (Cap1) PDF
32 pages
Fere
No ratings yet
Fere
46 pages
Ch11 - Slides - PA April 2024
No ratings yet
Ch11 - Slides - PA April 2024
27 pages
Econometrics Assig 1
0% (1)
Econometrics Assig 1
13 pages
Chapter-11Panel Data
No ratings yet
Chapter-11Panel Data
13 pages
Econometrics 2
No ratings yet
Econometrics 2
20 pages
ChainLadder (Revelant)
100% (1)
ChainLadder (Revelant)
51 pages
Chain Ladder
No ratings yet
Chain Ladder
51 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Chapter 0: Statistics For Economists: 1.1 Describing The Data
No ratings yet
Chapter 0: Statistics For Economists: 1.1 Describing The Data
4 pages
Claims Reserving With R
0% (1)
Claims Reserving With R
55 pages
Introduction To Panel Data
No ratings yet
Introduction To Panel Data
20 pages
Ch10 Slides .Econometrics - MBA
No ratings yet
Ch10 Slides .Econometrics - MBA
32 pages
Understanding Econometrics Basics
No ratings yet
Understanding Econometrics Basics
10 pages
Ch11 Slides
No ratings yet
Ch11 Slides
49 pages
Nonparametric Econometrics A Primer Jeffrey S Racine Download
No ratings yet
Nonparametric Econometrics A Primer Jeffrey S Racine Download
41 pages
4 Panel Data Regression
No ratings yet
4 Panel Data Regression
59 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Ch11 Panel PA Feb2021
No ratings yet
Ch11 Panel PA Feb2021
27 pages
BASIC ECONOMETRICS OLS Assumptions
No ratings yet
BASIC ECONOMETRICS OLS Assumptions
29 pages
CH0010790762166C66FD3C12577
No ratings yet
CH0010790762166C66FD3C12577
6 pages
CH03 Down But Not Out A CoC Approach To FV Risk Margins (2014)
No ratings yet
CH03 Down But Not Out A CoC Approach To FV Risk Margins (2014)
22 pages
Modern Actuarial Risk Theory Using R
No ratings yet
Modern Actuarial Risk Theory Using R
6 pages
Nora Roberts Stakleni Otok PDF
0% (1)
Nora Roberts Stakleni Otok PDF
85 pages
Pcelica 3 Radna Sveska Iz Matematike Za III Razred M Koraksic I G Markovic PDF
No ratings yet
Pcelica 3 Radna Sveska Iz Matematike Za III Razred M Koraksic I G Markovic PDF
61 pages
Pasiv U Engleskom Jeziku
No ratings yet
Pasiv U Engleskom Jeziku
8 pages
Astronomija
100% (1)
Astronomija
63 pages
A Tivlna
No ratings yet
A Tivlna
157 pages
Risks 07 00079 v2
No ratings yet
Risks 07 00079 v2
18 pages
Kondicional U Engleskom Jeziku
No ratings yet
Kondicional U Engleskom Jeziku
1 page
139-Article Text-619-1-10-20210130
No ratings yet
139-Article Text-619-1-10-20210130
18 pages
Monte Carlo R-Solutions
No ratings yet
Monte Carlo R-Solutions
42 pages
Zamenice U Engleskom Jeziku
No ratings yet
Zamenice U Engleskom Jeziku
2 pages
Biography of Eli Whitney
No ratings yet
Biography of Eli Whitney
1 page

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Hierarchcial Loss Reserving Guszcza Penultimate

Uploaded by

Hierarchical Models for Loss Reservingg

CAS Annual Meeting Jim Guszcza, FCAS, MAAA

Motivation Five Essential Features of Loss Reserving

Background Hierarchical Modeling Theory

Next Steps A Few Words on the Hierarchical Bayesian Approach

Copyright © 2008 Deloitte Development LLC. All rights reserved. 1

• Much loss reserving practice is “pre-theoretical” in nature.

• Traditional methods aren’t necessarily optimal from a statistical POV.

• Stochastic reserving: build statistical models of loss development.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 3

• Here is an actual Schedule P cumulative loss triangle.

• We would like to do stochastic reserving the “right”

• What considerations come to mind?

Cumulative Losses in 1000's

Copyright © 2008 Deloitte Development LLC. All rights reserved. 4

• The dataset is inherently longitudinal in nature.

• A “Bundle” of time series

Copyright © 2008 Deloitte Development LLC. All rights reserved. 5

• How many stochastic loss reserving techniques reflect all of these

1-3  We need to build hierarchical models

Cumulative Losses in 1000's

• Growth curves to model the loss development process (Clark 2003)

• Loss reserving treated as longitudinal data analysis (Guszcza 2008)

• Further using the hierarchical modeling framework to simultaneously

• Building a fully Bayesian model by assigning prior probability

Copyright © 2008 Deloitte Development LLC. All rights reserved. 7

Hierarchical Data Structures

• Hierarchical modeling is used when one’s data is grouped in

• Often grouped data is modeled either by:

• Hierarchical modeling offers a “middle way”.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 9

• Hierarchical models go by many different names

• We prefer the “hierarchical model” terminology because it

• An important special case of hierarchical models involves

Copyright © 2008 Deloitte Development LLC. All rights reserved. 10

• Classical Linear Model Yi = α + βXi + εi

• Random Intercept Model Yi = αj[i] + βXi + εi

• Random Intercept and Slope Model Yi = αj[i] + βj[i]Xi + εi

• But we could as easily

• Our model would not region5 region6 region7 region8

2007 2008 2009 2010 2007 2008 2009 2010

• Option 1: the YiPIF

region1 region2 region3 region4

• Sweep region under 2600

– Same α and β for every

region5 region6 region7 region8

2007 2008 2009 2010 2007 2008 2009 2010

• Option 2: random Yi ~ N (α j[i ] +PIF

• Yi = αj[i] + βXi + εi 2600

• This model has 9

region5 region6 region7 region8

2007 2008 2009 2010 2007 2008 2009 2010

• This model has 18

{α1, α2, …, α8,

region5 region6 region7 region8

2007 2008 2009 2010 2007 2008 2009 2010

Complete Pooling No Pooling

• Recall the random intercept model:

• Here is how the hyperparameters relate to the parameters:

• As σα0, the random intercept model  complete pooling

Copyright © 2008 Deloitte Development LLC. All rights reserved. 18

• Here is our Schedule P loss triangle:

chain link 2.365

• We want to build that

Cumulative Perccent of Ultimate

• We fit these curves to the

• Allows ultimate loss (ratio)

• Basic intuition is familiar: (CLAY,t) * (LDF) = Ult loss

12 36 60 84 108 132 156 180

• The “growth curve” part comes in by using G(t) instead of LDFs.

• The “hierarchical” part comes in because we can let LRAY, θ, and/or ω

• Our model so far:

• What if we want to include other random effects in the model?

• Here we add a “random scale” effect to let θ vary by AY.

Copyright © 2008 Deloitte Development LLC. All rights reserved. 23

Baseline Model Performance (