0% found this document useful (0 votes)

25 views36 pages

Linear Mixed Models in Actuarial Science

This document discusses linear mixed models (LMM) and their applications in actuarial science, particularly in predictive modeling using panel data. It covers model assumptions, specifications, estimation techniques, and provides examples using the R lme4 package. The document emphasizes the advantages of mixed models in handling clustered data and their relevance in actuarial practices such as credibility theory and ratemaking.

Uploaded by

ferventurato

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views36 pages

Linear Mixed Models in Actuarial Science

Uploaded by

ferventurato

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Linear mixed models

for predictive modelling

in actuarial science
Katrien Antonio, Yanwei Zhang

AFI_1385
Linear mixed models for predictive
modelling in actuarial science
∗ †
Katrien Antonio Yanwei Zhang

November 24, 2013

Chapter preview. We give a general discussion of linear mixed models and continue
with illustrating specific actuarial applications of this type of models. Technical details
on linear mixed models follow: model assumptions, specifications, estimation techniques
and methods of inference. We include three worked out examples with the R lme4 package
and use ggplot2 for the graphs. Full code is available from the book project’s web page.

1 Mixed models in actuarial science

1.1 What?
A first example of a linear mixed model. As explained in Chapter XXX , a [Reference to Chapter
on longitudinal data.]
panel data set follows a group of subjects (e.g. policyholders in an insurance portfolio)
over time. We therefore denote variables (e.g. yit , xit ) in a panel data set with double
subscripts, indicating the subject (say i) and the time period (say t). As motivated in
Section 1.2 of Chapter XXX, the analysis of panel data has several advantages. Panel
data allow to study the effect of certain covariates on the response of interest (as in
usual regression models for cross–sectional data), while accounting appropriately for the
dynamics in these relations. For actuarial ratemaking the availability of panel data is of
particular interest in light of a posteriori rating. An a posteriori tariff predicts the current
year loss for a particular policyholder, using (among others) the dependence between the
current year’s loss and losses reported by this policyholder in previous years. Credibility
theory, being a cornerstone of actuarial mathematics, is an example of such an a posteriori
rating system. Section 2 in Chapter XXX presents a sequence of models suitable for the
analysis of panel data in the context of linear models. Recall in particular the well–known
linear regression model with common intercept (or: cross–sectional model) (see ‘Linear
Model 1’ in Chapter XXX, Section 2)
0
Eyit = α + xit β. (1)
∗
University of Amsterdam and KU Leuven (Belgium), email: [Link]@[Link]
†
University of Southern California, email: actuary zhang@[Link]

1
This model completely pools the data, ignores the panel structure and produces identical
estimates for all subjects i (for a given xit ). The linear fixed effects model (‘Linear Model
2’ in Chapter XXX, Section 2) specifies
0
Eyit = αi + xit β, (2)

where each subject i has its own unknown - but fixed - intercept αi . Hence, the name
fixed effects model. Independence among all observations is assumed, and Var(yit ) = σ 2 .
This regression model does not pool information and estimates each αi separately using
least squares or maximum likelihood. This approach often results in overfitting and
unreasonable α̂i0 s (see Gelman (2006)). The linear random effects model (see ‘Linear
Model 3’ in Chapter XXX) is an alternative approach, balancing between no pooling and
complete pooling of data. It allows for random intercepts, with model equation
0
yit = αi + xit β + it , (3)

where it ∼ (0, σ2 ) 1 . The subject specific intercept αi is now a random variable with
zero mean and variance σα2 . Hence the name random effects model. Moreover, the
model in (3) is a first example of a linear mixed model ([LMM]), with a combination
(‘mix’ ) of fixed and random effects in the linear predictor. The errors it with variance
σ2 structure variability within subject i, whereas the random intercepts with variance σα2
represent variation between subjects. Compared with the no pooling and complete pooling
examples, the linear mixed model has many interesting features, as is explained below.
Mixed or multilevel models for clustered data. Panel data is a first example of so–
[Reference to Chapter called clustered data. As mentioned in Section 4 in Chapter XXX , predictive modeling
on longitudinal and
panel data] in actuarial science (and in many other statistical disciplines) will confront analysts with
data structures going beyond the cross–sectional as well as panel data design. Section 3
in this chapter includes multiple motivating examples. Mixed (or: multilevel) models are
statistical models suitable for the analysis of data structured in nested (i.e. hierarchical )
or non–nested (i.e. cross–classified, next to each other instead of hierachically nested)
clusters or levels. In this chapter we explain the use of linear mixed models for
[Reference to Chapter multilevel data. A discussion of non–linear mixed models follows in Chapter XXX .
on non-linear mixed
models.] Chapter XXX (on longitudinal and panel data), XXX (on credibility), and XXX (on
spatial statistics) in this book include additional examples of clustered data and their
[Reference to Chapters analysis with mixed models.
on credibility, longitu-
dinal data and spatial
stats.] Textbook examples. A standard textbook example of multilevel data is the ‘students
in schools’ data structure. Extended versions are the ‘students in classes in schools’
or ‘students followed repeatedly over time, in classes in schools’ examples, where each
example is adding an extra level of observations to the data hierarchy. Connecting with
the actuarial audience of this book, we consider the example of a collection of vehicles j
(with j = 1, . . . , ni ) insured under fleets i (with i = 1, . . . , m). Let yij be the loss observed

1
The notation it ∼ (0, σ2 ) implies E[it ] = 0 and Var[it ] = σ2 .

2
Linear mixed models for predictive modelling in actuarial science

for vehicle j in fleet i (in a well defined period of exposure). Denote with x1,ij covariate
information at vehicle–level (our level 1). x1,ij is, for example, the cubic capacity or
vehicle age of car j in fleet i. x2,i is a predictor at fleet–level (our level 2). x2,i could,
for example, refer to the size of the fleet, or the business in which the fleet is operating.
The so–called varying intercepts model is a basic example of a multilevel model. It
combines a linear model at vehicle–level (i.e. level 1)

yij = βi + β1,0 + x1,ij β1,1 + 1,ij , j = 1, . . . , ni , (4)

with a linear model at fleet–level (i.e. level 2)

βi = 2,i , i = 1, . . . , m, (5)

or, when fleet–specific information is available,

βi = x2,i β2 + 2,i , i = 1, . . . , m. (6)

Here 2,i ∼ (0, σ22 ) and 1,ij ∼ (0, σ12 ) are mean zero, independent error terms, representing
variability (or heterogeneity) at both levels in the data. Written as a single model
equation, the combination of (4) and, for example, (5), is:

yij = β1,0 + 2,i + x1,ij β1,1 + 1,ij . (7)

This regression model uses an overall intercept, β1,0 , a fleet–specific intercept, 2,i , a
vehicle–level predictor x1,ij with corresponding regression parameter, β1,1 , and an error
term 1,ij . We model the fleet–specific intercepts, 2,i , as random variables. This allows
to reflect heterogeneity between fleets in an efficient way, even for a large number
of fleets. Indeed, by assigning a distribution to these error terms, we basically only
need an estimate for the unknown parameters (i.e. the variance component σ22 ) in their
distribution. The other regression parameters, β1,0 and β1,1 , are considered fixed (in
frequentist terminology); we do not specify a distribution for them. The model in (7) is –
again – an example of a linear mixed model ([LMM]). Mixed refers to the combination
of fixed and random effects, combined in a model specification which is linear in the
random (2,i ) as well as in the fixed effects (β1,0 and β1,1 ). Allowing for varying slopes
and intercepts results in the following model equations

yij = βi,0 + x1,ij βi,1 + β1,0 + x1,ij β1,1 + 1,ij , i = 1, . . . , m, j = 1, . . . , ni , (8)

with

βi,0 = 2,i,0 ,
βi,1 = 2,i,1 . (9)

3
Written as a single model equation, this multilevel model becomes

yij = β1,0 + 2,i,0 + x1,ij β1,1 + x1,ij 2,i,1 + 1,ij . (10)

Besides having random intercepts (2,i,0 ), the model also allows the effect of predictor x1,ij
on the response to vary by fleet. This is modelled here by the random slopes 2,i,1 .

Main characteristics and motivations. The varying intercepts and varying

slopes examples reveal the essential characteristics of a multilevel model: (1) varying
coefficients and (2) a regression model for these varying coefficients (possibly using group–
level predictors). Motivations for using multilevel modeling are numerous (see Gelman
and Hill (2007)); we illustrate many of them throughout this chapter. With data often
being clustered (e.g. students in schools, students in classes in schools, cars in fleets,
policyholder data over time, policies within counties, . . .), statistical methodology should
reflect the structure in the data and use it as relevant information when building statistical
models. Using traditional (say linear or generalized linear models, as in Chapter XXX and
XXX) regression techniques, the clustering in groups is either ignored (‘complete pool-
ing’ ) or groups are analyzed separately (‘no pooling’ ) resulting in overfitting because
even small clusters will get their own regression model. The multilevel model enhances
both extremes, e.g. in the varying intercepts model from (7) complete pooling corresponds
with σ22 → 0 and σ22 → ∞ with no pooling. Multilevel modeling is a compromise between
these two extremes, known as partial pooling. In this case, we impose a distributional
assumption on 2,i (with variance σ22 ) and estimate σ22 from the data. This allows taking
heterogeneity between clusters into account, making appropriate cluster–specific predic-
tions and structuring the dependence between observations belonging to the same cluster.
Moreover, predictions related to new clusters become readily available. Whereas in clas-
sical regression cluster–specific indicators can not be included along with cluster–specific
predictors, multilevel models allow doing this in a convenient way (see (6)). When spec-
ifying regression models at different levels in the data, interactions between explanatory
variables at different levels (so–called cross–level effects) may appear. The latter is often
mentioned as another advantage of multilevel models.

What’s in a name?: labels and notation. Multilevel models carry many labels
in statistical literature. They are sometimes called hierarchical, because data are often
hierarchically structured (see the students in schools example) and because of the hierar-
chy in the model specifications. However, non–nested models, with levels structured next
to each other, instead of hierarchically nested, can also by analyzed with the multilevel
methodology. Multilevel models are also known as random effects or mixed models,
since they combine (a mix of) fixed and random effects. This distinction is only applica-
ble when using frequentist methodology and terminology. A Bayesian analysis treats all
regression parameters as random variables, specifying an appropriate prior distribution
for each parameter. Besides terminology, mathematical notation can be very different
among statistical sources. This should not be a surprise, taking into account that multi-
level models can be formulated for basically any number of levels, involving nested and

4
Linear mixed models for predictive modelling in actuarial science

non–nested group (or: cluster) effects, predictor information at different levels, and so on.
For instance, Gelman and Hill (2007) denote the varying coefficients and varying slopes
models in (4)+(6) and (10), respectively, in a more intuitive way:

yi = αj[i] + βxi + i , i = 1, . . . , N
αj = a + buj + ηj , j = 1, . . . , m, (11)

and

yi = αj[i] + βj[i] xi + i , i = 1, . . . , N
αj = a0 + b0 uj + ηj1 , j = 1, . . . , m
βj = ηj2 . (12)

Observations in the data set are indexed with i, where N is the total number of obser-
vations. j denotes the fleets in the data set, and j[i] is the fleet to which observation i
belongs. xi refers to covariate information available at vehicle–level (i.e. level 1 in (4))
and uj refers to covariate information available at fleet–level (i.e. level 2 in (6)).
The notation used from Section 2 on is motivated by generality, and inspired by Frees
(2004a). This notation allows writing down model equations in a structured way, with
clear reference to the particular level in the data to which the parameter/predictor is
attached. Moreover, this notation can be used for any number of levels in a concise
way. Section 2 explains the connection between this particular notation and the matrix
notation (and corresponding manipulations) often developed in statistical literature on
mixed models. When discussing examples, we replace this general notation with a more
intuitive one, explicitly referring to the structure of the data under consideration.
1.2 Why?: motivating examples from actuarial science
Research on mixed models originated in bio- and agricultural statistics. For example,
the topic of variance components models, a particular example of models with random
effects (see Searle et al. (2008)), was studied extensively in the context of animal breeding
experiments. The following (non–exhaustive) list of illustrations should convince the
reader of the usefulness of mixed models as a modeling tool in actuarial science, with
applications ranging from ratemaking to reserving and smoothing. We will deploy some
of these examples within the framework of linear mixed models, while others are more
appropriate for analysis with generalized linear mixed models (see Chapter XXX). [Reference to Chapter
on non–linear mixed
models.]
Illustration 1 (Credibility models). Credibility theory is an a posteriori ratemaking tech-
nique. Credibility models are designed for the prediction of an insured’s risk premium, by
weighting the insured’s own loss experience and the experience in the overall portfolio. An
extensive discussion of credibility models is available in Chapter XXX in this book. Credi- [Reference to Chapter
on credibility.]
bility models have a natural and explicit interpretation as special examples of mixed models.
Frees et al. (1999) demonstrate this connection, by reinterpreting credibility models using
mixed model parlance. This mapping highly increases the accessibility and usefulness of
such models. Indeed, the complete machinery (including computational methods and soft-

5
ware) of mixed models becomes available for the analysis of these actuarial models. The
famous Hachemeister data set (see Hachemeister (1975)) has often been used in credibility
literature. This data set considers 12 periods, from the third quarter of 1970 to the second
quarter of 1973, of bodily injury losses covered by a private passenger auto insurance. For
5 states the total loss and corresponding number of claims are registered. Figure 1 shows
[Reference to Chapter a trellis plot (see Chapter XXX) of the average loss per claim (in black), followed over
on longitudinal and
panel data.] time, per state. The plot also shows a linear regression line (in blue) and corresponding
confidence intervals (in grey). In Section 3 we use linear mixed models to analyze this
data set and predict the next year’s average claim per state. Further analysis – with focus
[Reference to Chapter on credibility theory – follows in Chapter XXX.
on credibility.]

1 2 3
●
●
●

●
●
● ●
● ●
●
●
●
●
● ●
●
● ●
Ratio (i.e. average loss per claim)

● ●

● ● ● ● ●
● ●
●
●
●
● ●
● ● ●

4 5
●

●
●

● ●
● ●
●

● ● ●
●
● ●
●

●
● ●
●
●
●

2.5 5.0 7.5 10.0 12.5 2.5 5.0 7.5 10.0 12.5
Period

Figure 1: Trellis plot of average losses per period (in black) and a linear regression line
(in blue) with corresponding confidence intervals (in grey); each panel represents
one state: Hachemeister data.

Illustration 2 (Workers’ Compensation Insurance: losses). The data set is from the
National Council on Compensation Insurance (USA) and contains losses due to permanent
partial disability (see Klugman (1992)). 121 occupation or risk classes are observed over a
period of 7 years. The variable of interest is the Loss paid out (on a yearly basis) per risk

6
Linear mixed models for predictive modelling in actuarial science

class. Possible explanatory variables are Year and Payroll. Frees et al. (2001) and Antonio
and Beirlant (2007) present mixed models for the pure premium, PP=Loss/Payroll. For
a random subsample of 10 risk classes, Figure 2 shows the time series plot of Loss (left)
and corresponding Payroll (right).

●
1.25e+08 ●
● ●

●
● ●
●
1.00e+08 ●
2e+06 ●
●
● ●

● ●
7.50e+07 ●

Payroll
●
Loss

●
●
●

5.00e+07 ●
1e+06 ●
● ● ● ●
● ●
● ● ●
● ● ●
● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
●
●
●
●
●
●
●
●
2.50e+07 ●
● ● ●
●
● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
0e+00
0.00e+00
2 4 6 2 4 6
Year Year

Figure 2: Time series plot of losses (left) and payroll (right) for a random sample of 10
risk classes: workers’ compensation data (losses).

Illustration 3 (Workers’ Compensation Insurance: frequencies). The data are from Klug-
man (1992) (see Scollnik (1996), Makov et al. (1996) and Antonio and Beirlant (2007) for
further discussion). Frequency counts in workers’ compensation insurance are observed
on a yearly basis for 133 occupation classes followed during 7 years. Count is the response
variable of interest. Possible explanatory variables are Year and Payroll, a measure of expo-
sure denoting scaled payroll totals adjusted for inflation. Figure 3 shows exploratory plots
for a random subsample of 10 occupation classes. Statistical modeling should take into
account the dependence between observations on the same occupation class and reflect the
heterogeneity between different classes. In ratemaking (or tarification) an obvious ques-
tion for this example would be: ‘What is the expected number of claims for a risk class in
the next observation period, given the observed claims history of this particular risk class
and the whole portfolio?’. Since the response variable in this example is claim frequency,
we will analyze this data set within the context of Generalized Linear Mixed instead of
Linear Mixed Models (in Chapter XXX). [Reference to Chapter
on non–linear mixed
models.]
Illustration 4 (Hierarchical data structures). With panel data a group of subjects is fol-
lowed over time, as in Illustrations 2 and 3. This is a basic and widely studied example
of hierarchical data. Obviously, more complex structures may occur. Insurance data of-
ten come with some kind of inherent hierarchy. Motor insurance policies grouped in
zip codes within counties within states are one example. Workers’ compensation or fire
insurance policies operating in similar industries or branches is another one. Consider
e.g. the manufacturing versus education branch, with employees in manufacturing firms
indicating larger claims frequencies, and restaurants versus stores, with restaurants hav-
ing a higher frequency of fire incidents than stores, and so on. A policy holder holding

7
●
5000 ●
100
● ●

4000
●
75 ●
● ●
● ●
●
● ● ●
● ●
●
● 3000
●

Payroll
Count

● ●
● ●

50 ●
●
●
● ● ●
● ●
● ●
● 2000 ●
●
● ● ●
●
● ●
● ●
● ●
● ●
25 ● ● ●
●
● ●
●
● ●
● 1000
● ● ● ● ●
● ● ● ●
●
● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
0 ● ● ● ● ● ●
0
2 4 6 2 4 6
Year Year

Figure 3: Time series plot of counts (left) and payroll (right) for a random sample of 10
risk classes: workers’ compensation data (counts).

multiple policies (e.g. for theft, motor, flooding, . . .), followed over time, within the same
company, is an example of a hierarchical data structure studied in the context of multidi-
mensional credibility (see Bühlmann and Gisler (2005)). Another detailed multilevel
analysis (going beyond the panel data structure) is Antonio et al. (2010). These authors
model claim count statistics for vehicles insured under a fleet policy. Fleet policies
are umbrella–type policies issued to customers whose insurance covers more than a single
vehicle. The hierarchical or multilevel structure of the data is as follows: vehicles (v)
observed over time (t), nested within fleets (f ), with policies issued by insurance compa-
nies (c). Multilevel models allow for incorporating the hierarchical structure of the data
by specifying random effects at vehicle, fleet and company levels. These random effects
represent unobservable characteristics at each level. At vehicle level, the missions assigned
to a vehicle or unobserved driver behavior may influence the riskiness of a vehicle. At
fleet level, guidelines on driving hours, mechanical check-ups, loading instructions and so
on, may influence the number of accidents reported. At insurance company level, under-
writing and claim settlement practices may affect claims. Moreover, random effects allow
a posteriori updating of an a priori tariff, by taking into account the past performance of
vehicle, fleet and company. As such, these models are relevant for a posteriori or experi-
ence rating with clustered data. See Antonio et al. (2010) and Antonio and Valdez (2012)
for further discussion.

Illustration 5 (Non–nested or cross–classified data structures). Data may also be struc-

tured in levels which are not nested or hierarchically structured, but instead act next to
each other. An example is the data set from Dannenburg et al. (1996) on private loans
from a credit insurer. The data are payments of the credit insurer to several banks for
covering losses caused by clients who were no longer able to pay off their loans. These
payments are categorized by civil status of the debtors and their work experience. The civil
status is single (1), divorced (2) or other (3), and the work experience is less than two

8
Linear mixed models for predictive modelling in actuarial science

years (< 2, category 1), from 2 up to 10 years (≥ 2 and < 10, category 2) and more than
then years (≥ 10, category 3). Table 1 shows the number of clients and the average loss
paid per risk class.

experience experience
status 1 2 3 status 1 2 3
1 40 43 41 1 180.39 246.71 261.58
2 54 53 48 2 172.05 232.67 253.22
3 39 39 44 3 212.30 269.56 366.61

Table 1: Number of payments (left) and average loss per combination of status and expe-
rience risk class: credit insurance data.

Boxplots of the observed payments per risk class are in Figure 4. Using linear mixed
models we estimate the expected loss per risk category, and compare our results with the
credibility premiums derived by Dannenburg et al. (1996).

800

600
●

●
Payment

●
400

200

1:1 1:2 1:3 2:1 2:2 2:3 3:1 3:2 3:3

Status:Experience

Figure 4: Boxplots of payments versus combination of status and experience: credit insur-
ance data.

9
Illustration 6 (Loss reserving). Zhang et al. (2012) analyze data from the workers’
compensation line of business of 10 large insurers, as reported to the National Association
of Insurance Commissioners 2 . Common accident years available are from 1988 to 1997.
Losses are evaluated at 12–month intervals, with the highest available development age
being 120 months. The data have a multilevel structure with losses measured repeatedly
over time, among companies and accident years. A plot of the cumulative loss over time
for each company clearly shows a nonlinear growth pattern, see Figure 5. Predicting the
development of these losses beyond the range of the available data, is the major challenge
in loss reserving. Figure 5 reveals that the use of a nonlinear growth curve model is an
interesting path to explore. Random effects will be included to structure heterogeneity
among companies and between accident years.

8
12

10
Comp #1 Comp #2 Comp #3 Comp #4 Comp #5
Cumulative loss

1
2
3
4
Comp #6 Comp #7 Comp #8 Comp #9 Comp #10 5
6
7
8
12

8
10

Evaluation time

Figure 5: Observed growth of cumulative losses for the 10 companies in study. The colored
lines represent accident years.

2 Linear mixed models

This Section is based on Verbeke and Molenberghs (2000), McCulloch and Searle (2001),
Ruppert et al. (2003), Czado (2004) and Frees (2004a).

2
NAIC is a consortium of state–level insurance regulators in the United States.

10
Linear mixed models for predictive modelling in actuarial science

2.1 Model assumptions and notation

The basic linear model specifies E[Y ] = Xβ with Y the response vector, β the vector of
regression parameters and X the model design matrix. In traditional statistical parlance,
all parameters in β are fixed, i.e. no distribution is assigned to them. They are unknown,
but fixed constants that should be estimated. In a linear mixed model we start from
Xβ, but add Zu to it, where Z is a model matrix, corresponding with a vector of
random effects u. A distribution is specified for this random effects vector u with mean
zero and covariance matrix D. As discussed in Section 1 and illustrated below, these
random effects structure between–cluster heterogeneity and within–cluster dependence.
All together, textbook notation for linear mixed models is as follows 3

y = Xβ + Zu +
u ∼ (0, D)
∼ (0, Σ), (13)

with a N × 1 vector of error terms with covariance matrix Σ (see below for examples),
which is independent of u. This is the hierarchical specification of a linear mixed model.
For given u the conditional mean and variance are

E[y|u] = Xβ + Zu,
Var[y|u] = Σ. (14)

The combined, unconditional or marginal model states

0
y ∼ (Xβ, V := ZDZ + Σ), (15)

showing that fixed effects enter the (implied) mean of Y and random effects structure the
(implied) covariance matrix of y.
Usually, normality is assumed for u and , thus

u 0 D 0
∼N , . (16)
0 0 Σ

With these distributional assumptions the hierarchical LMM becomes

y|u ∼ N (Xβ + Zu, Σ)

u ∼ N (0, D). (17)

This implies the marginal model y ∼ N (Xβ, V ), but not vice versa. When interest is
only in the fixed effects parameters β the marginal model can be used. With explicit
interest in β and u the specification in (13) and (17) should be used.

3
The notation u ∼ (0, D) implies E[u] = 0 and Var[u] = D.

11
Illustrations 7 and 8 below focus on particular examples of 2 and 3 level data and explain
in detail the structure of vectors and matrices in (13) and (15).

Illustration 7 (A 2–level model for longitudinal data.). Yij represents the jth measure-
ment on a subject i (with i = 1, . . . , m and j = 1, . . . , ni ). m is the number of subjects
under consideration and ni the number of observations registered on subject i. xij (p × 1)
is a column vector with fixed effects’ covariate information from observation j on subject i.
Correspondingly, z ij (q × 1) is a column vector with covariate information corresponding
with random effects. β (p × 1) is a column vector with fixed effects parameters and ui
(q × 1) is a column vector with random effects regression parameters. These are subject–
specific and allow to model heterogeneity between subjects. The combined model is
0 0
yij = xij β + z ij ui + ij . (18)
|{z} | {z } |{z}
fixed random random

The distributional assumptions for the random parts in (18) are

ui ∼ (0, G) G ∈ Rq×q
i ∼ (0, Σi ) Σi ∈ Rni ×ni . (19)

The covariance matrix G is left unspecified, i.e. no particular structure is implied. Var-
ious structures are available for Σi . Very often just a simple diagonal matrix is used:
Σi := σ 2 Ini . However, when the inclusion of random effects is not enough to capture the
dependence between measurements on the same subject, we can add serial correlation to the
model and specify Σi as non–diagonal (e.g. unstructured, Toeplitz or autoregressive struc-
ture, see Verbeke and Molenberghs (2000) for more discussion). u1 , . . . , um , 1 , . . . , m
are independent. Typically, normality is assumed for both vectors, as in (17). In vector
notation we specify

y i = X i β + Z i u i + i , i = 1, . . . , m,
ui ∼ (0, G)
i ∼ (0, Σi ), (20)

where
 0   0   
xi1 z i1 yi1
X i :=  ...  ∈ Rni ×p , Z i =  ...  ∈ Rni ×q , y i =  ...  ∈ Rni ×1 . (21)
     
0 0
xini z ini yini

12
Linear mixed models for predictive modelling in actuarial science

Combining all subjects or clusters i = 1, . . . , m, (13) is the matrix formulation of this

LMM for longitudinal data (with N = m
P
i=1 ni the total number of observations)
   
X1 1
 
y1
y =  . . .  ∈ RN ×1 , X =  ...  ∈ RN ×p , =  ...  ∈ RN ×1 ,
   
ym Xn m
 
Z1 0n1 ×q . . . 0n1 ×q  
 0n ×q u 1
2 Z2 
 . 
Z =   ∈ RN ×(m·q) , u =  ..  ∈ R(m·q)×1 . (22)
 
.. ..
 . . 
um
0nm ×q Zm

The covariance matrix of the combined random effects vector u on the one hand, and the
combined residual vector on the other hand, are specified as:
   
G Σ1
D = 
 ... ∈R
 m·q×m·q
, Σ=
 ... ∈R
 N ×N
. (23)
G Σm

Covariance matrix V in this particular example is block diagonal and given by

0
V = ZDZ + Σ
 0 
Z 1 GZ 1 + Σ1 . . . 0
= 
 ... 

0
0 Z m GZ m + Σm
 
V1
=  .. , (24)
.
 

Vm
0
with V i = Z i GZ i + Σi .

Illustration 8 (A 3–level example.). yijk is the response variable of interest, as observed

for, say, vehicle k, insured in fleet j by insurance company i. At vehicle level (or: level
1) we model this response as:
0 0
yijk = z 1,ijk β ij + x1,ijk β 1 + 1,ijk . (25)

Hereby, predictors z 1,ijk and x1,ijk may depend on insurance company, fleet or vehicle.
β 1 is a vector of regression parameters which will not vary by company nor fleet; they are
fixed effects regression parameters. Parameters β ij vary by company and fleet. We model
them in a level 2–equation:

β ij = Z 2,ij γ i + X 2,ij β 2 + 2,ij . (26)

13
X 2,ij and Z 2,ij may depend on company or fleet, but not on the insured vehicle. The
regression parameters in γ i are company–specific and modeled in (27):

γ i = X 3i β 3 + 3i , (27)

where the predictors in X 3i may depend on company, but not on fleet or vehicle. The
combined level 1, 2 and 3 models lead to the following model specification:
0 0
Yijk = z1,ijk (Z 2,ij (X 3,i β 3 + 3i ) + X 2,ij β 2 + 2,ij ) + x1,ijk β 1 + 1,ijk
0 0
= xijk β + z ijk uij + 1,ijk , (28)
0 0 0 0 0 0 0 0 0
where xijk = (x1,ijk z 1,ijk X 2,ij z 1,ijk Z 2,ij X 3i ), β = (β 1 β 2 β 3 ) , z i,j,k =
0 0 0 0 0
(z 1,i,j,k z 1,i,j,k Z 2,ij ) and uij = (2,ij 3,i ) . Formulating this 3–level model in matrix
notation follows from stacking all observations Yijk .

More examples of LMM specifications are in McCulloch and Searle (2001). A standard
notation for a k–level model is in Frees (2004a) (Appendix 5A).

2.2 The structure of random effects

Since the random effects u often correspond to factor predictors, the design matrix Z is
often highly sparse, with a high proportion of elements to be exactly zero. Moreover, the
covariance matrix D is highly structured and depends on some parameter vector θ that
is to be estimated.

Single random effect per level. This is the simplest yet most common case
where the random effect corresponds to a certain level of a single grouping factor.
For example, we may have the state indicator in the model and each state has its
own intercept, i.e. y ~ (1|state) (in R parlance). We illustrate this structure in
Section 3 with the workers’ compensation losses data.

Multiple random effects per level. Another common case is that the model has
both random intercepts and random slopes that vary by some grouping factor. For
example, each state in the model has its own intercept and also its own slope with
respect to some predictor, i.e., y ~ (1 + time|state). In general, the multiple
random effects are correlated, and so the matrix D is not diagonal. We illustrate
this structure in Section 3 with the workers’ compensation losses data.

Nested random effects. In the nested classification, some levels of one factor
occur only within certain levels of a first factor. For example, we may have obser-
vations within each county, and then the counties within each state. The county
from state A never occurs for state B, so counties are nested within states, forming
a hierarchical structure, i.e., y ~ (1|county/state). Antonio et al. (2010) is an
example of this type of structuring.

14
Linear mixed models for predictive modelling in actuarial science

Crossed random effects. This happens when each level of each factor may oc-
cur with each level of each other factor. For example, we may have both state
and car make in the model, cars of different makes can occur with each state, i.e.,
y ~ (1|state) + (1|make). The credit insurance example in Section 3 is an ex-
ample of crossed random effects.

2.3 Parameter estimation, inference and prediction

Mixed models use a combination of fixed effects regression parameters, random effects
and covariance matrix parameters (also called: variance components). For example, in
the varying intercepts example from (4) and (5), β1,0 and β1,1 are regression parameters
corresponding with fixed effects, σ12 and σ22 are variance components and 2,i (i = 1, . . . , m)
are the random effects. We will use standard statistical methodology, like maximum
likelihood, to estimate parameters in a LMM. For the random effects we apply statistical
knowledge concerning prediction problems, see McCulloch and Searle (2001) (Chapter 9)
for an overview. The difference in terminology stems from the non–randomness of the
parameters versus the randomness of the random effects.

We first derive an estimator for the fixed effects parameters in β and a predictor for the
random effects in u, under the assumption of known covariance parameters in V (see
(15)).

Estimating β. The Generalized Least Squares ([GLS]) estimator – which coincides

with the maximum likelihood estimator ([MLE]) under normality (as in (17)) – of β is:
0 0
β̂ = (X V −1 X)−1 X V −1 y. (29)

See Frees (2004a) or Czado (2004) for a formal derivation of this result.

Predicting u. In the sense of minimal Mean Squared Error of Prediction ([MSEP])

the best predictor ([BP]) of u is the conditional mean E[u|Y ]. This predictor obviously
requires knowledge of the conditional distribution u|Y . The BP is often simplified by
restricting the predictor to be a a linear function of Y : the Best Linear Predictor ([BLP]).
The BLP of a random vector u is

BLP[u] = û = E[u] + CV −1 (y − E[y]), (30)

0
where V = Var(y) and C = Cov(u, y ). BP(u) and BLP(u) are unbiased, in the sense
that their expected value equals E[u]. Normality is not required in BP or BLP, but with
(y u) multivariate normally distributed, the BP and BLP coincide. See McCulloch and
Searle (2001) (Chapter 9) for more details.

15
In the context of the LMM sketched in (17) the predictor of u is usually called the Best
Linear Unbiased Predictor ([BLUP]). Robinson (1991) describes several ways to derive
this BLUP. For instance, under normality assumptions:
0 0
Cov(y, u ) = Cov(Xβ + Zu + , u )
0 0 0
= Cov(Xβ, u ) + ZVar(u, u ) + Cov(, u )
= ZD,

which leads to the multivariate normal distribution

y Xβ V ZD
∼ N , 0 . (31)
u 0 DZ D

4
Using either properties of this distribution or the result in (30) the BLUP of u follows:
0
BLUP(u) := û = DZ V −1 (y − Xβ). (32)

Of course, (32) relies on the (unknown) vector of fixed effects β, as well as on unknown
covariance parameters in V . Replacing both with their estimates, we call the BLUP an
empirical or estimated BLUP. Estimated BLUPs are confronted with multiple sources
of variability: variability from the estimation of (β, u) and from the estimation of V .
Histograms and scatter plots of components of û are often used to detect outlying clusters,
or to visualize between–cluster heterogeneity.

A unified approach: Henderson’s justification. Maximizing the joint log likelihood

0 0 0
of (y , u ) (see assumptions (17)) with respect to (β, u) leads to Henderson’s mixed model
equations:

f (y, u) = f (y|u) · f (u)

1 0 1 0
∝ exp − (y − Xβ − Zu) Σ (y − Xβ − Zu) · exp (− u D −1 u).(33)
−1
2 2

It is therefore enough to minimize

0 0
Q(β, u) := (y − Xβ − Zu) Σ−1 (y − Xβ − Zu) + u Du, (34)

which corresponds to solving the set of equations

∂ ∂
Q(β, u) = 0 and Q(β, u) = 0
∂β ∂u
0 0
! 0
!
X Σ−1 X X Σ−1 Z X Σ−1 y

β̃
⇔ 0 0 = 0 . (35)
Z Σ−1 X Z Σ−1 Z + D −1 ũ Z Σ−1 y

4 Y µY ΣY ΣY Z
Namely: with X = ∼N , we know Z|Y ∼ N (µZ|Y , ΣZ|Y )
Z µZ ΣZY ΣZ
where µZ|Y = µZ + ΣZY Σ−1 −1
Y (Y − µY ) and ΣZ|Y = ΣZ − ΣZY ΣY ΣY Z .

16
Linear mixed models for predictive modelling in actuarial science

(29) and (32) solve this system of equations.

More on prediction. With β̂ from (29) and û from (32), the profile of cluster i is
predicted by

ŷ i := X i β̂ + Z i ûi
0
= X i β̂ + Z i DZ i V −1
i (Y i − X i β̂)
= Σi V −1 −1
i X i β̂ + (I ni − Σi V i )Y i , (36)
0
using V i = Z i DZ i + Σi and ni the cluster size. ŷ i is a weighted mean of the global
profile X i β̂ and the data observed on cluster i, y i . ŷ i is a so–called shrinkage estimator.
Actuaries will recognize a credibility type formula in (36).
The prediction of a future observation is discussed in detail in Frees (2004b) (Sec-
tion 4.4). The case of non–diagonal residual covariance matrices Σi requires special at-
0 0
tention. For instance, with panel data the BLUP for yi,Ti +1 is xi,Ti +1 β + z i,Ti +1 ûi +
BLUP(i,Ti +1 ). From (30) we understand that the last term in this expression is zero [Here we connect with
Chapter 17 from the
when Cov(i,Ti +1 , i ) = 0. This is not the case when serial correlation is taken into ac- book.]

count. Chapter XXX of this book (on Credibility and Regression Modeling) carefully
explains this kind of prediction problems.

Estimating variance parameters. The parameters or variance components used in

V are in general unknown and should be estimated from the data. With θ the vector
0
of unknown parameters used in V = ZD(θ)Z + D(θ), the log–likelihood for (β, θ) is
(with c a constant)

`(β, θ) = log {L(β, θ)}

1 0

= − ln |V (θ)| + (y − Xβ) V (θ)−1 (y − Xβ) + c. (37)
2
Maximizing (37) with respect to β and with θ fixed, we get
0 0
β̂(θ) = (X V (θ)−1 X)−1 X V (θ)−1 y. (38)

We obtain the so–called profile log–likelihood by replacing β in (37) with β̂ from (38)

`p (θ) := `(β̂, θ)
1n 0 −1
o
= − ln |V (θ)| + (y − X β̂(θ)) V (θ) (y − X β̂(θ)) . (39)
2

Maximizing this profile log–likelihood with respect to θ gives the maximum likelihood
estimates θ̂ MLE of the variance components in θ.
With LMMs Restricted (or Residual) maximum likelihood (REML) is a popular alter-
native to estimate θ. REML accounts for the degrees of freedom used for fixed effects
estimation. McCulloch and Searle (2001) (Section 6.10) is an overview of important ar-
guments in the discussion ‘ML versus REML?’. For example, estimates with REML (for

17
balanced data) are minimal variance unbiased under normality 5 , and are invariant to the
value of β. The REML estimation of θ is based on the marginal log–likelihood obtained
by integrating out the fixed effects in β:
Z
`r (θ) := ln L(β, θ)dβ , (41)

where (see Czado (2004))

Z Z
1 −1/2 1 0 −1
L(β, θ)dβ = |V (θ)| exp − (y − Xβ) V (θ) (y − Xβ) dβ
(2π)N/2 2
..
.
1 0
= `p (θ) − ln X V (θ)−1 X + constants. (42)
2

2.3.1 Standard errors and inference

Estimation of standard errors. In the marginal model y ∼ N (Xβ, V (θ)), the co-
variance of β̂ in (29) is
0
Cov(β̂) = (X V −1 (θ)X)−1 , (43)

where Cov(y) = V (θ) is used. Replacing the unknown θ with its ML or REML estimate
0 −1
θ̂ and using V̂ := V (θ̂), a natural estimate for Cov(β̂) is (X V̂ X)−1 . However, this
estimate ignores the extra variability originating from the estimation of θ. Kacker and
Harville (1984) (among others) discuss attempts to quantify this extra variability through
approximation, but only a fully Bayesian analysis allows to account for all sources of
variability (see Chapter XXX where we demonstrate a Bayesian analysis of a Generalized
Linear Mixed Model).

The covariance of the empirical BLUP in (32) is equal to

0
Cov(û) = Cov(DZ V −1 (y − X β̂))
0 −1 0
0 −1 −1 −1 −1
= DZ V − V X X V X XV ZD. (44)

5
A well known example of ‘REML versus ML’ considers the case of a random sample X1 , . . . , XN ∼
N (µ, σ 2 ). The resulting estimators for the unknown variance σ 2 are
N N
2 1 X 1 X
σ̂M L = (Xi − X̄)2 , 2
σ̂REM L = (Xi − X̄)2 , (40)
N i=1 N − 1 i=1

with X̄ the sample mean. The REML estimator is unbiased for σ 2 . The (N − 1) in σ̂REM
2
L accounts
for the estimation of µ by X̄.

18
Linear mixed models for predictive modelling in actuarial science

However, the estimator in (44) ignores the variability in the random vector u. Therefore,
as suggested by Laird and Ware (1982), inference for u is usually based on Cov(û − u).
Estimates of the precision of other predictors involving β̂ and û are based on
" #
β̂
Cov , (45)
û − u

and are available in McCulloch and Searle (2001) (Section 9.4 (c)). Accounting for the
variability induced by estimating the variance components θ would require – once again –
a fully Bayesian analysis. Using Bayesian statistics posterior credible intervals of cluster–
specific effects follow immediately. These are useful to understand the between–cluster
heterogeneity present in the data.
Inference. We consider testing a set of s (s ≤ p) hypotheses concerning the fixed effects
parameters in β

H0 : Cβ = ζ
versus H1 : Cβ 6= ζ. (46)

The Wald test statistic

0 0
[C β̂ − ζ] [CVar(β̂)C ][C β̂ − ζ] (47)

is approximately χ2s distributed. With `(β̃, Σ̃) the log–likelihood obtained with ML in
the restricted model (i.e. under H0 ) and `(β̂, Σ̂) the log–likelihood with ML in the
unrestricted model, the likelihood ratio test statistic ([LRT]) for nested models

−2[`(β̃, Σ̃) − `(β̂, Σ̂)], (48)

is approximately χ2s distributed. Estimation should be done with ML instead of REML,

since REML maximizes the likelihood of linear combinations of Y that do not depend on
β.
Testing the necessity of random effects requires a hypothesis test involving the variance
components. For example, in the varying intercepts model from (7), we want to investigate
whether the intercepts of different subjects are significantly different. This corresponds
with

H0 : σ22 = 0 versus H1 : σ22 > 0. (49)

However, because 0 is on the boundary of the allowed parameter space for σ22 , the like-
lihood ratio test statistic should not be compared with a χ21 distribution, but with a
mixture 12 χ20 + 12 χ21 . When testing a hypothesis involving s fixed effects parameters and
one variance component, the reference distribution is 12 χ2s + 12 χ2s+1 . When more variance
components are involved, the complexity of this problem increases, see Ruppert et al.
(2003) and related work from these authors.

19
3 Examples
3.1 Workers’ compensation insurance losses
We analyze the data from Illustration 2 on losses observed for workers’ compensation
insurance risk classes. Variable of interest is Lossij observed per risk class i and year j. The
distribution of the losses is right skewed, which motivates the use of log (Lossij ) as response
variable. To enable out-of-sample predictions, we split the data set in a training (without
Lossi7 ) versus test set (the Lossi7 observations). We remove observations corresponding
with zero payroll from the data set. Models are estimated on the training set, and
centering of covariate Year is applied. Throughout our analysis we include log (Payroll)ij
as an offset in the regression models, since losses should be interpreted relative to the size
of the risk class.
Complete pooling. We start with the ‘complete pooling’ model, introduced in (1).
The model ignores the clustering of data in risk classes and fits an overall intercept (β0 )
and an overall slope (β1 ) for the effect of Year.

log (Lossij ) = log (Payrollij ) + β0 + β1 Yearij + ij (50)

ij ∼ N (0, σ2 ) i.i.d. (51)

We fit the model with lm in R.

>[Link] <- lm(log(loss)~yearcentr, offset=log(payroll),data=wclossFit)

>summary([Link])

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.34023 0.04105 -105.733 <2e-16 ***
yearcentr 0.03559 0.02410 1.477 0.14
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 1.062 on 667 degrees of freedom

Multiple R-squared: 0.7282, Adjusted R-squared: 0.7278
F-statistic: 1787 on 1 and 667 DF, p-value: < 2.2e-16

According to this R output β̂0 = −4.34 (with s.e. 0.041), β̂1 = 0.036 (with s.e. 0.024) and
σ̂ = 1.062.
No pooling. The fixed effects linear regression model in (2) estimates an intercept
for each of the 118 risk classes in the data set. According to model equation (52), the
intercepts β0,i are unknown, but fixed, whereas the error terms ij are stochastic.

log(Lossij ) = log (Payrollij ) + β0,i + β1 Yearij + ij

ij ∼ N (0, σ2 ) i.i.d. (52)

20
Linear mixed models for predictive modelling in actuarial science

We fit this model in R by identifying the risk class variable as a factor variable.

>[Link] <- lm(log(loss)~0+yearcentr+factor(riskclass), offset=log(payroll),

data=wclossFit)
>summary([Link])

Coefficients:
Estimate Std. Error t value Pr(>|t|)
yearcentr 0.03843 0.01253 3.067 0.00227 **
factor(riskclass)1 -3.49671 0.22393 -15.615 < 2e-16 ***
factor(riskclass)2 -3.92231 0.22393 -17.516 < 2e-16 ***
factor(riskclass)3 -4.48135 0.22393 -20.012 < 2e-16 ***
factor(riskclass)4 -4.70981 0.22393 -21.032 < 2e-16 ***
...
Residual standard error: 0.5485 on 550 degrees of freedom
Multiple R-squared: 0.9986, Adjusted R-squared: 0.9983
F-statistic: 3297 on 119 and 550 DF, p-value: < 2.2e-16

The null hypothesis of equal intercepts, H0 : β0,1 = β0,2 = . . . = β0,118 = β0 , is re-

jected (with p–value < 0.05). Therefore, the ‘no pooling’ model significantly improves the
‘complete pooling’ model.

> anova([Link],[Link])
Analysis of Variance Table

Model 1: log(loss) ~ yearcentr

Model 2: log(loss) ~ 0 + yearcentr + factor(riskclass)
[Link] RSS Df Sum of Sq F Pr(>F)
1 667 751.90
2 550 165.48 117 586.42 16.658 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Figure 6 (left) shows the estimates β̂0,i , plus/minus one standard error, against the size
(on log–scale) of the risk class. The size of a risk class is here defined as 6j=1 Payrollij .
P

The ‘no pooling’ model estimates risk class specific intercepts with reasonable precision.
Linear mixed models: random intercepts. A linear mixed model with random risk
class specific intercepts is a meaningful alternative for the ‘no pooling’ model in (52). The
regression equation is

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + ij

u0,i ∼ N (0, σu2 ) i.i.d.
ij ∼ N (0, σ2 ) i.i.d. (53)

21
Random intercepts u0,i are independent across risk classes, and independent of the error
terms ij . In R we use the lme4 package to fit this linear mixed model. The package
uses REML by default. Results with ML follow by adding REML=FALSE in the lmer(...)
statement.

> lmm1 <- lmer(log(loss) ~ (1|riskclass)+yearcentr+offset(log(payroll)),

data=wclossFit)
> print(lmm1)
Linear mixed model fit by REML
Formula: log(loss) ~ (1 | riskclass) + yearcentr + offset(log(payroll))
Data: wclossFit
AIC BIC logLik deviance REMLdev
1448 1466 -720.2 1431 1440
Random effects:
Groups Name Variance [Link].
riskclass (Intercept) 0.88589 0.94122
Residual 0.30145 0.54904
Number of obs: 669, groups: riskclass, 118

Fixed effects:
Estimate Std. Error t value
(Intercept) -4.31959 0.08938 -48.33
yearcentr 0.03784 0.01253 3.02

Correlation of Fixed Effects:

(Intr)
yearcentr 0.001

The R output shows the following parameter estimates: β̂0 = −4.32 (s.e. 0.089), β̂1 = 0.037
(s.e. 0.013), σ̂u = 0.94 and σ̂ = 0.55. In Figure 6 (right) we plot the point predictions
for the ui,0 ’s, and their corresponding standard errors, against size of the risk class. To
create this plot we refit the linear mixed model and do not include an intercept.
The point estimates of the random intercepts obtained with the ‘no pooling’ model in
(52) and the linear mixed model in (53) are similar in this example. For the standard
errors of the random intercepts in the LMM we use the following instructions

str(rr1 <- ranef(lmm0, condVar = TRUE))

[Link] = sqrt([Link](attributes(rr1$riskclass)$postVar)),

which calculates the variance of u|y (see XXX and the footnote below (30)), conditional on
the maximum likelihood estimates for β and θ. Thus, these standard errors are different
from the approach outlined in (44). We are aware of the fact that they do not account
for all sources of variability involved.

22
Linear mixed models for predictive modelling in actuarial science

● ●
−2 ●
−2 ●
● ●

● ●
● ●
Estimate (+/− s.e.)

Estimate (+/− s.e.)

● ●●● ● ●●●
● ● ● ●
● ● ● ● ●
● ●●● ● ●
● ● ● ● ● ●
● ●●● ● ●
●
●● ●● ●● ●● ●● ●●
● ● ●●
● ●● ● ● ● ● ● ●●
● ●● ● ● ●
● ●● ●● ●● ● ● ● ●● ● ●● ● ●
−4 ●● ● ● ●
● ●
● ● −4 ●
●● ● ●
●●● ●
● ●
● ● ●● ●●●● ● ● ● ●● ●●●● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●●● ● ● ● ● ● ● ●●● ●
● ● ● ● ●
●
● ●
● ● ●
●●● ● ● ●
●●●
● ● ● ● ● ●
●● ● ● ● ●● ● ● ●
●● ● ●● ●
● ●● ● ●●
● ●
●● ●●
● ●
−6 ● −6 ●

● ●
● ●
● ●
● ●
● ● ● ●
● ●

−8 −8

12.5 15.0 17.5 20.0 22.5 12.5 15.0 17.5 20.0 22.5
log(Size) log(Size)

Figure 6: Point estimates for risk class specific intercepts, plus/minus one standard error.
Results from no pooling approach (left) and linear mixed model (right). The
dashed line is y = −4.34, i.e. the overall intercept from the complete pooling
model.

Linear mixed models: random intercepts and slopes. We now extend the LMM
in (53) and allow for random slopes as well as random intercepts. This is an example of
the ‘multiple random effects per level’ setting from Section 2.2. The model equation is

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + u1,i Yearij + ij ,

ui ∼ N (0, D(θ)) i.i.d.
ij ∼ N (0, σ2 ) i.i.d. (54)

The random effects vector ui is now bivariate, say with Var(ui,0 ) = θ0 , Var(ui,1 ) = θ1 and
Cov(ui,0 , ui,1 ) = θ01 . Random effects are independent across risk classes, and independent
of the error terms ij . We fit this model with lmer as follows.

> lmm2 <- lmer(log(loss) ~ (1+yearcentr|riskclass)+yearcentr+offset(log(payroll)),

data=wclossFit)
> print(lmm2)
Linear mixed model fit by REML
Formula: log(loss) ~ (1 + yearcentr | riskclass) + yearcentr + offset(log(payroll))
Data: wclossFit
AIC BIC logLik deviance REMLdev
1451 1478 -719.4 1429 1439
Random effects:
Groups Name Variance [Link]. Corr
riskclass (Intercept) 0.885937 0.941242
yearcentr 0.003171 0.056312 -0.195
Residual 0.290719 0.539184
Number of obs: 669, groups: riskclass, 118

23
Fixed effects:
Estimate Std. Error t value
(Intercept) -4.32030 0.08929 -48.38
yearcentr 0.03715 0.01340 2.77

Correlation of Fixed Effects:

(Intr)
yearcentr -0.072

In this output θ̂0 = 0.89, θ̂1 = 0.0032 and θ̂01 = −0.010. We test whether the structure
of random effects should be reduced, i.e. H0 : θ1 = 0 (with θ1 the variance of random
slopes), using an anova test comparing models (53) and (54).

> anova(lmm1,lmm2)
Data: wclossFit
Models:
lmm1: log(loss) ~ (1 | riskclass) + yearcentr + offset(log(payroll))
lmm2: log(loss) ~ (1 + yearcentr | riskclass) + yearcentr + offset(log(payroll))
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
lmm1 4 1438.5 1456.6 -715.27
lmm2 6 1440.9 1468.0 -714.46 1.6313 2 0.4423

When performing the corresponding LRT the software automatically refits lmm1 and lmm2
with ML (instead of REML), as required (see our discussion in Section 2.3.1). This ex-
plains why the AIC, BIC and logLik values differ from those printed above. The observed
Chisq test statistic and reported p–value indicate that H0 : σ12 = 0 can not be rejected.
The model with only random intercepts is our preferred specification.

Out–of–sample predictions. We compare out–of–sample predictions of Lossi7 , for

given Payrolli7 , as obtained with models (50), (52) and (53). Figure 7 plots observed
versus fitted losses (on log scale) for (from left to right) the complete pooling, the random
intercepts and the no pooling linear regression model.

3.2 Hachemeister data

We present an analysis of the Hachemeister data using three simple linear mixed mod-
els. Chapter XXX presents an in depth discussion of credibility models for this data
set (namely the Bühlmann, Bühlmann–Straub and Hachemeister credibility models). By
combining the R scripts prepared for our illustration with the scripts from Chapter XXX,
readers obtain relevant illustrations of credibility models in R and their analogue inter-
pretation as LMMs.

24
Linear mixed models for predictive modelling in actuarial science

CP lmm1 NP
●

●
●
●
● ●

● ●
●
●
● ● ●●
● ● ●● ●
● ●
●
●
● ●
●
● ● ●
●
●● ●● ● ● ●
● ●● ● ● ● ● ●
●
● ●● ● ●●● ●
●●●
●● ● ● ●● ● ●
● ●
●
● ●
●
● ●
● ● ● ●●
●
● ●
● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ●●
log(Pred)

● ●
● ● ●●
● ●● ● ● ● ●
● ●
● ● ●● ●
●
●
● ●
● ● ● ● ● ●●
● ●
●● ●
● ●● ●
● ● ● ●
● ●●●● ● ●
●● ● ● ●●● ●●
● ● ● ●● ●●
● ●● ●
● ● ●● ● ●●
● ● ● ●
● ●● ● ● ●●● ● ●●
● ● ● ●● ●● ●
●●
● ● ● ● ● ●
● ●●
● ● ●● ● ● ●
● ●●
● ●
● ● ● ● ●
●● ●● ● ●
● ●● ●
● ●●● ●
● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ●
●
● ● ●
● ● ● ● ●
●
● ●
● ● ●● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●

● ● ●

● ●
●

10.0 12.5 15.0 10.0 12.5 15.0 10.0 12.5 15.0

log(Obs)

Figure 7: Out–of–sample predictions for Lossi7 versus observed losses, as obtained with
model (50) (‘CP’, complete pooling), (53) (‘lmm1’, random intercepts) and (52)
(‘NP’, no pooling): losses on workers’ insurance compensation.

Random intercepts, no weights. Response variable is the average loss per claim (i.e.
Ratioij ), per state i (i = 1, . . . , 5) and quarter j (j = 1, . . . , 12). A basic random state
intercept model for Ratioij is

Ratioij = β0 + ui,0 + ij

ui,0 ∼ N (0, σu2 ) i.i.d.
ij ∼ N (0, σ2 ) i.i.d. (55)

Apart from the normality assumption, actuaries recognize the so–called Bühlmann cred-
ibility model, as Chapter XXX explains.

Random intercepts, including weights. Our response variable is average loss per
claim, constructed as total loss (per state and quarter) divided by the corresponding
number of claims. This average loss is more precise when more claims have been observed.

25
We therefore include the number of observed claims as weights (wij ) in our LMM.

Ratioij = β0 + ui,0 + ij

ui,0 ∼ N (0, σu2 ) i.i.d.
ij ∼ N (0, σ2 /wij ) i.i.d. (56)

The model equation and variance assumptions (apart from normality) correspond with
the Bühlmann–Straub credibility model. Including weights goes as follows in R lme4:

> lmmBS <- lmer(ratio ~ (1|state),weights=weight,data=hach)

> print(lmmBS)
Linear mixed model fit by REML
Formula: ratio ~ (1 | state)
Data: hach
AIC BIC logLik deviance REMLdev
1301 1307 -647.5 1306 1295
Random effects:
Groups Name Variance [Link].
state (Intercept) 22.326 4.725
Residual 47928.954 218.927
Number of obs: 60, groups: state, 5

Fixed effects:
Estimate Std. Error t value
(Intercept) 1688.934 2.265 745.6

The risk (or: credibility) premium for state i is β̂0 + ûi,0 , and is available in R as follows

## get fixed effects

fe <- fixef(lmmBS)
## get random intercepts
re <- ranef(lmmBS)
## calculate credibility premiums in this lmm
[Link] <- fe[1]+re$state
> t([Link])
1 2 3 4 5
(Intercept) 2053.18 1528.509 1790.053 1468.113 1604.815

Chapter XXX illustrates how traditional actuarial credibility calculations are available in
the actuar package in R. The credibility premiums obtained with Bühlmann–Straub are
close to – but not exactly the same as – the premiums obtained with (56). Note that
the actuarial credibility calculations use method of moments for parameter estimation,
whereas our LMMs use (RE)ML.

> ## BS model (Buhlmann-Straub credibility model)

26
Linear mixed models for predictive modelling in actuarial science

> ## use actuar package, and hachemeister data as available in this package
> fitBS <- cm(~state, hachemeister,ratios = ratio.1:ratio.12,
weights = weight.1:weight.12)
> [Link] <- predict(fitBS) # credibility premiums
> [Link]
[1] 2055.165 1523.706 1793.444 1442.967 1603.285
Random intercepts and slopes, including weights. We extend the random inter-
cepts model to a random intercepts and slopes model, using the period of observation as
regressor.

Ratioij = β0 + ui,0 + β1 periodij + ui,1 periodij + ij

ui ∼ N (0, D(θ)) i.i.d.
ij ∼ N (0, σ2 /wij ) i.i.d. (57)

Our analysis uses periodij as the quarter (j = 1, . . . , 12) of observation. The use of a
centered version of period is discussed in Chapter XXX. In R the (1+period|state)
instruction specifies random intercepts and slopes per state.
> lmmHach <- lmer(ratio ~ period+(1+period|state),weights=weight,data=hach)
> lmmHach
Linear mixed model fit by REML
Formula: ratio ~ period + (1 + period | state)
Data: hach
AIC BIC logLik deviance REMLdev
1242 1255 -615.1 1247 1230
Random effects:
Groups Name Variance [Link]. Corr
state (Intercept) 4.1153e+00 2.02863
period 1.9092e-01 0.43695 1.000
Residual 1.6401e+04 128.06735
Number of obs: 60, groups: state, 5

Fixed effects:
Estimate Std. Error t value
(Intercept) 1501.5452 1.1265 1333.0
period 27.7333 0.2172 127.7

Correlation of Fixed Effects:

(Intr)
period 0.540
Using LMM (57) the state specific risk premium for the next time period, is

\
E[Ratio i,13 |ui ] = β̂0 + ûi,0 + β̂1 · 13 + ûi,1 · 13 (58)

27
> t([Link])
[,1] [,2] [,3] [,4] [,5]
[1,] 2464.032 1605.676 2067.279 1453.923 1719.48.

These premiums correspond with the results (obtained with SAS) reported in Frees et al.
(1999) (see Table 3, columns ‘Prediction and standard errors’). These authors also inves-
tigate linear mixed models as a user friendly and computationally attractive alternative
for actuarial credibility models. The traditional Hachemeister credibility premiums are
available in R as follows (see also the ‘Base’ results in Table 3 from Frees et al. (1999))

fitHach <- cm(~state, hachemeister,regformula = ~time, regdata =

[Link](time = 1:12),ratios = ratio.1:ratio.12,
weights = weight.1:weight.12)
[Link] <- predict(fitHach, newdata = [Link](time = 13))
# [Link]
# > [Link]
# [1] 2436.752 1650.533 2073.296 1507.070 1759.403

Once again, with linear mixed models we obtain premiums that are close to, but do not
replicate, the traditional actuarial credibility results. Differences in parameter estimation
techniques explain why these results are not identical.
Using a LRT we verify whether model (57) should be reduced to the model with random
intercepts only. The p–value indicates that this is not case.

lmmHach2 <- lmer(ratio ~ period+(1|state),weights=weight,data=hach)

anova(lmmHach,lmmHach2)
#Data: hach
#Models:
#lmmHach2: ratio ~ period + (1 | state)
#lmmHach: ratio ~ period + (1 + period | state)
# Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
#lmmHach2 4 1272.2 1280.5 -632.08
#lmmHach 6 1258.7 1271.2 -623.32 17.521 2 0.0001568 ***
#---
#Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Figure 8 illustrates the fit of a complete pooling (dark grey, dashed line), a no pooling
(black, dashed line) and a LMM with random intercepts and slopes (black, solid line).
The regression equations for the complete and no pooling model are

Ratioij = β0 + β1 periodij + ij

ij ∼ N (0, σ2 /wij ), (59)

28
Linear mixed models for predictive modelling in actuarial science

and

Ratioij = β0,i + β1,i periodij + ij

ij ∼ N (0, σ2 /wij ), (60)

respectively.

1 2 3
2500 ●

● ●
● ●
●
● ●
● ●
● ● ●
2000
● ●
●
● ●
●
● ● ●
● ●
● ●

1500 ●
●
● ● ● ● ● ●

● ●

1000
Ratio

4 5
2500

2000 ●

● ● ●
●
● ● ● ●
● ●
●
1500 ●
●
● ●

●
●
● ●
●
● ●

1000 ●

2.5 5.0 7.5 10.0 12.5 2.5 5.0 7.5 10.0 12.5
Period

Figure 8: Fit of a complete pooling (dark grey, dashed line), a no pooling (black,
dashed line) and a LMM with random intercepts and slopes (black, solid line):
Hachemeister data (no centering of period).

3.3 Credit insurance data

We analyze the data from Illustration 5 and demonstrate the use of crossed random
effects (see Section 2.2) with lme4. The response variable of interest is Paymentijt , where
i = 1, 2, 3 denotes status and j = 1, 2, 3 is for working experience of the insured, t is an
index going over all observation in cell (i, j). Dannenburg et al. (1996) use these data to
demonstrate the principles of a so–called cross classification credibility model, with model
equation (in typical actuarial credibility notation)
(1) (2) (12) (123)
Paymentijt = m + Ξi + Ξj + Ξij + Ξijt . (61)

29
(1)
Hereby, m is an overall intercept, Ξi is a random effect for level i in factor (1) (i.e.
(2) (12)
status), Ξj a random intercept for level j in factor (2) (i.e. experience) and Ξij is a
(123)
random effect for the interaction of level i and j. Ξijt is an error term for observation t
from the combined level i and j. Dannenburg et al. (1996) obtain the following credibility
premiums

experience
status 1 2 3
1 181.05 238.18 277.77
2 172.11 229.16 268.8
3 225.29 282.24 323.68

Table 2: Credibility premiums obtained with crossed classification credibility model per
combination of status and experience risk class: credit insurance data.

The analysis of this data by means of a linear mixed model with crossed random effects
(i.e. (1 | status:experience)), is directly available in R.
(1) (2) (12)
Paymentijt = m + ui + uj + uij + ijt
(1)
ui ∼ N (0, σ12 )
(2)
uj ∼ N (0, σ22 )
(12) 2
uij ∼ N (0, σ12 )
ijt ∼ N (0, σ2 ), (62)

where i and j run over all levels in factors 1 (status) and 2 (experience) and we assume all
random variables to be independent.

> lmm2 <- lmer(payment ~ 1+(1|experience)+(1|status)+(1|status:experience)

,data=credit)
> print(lmm2)
Linear mixed model fit by REML
Formula: payment ~ 1 + (1 | experience) + (1 | status) + (1 | status:experience)
Data: credit
AIC BIC logLik deviance REMLdev
5241 5261 -2616 5240 5231
Random effects:
Groups Name Variance [Link].
status:experience (Intercept) 14.611 3.8224
status (Intercept) 992.791 31.5086
experience (Intercept) 2569.330 50.6886
Residual 26990.398 164.2875
Number of obs: 401, groups: status:experience, 9; status, 3; experience, 3

30
Linear mixed models for predictive modelling in actuarial science

Fixed effects:
Estimate Std. Error t value
(Intercept) 244.25 35.44 6.892

The resulting risk premiums as obtained with lme4 are very close to the credibility pre-
miums in Table 2.

experience
status 1 2 3
1 181.0253 238.1813 277.7692
2 172.1086 229.1551 268.7954
3 225.2921 282.2424 323.6784

Our analysis directly uses Payment as response variable to facilitate the comparison be-
tween the credibility and linear mixed model calculations. However, the positivity and
right skewness of Payment suggests the use of a lognormal or gamma distribution for this
response.

4 Further readings and illustrations

We recommend Czado (2004), Gelman and Hill (2007), Frees (2004a), McCulloch and
Searle (2001), Ruppert et al. (2003) and Verbeke and Molenberghs (2000) as further read-
ings on linear mixed models. The use of LMMs for smoothing purposes is not discussed
above, but interested readers can find below a brief introduction and useful references.

Illustration 9 (Smoothing with mixed models). A semiparametric regression model in-

corporates both parametric as well as nonparametric functional relationships between a
response and a set of covariates. These models are particularly useful when a globally
linear pattern is inappropriate or parametric nonlinear curves are difficult to determine.
Such nonlinear effect frequently occurs when time related covariates are present, such as
driver’s age, development lag or years in business of the insured company. For example,
in a LM the effect of age of the insured on the number of claims reported is often ex-
pressed with a categorical Age covariate. The analyst splits Age in several categories and
estimates a regression parameter for each of them. In a nonparametric analysis we model
the effect of Age on the response with an unknown, smooth function, in comparison with
the piece-wise constant assumption in linear models.
Penalized splines (also called P–splines) are popular nonparametric tools that specify
the smoothing function as a linear combination of basis functions, in which some coef-
ficients associated with the basis functions are constrained in order to avoid overfitting.
That is, they are penalized, or shrunk towards zero, reducing the effective number of coef-
ficients to be estimated. The broad popularity of P–splines is largely because they can be
written in the form of mixed models (Ruppert et al., 2003; Wood, 2006) so that we
can rely on software, diagnostic and inferential tools designed for mixed models directly in
fitting P–splines, or use a Bayesian implementation of the model to make inference of the

31
full posterior distribution. Of course, hierarchical components can be included in addition
to smoothing terms, thus often leading to models that are both intuitively appealing and
structurally flexible when studying practical problems in predictive modeling.
For example, Figure 9 shows an application of the P–splines in estimating insurance
loss reserves. In this example, the incremental paid insurance losses, represented by the
dots in the plot, exhibit a nonlinear dependence upon the report lag (the x-axis). Standard
loss reserving methods will specify a model with these lags as categorical covariates. In
contrast, P–splines allow us to estimate a smooth functional relationship between paid
losses and report lags. One advantage over the reserving model with dummy variables
is the reduced number of model parameters because generally a small number of knots
can capture the observed pattern sufficiently well. The example shown here is based on a
four-knot penalized spline, and Zhang and Dukic (2012) find that the resulting model has
significantly better predictive performance than a dummy-variable-based reserving model.
Another benefit is that estimates at any time point can be produced based on interpolation
or extrapolation of the estimated functional form. This can be very helpful when the goal
of a reserving study is to make forecasts for a short period ahead, say one month or a
quarter.
More examples of semiparametric models in insurance loss reserving can be found in
Antonio and Beirlant (2008) and Zhang and Dukic (2012). Multivariate extensions of
penalized splines are available for spatial regression (e.g. in postcode rating). See Chapter
[Reference to Chapter XXX for further discussion.
on GAMS / spatial
statistics.]

● ●
●
●

●
●
Personal auto loss (in 1,000,000s)

20 ●

●
●
● ●
●
●
●
15 ●
●
● ●
●
●

10 ●
●
● ●
●
●
●
●
●

●
5 ●
● ●
●
●
●
●
● ●
●
●
● ●
● ●
● ●
● ●
0 ●

0 2 4 6 8 10
Lag

Figure 9: The plot of the company-level smoother (incremental losses) along with the 50%
prediction interval for a loss triangle.

32
Linear mixed models for predictive modelling in actuarial science

References

Antonio, K. and Beirlant, J. (2007). Actuarial statistics with generalized linear mixed
models. Insurance: Mathematics and Economics, 40(1):58–76.
Antonio, K. and Beirlant, J. (2008). Issues in claims reserving and credibility: a semipara-
metric approach with mixed models. Journal of Risk and Insurance, 75(3):643–676.
Antonio, K., Frees, E., and Valdez, E. (2010). A multilevel analysis of intercompany
claim counts. ASTIN Bulletin: The Journal of the International Actuarial Association,
40(1):151–177.
Antonio, K. and Valdez, E. (2012). Statistical aspects of a priori and a posteriori risk
classification in insurance. Advances in Statistical Analysis, 96(2):187–224.
Bühlmann, H. and Gisler, A. (2005). A course in credibility theory and its applications.
Springer Verlag, Berlin.
Czado, C. (2004). Linear Mixed Models. Lecture slides on GLM, TU Munchen.
Dannenburg, D., Kaas, R., and Goovaerts, M. (1996). Practical actuarial credibility
models. Institute of actuarial science and econometrics, University of Amsterdam.
Frees, E. (2004a). Longitudinal and panel data. Analysis and applications in the social
sciences. Cambridge University Press.
Frees, E. (2004b). Longitudinal and Panel Data: Analysis and Applications in the Social
Sciences. Cambridge University Press, Cambridge.
Frees, E., Young, V., and Luo, Y. (1999). A longitudinal data analysis interpretation of
credibility models. Insurance: Mathematics and Economics, 24(3):229–247.
Frees, E., Young, V., and Luo, Y. (2001). Case studies using panel data models. North
American Actuarial Journal, 5(4):24–42.
Gelman, A. (2006). Multilevel (hierarchical) modeling: what it can and cannot do. Tech-
nometrics, 48(3):432–435.
Gelman, A. and Hill, J. (2007). Applied Regression and Multilevel (Hierarchical) Models.
Cambridge University Press, Cambridge.
Hachemeister, C. (1975). Credibility: Theory and Applications, chapter ‘Credibility for
regression models with application to trend’, pages 129–163. Academic Press, New
York.
Kacker, R. and Harville, D. (1984). Approximations for standard errors of estimators of
fixed and random effects in mixed linear models. Journal of the American Statistical
Association, 79:853–862.
Klugman, S. (1992). Bayesian statistics in actuarial science with emphasis on credibility.
Kluwer, Boston.
Laird, N. and Ware, J. (1982). Random-effects models for longitudinal data. Biometrics,
38(4):963–974.
Makov, U., Smith, A., and Liu, Y. (1996). Bayesian methods in actuarial science. The
Statistician, 45(4):503–515.
McCulloch, C. and Searle, S. (2001). Generalized, Linear and Mixed Models. Wiley Series
in Probability and Statistics, Wiley, New York.
Robinson, G. (1991). That blup is a good thing: the estimation of random effects.
Statistical Science, 6:15–51.
Ruppert, D., Wand, M., and Carroll, R. (2003). Semiparametric regression. Cambridge
University Press, Cambridge.

33
Scollnik, D. (1996). An introduction to Markov Chain Monte Carlo methods and
their actuarial applications. Proceedings of the Casualty Actuarial Society Forum,
LXXXIII:114–165.
Searle, S., Casella, G., and McCulloch, C. (2008). Variance components. Wiley.
Verbeke, G. and Molenberghs, G. (2000). Linear mixed models for longitudinal data.
Springer Series In Statistics, New York.
Wood, S. (2006). Generalized Additive Models: An introduction with R. Chapman & Hall,
CRC Texts in Statistical Science.
Zhang, Y. and Dukic, V. (2012). Predicting multivariate insurance loss payments under
the bayesian copula framework. The Journal of Risk and Insurance. in press, DOI:
10.1111/j.1539-6975.2012.01480.x.
Zhang, Y., Dukic, V., and Guszcza, J. (2012). A bayesian nonlinear model for forecasting
insurance loss payments. Journal of the Royal Statistical Society, Series A, 175:637–656.

34
FACULTY OF ECONOMICS AND BUSINESS
Naamsestraat 69 bus 3500
3000 LEUVEN, BELGIË
tel. + 32 16 32 66 12
fax + 32 16 32 67 91
info@[Link]
[Link]

Common questions

Linear mixed models enhance prediction accuracy by using random effects to account for variations both within and between risk classes. Unlike no pooling, which overly specifies individual models leading to overfit, or complete pooling that may overlook class-specific variations, mixed models harness the benefits of both, capturing individual level deviations while maintaining a generalizable approach. Thus, they produce more realistic predictions by reflecting variability inherent in risk assessments, evidenced in their use for estimating loss based on historical insurance data .

Linear mixed models offer the advantage of partial pooling, which balances between the extremes of complete pooling and no pooling. This approach takes into account the structure and variability within clustered data, enhancing prediction accuracy by incorporating both fixed and random effects. Traditional regression techniques either ignore clustering or handle each cluster separately, often leading to overfitting or underfitting. Hence, mixed models make appropriate cluster-specific predictions and model dependencies within observations, providing comprehensive insights into clustered data .

Multilevel models are referred to as hierarchical models because they are well-suited for data organized in a hierarchical structure, such as students nested within schools. They allow for complex model specifications that include interactions across different levels, known as cross-level effects. This capability means that explanatory variables at different levels can interact, providing insights that are often not captured by simpler models. For example, these models can incorporate predictors at both individual and group levels, thus capturing unique variance within and between clusters .

In the context of multilevel models, Bayesian analysis plays a crucial role by treating all parameters as random variables with specific prior distributions, unlike frequentist approaches which distinguish between fixed and random effects. This allows Bayesian methods to integrate prior knowledge or beliefs into the model estimation process, enhancing inference about parameters especially when data is sparse. Bayesian analysis provides a probabilistically consistent framework to address uncertainty and variability across hierarchies in multilevel structures, which is particularly beneficial in complex datasets where traditional assumptions may not hold .

Multilevel models offer substantial advantages in actuarial science, particularly for tasks like ratemaking, by allowing for a nuanced inclusion of group-level predictors and cluster-specific variations. They support the integration of credibility models, widely used in actuarial practice for predicting risk premiums, by aligning with the multilevel model framework of mixed effects. This enhances predictions through the accounting of within-group and between-group variability, thus providing more reliable premium estimates and improving risk assessment for insurance applications .

Cross-level interactions in multilevel models enhance understanding by allowing interactions between predictors at different levels, offering insights into how lower-level phenomena might change with respect to higher-level contextual factors. Incorporating such interactions provides depth in analyses of hierarchical data, recognizing that relationships affecting outcomes can vary depending on the group context, such as how school policies may mediate student performance. This enhances model interpretability and guides precise interventions or predictions considering complex real-world interdependencies .

In multilevel models, non-nested clustering is significant as it allows for the analysis of structures where data groups are not strictly hierarchical but rather cross-classified. Unlike hierarchical structures where data strictly nests (e.g., students within classrooms), non-nested clustering treats groups side by side, like policies within counties that do not inherently nest but influence observations. Such models accommodate more complex data structures by using mixed effects to appropriately distribute variance both within and between levels, offering flexible modeling of multidimensional data .

Credibility models, used in actuarial science for predicting an insured’s risk premium, naturally relate to multilevel models as they integrate within the framework of mixed models. This relation arises because credibility models weigh individual and group experiences, mirroring the multilevel model's approach of partial pooling through random effects. By considering both individual loss experiences and aggregate portfolio data, credibility models as a type of multilevel model capture the balance of personalized and generalized prediction needs, enhancing their predictive ability .

In multilevel models, varying intercepts and slopes are critical as they incorporate randomness in coefficients, allowing the model to account for heterogeneity across clusters. Varying intercepts account for differences in starting points across clusters, while varying slopes allow different clusters to have different rates of change with respect to predictors. This flexibility provides more accurate modeling of data where group-level differences are expected, aiding in capturing the clustered nature of the data .

Software tools like R's lme4 package offer robust functionalities for fitting linear mixed models, particularly useful for large datasets as they efficiently handle complex random effects structures using methods like REML by default. The lme4 package facilitates streamlined model fitting, testing, and interpretation, allowing users to incorporate multiple levels of random intercepts and slopes seamlessly. As demonstrated in the analysis of risk class-specific intercepts, the package can customize model specifications to accurately represent data complexities, leading to enhanced prediction and inferential capabilities .

Mixed Design ANOVA in Stata
No ratings yet
Mixed Design ANOVA in Stata
5 pages
Understanding Linear Mixed Models
No ratings yet
Understanding Linear Mixed Models
9 pages
Linear Mixed Models in Stata Guide
No ratings yet
Linear Mixed Models in Stata Guide
65 pages
Missing Numbers in Linear Mixed Models
No ratings yet
Missing Numbers in Linear Mixed Models
42 pages
Multilevel Mixed Models Overview
No ratings yet
Multilevel Mixed Models Overview
19 pages
Panel Data Analysis
100% (1)
Panel Data Analysis
364 pages
Mixed Models in R: lme4 Package Overview
No ratings yet
Mixed Models in R: lme4 Package Overview
21 pages
Mixed Model Analysis for Overdispersion
No ratings yet
Mixed Model Analysis for Overdispersion
9 pages
Panel Data Analysis Course Overview
No ratings yet
Panel Data Analysis Course Overview
48 pages
MAS-II Credibility Formula Overview
No ratings yet
MAS-II Credibility Formula Overview
14 pages
Panel Data Estimation Techniques
No ratings yet
Panel Data Estimation Techniques
112 pages
Panel Data Models in Econometrics
No ratings yet
Panel Data Models in Econometrics
25 pages
Linear Mixed Models in Stata Guide
No ratings yet
Linear Mixed Models in Stata Guide
17 pages
Linear Mixed Models Overview
No ratings yet
Linear Mixed Models Overview
90 pages
Part 8 Panel Regression DP 2025
No ratings yet
Part 8 Panel Regression DP 2025
36 pages
Understanding Econometrics in Business
No ratings yet
Understanding Econometrics in Business
5 pages
Paper 3 Gambino-Mccoach-2025-R2-Mlm-A-Command-For-Computing-R-Squared-Measures-For-Models-Fit-By-Mixed
No ratings yet
Paper 3 Gambino-Mccoach-2025-R2-Mlm-A-Command-For-Computing-R-Squared-Measures-For-Models-Fit-By-Mixed
24 pages
General Linear Mixed Models Explained
No ratings yet
General Linear Mixed Models Explained
158 pages
Panel Data Regression Models Explained
No ratings yet
Panel Data Regression Models Explained
34 pages
Understanding Fixed and Random Effects
No ratings yet
Understanding Fixed and Random Effects
121 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
26 pages
xthybrid Command for GLMM Analysis
No ratings yet
xthybrid Command for GLMM Analysis
27 pages
Understanding Linear Mixed Models
No ratings yet
Understanding Linear Mixed Models
24 pages
Joint Modeling of Telematics and Claims
No ratings yet
Joint Modeling of Telematics and Claims
23 pages
Multivariate Models: Hierarchical & SUR
No ratings yet
Multivariate Models: Hierarchical & SUR
57 pages
Pg1342 Images
No ratings yet
Pg1342 Images
51 pages
Hausman and Chow Tests in Panel Data
No ratings yet
Hausman and Chow Tests in Panel Data
59 pages
Ebookname - Com/?p 10134
100% (5)
Ebookname - Com/?p 10134
127 pages
Panel Data Regression Techniques
No ratings yet
Panel Data Regression Techniques
37 pages
Predictive Modelling Applications in Actuarial Science Volume1 PDF
100% (2)
Predictive Modelling Applications in Actuarial Science Volume1 PDF
563 pages
Estimation For Multivariate Linear Mixed Models
No ratings yet
Estimation For Multivariate Linear Mixed Models
7 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
Manual Stata Multilevel
No ratings yet
Manual Stata Multilevel
371 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
26 pages
Econometric Modeling Techniques Overview
No ratings yet
Econometric Modeling Techniques Overview
38 pages
Binary Dependent Variable Models in Econometrics
No ratings yet
Binary Dependent Variable Models in Econometrics
8 pages
Understanding Panel Data Models
No ratings yet
Understanding Panel Data Models
46 pages
Panel Data Models: OLS vs. Fixed Effects
No ratings yet
Panel Data Models: OLS vs. Fixed Effects
38 pages
Understanding Panel Data Analysis
No ratings yet
Understanding Panel Data Analysis
27 pages
Bai 2015
No ratings yet
Bai 2015
19 pages
Seemingly Unrelated Regression Models
No ratings yet
Seemingly Unrelated Regression Models
9 pages
Panel Data Analysis: Fixed vs Random Effects
No ratings yet
Panel Data Analysis: Fixed vs Random Effects
42 pages
Econometric Methods for Panel Data
No ratings yet
Econometric Methods for Panel Data
58 pages
Random-Effects Model Estimation Techniques
No ratings yet
Random-Effects Model Estimation Techniques
12 pages
Pooled OLS and LSDV Estimators Explained
No ratings yet
Pooled OLS and LSDV Estimators Explained
48 pages
Styaz P M A I A S V 1 P M T
No ratings yet
Styaz P M A I A S V 1 P M T
563 pages
Intro to Linear Mixed Effects Models in R
No ratings yet
Intro to Linear Mixed Effects Models in R
46 pages
Fixed and Random Coefficients in Multilevel Regression: Var (U) and Var (U)
No ratings yet
Fixed and Random Coefficients in Multilevel Regression: Var (U) and Var (U)
3 pages
Python Tool for Panel Data Analysis
No ratings yet
Python Tool for Panel Data Analysis
14 pages
Introduction To Mixed Modeling Procedures: Sas/Stat 13.2 User's Guide
No ratings yet
Introduction To Mixed Modeling Procedures: Sas/Stat 13.2 User's Guide
18 pages
CH 1
No ratings yet
CH 1
26 pages
Key Assumptions in Regression Analysis
No ratings yet
Key Assumptions in Regression Analysis
13 pages
Corrections for ChJS Article Proofs
No ratings yet
Corrections for ChJS Article Proofs
20 pages
Econometric Theory Syllabus Spring 2025
No ratings yet
Econometric Theory Syllabus Spring 2025
3 pages
Bivariate Fay-Herriot Model with Errors
No ratings yet
Bivariate Fay-Herriot Model with Errors
30 pages
Advanced Econometrics Course Overview
No ratings yet
Advanced Econometrics Course Overview
3 pages
Overfitting and Regularization in Regression
No ratings yet
Overfitting and Regularization in Regression
5 pages
Forecasting Error Metrics in Excel
No ratings yet
Forecasting Error Metrics in Excel
4 pages
Descriptive Statistics and Assumption Tests
No ratings yet
Descriptive Statistics and Assumption Tests
9 pages
Logistic Regression vs Discriminant Analysis
No ratings yet
Logistic Regression vs Discriminant Analysis
25 pages
Naïve Bayes Iris Dataset Analysis
No ratings yet
Naïve Bayes Iris Dataset Analysis
2 pages
Data Science Course Question Bank
No ratings yet
Data Science Course Question Bank
6 pages
Uji Signifikansi Variabel Y dan X
No ratings yet
Uji Signifikansi Variabel Y dan X
3 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
3 pages
Wage Discrimination Analysis by Gender and Race
No ratings yet
Wage Discrimination Analysis by Gender and Race
5 pages
DOTE2011 In-Class Exercise Solutions
No ratings yet
DOTE2011 In-Class Exercise Solutions
2 pages
Virial Equation of State Explained
No ratings yet
Virial Equation of State Explained
9 pages
Regression Analysis in Finance
No ratings yet
Regression Analysis in Finance
20 pages
EC226 Econometrics Exam Formula Sheet
No ratings yet
EC226 Econometrics Exam Formula Sheet
10 pages
Time Series Analysis for A Level Stats
No ratings yet
Time Series Analysis for A Level Stats
7 pages
Effective Forecasting Techniques Explained
No ratings yet
Effective Forecasting Techniques Explained
44 pages
Epidemiology: Rates and Age Adjustment
No ratings yet
Epidemiology: Rates and Age Adjustment
2 pages
Understanding Inbreeding Depression
No ratings yet
Understanding Inbreeding Depression
52 pages
Regression Analysis and Correlation Methods
No ratings yet
Regression Analysis and Correlation Methods
12 pages
STAT2602 Probability & Statistics II Exam
No ratings yet
STAT2602 Probability & Statistics II Exam
4 pages
Excel Add-ins for Sales Forecasting
No ratings yet
Excel Add-ins for Sales Forecasting
15 pages
Financial Time Series Analysis Assignment
No ratings yet
Financial Time Series Analysis Assignment
4 pages
Diet Impact Analysis Using OLS and 2SLS
No ratings yet
Diet Impact Analysis Using OLS and 2SLS
3 pages
ANCOVA vs Regression Analysis Explained
No ratings yet
ANCOVA vs Regression Analysis Explained
4 pages
Statistical Model Summary and Analysis
No ratings yet
Statistical Model Summary and Analysis
2 pages
Econometrics Exam Paper 2021/2022
No ratings yet
Econometrics Exam Paper 2021/2022
3 pages
ARDL Model in Econometrics Explained
No ratings yet
ARDL Model in Econometrics Explained
6 pages
GMM and Generalized Empirical Likelihood Analysis
No ratings yet
GMM and Generalized Empirical Likelihood Analysis
37 pages
Prais-Winsten Regression Overview
No ratings yet
Prais-Winsten Regression Overview
11 pages
Sobel Test
No ratings yet
Sobel Test
3 pages

Linear Mixed Models in Actuarial Science

Uploaded by

Linear Mixed Models in Actuarial Science

Uploaded by

Linear mixed models

for predictive modelling

November 24, 2013

1 Mixed models in actuarial science

yij = βi + β1,0 + x1,ij β1,1 + 1,ij , j = 1, . . . , ni , (4)

with a linear model at fleet–level (i.e. level 2)

or, when fleet–specific information is available,

βi = x2,i β2 + 2,i , i = 1, . . . , m. (6)

yij = β1,0 + 2,i + x1,ij β1,1 + 1,ij . (7)

yij = βi,0 + x1,ij βi,1 + β1,0 + x1,ij β1,1 + 1,ij , i = 1, . . . , m, j = 1, . . . , ni , (8)

yij = β1,0 + 2,i,0 + x1,ij β1,1 + x1,ij 2,i,1 + 1,ij . (10)

Main characteristics and motivations. The varying intercepts and varying

Illustration 5 (Non–nested or cross–classified data structures). Data may also be struc-

1:1 1:2 1:3 2:1 2:2 2:3 3:1 3:2 3:3

2 Linear mixed models

2.1 Model assumptions and notation

The combined, unconditional or marginal model states

With these distributional assumptions the hierarchical LMM becomes

y|u ∼ N (Xβ + Zu, Σ)

The distributional assumptions for the random parts in (18) are

Combining all subjects or clusters i = 1, . . . , m, (13) is the matrix formulation of this

Covariance matrix V in this particular example is block diagonal and given by

Illustration 8 (A 3–level example.). yijk is the response variable of interest, as observed

β ij = Z 2,ij γ i + X 2,ij β 2 + 2,ij . (26)

2.2 The structure of random effects

2.3 Parameter estimation, inference and prediction

Estimating β. The Generalized Least Squares ([GLS]) estimator – which coincides

Predicting u. In the sense of minimal Mean Squared Error of Prediction ([MSEP])

BLP[u] = û = E[u] + CV −1 (y − E[y]), (30)

which leads to the multivariate normal distribution

A unified approach: Henderson’s justification. Maximizing the joint log likelihood

f (y, u) = f (y|u) · f (u)

It is therefore enough to minimize

which corresponds to solving the set of equations

(29) and (32) solve this system of equations.

Estimating variance parameters. The parameters or variance components used in

`(β, θ) = log {L(β, θ)}

where (see Czado (2004))

2.3.1 Standard errors and inference

The covariance of the empirical BLUP in (32) is equal to

The Wald test statistic

−2[`(β̃, Σ̃) − `(β̂, Σ̂)], (48)

is approximately χ2s distributed. Estimation should be done with ML instead of REML,

H0 : σ22 = 0 versus H1 : σ22 > 0. (49)

log (Lossij ) = log (Payrollij ) + β0 + β1 Yearij + ij (50)

We fit the model with lm in R.

>[Link] <- lm(log(loss)~yearcentr, offset=log(payroll),data=wclossFit)

Residual standard error: 1.062 on 667 degrees of freedom

log(Lossij ) = log (Payrollij ) + β0,i + β1 Yearij + ij

>[Link] <- lm(log(loss)~0+yearcentr+factor(riskclass), offset=log(payroll),

The null hypothesis of equal intercepts, H0 : β0,1 = β0,2 = . . . = β0,118 = β0 , is re-

Model 1: log(loss) ~ yearcentr

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + ij

> lmm1 <- lmer(log(loss) ~ (1|riskclass)+yearcentr+offset(log(payroll)),

Correlation of Fixed Effects:

str(rr1 <- ranef(lmm0, condVar = TRUE))

Estimate (+/− s.e.)

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + u1,i Yearij + ij ,

> lmm2 <- lmer(log(loss) ~ (1+yearcentr|riskclass)+yearcentr+offset(log(payroll)),

Correlation of Fixed Effects:

Out–of–sample predictions. We compare out–of–sample predictions of Lossi7 , for

3.2 Hachemeister data

10.0 12.5 15.0 10.0 12.5 15.0 10.0 12.5 15.0

Ratioij = β0 + ui,0 + ij

Ratioij = β0 + ui,0 + ij

> lmmBS <- lmer(ratio ~ (1|state),weights=weight,data=hach)

## get fixed effects

> ## BS model (Buhlmann-Straub credibility model)

Ratioij = β0 + ui,0 + β1 periodij + ui,1 periodij + ij

Correlation of Fixed Effects:

fitHach <- cm(~state, hachemeister,regformula = ~time, regdata =

lmmHach2 <- lmer(ratio ~ period+(1|state),weights=weight,data=hach)

Ratioij = β0 + β1 periodij + ij

Ratioij = β0,i + β1,i periodij + ij

3.3 Credit insurance data

> lmm2 <- lmer(payment ~ 1+(1|experience)+(1|status)+(1|status:experience)

yij = βi + β1,0 + x1,ij β1,1 + 1,ij , j = 1, . . . , ni , (4)

βi = x2,i β2 + 2,i , i = 1, . . . , m. (6)

yij = β1,0 + 2,i + x1,ij β1,1 + 1,ij . (7)

yij = βi,0 + x1,ij βi,1 + β1,0 + x1,ij β1,1 + 1,ij , i = 1, . . . , m, j = 1, . . . , ni , (8)

yij = β1,0 + 2,i,0 + x1,ij β1,1 + x1,ij 2,i,1 + 1,ij . (10)

β ij = Z 2,ij γ i + X 2,ij β 2 + 2,ij . (26)

log (Lossij ) = log (Payrollij ) + β0 + β1 Yearij + ij (50)

log(Lossij ) = log (Payrollij ) + β0,i + β1 Yearij + ij

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + ij

log (Lossij ) = log (Payrollij ) + β0 + u0,i + β1 Yearij + u1,i Yearij + ij ,

Ratioij = β0 + ui,0 + ij

Ratioij = β0 + ui,0 + ij

Ratioij = β0 + ui,0 + β1 periodij + ui,1 periodij + ij

Ratioij = β0 + β1 periodij + ij

Ratioij = β0,i + β1,i periodij + ij