0% found this document useful (0 votes)
15 views49 pages

Challenging Calibration of Insurance Scoring Algorithms: Agathe Fernandes Machado

Uploaded by

Ma Ga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views49 pages

Challenging Calibration of Insurance Scoring Algorithms: Agathe Fernandes Machado

Uploaded by

Ma Ga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Challenging Calibration of Insurance Scoring Algorithms

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,


Ewen Gallic, François Hu

Tuesday, June 18th, 2024

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 1 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 2 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Setup

• Let us consider a binary event D whose observations are denoted di = 1 if the


event occurs, and di = 0 otherwise, where i denotes the ith observations.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 3 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Setup

• Let us consider a binary event D whose observations are denoted di = 1 if the


event occurs, and di = 0 otherwise, where i denotes the ith observations.
• Let us further assume that the (unobserved) probability of the event di = 1
depends on individual characteristics:

pi = s(xi )

where, with sample size n > 0, i = 1, . . . , n represents individuals, and xi the


characteristics.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 3 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Setup

• Let us consider a binary event D whose observations are denoted di = 1 if the


event occurs, and di = 0 otherwise, where i denotes the ith observations.
• Let us further assume that the (unobserved) probability of the event di = 1
depends on individual characteristics:

pi = s(xi )

where, with sample size n > 0, i = 1, . . . , n represents individuals, and xi the


characteristics.
• To estimate this probability, we can use a statistical model (e.g., a GLM) or a
machine learning model (e.g., a random forest).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 3 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?

“The phrase ‘probability of death’, when it refers to a single person, has no


meaning for us at all.” Von Mises et al. (1939)

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?

“The phrase ‘probability of death’, when it refers to a single person, has no


meaning for us at all.” Von Mises et al. (1939)
• In such cases, it is important that the estimated scores can be interpreted as
probabilities.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Motivation

• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?

“The phrase ‘probability of death’, when it refers to a single person, has no


meaning for us at all.” Von Mises et al. (1939)
• In such cases, it is important that the estimated scores can be interpreted as
probabilities.
• This might become a problem when using tree-based classifiers (Niculescu-Mizil
and Caruana, 2005; Park and Ho, 2020; Hänsch, 2020) rather than logistic
regression models (Machado et al., 2024).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Roadmap

1 Introduction

2 Calibration
Definition
Visualizing Calibration
Measuring Calibration

3 Score Heterogeneity of Tree-Based Methods


Simulated environment
Real-world scenario in insurance

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 5 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration
Definition
Visualizing Calibration
Measuring Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 6 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration
Definition
Visualizing Calibration
Measuring Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 7 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Definition

Calibration of a Binary Classifier (Schervish (1989))


For a binary variable D, a model is well-calibrated when

E[D | ŝ(X) = p] = p, ∀p ∈ [0, 1] . (1)

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 8 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Definition

Calibration of a Binary Classifier (Schervish (1989))


For a binary variable D, a model is well-calibrated when

E[D | ŝ(X) = p] = p, ∀p ∈ [0, 1] . (1)

Note: conditioning by {ŝ(x) = p} leads to the concept of (local) calibration; however,


as discussed by Bai et al. (2021), {ŝ(x) = p} is a.s. a null mass event. Thus,
calibration should be understood in the sense that
a.s.
E[D | ŝ(X) = p] → p when n → ∞ ,

meaning that, asymptotically, the model is well-calibrated, or locally well-calibrated in


p, for any p.
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu
Challenging Calibration of Insurance Scoring Algorithms 8 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration
Definition
Visualizing Calibration
Measuring Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 9 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Calibration curve

• Estimation of g(·) (which measures miscalibration on predicted scores ŝ(x)):


{
[0, 1] → [0, 1]
g: . (2)
p 7→ g(p) := E[D | ŝ(x) = p]

• Challenge: having enough observations with identical scores is difficult.


• Solutions:
1 Reliability diagram (Wilks, 1990): grouping obs. into B bins, defined by the
quantiles of predicted scores,
2 Using a smoother representation with local regression techniques, which estimates
a conditional expectation within a specified neighborhood of predicted scores
(Denuit et al., 2021).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 10 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration
Definition
Visualizing Calibration
Measuring Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 11 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Metrics

Brier Score (Brier (1950))


The Brier Score does not depend on bins but directly on observations, and is defined as:

1∑
n
BS = (di − ŝ(xi ))2
n i=1

where di is the observed event and ŝ(xi ) the estimated score.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 12 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Metrics

Brier Score (Brier (1950))


The Brier Score does not depend on bins but directly on observations, and is defined as:

1∑
n
BS = (di − ŝ(xi ))2
n i=1

where di is the observed event and ŝ(xi ) the estimated score.

Integrated Calibration Index or ICI (Austin and Steyerberg (2019))


The ICI is based on the calibration curve ĝ estimated with local regression techniques and is defined
1∑
as n
ICI = |ĝ (ŝ(xi ))) − ŝ(xi ))|
n i=1
where ĝ (ŝ(xi ))) represents the prediction obtained from the local regression fit on the estimated score
ŝ(xi ).
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu
Challenging Calibration of Insurance Scoring Algorithms 12 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Illustrative example

• Consider the frenchmotor dataset from InsurFair (Charpentier, 2014), where


we aim to estimate the probability of accident for insureds within a year
(n = 12, 437 and 17 explanatory variables), by predicting the binary response
variable D, indicating the occurrence of an accident.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 13 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Illustrative example

• Consider the frenchmotor dataset from InsurFair (Charpentier, 2014), where


we aim to estimate the probability of accident for insureds within a year
(n = 12, 437 and 17 explanatory variables), by predicting the binary response
variable D, indicating the occurrence of an accident.
• We compare predictions from a GLM and a GAM to those from a random
forest (RF) regressor, increasingly used in insurance (NAIC, 2022).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 13 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Illustrative example

• Consider the frenchmotor dataset from InsurFair (Charpentier, 2014), where


we aim to estimate the probability of accident for insureds within a year
(n = 12, 437 and 17 explanatory variables), by predicting the binary response
variable D, indicating the occurrence of an accident.
• We compare predictions from a GLM and a GAM to those from a random
forest (RF) regressor, increasingly used in insurance (NAIC, 2022).

Table 1: Performance and calibration metrics on test set.


Model AUC Brier score ICI
GLM 0.61 ± 0.03 0.08 ± 0.03 0.04 ± 0.03
GAM 0.61 ± 0.03 0.08 ± 0.03 0.04 ± 0.03
RF 0.88 ± 0.03 0.07 ± 0.02 0.05 ± 0.03

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 13 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Calibration curves (1/2)


GLM GAM RF

Density

Density

Density
0 4 8

0 4 8

0 4 8
Scores (glm) Scores (gam) Scores (rf)
0.6

0.6

0.6
E(D|s^(x))

E(D|s^(x))

E(D|s^(x))
0.4

0.4

0.4
^

^
0.2

0.2

0.2
0.0

0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

s^(x) s^(x) s^(x)

Figure 1: Distribution of estimated scores for the three models, along with their calibration
curves generated using locfit.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 14 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Calibration curves (2/2)


The lack of score heterogeneity observed in RF model compared to GLM and GAM
is not assessed by calibration metrics.
GLM GAM RF

Density

Density

Density
0 4 8

0 4 8

0 4 8
Scores (glm) Scores (gam) Scores (rf)
0.4

0.4

0.4
0.3

0.3

0.3
E(D|s^(x))

E(D|s^(x))

E(D|s^(x))
0.2

0.2

0.2
0.1

0.1

0.1
0.0

0.0

0.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

s^(x) s^(x) s^(x)

Figure 2: Distribution of estimated scores for the three models, along with their zoomed
reliability diagrams.
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu
Challenging Calibration of Insurance Scoring Algorithms 15 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration

3 Score Heterogeneity of Tree-Based Methods


Simulated environment
Real-world scenario in insurance

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 16 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration

3 Score Heterogeneity of Tree-Based Methods


Simulated environment
Real-world scenario in insurance

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 17 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Overview for decision trees


Here, we consider a simulated environment for Di ∼ B(pi ), with pi the true
underlying probability distribution.
True Probabilities Smallest Largest
No. Leaves = 5, AUC = 0.72, No. Leaves = 2010, AUC = 0.62,
Brier = 0.21, ICI = 0.02, KL = 1.7 Brier = 0.34, ICI = 0.29, KL = 4.26

2000
600

2500
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

p s^(x) s^(x)

AUC* ICI* KL*


No. Leaves = 25, AUC = 0.75, No. Leaves = 8, AUC = 0.74, No. Leaves = 65, AUC = 0.75,
Brier = 0.2, ICI = 0.01, KL = 0.52 Brier = 0.21, ICI = 0.01, KL = 1.06 Brier = 0.2, ICI = 0.02, KL = 0.06

1500

600
800
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

s^(x) s^(x) s^(x)

Figure 3: Distribution of true probabilities and estimated scores for trees of interest. The

Kullback Leibler divergence (KL) of ϕ from ψ is defined by DKL (ϕ||ψ)= mi=1 hϕ (i) log hhψϕ (i)
(i)
.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 18 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration

3 Score Heterogeneity of Tree-Based Methods


Simulated environment
Real-world scenario in insurance

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 19 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Bayesian framework: back to the frenchmotor dataset

• The true underlying data distribution of D is not observable.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 20 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Bayesian framework: back to the frenchmotor dataset

• The true underlying data distribution of D is not observable.


• Expert opinion: Beta prior to model the underlying data distribution.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 20 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Bayesian framework: back to the frenchmotor dataset

• The true underlying data distribution of D is not observable.


• Expert opinion: Beta prior to model the underlying data distribution.

AUC* ICI* KL* Brier*


15

15

15

15
10

10

10

10
Density

Density

Density

Density
5

5
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

s^(X) s^(X) s^(X) s^(X)


Beta prior Scores predicted by RF Beta prior

Figure 4: Distribution of RF predicted scores when


optimizing hyperparameters for AUC (AUC∗ ), ICI
(ICI∗ ) and KL (KL∗ ).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 20 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Bayesian framework: back to the frenchmotor dataset

• The true underlying data distribution of D is not observable.


• Expert opinion: Beta prior to model the underlying data distribution.

AUC* ICI* KL* Brier*


15

15

15

15
Table 2: Difference in validation set
10

10

10

10
metrics between ICI∗ , KL∗ and the
Density

Density

Density

Density
reference model: AUC∗ .
5

5
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Optim.
0.0 0.2 0.4 0.6 0.8 1.0 ∆AUC ∆ICI ∆KL
Beta prior
s^(X)
Scores predicted by RF
s^(X)
Beta prior
s^(X)
ICI∗
s^(X)
−0.23 −0.02 +0.44
KL∗ −0.05 +0.01 −0.77
Figure 4: Distribution of RF predicted scores when
optimizing hyperparameters for AUC (AUC∗ ), ICI
(ICI∗ ) and KL (KL∗ ).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 20 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

1 Introduction

2 Calibration

3 Score Heterogeneity of Tree-Based Methods

4 Wrap-up

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 21 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Wrap-up

• Calibration matters: when training classifiers, looking at calibration of models


should not be disregarded.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 22 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Wrap-up

• Calibration matters: when training classifiers, looking at calibration of models


should not be disregarded.
• Calibration may not be sufficient for tree-based methods: for RF, when score
heterogeneity is lacking, metrics such as KL should complement the commonly
used calibration metrics.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 22 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Wrap-up

• Calibration matters: when training classifiers, looking at calibration of models


should not be disregarded.
• Calibration may not be sufficient for tree-based methods: for RF, when score
heterogeneity is lacking, metrics such as KL should complement the commonly
used calibration metrics.
• Next steps: In particular, for private insurance, calibration (or sufficiency)
emerges as the most suitable metric for evaluating group fairness, as highlighted
by Baumann and Loi (2023).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 22 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up

Wrap-up

• Calibration matters: when training classifiers, looking at calibration of models


should not be disregarded.
• Calibration may not be sufficient for tree-based methods: for RF, when score
heterogeneity is lacking, metrics such as KL should complement the commonly
used calibration metrics.
• Next steps: In particular, for private insurance, calibration (or sufficiency)
emerges as the most suitable metric for evaluating group fairness, as highlighted
by Baumann and Loi (2023).

Comments are welcome: [email protected]

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 22 / 22
Appendix References

5 Appendix

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 1 / 10
Appendix References

References I

Austin, P. C. and Steyerberg, E. W. (2019). The integrated calibration index (ici) and related metrics for
quantifying the calibration of logistic regression models. Statistics in Medicine 38: 4051 4065,
doi:10.1002/sim.8281.
Bai, Y., Mei, S., Wang, H. and Xiong, C. (2021). Don t just blame over-parametrization for over-confidence:
Theoretical analysis of calibration in binary classification. In International Conference on Machine Learning.
PMLR, 566–576.
Baumann, J. and Loi, M. (2023). Fairness and Risk: An Ethical Argument for a Group Fairness Definition
Insurers Can Use. Philosophy & technology 36, doi:https://doi.org/10.1007/s13347-023-00624-9.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78: 1
3.
Charpentier, A. (2014). Computational Actuarial Science. CRC Press.
Denuit, M., Charpentier, A. and Trufin, J. (2021). Autocalibration and tweedie-dominance for insurance pricing
with machine learning. Insurance: Mathematics and Economics 101: 485–497,
doi:https://doi.org/10.1016/j.insmatheco.2021.09.001.
Hänsch, R. (2020). Stacked Random Forests: More Accurate and Better Calibrated. In IGARSS 2020 - 2020
IEEE International Geoscience and Remote Sensing Symposium, 1751–1754.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 2 / 10
Appendix References

References II

Machado, A. F., Charpentier, A., Flachaire, E., Gallic, E. and Hu, F. (2024). From uncertainty to precision:
Enhancing binary classifier performance through calibration.
NAIC (2022). Appendix b-trees information elements and guidance for a regulator to meet best practices’
objectives (when reviewing tree-based models).
Niculescu-Mizil, A. and Caruana, R. (2005). Predicting good probabilities with supervised learning. In
Proceedings of the 22nd International Conference on Machine Learning, ICML ’05. New York, NY, USA:
Association for Computing Machinery, 625 632, doi:10.1145/1102351.1102430.
Park, Y. and Ho, J. C. (2020). Califorest: Calibrated random forest for health data. Proceedings of the ACM
Conference on Health, Inference, and Learning 2020 : 40–50.
Schervish, M. J. (1989). A General Method for Comparing Probability Assessors. The Annals of Statistics 17:
1856–1879, doi:10.1214/aos/1176347398.
Von Mises, R., Neyman, J., Sholl, D. and Rabinowitsch, E. (1939). Probability, Statistics and Truth. Macmillan.
Wilks, D. S. (1990). On the combination of forecast probabilities for consecutive precipitation periods. Weather
and Forecasting 5: 640 – 650, doi:10.1175/1520-0434(1990)005<0640:OTCOFP>2.0.CO;2.

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 3 / 10
Appendix References

5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 4 / 10
Appendix References

(Mis-)Calibration and standard metrics

Table 3: Confusion Table

Actual/Predicted Positive Negative


Positive TP FN
Negative FP TN

where
TP FP
TPR = ; FPR =
TP + FN FP + TN
AUC (Area Under Curve): TPR and TFP for various prob. threshold τ

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 5 / 10
Appendix References

5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 6 / 10
Appendix References

Data Generating Process for Score Heterogeneity

Di ∼ B(pi ),
where individual probabilities are obtained using a logistic sigmoid function:
1
, pi =
1 + exp(−ηi )
ηi = axi
[ ] [ ] [ ]⊤
with a = a1 a2 = 0.5 1 and xi = x1,i x2,i . The observations xi are drawn
from a N (0, 1).

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 7 / 10
Appendix References

5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 8 / 10
Appendix References

Parameters of RF for different optimization objectives

Table 4: RF parameters for different optimization objectives.


Optim. mtry num_trees min_node_size
AUC∗ 10 500 2
KL∗ 10 500 18
ICI∗ 4 500 512
Brier∗ 2 500 2

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 9 / 10
Appendix References

Metrics of RF optimization on validation set

Table 5: AUC, ICI and KL calculations for different RF optimization objectives.


Optim. AUC ICI KL
AUC∗ 0.78 0.03 0.80
ICI∗ 0.55 0.002 1.24
KL∗ 0.73 0.03 0.03

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu


Challenging Calibration of Insurance Scoring Algorithms 10 / 10

You might also like