Challenging Calibration of Insurance Scoring Algorithms: Agathe Fernandes Machado
Challenging Calibration of Insurance Scoring Algorithms: Agathe Fernandes Machado
1 Introduction
2 Calibration
4 Wrap-up
Setup
Setup
pi = s(xi )
Setup
pi = s(xi )
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?
Motivation
• In insurance, we find cases where we are more interested in the underlying risk
than on being able to discriminate between the occurrence/non-occurrence of an
event:
• what is the probability for this insured to have an accident within the next year?
• what is the probability of death of this individual within the year?
Roadmap
1 Introduction
2 Calibration
Definition
Visualizing Calibration
Measuring Calibration
4 Wrap-up
1 Introduction
2 Calibration
Definition
Visualizing Calibration
Measuring Calibration
4 Wrap-up
1 Introduction
2 Calibration
Definition
Visualizing Calibration
Measuring Calibration
4 Wrap-up
Definition
Definition
1 Introduction
2 Calibration
Definition
Visualizing Calibration
Measuring Calibration
4 Wrap-up
Calibration curve
1 Introduction
2 Calibration
Definition
Visualizing Calibration
Measuring Calibration
4 Wrap-up
Metrics
1∑
n
BS = (di − ŝ(xi ))2
n i=1
Metrics
1∑
n
BS = (di − ŝ(xi ))2
n i=1
Illustrative example
Illustrative example
Illustrative example
Density
Density
Density
0 4 8
0 4 8
0 4 8
Scores (glm) Scores (gam) Scores (rf)
0.6
0.6
0.6
E(D|s^(x))
E(D|s^(x))
E(D|s^(x))
0.4
0.4
0.4
^
^
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 1: Distribution of estimated scores for the three models, along with their calibration
curves generated using locfit.
Density
Density
Density
0 4 8
0 4 8
0 4 8
Scores (glm) Scores (gam) Scores (rf)
0.4
0.4
0.4
0.3
0.3
0.3
E(D|s^(x))
E(D|s^(x))
E(D|s^(x))
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
Figure 2: Distribution of estimated scores for the three models, along with their zoomed
reliability diagrams.
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire,Ewen Gallic, François Hu
Challenging Calibration of Insurance Scoring Algorithms 15 / 22
Introduction Calibration Score Heterogeneity of Tree-Based Methods Wrap-up
1 Introduction
2 Calibration
4 Wrap-up
1 Introduction
2 Calibration
4 Wrap-up
2000
600
2500
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p s^(x) s^(x)
1500
600
800
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 3: Distribution of true probabilities and estimated scores for trees of interest. The
∑
Kullback Leibler divergence (KL) of ϕ from ψ is defined by DKL (ϕ||ψ)= mi=1 hϕ (i) log hhψϕ (i)
(i)
.
1 Introduction
2 Calibration
4 Wrap-up
15
15
15
10
10
10
10
Density
Density
Density
Density
5
5
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
15
15
15
Table 2: Difference in validation set
10
10
10
10
metrics between ICI∗ , KL∗ and the
Density
Density
Density
Density
reference model: AUC∗ .
5
5
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Optim.
0.0 0.2 0.4 0.6 0.8 1.0 ∆AUC ∆ICI ∆KL
Beta prior
s^(X)
Scores predicted by RF
s^(X)
Beta prior
s^(X)
ICI∗
s^(X)
−0.23 −0.02 +0.44
KL∗ −0.05 +0.01 −0.77
Figure 4: Distribution of RF predicted scores when
optimizing hyperparameters for AUC (AUC∗ ), ICI
(ICI∗ ) and KL (KL∗ ).
1 Introduction
2 Calibration
4 Wrap-up
Wrap-up
Wrap-up
Wrap-up
Wrap-up
5 Appendix
References I
Austin, P. C. and Steyerberg, E. W. (2019). The integrated calibration index (ici) and related metrics for
quantifying the calibration of logistic regression models. Statistics in Medicine 38: 4051 4065,
doi:10.1002/sim.8281.
Bai, Y., Mei, S., Wang, H. and Xiong, C. (2021). Don t just blame over-parametrization for over-confidence:
Theoretical analysis of calibration in binary classification. In International Conference on Machine Learning.
PMLR, 566–576.
Baumann, J. and Loi, M. (2023). Fairness and Risk: An Ethical Argument for a Group Fairness Definition
Insurers Can Use. Philosophy & technology 36, doi:https://doi.org/10.1007/s13347-023-00624-9.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78: 1
3.
Charpentier, A. (2014). Computational Actuarial Science. CRC Press.
Denuit, M., Charpentier, A. and Trufin, J. (2021). Autocalibration and tweedie-dominance for insurance pricing
with machine learning. Insurance: Mathematics and Economics 101: 485–497,
doi:https://doi.org/10.1016/j.insmatheco.2021.09.001.
Hänsch, R. (2020). Stacked Random Forests: More Accurate and Better Calibrated. In IGARSS 2020 - 2020
IEEE International Geoscience and Remote Sensing Symposium, 1751–1754.
References II
Machado, A. F., Charpentier, A., Flachaire, E., Gallic, E. and Hu, F. (2024). From uncertainty to precision:
Enhancing binary classifier performance through calibration.
NAIC (2022). Appendix b-trees information elements and guidance for a regulator to meet best practices’
objectives (when reviewing tree-based models).
Niculescu-Mizil, A. and Caruana, R. (2005). Predicting good probabilities with supervised learning. In
Proceedings of the 22nd International Conference on Machine Learning, ICML ’05. New York, NY, USA:
Association for Computing Machinery, 625 632, doi:10.1145/1102351.1102430.
Park, Y. and Ho, J. C. (2020). Califorest: Calibrated random forest for health data. Proceedings of the ACM
Conference on Health, Inference, and Learning 2020 : 40–50.
Schervish, M. J. (1989). A General Method for Comparing Probability Assessors. The Annals of Statistics 17:
1856–1879, doi:10.1214/aos/1176347398.
Von Mises, R., Neyman, J., Sholl, D. and Rabinowitsch, E. (1939). Probability, Statistics and Truth. Macmillan.
Wilks, D. S. (1990). On the combination of forecast probabilities for consecutive precipitation periods. Weather
and Forecasting 5: 640 – 650, doi:10.1175/1520-0434(1990)005<0640:OTCOFP>2.0.CO;2.
5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset
where
TP FP
TPR = ; FPR =
TP + FN FP + TN
AUC (Area Under Curve): TPR and TFP for various prob. threshold τ
5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset
Di ∼ B(pi ),
where individual probabilities are obtained using a logistic sigmoid function:
1
, pi =
1 + exp(−ηi )
ηi = axi
[ ] [ ] [ ]⊤
with a = a1 a2 = 0.5 1 and xi = x1,i x2,i . The observations xi are drawn
from a N (0, 1).
5 Appendix
Calculation of Performance Metrics
Simulated Environment for Score Heterogeneity
Random Forest Optimization on frenchmotor dataset