Abstract:
depth, width, weight decay, and Batch Normalization are important factors influencing calibration.
Introduction:
文章中提到了两个例子来说明:神经网络分类器不仅仅需要精确,还需要指出何时他们不正确。
一个例子是自动驾驶领域,使用神经网络去检测行人和障碍物。如果检测网络不能confidently预测是否存在直接障碍物,汽车在这个时候应当更多的依赖于其他传感器的输出进行制动。另一个例子是在自动医疗领域,在疾病诊断网络的置信度较低时,需要传递给医生进行诊断。
这说明,一个神经网络应当在提供预测之外,再提供一个calibrated confidence measure。换句话说,和预测类别label关联的概率应当能反映基本事实正确的可能性(likelihood)。
Calibrated confidence estimate对于模型的可解释性也很重要。此外,良好的概率估计值可用于讲神经网络合并到其他概率模型之中。
图一展示:modern neural networks are no longer well-calibrated
第一行:以直方图的形式显示预测置信度(即与预测标签相关的概率)的分布。
LeNet 的平均置信度与其准确性非常接近,而 ResNet 的平均置信度远高于其准确性
第二行:We see that LeNet is well-calibrated, as confidence closely approximates the expected accuracy (i.e. the bars align roughly along the diagonal). On the other hand, the ResNet’s accuracy is better, but does not match its confidence.
Our goal is not only to understand why neural networks have become miscalibrated, but also to identify what methods can alleviate this problem.
Definitions
input: X ∈ 和 label Y ∈
= {1, . . . , K} 都是随机变量,follow a ground truth joint distribution π(X, Y ) = π(Y |X)π(X).
Let h be a neural network with , where
is a class prediction and
is its associated confidence, i.e. probability of correctness.
confidence estimate to be calibrated, define perfect calibration:
未完待续