Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at https://github.com/hee-suk-yoon/ESD.
翻译:现代神经网络由于过于自信的预测而显示,现代神经网络的校准往往不善,传统上,在训练后使用后处理方法来校准模型;近年来,提出了各种可训练校准措施,以直接将其纳入培训过程;然而,这些方法都包含内部超参数,而这些校准目标的性能依靠调整这些超参数,随着神经网络和数据集规模的扩大,计算成本会增加;因此,我们提出了预期的平差(ESD),一种无调试(即无超参数)可训练的校准目标损失,我们从两种预期的平差的角度来看待校准错误。在对若干建筑(CNN、变换器)和数据集进行广泛的实验后,我们证明(1) 将ESD纳入培训中改进了各种批量大小环境的校准模型,而无需内部超光度校准调整,(2) ESD提供与以往方法相比最佳的校准结果,(3) ESD在缺乏内部校准期间,ES/Qreal-deformal-dealal 校准需要的MES/eximal-dealbal</s>