Increased trust in the predictions of neural networks are required for these models to gain greater popularity in real world decision making systems where the cost of misclassification is high. Accurate confidence estimates which represent the expected sample accuracy can greatly aid in increasing this trust, however modern neural networks are largely miscalibrated. Confidence estimates can be used as interpretable probabilities which are fed into the next stage of the decision making system, or as values which determine when not to act on the neural network’s predictions when compared to a threshold. We explore the effects of various regularization techniques and calibration methods on the expected calibration error. We find that widely regularization techniques do decrease the calibration error, but that the best hyperparameter values for this regularization may be different than the value that maximizes the generalization accuracy. For the application of using confidence estimates to determine which inputs to not predict on, we develop a framework of visualizing the tradeoff between the proportion of inputs not predicted on and the resulting accuracy. This method has a greater flexibility in the types of confidence scores that result in good indicators of when to trust the model and can also support confidence scores that are not probabilities or direct estimates of the expected sample accuracy. We demonstrate this using entropy and distance to the decision boundary as two methods which can separate out points that are likely to be incorrect without returning probabilities that are interpreted as traditional confidence estimates.




Download Full History