Similar to how a machine learning model converges by following the gradient produced by the choice of loss function, a scholarly field converges towards adoption of various model modification by following a type of gradient produced by the choice of error metrics used to report results in its papers. In this way, a field and its practitioners become a part of a larger human-centric process of design. In this paper we argue for the importance of choosing the right error metric for a popular cognitive model called Bayesian Knowledge Tracing (BKT), used in the context of intelligent tutoring systems. According to our analyses with synthetic data---including correlation analysis, gradient visualization, and parameter estimation---we find that error metrics of Root Mean Squared Error (RMSE) and log-likelihood provide the best correspondence to the true generating model. Area Under the Curve (AUC) and accuracy are significantly behind, while precision and recall have extremely poor performance. Our result validates the standard practices of using RMSE as a metric to evaluate BKT models and using RMSE or log-likelihood for BKT parameter estimation. Our result adds to the mounting wisdom against using AUC and accuracy, which are the other metrics that have been frequently used to evaluate BKT models as depicted in our seven-year literature review of the field. Additionally, we investigate the validity of parameters estimated using the different error metrics on real data from ASSISTments, Cognitive Tutor, and Khan Academy. The real data analysis reinforces our finding that log-likelihood and RMSE appear to be superior to the rest of the metrics and should be the metric of choice when applying this model.





Download Full History