Description
Forecasting future world events is a challenging but fruitful task, especially during times of uncertainty for better decision-making. We introduce a dataset of forecasting questions spanning various categories and topics and a large dataset of news curated from common-crawl. We show the effectiveness of larger models, better retrieval sources and techniques, and temporal architecture for long-range modeling. In order to better measure models’ performance and calibration on questions with numerical outputs, we also introduce another dataset full of numerical questions where we design a baseline algorithm to train models to output confidence intervals at specified confidence levels. With this dataset, we introduce a novel measure of calibration for numerical outputs based on adaptive binning RMS.