Description
The first part focuses on performative prediction. Performative prediction formalizes the phenomenon that predictive models—by means of being used to make consequential downstream decisions—often influence the outcomes they aim to predict in the first place. For example, travel time estimates on navigation apps influence traffic patterns and thus realized travel times, stock price predictions influence trading activity and hence prices. We examine common heuristics such as retraining, as well as more refined optimization strategies for dealing with performative feedback. At the end of the first part, we identify important scenarios where the act of prediction triggers feedback loops that are not explained by the framework of performativity, and we develop theory to describe and study such feedback.
The second part discusses principles for valid statistical inference, i.e., valid p-values and confidence intervals, in the presence of feedback. We consider two types of feedback: the first is due to data snooping, i.e., the practice of choosing which results to report only after seeing the data; the second arises when machine-learning systems are used to supply cheap predictions to augment or supplant high-quality data in future scientific analyses. In both cases, ignoring the feedback and naively applying classical statistical methods leads to inflated error rates and false discoveries; we provide alternative approaches that guarantee valid inferences in the face of feedback.