Description
In this thesis, we apply modern advances in statistical and online learning theory to enrich our understanding of the statistical complexity of data-driven control. In the first half of the thesis, we study estimation of system parameters, known as system identification. We show that the stability of the system - long thought to characterize the sample complexity of estimation - is often inversely proportional to the number of trajectories required to recovery the system parameters to specified tolerance.
We then turn to adaptive control: the problem in which a learning agent interacts with an unknown dynamical system in hopes of attaining control performance comparable to if she had foreknowledge of the system dynamics. Adopting the conventions of modern online learning theory, we study the problem from the perspective of regret, or cumulative performance relative to an idealized benchmark. Building on the system identification results, we present upper and lower bounds that demonstrate that a remarkably simple strategy enjoys optimal regret for the adaptive control of the linear quadratic regulator. We then propose a unified framework - based on online convex optimization - which enjoys similar regret guarantees in the considerably more general setting of nonstochastic control, accommodating partial observation, time-varying costs, and adversarially-chosen disturbances. Surprisingly, we find that the best-attainable regret (with respect to a suitable benchmark of stabilizing LTI control policies) exhibits the same scaling in the problem horizon as the optimal regret for the online LQR problem.