Description
In this thesis we provide a comprehensive set of algorithmic approaches to the problem of model selection in stochastic contextual bandits and reinforcement learning. We propose and analyze two distinct approaches to the problem. First, we introduce Stochastic CORRAL, an algorithm that successfully combines an adversarial EXP3 or CORRAL master with multiple stochastic bandit algorithms. Second, we introduce three distinct stochastic master algorithms: Explore-Commit-Eliminate (ECE), Regret Balancing, and Regret Bound Balancing and Elimination (RBBE) that recover the rates of Stochastic CORRAL under an EXP3 and a CORRAL master but with the advantage the model selection guarantees of RBBE extend to the setting of contextual linear bandits with adversarial contexts. We complement our algorithmic results with a variety of lower bounds designed to explore the theoretical limits of model selection in Contextual Bandits and Reinforcement Learning.