Description
To realize this potential of RL, in this dissertation, we develop an alternate paradigm that aims to utilizes static datasets of experience for learning policies. Such a "dataset-driven'' paradigm broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. It also brings the scalability and reliability benefits that modern supervised and unsupervised ML methods enjoy into RL. That said, instantiating this paradigm is challenging as it requires reconciling the {static} nature of learning from a dataset with the traditionally active nature of RL, which results in challenges of distributional shift, generalization, and optimization. After theoretically and empirically understanding these challenges, we develop algorithmic ideas for addressing thee challenges and discuss several extensions to convert these ideas into practical methods that can train modern high-capacity neural network function approximators on large and diverse datasets. Finally, we show how the techniques can enable us to pre-train generalist policies for real robots and video games and enable fast and efficient hardware accelerator design.