Climate change requires a radical and complex transition in the way our energy sector generates and uses energy. Solar and wind energy are leading carbon-free power sources, but they are non-dispatchable, meaning that an operator cannot control when their generation occurs. As a result, their growth may be stymied unless supporting technologies can change how the rest of our energy system responds to their generation. Demand response is an important tool in a suite of supporting technologies that help smooth the introduction of non-dispatchable energy into the grid. The more effective demand response signals (prices) are on responses (deferring of energy by appliances and building systems), the more quickly solar and wind may decarbonize our energy system.

Demand response signals may be amplified by advanced controls, of which reinforcement learning is a prime example. We present two different environments for testing demand response price-setting at different levels of the grid. The first environment is MicrogridLearn, an environment that transmits prices to a collection of buildings and learns to set prices in a way that better shapes the energy and lowers energy costs for all buildings in its purview. The second environment is OfficeLearn, an environment that simulates behavioral response to prices. Using these two environments, we identify six key challenges in moving RL from simulation to reality, and present RL strategies, some novel, to overcome them. First, RL controllers implemented in the real world need to be data efficient: we present Offline-Online RL, Surprise-Minimizing RL, and extrinsic and intrinsic planning as solutions. Second, RL in the real world should be robust and guarantee safe action: we present a novel method, the guardrails planning model, and demonstrate that using a conservative decision process with a distributional prediction can help learning. Third, RL may get stuck in local optima: we present meta-learning over domain randomization as a technique to ensure agent robustness. Fourth, agents may be attacked: we demonstrate a novel adversarial attack on RL and present a defense. Fifth, energy applications may require real-world RL to protect privacy and generalize to new subdomains easily; we present the first ever application of Personal Federated Hypernetworks (PFH) to RL to accomplish both tasks. Finally, hyperparameter sweeps may entail large data consumption; we present a regression analysis of hyperparameter sweep values to give a sense of hyperparameter-parameter strength.

Finally, we discuss how RL may be implemented in experiment. We give a prospective experiment plan. We present an API to connect the RL controller to the real world. We then discuss two prior experiments run in the same office setting: an A/B test of two different energy visualizations, and a measure of the persistence of the effects of energy reduction after the experimental period ended.

Our work contributes to societal knowledge in the following ways. We are the first to propose the use of RL for price-setting in energy systems, and we are the first to propose a Social Game as a mechanism to incentivize price sensitivity in an office setting. Of our RL methods, we are the first to propose adversarial poisoning during train time of algorithms. We are also the first to propose the use of personal federated hypernetworks for training new RL agents. Our other methods have been inspired by similar implementations in other fields, but are novel to the communities and application spaces in which we operate.

We hope that, from our work, the community may continue to iterate on RL architectures for price-setting, and may implement these techniques in experiment.




Download Full History