Autonomous drone flight has emerged as a revolutionary technology with diverse applications across industries such as search and rescue, infrastructure inspection, deliveries, defense, and precision agriculture. Drones are packed with various sensor suites and are tasked to perceive their surroundings, navigate to goal locations, and detect points of interest, all while dealing with complex, unknown environments. Classical approaches separate the perception, planning, and control steps. Other works output trajectories for a Model Predictive Controller to follow.

In this work, we present a deep Reinforcement Learning (RL) approach for an off-the-shelf drone to fly through goal positions and avoid obstacles in unknown outdoor environments. The agent is given position, orientation, yaw rate, and depth image information. Given this, it outputs linear velocities and a yaw rate. Privileged learning is used during training, where full environment information is used beforehand to pre-compute optimal trajectories. Optimal trajectories provide a supervisory signal to the RL agent, which is penalized for deviating from the given trajectory. Our work combines the benefits of pre-computed optimal trajectories with the advantages of exploration with an RL agent, allowing for flight in previously unseen situations. This policy transfers well to new hardware platforms with different dynamics, as many off-the-shelf platforms come with lower level velocity controllers. Real world experiments show positive results on a DJI Matrice 300 -- a different hardware platform from the one used in simulation -- in a simple outdoor environment, where the policy is applied zero-shot and is able to avoid an obstacle and navigate to desired goals at 1m/s.




Download Full History