Description
The ability of robots to grasp novel objects has industry applications in e-commerce order fulfillment and home service. Despite the recent success of data-driven grasping policies, they can fail to grasp objects which have complex geometry or are significantly outside of the training distribution. For such objects, we present a Thompson sampling algorithm that leverages learned priors from the Dexterity Network robot grasp planner to guide grasp exploration and provide probabilistic estimates of grasp success for each stable pose of novel objects. In addition, we introduce a new formulation of the mismatch between a prior distribution on grasp qualities and the ground truth grasp quality distribution, and perform empirical analysis studying the effect of this mismatch on policy performance.
In the first part of the thesis, simulation experiments suggest that the best learned policy attains an average total reward 64.5% higher than a greedy baseline and achieves within 5.7% of an oracle baseline. Total reward is defined as the average sum reward over 300,000 training runs across a set of 3000 object poses or approximately 1600 objects. In addition, we find that Thompson sampling without a neural network prior attains an average total reward 43.4% higher than a greedy baseline and achieves within 4.6% of the best learned policy when evaluated over 20,000 training runs across a set of 200 object poses.
In the second part of the thesis, we change the object’s stable pose during learning. Simulation experiments suggest that the best learned policy attains an average total reward at least 150.1% higher than a greedy baseline and achieves within at most 12.15% of an oracle baseline when evaluated over 5000 training runs per object across a total of 25 stable poses across all 4 objects.