Self-driving vehicle vision systems must deal with an extremely broad and challenging set of scenes. We propose a distributed training regimen for a CNN vision system whereby vehicles in the field continually collect images of objects that are incorrectly or weakly classified. These images are then used to retrain the vehicle’s object detection system offline, so that accuracy on difficult images continues to improve over time. In this report we show the feasibility of this approach in several steps. First, we note that an optimal subset (relative to all the objects encountered) of images can be obtained by importance sampling using gradients of the recognition network. Next we show that these gradients can be approximated with very low error using just the last layer gradient, which is already available when the CNN is running inference. Then, we generalize these results to objects in a larger scene using an object detection system. Finally, we describe a self-labelling scheme using object tracking. Objects are tracked back in time (near-to-far) and labels of near objects are used to check accuracy of those objects in the far field. Finally we present some experiments and show the data reductions that are possible.




Download Full History