This thesis investigates using end-to-end learning for the autonomous driving system, a concrete robotic application. End to end learning can produce reasonable driving behaviors, even in the complex urban driving scenarios. Representation learning in end-to-end driving models is crucial, and auxiliary vision tasks such as semantic segmentation can help to form a more informative driving representation especially when training data is limited. Naive convolutional neural networks are usually only capable of doing reactive control and can not involve complex reasoning in a particular scenario. This thesis also studies how to handle scene conditioned driving behavior, which goes beyond the capability of reactive control. Alongside the end-to-end structure, learning methods also play a critical role. Imitation learning methods will acquire meaningful behaviors but usually, the robot can not master the skill. Reinforcement learning, on the contrary, either barely learns anything if the environment is too complex, or it can master the skill otherwise. To get the best of both worlds, this thesis proposes an algorithmically unified method to learn from both demonstration data and the environment.