Ever since the dawn of computer vision, 3D reconstruction has been a core problem, inspiring early seminal works and leading to numerous real world applications. Much recent progress in the field however, has been driven by visual recognition systems powered by statistical learning techniques - more recently with deep convolutional neural networks (CNNs). In this thesis, we attempt to bridge the worlds of geometric 3D reconstruction and learning based recognition by learning to leverage various 3D perception cues from image collections for the task of reconstructing 3D objects.

In Chapter 2, we present a system that is able to learn intra-category regularities in object shapes by building category-specific deformable 3D models from 2D recognition datasets enabling fully automatic single view 3D reconstruction for novel instances. In Chapter 3, we demonstrate how predicting the amodal extent of objects in images and reasoning about their co-occurrences can help us infer their real world heights. Finally, in Chapter 4, we present Learnt Stereo Machines (LSM), an end-to- end learnt framework using convolutional neural networks, which unifies a number of paradigms in 3D object reconstruction- single and multi-view reconstruction, coarse and dense outputs and geometric and semantic reasoning. We will conclude with several promising future directions for learning based 3D reconstruction.




Download Full History