Deep learning has achieved great results and made rapid progress over the past few years, particularly in the field of computer vision. Deep learning models are composed of artificial neural networks and a supervised, semi-supervised, or unsupervised learning scheme. Larger models have neural network architectures with more parameters, often resulting from more/wider layers. In this paper, we perform a case study in the domain of monocular depth estimation and contribute both a new model as well as a new dataset. We propose PixelBins, a simplification to AdaBins, the existing state-of-the-art model, and obtain comparable performance to state-of-the-art methods. Our method achieves a ~20x reduction in model size as well as an absolute relative error of 0.057 on the popular KITTI benchmark. Furthermore, we conceptualize and justify the need for truly open datasets. Consequently, we introduce a modern, extensible dataset consisting of high quality, cross-calibrated image+point cloud pairs across a diverse set of locations. The dataset is uniquely suited for the designation of truly open for a variety of reasons, such as a ~100x reduction in cost to contribute a new image+pointcloud pair. We make our code and dataset publicly available and provide instructions for contributing to and replicating our experiments.




Download Full History