Description
The advent of inexpensive 3D sensors has resulted in an abundance of 3D point- clouds and datasets. For instance, RGB-D sensors such as Kinect can result in 3D point clouds by projecting 2D pixels into 3D world coordinate using depth and pose information. Recent advancements in deep learning techniques appear to result in promising solutions to 2D and 3D recognition problems including 3D object detection. Unlike 3D classification, 3D object detection has received less attention in the research community. In this thesis, we propose a novel approach to 3D object detection, the Sparse Sampling Neural Network (SSNN), which takes large, unordered point clouds as input. We overcome the challenges of processing three dimensional data by convolving a collection of “probes” across a point cloud input which then feeds into a 3D convolutional neural network. This approach allows us to efficiently and accurately infer bounding boxes and their associated classes without discritizing the volumetric space into voxels. We demonstrate that our network performs well on indoor scenes, achieving mean Average Precision (mAP) of 54.48% on the Matterport3D dataset, 62.93% on the Stanford Large-Scale 3D Indoor Spaces Dataset, and 48.4% on the SUN RGB-D dataset.