In recent years, the ubiquity of drones equipped with RGB cameras has made aerial 3D point cloud and model generation significantly more cost effective than with traditional aerial LiDAR-based methods. Most existing aerial 3D point cloud building reconstruction and segmentation approaches use geometric methods that are tailored to 3D LiDAR data. However, point clouds generated from drone imagery generally have a much different structure than the ones constructed through LiDAR imaging systems, for which these methods are not as suitable. In this thesis we present two methods: (a) an approach for segmenting building and vegetation points in a 3D point cloud; (b) a pipeline for extracting a building footprint with height information from aerial images. For both approaches, we leverage the commercial software Pix4D to construct a 3D point cloud from RGB drone imagery.

To segment the point cloud, our basic approach is to directly apply deep learning segmentation methods to the very RGB images used to create the point cloud itself, followed by back-projecting the pixel class in segmented images onto the 3D points. This is a particularly attractive solution, since deep learning methods for image segmentation are more mature and advanced as compared to 3D point cloud segmentation. Furthermore, GPU engines for 2D image convolutions are likely to result in higher processing speeds than could be achieved using 3D point cloud data. We compute F1 and Jaccard similarity coefficient scores for the building and vegetation point classifications to show that our methodology outperforms existing methods such as PointNet++ and commercially available packages such as Pix4D.

For building footprint extraction, the 3D point cloud is used in conjunction with image processing and geometric methods to extract a building footprint. The footprint is then extruded vertically based on the heights of the extracted rooftops. The footprint extraction involves two main steps, line segment detection and polygonization of the lines. To detect line segments, we project the point cloud onto a regular grid, detect preliminary lines using the Hough transform, refine them via RANSAC, and convert them into line segments by checking the density of the points surrounding the line. In the polygonization step, we convert detected line segments into polygons by constructing and completing partial polygons, and then filter them by checking for support in the point cloud. The polygons are then merged based on their respective height profiles to create a full building footprint, which can be used to construct a 3D model of the building. Notably, an application of the extracted 3D building model is the computation of the window-to-wall ratio of the building. Given a set of detected windows on a 2D image, we can project them onto the extracted 3D model and compute their area to obtain the window-to-wall ratio. We have tested our system on two buildings of several thousand square feet in Alameda, CA, and obtained an F1 score of 0.93 and 0.95 respectively as compared to the ground truth.




Download Full History