This thesis investigates two fundamental problems in computer vision: contour detection and image segmentation. We present new state-of-the-art algorithms for both of these tasks. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms.

Our approach to contour detection couples multiscale local brightness, color, and texture cues to a powerful globalization framework using spectral clustering. The local cues, computed by applying oriented gradient operators at every location in the image, define an affinity matrix representing the similarity between pixels. From this matrix, we derive a generalized eigenproblem and solve for a fixed number of eigenvectors which encode contour information. Using a classifier to recombine this signal with the local cues, we obtain a large improvement over alternative globalization schemes built on top of similar cues.

To produce high-quality image segmentations, we link this contour detector with a generic grouping algorithm consisting of two steps. First, we introduce a new image transformation called the Oriented Watershed Transform for constructing a set of initial regions from an oriented contour signal. Second, using an agglomerative clustering procedure, we form these regions into a hierarchy which can be represented by an Ultrametric Contour Map, the real-valued image obtained by weighting each boundary by its scale of disappearance. This approach outperforms existing image segmentation algorithms on measures of both boundary and segment quality. These hierarchical segmentations can optionally be further refined by user-specified annotations.

While the majority of this work focuses on processing static images, we also develop extensions for video. In particular, we augment the set of static cues used for contour detection with a low-level motion cue to create an enhanced boundary detector. Using optical flow in conjunction with this detector enables the determination of occlusion boundaries and assignment of figure/ground labels in video.




Download Full History