We develop a battery of segmentation comparison measures that we use both to validate the consistency of the human data and to provide approaches for evaluating grouping algorithms. In conjunction with the segmentation dataset, the various measures provide "micro-benchmarks" for boundary detection algorithms and pixel affinity functions, as well a benchmark for complete segmentation algorithms. Using these performance measures, we can systematically improve grouping algorithms with the human ground truth as our goal.
Starting at the lowest level, we present local boundary models based on brightness, color, and texture cues, where the cues are individually optimized with respect to the dataset and then combined in a statistically optimal manner with classifiers. The resulting detector is shown to significantly outperform prior state-of-the-art algorithms. Next, we learn from data how to combine the boundary model with patch-based features in a pixel affinity model to settle long-standing debates in computer vision with empirical results: (1) brightness boundaries are more informative than patches, and vice versa for color; (2) texture boundaries and patches are the two most powerful cues; (3) proximity is not a useful cue for grouping, it is simply a result of the process; and (4) both boundary-based and region-based approaches provide significant independent information for grouping.