This thesis builds on previous work on shape matching techniques in computer vision, and classification techniques in machine learning. It tries to model the visual recognition task as one based on comparison to prototypes. Careful attention is paid to insights from perception and neuroscience on the nature of human visual similarity.

Classifying hand written digits has been heavily researched, and serves as a clearly-defined shape matching problem. One contribution of this thesis is a prototype-based classifier that drastically reduces the number of prototypes needed for the nearest neighbor technique.

A second contribution of this thesis is a technique for determining what parts of the shape are most informative for classification. This notion can be formalized in the framework of "feature selection" as in machine learning. The study of discriminative power from feature selection yields insight on (1) the usefulness of individual parts of the shape (2) the classification process of a general linear model classifier on shape data.

The most significant contribution of this thesis is a new learning technique for visual categorization, "SVM-KNN", which draws on aspects of two well known techniques: support vector machines (SVM) and K-nearest neighbor. The basic idea is to find close neighbors to a query sample and train a local support vector machine that preserves the distance function on the collection of neighbors. This technique is a good match to the unique challenges of general visual recognition: hugely multi-class, few training examples per class, and high variation within class. This approach is quite flexible, and permits recognition based on color, texture, and particularly shape, in a homogeneous framework.

Our hybrid method has reasonable computational complexity both in training and at run time, and yields excellent results in practice. A wide variety of distance functions can be used and experiments show state-of-the-art performance on a number of benchmark data sets of shape and texture classification (MNIST, USPS, CUReT) and object recognition (Caltech-101). On Caltech-101, the technique achieved a correct classification rate of 62.42% +- 0.41% using only fifteen training examples, outperforming other published approaches at the time.




Download Full History