Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease generalization error. Procedures of this type were called arcing by Breiman [1996]. The first successful arcing procedure was introduced by Freund and Schapire [1995,1996] and called Adaboost. In an effort to explain why Adaboost works, Schapire, et al [1997] derived a bound on the generalization error of a convex combination of classifiers in terms of the margin. We introduce a function called the edge, which differs from the margin only if there are more than two classes. A framework for understanding arcing algorithms is defined. In this framework, we see that the arcing algorithms currently in the literature are optimization algorithms which minimize some function of the edge. A relation is derived between the optimal reduction in the maximum value of the and the PAC concept of weak learner. Two algorithms are described which achieve the optimal reduction. Tests on both synthetic and real data case doubt on the Schapire, et al explanation.




Download Full History