PDF

Description

A prediction algorithm is consistent if given a large enough sample of instances from the underlying distribution, it can achieve nearly optimal generalization accuracy. In practice, the training set is finite and does not give an adequate representation of the underlying distribution. Our work is based on a simple method for generating additional data from the existing data. Using this new data (convex pseudo-data) it is shown empirically that on a variety of data sets prediction accuracy of an algorithm can be significantly improved. This is shown first in classification using the CART algorithm. Similar results are shown in regression. Then pseudo-data is applied to bagging CART. Although CART is being used as a test bed, the idea of generating convex psuedo-data can be applied to any prediction method.

Details

Files

Statistics

from
to
Export
Download Full History