Description
I start with the random subspace method of ensemble feature selection, in which each feature vector is simply chosen randomly from the feature pool. Using ISOLET, I obtain performance improvements over baseline in almost every case where there is a statistically significant performance difference, but there are many cases with no such difference.
I then try hill-climbing, a wrapper approach that changes a single feature at a time when the change improves a performance score. With ISOLET, hill-climbing gives performance improvements in most cases for noisy data, but no improvement for clean data. I then move to Numbers, for which much more data is available to guide hill-climbing. When using either the clean or noisy Numbers data, hill-climbing gives performance improvements over multi-stream baselines in almost all cases, although it does not improve over the best single-stream baseline. For noisy data, these performance improvements are present even for noise types that were not seen during the hill-climbing process. In mismatched condition tests involving mismatch between clean and noisy data, hill-climbing outperforms all baselines when Opitz's scoring formula is used. I find that this scoring formula, which blends single-classifier accuracy and ensemble diversity, works better for me than ensemble accuracy as a performance score for guiding hill-climbing.