Home > Ensembles

Boosting

Boosting (Schapire, 1990) is a meta-algorithm which can be viewed as a model averaging method. It is the most widely used ensemble method and one of the most powerful learning ideas introduced in the last ten years. Originally designed for classification, it can also be profitably extended to regression. We first create a “weak” classifier, that is, it suffices that its accuracy on the training set is only slightly better than a random guessing. A succession of models are built iteratively, each one being trained on a data set in which points misclassified (or, with regression, those poorly predicted) by the previous model are given more weight. Finally, all of the successive models are weighted according to their success and then the outputs are combined using voting (for classification) or averaging (for regression), thus creating a final model. The original boosting algorithm combined three weak learners to generate a strong learner.

AdaBoost

AdaBoost (Freund and Schapire, 1996), short for “adaptive boosting”, is the most popular boosting algorithm. It uses the same training set over and over again (thus it need not be large) and can also combine an arbitrary number of base-learners.

Original Paper

SCHAPIRE, Robert E., 1990. The strength of weak learnability, Machine Learning, Volume 5, Number 2, June, Pages 197-227. [Cited by 674] (41.46/year)
Abstract: "This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.
A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ε."

AdaBoost Paper

FREUND, Yoav and Robert E. SCHAPIRE, 1996. Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference (ICML '96), edited by Lorenza Saitta, pages 148-156, Morgan Kaufmann. [Cited by 1404] (136.91/year)
Abstract: "In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a “pseudo-loss” which is a method for forcing a learning algorithm of multi-label concepts to concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems.
We performed two sets of experiments. The first set compared boosting to Breiman's “bagging” method when used to aggregate various classifiers (including decision trees and single attribute-value tests). We compared the performance of the two methods on a collection of machine-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem."

10 Important Papers

Links

Bibliography