Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Zaman, M. Faisal; Hirose, Hideo
July 2011
New Generation Computing;Jul2011, Vol. 29 Issue 3, p277
Academic Journal
Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training set and the complexity of the classifier. It has been both theoretically and empirically established that the classification performance (hence bias and variance) of a single classifier can be improved partially by using a suitable ensemble method of the classifier and resampling the original training set. In this paper, we have empirically examined the bias-variance decomposition of three different types of ensemble methods with different training sample sizes consisting of 10% to maximum 63% of the observations from the original training sample. First ensemble is bagging, second one is a boosting type ensemble named adaboost and the last one is a bagging type hybrid ensemble method, called bundling. All the ensembles are trained on training samples constructed with small subsampling ratios (SSR) 0.10, 0.20, 0.30, 0.40, 0.50 and bootstrapping. The experiments are all done on 20 UCI Machine Learning repository datasets and designed to find out the optimal training sample size (smaller than the original training sample) for each ensemble and then find out the optimal ensemble with smaller trianing sets with respect to the bias-variance performance. The bias-variance decomposition of bundling shows that this ensemble method with small subsamples has significantly lower bias and variance than subsampled and bootstrapped version of bagging and adaboost.


Related Articles

  • Cross-situational and supervised learning in the emergence of communication. Fontanari, Jose Fernando; Cangelosi, Angelo // Interaction Studies;2011, Vol. 12 Issue 1, p1 

    Scenarios for the emergence or bootstrap of a lexicon involve the repeated interaction between at least two agents who must reach a consensus on how to name N objects using H words. Here we consider minimal models of two types of learning algorithms: cross-situational learning, in which the...

  • Improving Web Learning through model Optimization using Bootstrap for a Tour-Guide Robot. León, Rafael; Rainer, J. Javier; Rojo, José Manuel; Galán, Ramón // International Journal of Interactive Multimedia & Artificial Int;Sep2012, Vol. 1 Issue 6, p13 

    We perform a review of Web Mining techniques and we describe a Bootstrap Statistics methodology applied to pattern model classifier optimization and verification for Supervised Learning for Tour-Guide Robot knowledge repository management. It is virtually impossible to test thoroughly Web Page...

  • Using Support Vector Machine Ensembles for Target Audience Classification on Twitter. Lo, Siaw Ling; Chiong, Raymond; Cornforth, David // PLoS ONE;Apr2015, Vol. 10 Issue 4, p1 

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on...

  • Opinion Bias Detection Based on Social Opinions for Twitter. A-Rong Kwon; Kyung-Soon Lee // Journal of Information Processing Systems;Dec2013, Vol. 9 Issue 4, p538 

    In this paper, we propose a bias detection method that is based on personal and social opinions that express contrasting views on competing topics on Twitter. We used unsupervised polarity classification is conducted for learning social opinions on targets. The tf-idf algorithm is applied to...

  • Contents. Criminisi, Antonio; Shotton, Jamie; Konukoglu, Ender // Foundations & Trends in Computer Graphics & Vision;2011, Vol. 7 Issue 2/3, preceding p83 

    The table of contents for the publication "Decision Forests: A Unified Frameworkfor Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning" is presented.

  • Introduction to the special issue on image and video retrieval: theory and applications. Kompatsiaris, Ioannis; Marchand-Maillet, Stephane; Zwol, Roelof; Marcel, Sébastien // Multimedia Tools & Applications;Oct2011, Vol. 55 Issue 1, p1 

    An introduction is presented in which the editor discusses various reports within the issue on topics including the use of supervised learning, retagging videos and face recognition.

  • Editorial. Steinley, Douglas // Journal of Classification;Oct2015, Vol. 32 Issue 3, p357 

    An introduction is presented in which the editor discusses various reports within the issue on topics including the fractionally-supervised classification method for unsupervised and supervised learning, semi-definite programming (SDP) method for anlyzing data, and exact algorithm for clustering.

  • Change detection in SAR images using deep belief network: a new training approach based on morphological images. Samadi, Farnaam; Akbarizadeh, Gholamreza; Kaabi, Hooman // IET Image Processing;2019, Vol. 13 Issue 12, p2255 

    In solving change detection problem, unsupervised methods are usually preferred to their supervised counterparts due to the difficulty of producing labelled data. Nevertheless, in this paper, a supervised deep learning-based method is presented for change detection in synthetic aperture radar...

  • Confidence intervals illuminate absence of evidence. Altman, Doug; Bland, J. Martin // BMJ: British Medical Journal (International Edition);4/24/2004, Vol. 328 Issue 7446, p1016 

    Presents a letter to the editor discussing confidence intervals and conclusions drawn in three studies and offering the opinion that confidence intervals reflect specific uncertainty in research.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics