Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule

Erenel, Zafer; Altınçay, Hakan
May 2013
Neural Computing & Applications;May2013 Supplement, Vol. 22, p83
Academic Journal
The distribution of documents over two classes in binary text categorization problem is generally uneven where resampling approaches are shown to improve F scores. The improvement achieved is mainly due to the gain in recall where precision may deteriorate. Since precision is the primary concern in some applications, achieving higher F scores with a desired level of trade-off between precision and recall is important. In this study, we present an analytical comparison between unanimity and majority voting rules. It is shown that unanimity rule can provide better F scores compared to majority voting when an ensemble of high recall but low precision classifiers is considered. Then, category-based undersampling is proposed to generate high recall members. The experiments conducted on three datasets have shown that superior F scores can be realized compared to the support vector machines(SVM)-based baseline system and voting over a random undersampling-based ensemble.


Related Articles

  • Research on SVM Classification Algorithm Based on RS Attribute Reduction. Du Juan; Liu Yang; Yi Zhi-an; Tracy E. P. // Journal of Networks;Nov2014, Vol. 9 Issue 11, p3061 

    Support vector machine (SVM) can transforms the classification problem into quadratic programming problem, optimizing the classification hyper-plane. But when it deals with large amount of data, there are too much characteristics, which will lead to sample conflict and increase the complexity of...

  • Based on SVM classification of connection pools algorithm. Zhao Juan; Xiong Hui Yun // Advanced Materials Research;2014, Vol. 945-949, p2435 

    The connection pool technology has become a deal with large amount of data requested a solution that is widely used now. This paper used the SVM classification algorithm for classified all database requests quickly, so the corresponding database request could be assigned to different connection...

  • A Novel Background Subtraction Method Using Multiclass Support Vector Machine. Shaona Zhou; Shaorui Xu; Hua Xiao // Applied Mechanics & Materials;2014, Vol. 701/702, p265 

    Background subtraction, where the foreground is segmented from the background, is the first step of data analysis and processing in automated visual surveillance. Aiming to solve the problems associated with dynamic, multi-modal background, we explore a new approach which can handle the...

  • Application of FA and Multi-Layer RVM Classifier in Features Analysis of X-Ray Images. Liguo Zhang; Lei Wang // Advances in Information Sciences & Service Sciences;Jan2013, Vol. 5 Issue 2, p793 

    In this paper, the FA and Multi-Layer RVM has been used to investigate a real data set of grains, the wheat varieties, Kama, Rosa and Canadian, characterized by measurements of main grain geometric features obtained by X-Ray technique, have been analyzed. As a statistical analysis techniques,...

  • Improving activated sludge classification based on imbalanced data. Qian, Y.; Liang, Y. C.; Guan, R. C. // Journal of Hydroinformatics;2014, Vol. 16 Issue 6, p1331 

    A fast and accurate classification method for sewage sludge biological activity classification is of great significance for wastewater treatment. However, the data are often imbalanced and the accuracy of traditional classification algorithms applied to imbalanced small classes of data is very...

  • FactorsR: An RWizard Application for Identifying the Most Likely Causal Factors in Controlling Species Richness. Guisande, Cástor; Heine, Juergen; García-Roselló, Emilio; González-Dacosta, Jacinto; García Perez-Schofield, Baltasar J.; González-Vilas, Luis; Vaamonde, Antonio; Lobo, Jorge M. // Diversity (14242818);2015, Vol. 7 Issue 4, p385 

    We herein present FactorsR, an RWizard application which provides tools for the identification of the most likely causal factors significantly correlated with species richness, and for depicting on a map the species richness predicted by a Support Vector Machine (SVM) model. As a demonstration...

  • Application and Research of Data Mining Technology in Communication Network Environment. Zhang Zhi // Advanced Materials Research;7/24/2014, Vol. 989-994, p3814 

    The abnormal data of communication networks are complex and diverse, it is difficult to recognize and mine the abnormal data accurately and effectively with traditional methods. In order to improve the recognition accuracy of communication network, a data mining algorithms based on the method of...

  • A WSAN Precision Detection Method based on Optimal State Estimation by WSVM. WANG Ting; WANG Heng; LI Yong; WANG Ping // Journal of Convergence Information Technology;Nov2012, Vol. 7 Issue 21, p296 

    The state of WSAN precision detection of networked synchronization control system is very important for the health condition analysis of networked synchronization control system. In this study, WSAN precision detection method based on optimal state estimation by wavelet-support vector machine is...

  • Entity Relation Extraction Based on DAGSVM. Shengwei Tian; Huiyun Wang; Long Yu; Hanjun Guo // International Journal of Advancements in Computing Technology;Mar2013, Vol. 5 Issue 5, p949 

    Entity relation extraction is an important research field in Information Extraction. In order to solve multi-classification problem, we referred to SVMs (Support Vector Machine System) trained in this way as 1-versus-1, its cascade error is serious; or in the other way as 1-versus-rest, whose...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics