An Initialization Method for the K-means Algorithm using RNN and Coupling Degree

Ahmed, Alaa H.; Ashour, Wesam
July 2011
International Journal of Computer Applications;Jul2011, Vol. 25, p1
Academic Journal
Since K-means is widely used for general clustering, its performance is a critical point. This performance depends highly on initial cluster centers since it may converge to numerous local minima. In this paper a proposed initialization method to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search and coupling degree. Reverse nearest neighbor search retrieves all points in a given data set whose nearest neighbor is a given query point, where coupling degree between neighborhoods of nodes is defined based on the neighborhood-based rough set model as the amount of similarity between objects. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. The application of the proposed algorithm to K-means clustering algorithm is demonstrated. An experiment is carried out on several popular datasets and the results show the advantages of the proposed method.


Related Articles

  • Greedy Feature Selection for Subspace Clustering. Dyer, Eva L.; Sankaranarayanan, Aswin C.; Baraniuk, Richard G. // Journal of Machine Learning Research;Sep2013, Vol. 14, p2487 

    Unions of subspaces provide a powerful generalization of single subspace models for collections of high-dimensional data; however, learning multiple subspaces from data is challenging due to the fact that segmentation--the identification of points that live in the same subspace--and subspace...

  • A Two-Stage Tree based Meta-Classifier using Stack-Generalization. Kalpana, B.; Saravanan, V.; Vivekanandan, K. // International Journal of Computer Applications;Dec2011, Vol. 36, p25 

    Given a choice of classifiers each performing differently on different datasets the best option to assume is an ensemble of classifiers. An ensemble uses a single learning algorithm, whereas in this paper we propose a two stage stacking method with decision tree c4.5 as meta classifier. The base...

  • A New R-tri-training Algorithm Based on Relational Nearest Neighbor Classifier with Prototype-dependent Weights. Yanjuan Li; Hong'e Ren // Advances in Information Sciences & Service Sciences;Jan2013, Vol. 5 Issue 2, p752 

    In this paper, a new relational semi-supervised learning algorithm named PWRNN-R-tri-training (R-tri-training based on relational nearest neighbor classifier with prototype-dependent weights) is proposed. The co-training process of PWRNN-R-tri-training consists of co-labeling, re-classifying and...

  • Simple algorithm portfolio for SAT. Nikolić, Mladen; Marić, Filip; Janičić, Predrag // Artificial Intelligence Review;Dec2013, Vol. 40 Issue 4, p457 

    The importance of algorithm portfolio techniques for SAT has long been noted, and a number of very successful systems have been devised, including the most successful one—SATzilla. However, all these systems are quite complex (to understand, reimplement, or modify). In this paper we...

  • Integrating the Supervised Information into Unsupervised Learning. Ping Ling; Nan Jiang; Xiangsheng Rong // Mathematical Problems in Engineering;2013, p1 

    This paper presents an assembling unsupervised learning framework that adopts the information coming from the supervised learning process and gives the corresponding implementation algorithm. The algorithm consists of two phases: extracting and clustering data representatives (DRs) firstly to...

  • Proceedings of Reisensburg 2011. Binder, Harald; Kestler, Hans; Schmid, Matthias // Computational Statistics;Feb2014, Vol. 29 Issue 1/2, p1 

    An introduction is presented in which the editor discusses various reports within the issue on topics including ways for making graphs accessible for machine learning tasks, the use of nearest neighbor classifiers as a machine learning method and inference of Boolean networks.

  • Cross-scale analysis of cluster correspondence using different operational neighborhoods. Lu, Yongmei; Thill, Jean-Claude // Journal of Geographical Systems;Sep2008, Vol. 10 Issue 3, p241 

    Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of...

  • CID: an efficient complexity-invariant distance for time series. Batista, Gustavo; Keogh, Eamonn; Tataw, Oben; Souza, Vinícius // Data Mining & Knowledge Discovery;May2014, Vol. 28 Issue 3, p634 

    The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor...

  • Identification and estimation of superposed Neyman-Scott spatial cluster processes. Tanaka, Ushio; Ogata, Yosihiko // Annals of the Institute of Statistical Mathematics;Aug2014, Vol. 66 Issue 4, p687 

    This paper proposes an estimation method for superposed spatial point patterns of Neyman-Scott cluster processes of different distance scales and cluster sizes. Unlike the ordinary single Neyman-Scott model, the superposed process of Neyman-Scott models is not identified solely by the...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics