Avoiding Objects with few Neighbors in the K-Means Process and Adding ROCK Links to Its Distance

Alnabriss, Hadi A.; Ashour, Wesam
August 2011
International Journal of Computer Applications;Aug2011, Vol. 28, p12
Academic Journal
K-means is considered as one of the most common and powerful algorithms in data clustering, in this paper we're going to present new techniques to solve two problems in the K-means traditional clustering algorithm, the 1st problem is its sensitivity for outliers, in this part we are going to depend on a function that will help us to decide if this object is an outlier or not, if it was an outlier it will be expelled from our calculations, that will help the K-means to make good results even if we added more outlier points; in the second part we are going to make K-means depend on Rock links in addition to its traditional distance, Rock links takes into account the number of common neighbors between two objects, that will make the K-means able to detect shapes that can't be detected by the traditional K-means.


Related Articles

  • An Effective Clustering-Based Approach for Outlier Detection. Al-Zoubi, Moh'd Belal // European Journal of Scientific Research;Mar2009, Vol. 28 Issue 2, p310 

    Outlier detection is an extremely important task in a wide variety of application domains. In this paper, a proposed method based on clustering approaches for outlier detection is presented. We first perform the PAM clustering algorithm. Small clusters are then determined and considered as...

  • HYBRID ANT-BASED CLUSTERING ALGORITHM WITH CLUSTER ANALYSIS TECHNIQUES. Omar, Wafa'a; Badr, Amr; El-Fattah Hegazy, Abd // Journal of Computer Science;Jun2013, Vol. 9 Issue 6, p780 

    Cluster analysis is a data mining technology designed to derive a good understanding of data to solve clustering problems by extracting useful information from a large volume of mixed data elements. Recently, researchers have aimed to derive clustering algorithms from nature's swarm behaviors....

  • MULTI-DENSITY DBSCAN USING REPRESENTATIVES: MDBSCAN-UR. Ahmed, Rwand; El-Zaza, Eman; Ashour, Wesam // Computing & Information Systems;Oct2011, Vol. 15 Issue 2, p1 

    DBSCAN is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distributing of dataset. It simply uses the global uses minimum number of points (MinPts) parameter,...

  • AVOIDING NOISE AND OUTLIERS IN K-MEANS. Jnena, Rami; Timraz, Mohammed; Ashour, Wesam // Computing & Information Systems;Oct2011, Vol. 15 Issue 2, p1 

    Applying k-means algorithm on the datasets that include large number of noise and outlier objects, gives unclear clusters results. In this paper we proposed a new technique for avoiding these noise and outliers by applying some preprocessing and post processing steps for the dataset that have to...

  • K-Means for Spherical Clusters with Large Variance in Sizes. Fahim, A. M.; Saake, G.; Salem, A. M.; Torkey, F. A.; Ramadan, M. A. // International Journal of Computer Science;2009, Vol. 4 Issue 3, p145 

    Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of...

  • DERIVING CLUSTER KNOWLEDGE USING ROUGH SET THEORY. Upadhyaya, Shuchita; Arora, Alka; Jain, Rajni // Journal of Theoretical & Applied Information Technology;2008, Vol. 4 Issue 8, p688 

    Clustering algorithms gives general description of the clusters listing number of clusters and member entities in those clusters. It lacks in generating cluster description in the form of pattern. Deriving pattern from clusters along with grouping of data into clusters is important from data...

  • Detecting Outliers in Interval-Valued Data Using Heuristic Possibilistic Clustering. VIATTCHENIN, Dmitri // Journal of Computer Science & Control Systems;2012, Vol. 5 Issue 2, p39 

    The paper deals with the problem of outlier detection in the interval-valued data. The corresponding technique based on a heuristic method of possibilistic clustering. The description of basic concepts of the heuristic method of possibilistic clustering based on the allotment concept is...

  • Outlier Detection using Improved Genetic K-means. Marghny, M. H.; Taloba, Ahmed I. // International Journal of Computer Applications;Aug2011, Vol. 28, p33 

    The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable...

  • An effective web document clustering algorithm based on bisection and merge. Ingyu Lee; Byung-Won On // Artificial Intelligence Review;Jun2011, Vol. 36 Issue 1, p69 

    To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation,...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics