TITLE

A New-Fangled FES-k -Means Clustering Algorithm for Disease Discovery and Visual Analytics

AUTHOR(S)
Oyana, Tonny J.
PUB. DATE
January 2010
SOURCE
EURASIP Journal on Bioinformatics & Systems Biology;2010, p1
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique--the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
ACCESSION #
65032840

 

Related Articles

  • Multicore Processing for Clustering Algorithms. Rao, Rekhansh; Nagwanshi, Kapil Kumar; Dubey, Sipi // International Journal of Computer Technology & Applications;2012, Vol. 3 Issue 2, p555 

    Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU's. Most of the programming languages doesn't provide multiprocessing facilities and hence wastage of...

  • Evaluation of the Clustering Characteristics of DBSCAN SOM and k-Means Algorithms. Mumtaz, K.; Duraiswamy, K. // International Journal of Computational Intelligence Research;2010, Vol. 6 Issue 4, p505 

    Mining knowledge from large amounts of spatial data is known as spatial data mining. This becomes a highly demanding field, because huge amounts of spatial data have been collected in various applications ranging from geospatial data to bio-medical knowledge. Recently, clustering has been...

  • A Novel Density based improved k-means Clustering Algorithm -- Dbkmeans. Mumtaz, K.; Duraiswamy, K. // International Journal on Computer Science & Engineering;2010, Vol. 2 Issue 2, p213 

    Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The amount of spatial data being...

  • Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining. Alhajj, Reda; Kaya, Mehmet // Journal of Intelligent Information Systems;Dec2008, Vol. 31 Issue 3, p243 

    Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets are given or employ a clustering...

  • Novel Hybrid Clustering Optimization Algorithms Based on Plant Growth Simulation Algorithm. Tavakolian, Rozita; Charkari, Nasroollah Moghaddam // Journal of Advanced Computer Science & Technology Research;2011, Vol. 1 Issue 2, p84 

    Data clustering as one of the important data mining techniques is a fundamental and widely used method to achieve useful information about data. In face of the clustering problem, clustering methods still suffer from trapping in a local optimum and cannot often find global clusters. In general,...

  • A New Text Clustering Method Based on KGA. ZhanGang Hao // Journal of Software (1796217X);May2012, Vol. 7 Issue 5, p1094 

    Text clustering is one of the key research areas in data mining. K-medoids is a classical partitioning algorithm, which can better solve the isolated point problem, but it often converges to local optimization. In this paper, we put forward a new genetic algorithm called KGA algorithm by putting...

  • Study of Clustering Algorithm based on Fuzzy C-Means and Immunological Partheno Genetic. Hongfen Jiang; Yijun Liu; Feiyue Ye; Haixu Xi; Mingfang Zhu; Junfeng Gu // Journal of Software (1796217X);Jan2013, Vol. 8 Issue 1, p134 

    Clustering algorithm is very important for data mining. Fuzzy c-means clustering algorithm is one of the earliest goal-function clustering algorithms, which has achieved much attention. This paper analyzes the lack of fuzzy C-means (FCM) algorithm and genetic clustering algorithm. Propose a...

  • A new hybrid imperialist competitive algorithm on data clustering. NIKNAM, TAHER; FARD, ELAHE; EHRAMPOOSH, SHERVIN; ROUSTA, ALIREZA // Sadhana;Jun2011, Vol. 36 Issue 3, p293 

    Clustering is a process for partitioning datasets. This technique is very useful for optimum solution. k-means is one of the simplest and the most famous methods that is based on square error criterion. This algorithm depends on initial states and converges to local optima. Some recent...

  • Role of Rough Sets in Data Analysis. Anitha, K. // Annual International Conference on Optoelectronics, Photonics & ;2016, p155 

    Data Clustering is the process of dividing a data set into groups of similar items. This paper describes the data analysis technique based on Rough sets. The main objective of data analysis using Rough Set theory is it discovers hidden patterns of data from high dimensional data base. Moreover...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics