Semantic based Document Clustering: A Detailed Review

Shah, Neepa; Mahajan, Sunita
August 2012
International Journal of Computer Applications;8/15/2012, Vol. 52, p42
Academic Journal
Document clustering, one of the traditional data mining techniques, is an unsupervised learning paradigm where clustering methods try to identify inherent groupings of the text documents, so that a set of clusters is produced in which clusters exhibit high intra-cluster similarity and low inter-cluster similarity. The importance of document clustering emerges from the massive volumes of textual documents created. Although numerous document clustering methods have been extensively studied in these years, there still exist several challenges for increasing the clustering quality. Particularly, most of the current document clustering algorithms does not consider the semantic relationships which produce unsatisfactory clustering results. Since last three-four years efforts have been seen in applying semantics to document clustering. Here, an exhaustive and detailed review of more than thirty semantic driven document clustering methods is presented. After an introduction to the document clustering and its basic requirements for improvement, traditional algorithms are overviewed. Also, semantic similarity measures are explained. The article then discusses algorithms that make semantic interpretation of documents for clustering. The semantic approach applied, datasets used, evaluation parameters applied, limitations and future work of all these approaches is presented in tabular format for easy and quick interpretation.


Related Articles

  • Chameleon based on clustering feature tree and its application in customer segmentation. Jinfeng Li; Kanliang Wang; Lida Xu // Annals of Operations Research;Apr2009, Vol. 168 Issue 1, p225 

    Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and...

  • An Empirical Study on Similarity between Documents in Nptel Application Using Clustering Techniques. S.Appavu Alias Balamurugan; Kalpana, N. // International Journal of Computer Science & Network Security;Dec2013, Vol. 13 Issue 12, p58 

    This chapter presents a tutorial overview of the main clustering methods used in Data Mining. The goal is to provide a self-contained review of the concepts and the similarity underlying clustering techniques. The chapter begins by providing measures and criteria that are used for determining...

  • Platform Development of Wushu Data Mining Based on B/S Architecture. Lihai Qi // Applied Mechanics & Materials;2014, Issue 608-609, p165 

    In the current, Colleges students are universal love Taekwondo, it has been widely spread in the university, and some universities will Taekwondo to set for compulsory courses of sports teaching. And now the study is less for Taekwondo of foreign spread, the study of Taekwondo is also confined...

  • Extracting Summary from Documents using K-mean Clustering Algorithm. K. S., Manjula; Swetha Ramana, D. Venkata // International Journal of Computer Science & Network Security;Aug2014, Vol. 14 Issue 8, p98 

    Extracting summary from the documents is a difficult task for human beings. There fore to generate summary automatically has to facilitate several challenges; as the system automates it can only extract the required information from the original document. This reduces the work to compress the...

  • Association Rule Pruning based on Interestingness Measures with Clustering. Kannan, S.; Bhaskaran, R. // International Journal of Computer Science Issues (IJCSI);Nov2009, Vol. 6 Issue 1, p35 

    Association rule mining plays vital part in knowledge mining. The difficult task is discovering knowledge or useful rules from the large number of rules generated for reduced support. For pruning or grouping rules, several techniques are used such as rule structure cover methods, informative...

  • REES MATRIX CONSTRUCTIONS FOR CLUSTERING OF DATA. Kelarev, A. V.; Watters, P.; Yearwood, J. L. // Journal of the Australian Mathematical Society;Dec2009, Vol. 87 Issue 3, p377 

    This paper continues the investigation of semigroup constructions motivated by applications in data mining. We give a complete description of the error-correcting capabilities of a large family of clusterers based on Rees matrix semigroups well known in semigroup theory. This result strengthens...

  • Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points. Velmurugan, T.; Santhanam, T. // Journal of Computer Science;2010, Vol. 6 Issue 3, p363 

    Problem statement: Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are...

  • An Analytical Assessment on Document Clustering. Pushplata, Ram Chatterjee // International Journal of Computer Network & Information Security;Jun2012, Vol. 4 Issue 5, p63 

    Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Document clustering is an unsupervised approach of...

  • K-Means for Spherical Clusters with Large Variance in Sizes. Fahim, A. M.; Saake, G.; Salem, A. M.; Torkey, F. A.; Ramadan, M. A. // Proceedings of World Academy of Science: Engineering & Technolog;Nov2008, Vol. 47, p177 

    Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics