An improved combinatorial biclustering algorithm

Nosova, Ekaterina; Napolitano, Francesco; Amato, Roberto; Cocozza, Sergio; Miele, Gennaro; Raiconi, Giancarlo; Tagliaferri, Roberto
May 2013
Neural Computing & Applications;May2013 Supplement, Vol. 22, p293
Academic Journal
DNA microarray analysis represents a relevant technology in genetic research to explore and recognize possible genomic features of many diseases. Since it is a high-throughput technology, it requires advanced tools for a dimensional reduction in massive data sets. Clustering is among the most appropriate tools for mining these data, although it suffers from the following problems: instability of the results, large number of genes compared with the number of samples, high noise level, complexity of initialization, and grouping genes and samples simultaneously. Almost all these problems can be positively addressed by using novel techniques, such as biclustering. In this paper, a new biclustering algorithm is proposed, hereafter denoted as combinatorial biclustering algorithm (CBA), that addresses the problems listed above. The algorithm analyzes the data finding biclusters of the desired size and allowable error. CBA performances are compared with the ones of other bicluster algorithms by discussing the output of different methods once running them on a synthetic data set. CBA seems to perform better, and for this reason, it has been applied to study a real data set as well. In particular, CBA has analyzed the transcriptional profile of 38 gastric cancer tissues with microsatellite instability (MSI) and without MSS. The results show clearly a much coherent behavior in gene expression of normal tissues versus tumoral ones. The high level of gene misregulation in tumoral tissues affects any further bicluster analysis, and it is only partially smoothed in the MSI/MSS study even admitting much higher level on initial admissible error.


Related Articles

  • A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets. Ashok Sharma; Robert Podolsky; Jieping Zhao; Richard A. McIndoe // Bioinformatics;May2009, Vol. 25 Issue 9, p1152 

    Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting...

  • Computing the maximum similarity bi-clusters of gene expression data. Lusheng Wang // Bioinformatics;Jan2007, Vol. 23 Issue 1, p50 

    Motivations: Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may...

  • Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms. Burton Kuan Hui Chia; Karuturi, R. Krishna Murthy // Algorithms for Molecular Biology;2010, Vol. 5, p23 

    Background: Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real...

  • IN- SILICO ANALYSIS OF MICRO ARRAY DATA FOR PROSTATE CANCER. Satendra, Singh; Rohit, Lall; Prashant, Jain A. // International Journal of Pharmaceutical Sciences Review & Resear;Jul-Aug2010, Vol. 3 Issue 1, p46 

    The micro array data analysis for prostate cancer was carried out by clustering algorithms SOM and K-mean. The Genes were clustered into nine different clusters in both techniques based on the expression profile of those genes in prostate cancer. The expression of genes in some of clusters was...

  • Bayesian variable selection for disease classification using gene expression data. Yang Ai-Jun; Song Xin-Yuan // Bioinformatics;Jan2010, Vol. 26 Issue 2, p215 

    Motivation: An important application of gene expression microarray data is the classification of samples into categories. Accurate classification depends upon the method used to identify the most relevant genes. Owing to the large number of genes and relatively small sample size, the selection...

  • Biclustering of time-lagged gene expression data using real number. Liu, F.; Wang, L. B. // Journal of Biomedical Science & Engineering;Feb2010, Vol. 3 Issue 2, p217 

    Analysis of gene expression data can help to find the time-lagged co-regulation of gene cluster. However, existing method just solve the problem under the condition when the data is discrete number. In this paper, we propose efficient algorithm to indentify time-lagged co-regulated gene cluster...

  • SC2ATmd: a tool for integration of the figure of merit with cluster analysis for gene expression data. Olex, Amy L.; Fetrow, Jacquelyn S. // Bioinformatics;May2011, Vol. 27 Issue 9, p1330 

    Summary: Standard and Consensus Clustering Analysis Tool for Microarray Data (SC2ATmd) is a MATLAB-implemented application specifically designed for the exploration of microarray gene expression data via clustering. Implementation of two versions of the clustering validation method figure of...

  • Gene expression profiling of human ovarian tumours. Biade, S.; Marinucci, M.; Schick, J.; Roberts, D.; Workman, G.; Sage, E. H.; O'Dwyer, P. J.; LiVolsi, V. A.; Johnson, S. W. // British Journal of Cancer;10/23/2006, Vol. 95 Issue 8, p1092 

    There is currently a lack of reliable diagnostic and prognostic markers for ovarian cancer. We established gene expression profiles for 120 human ovarian tumours to identify determinants of histologic subtype, grade and degree of malignancy. Unsupervised cluster analysis of the most variable set...

  • BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis. Mahfouz, M. A.; Ismail, M. .A. // International Journal of Intelligent Technology;2009, Vol. 4 Issue 2, p117 

    Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics