A stable iterative method for refining discriminative gene clusters

Min Xu; Mengxia Zhu; Louxin Zhang
January 2008
BMC Genomics;2008 Supplement 2, Vol. 9, Special section p1
Academic Journal
Background: Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem. Results: We propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches. Conclusion: This backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.


Related Articles

  • biDCG: A New Method for Discovering Global Features of DNA Microarray Data via an Iterative Re-Clustering Procedure. Chen, Chia-Pei; Fushing, Hsieh; Atwill, Rob; Koehl, Patrice // PLoS ONE;Jul2014, Vol. 9 Issue 7, p1 

    Biclustering techniques have become very popular in cancer genetics studies, as they are tools that are expected to connect phenotypes to genotypes, i.e. to identify subgroups of cancer patients based on the fact that they share similar gene expression patterns as well as to identify subgroups...

  • Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures. Tan, Meng P.; Smith, Erin N.; Broach, James R.; Floudas, Christodoulos A. // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data...

  • Automated group assignment in large phylogenetic trees using GRUNT: GRouping, Ungrouping, Naming Tool. Dalevi, Daniel; DeSantis, Todd Z.; Fredslund, Jakob; Andersen, Gary L.; Markowitz, Victor M.; Hugenholtz, Philip // BMC Bioinformatics;2007 Supplement 2, Vol. 8, p402 

    Background: Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees,...

  • Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments. Breitling, Rainer; Amtmann, Anna; Herzyk, Pawel // BMC Bioinformatics;2004, Vol. 5, p34 

    Background: The biological interpretation of even a simple microarray experiment can be a challenging and highly complex task. Here we present a new method (Iterative Group Analysis) to facilitate, improve, and accelerate this process. Results: Our Iterative Group Analysis approach (iGA) uses...

  • A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets. Ashok Sharma; Robert Podolsky; Jieping Zhao; Richard A. McIndoe // Bioinformatics;May2009, Vol. 25 Issue 9, p1152 

    Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting...

  • Computing the maximum similarity bi-clusters of gene expression data. Lusheng Wang // Bioinformatics;Jan2007, Vol. 23 Issue 1, p50 

    Motivations: Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may...

  • Reuse of imputed data in microarray analysis increases imputation efficiency. Ki-Yeol Kim; Byoung-Jin Kim; Gwan-Su Yi // BMC Bioinformatics;2004, Vol. 5, p160 

    Background: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the...

  • Extending bicluster analysis to annotate unclassified ORFs and predict novel functional modules using expression data. Bryan, Kenneth; Cunningham, Pádraig // BMC Genomics;2008 Supplement 2, Vol. 9, Special section p1 

    Background: Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with...

  • gcExplorer: interactive exploration of gene clusters. Theresa Scharl; Friedrich Leisch // Bioinformatics;Apr2009, Vol. 25 Issue 8, p1089 

    Summary: Cluster analysis plays an important role in the analysis of gene expression data since the early beginning of microarray studies and is routinely used to find groups of genes with common expression pattern. In order to make cluster analysis helpful for users, visualization of cluster...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics