TITLE

Extending bicluster analysis to annotate unclassified ORFs and predict novel functional modules using expression data

AUTHOR(S)
Bryan, Kenneth; Cunningham, P√°draig
PUB. DATE
January 2008
SOURCE
BMC Genomics;2008 Supplement 2, Vol. 9, Special section p1
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
Background: Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results: The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion: In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation.
ACCESSION #
35701855

 

Related Articles

  • A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets. Ashok Sharma; Robert Podolsky; Jieping Zhao; Richard A. McIndoe // Bioinformatics;May2009, Vol. 25 Issue 9, p1152 

    Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting...

  • Under the MIAME sun.  // Nature Methods;Jun2006, Vol. 3 Issue 6, p415 

    The article presents an overview of the topics covered in the June 2006 issue. It informs that organizations like Microarray Gene Expression Data society and Human Proteome Organization aimed to develop data formats compatible with the majority of instrument outputs and analysis, to establish...

  • SC2ATmd: a tool for integration of the figure of merit with cluster analysis for gene expression data. Olex, Amy L.; Fetrow, Jacquelyn S. // Bioinformatics;May2011, Vol. 27 Issue 9, p1330 

    Summary: Standard and Consensus Clustering Analysis Tool for Microarray Data (SC2ATmd) is a MATLAB-implemented application specifically designed for the exploration of microarray gene expression data via clustering. Implementation of two versions of the clustering validation method figure of...

  • Extended sequence of the turkey MHC B-locus and sequence variation in the highly polymorphic B-G loci. Bauer, Miranda; Reed, Kent // Immunogenetics;Apr2011, Vol. 63 Issue 4, p209 

    Genetic variation in the major histocompatibility complex (MHC) is directly correlated to differences in disease resistance. Immunity is greatly dependent on highly polymorphic genes in the MHC, such as class I, class II, and class III complement genes. Preliminary studies of wild turkey...

  • Computing the maximum similarity bi-clusters of gene expression data. Lusheng Wang // Bioinformatics;Jan2007, Vol. 23 Issue 1, p50 

    Motivations: Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may...

  • A global map of human gene expression. Lukk, Margus; Kapushesky, Misha; Nikkil√§, Janne; Parkinson, Helen; Goncalves, Angela; Huber, Wolfgang; Ukkonen, Esko; Brazma, Alvis // Nature Biotechnology;Apr2010, Vol. 28 Issue 4, p322 

    A letter is presented which described the construction of a global map of human gene expression from a large microarray data set in 2010. The collection and extraction of 9,000 raw data files from the public databases Gene Expression Omnibus and ArrayExpress is described. It details the...

  • The yeast lifecycle and DNA array technology. Williams, R.M. // Journal of Industrial Microbiology & Biotechnology;Mar2002, Vol. 28 Issue 3, p186 

    The genome variability and meiotic gene expression patterns in two unrelated laboratory yeast strains, SK1 and W303, have been characterized using high-density oligonucleotide arrays. The statistical analysis and comparison of the data has allowed identification of: (1) genes with functional...

  • Aligning Sequences by Minimum Description Length. Conery, John S. // EURASIP Journal on Bioinformatics & Systems Biology;2007, p1 

    This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver...

  • Molecular characterization and expression of type-I interferon gene in Labeo rohita. Parhi, Janmejay; Mukherjee, S.; Saxena, Gopalkrishna; Sahoo, Lopamudra; Makesh, M. // Molecular Biology Reports;May2014, Vol. 41 Issue 5, p2979 

    Genes coding for type-I interferon (I-IFN) has been cloned from Labeo rohita, a commercially important and widely cultured fish in India and South East Asia. In the present study, full-length gene of I-IFN was amplified and sequenced. The sequence analysis revealed that I-IFN consists of 1,786...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics