Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles

Psomopoulos, Fotis E.; Mitkas, Pericles A.; Ouzounis, Christos A.
January 2013
PLoS ONE;Jan2013, Vol. 8 Issue 1, Special section p1
Academic Journal
Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.


Related Articles

  • Genome Sequencing: Into the woods. Goymer, Patrick // Nature Reviews Genetics;Nov2006, Vol. 7 Issue 11, p826 

    The article discusses research being done on the genome of black cottonwood. The examination of the newly published draft genome reveals 45,555 putative protein-coding genes, neatly twice as many as A. thaliani. Almost 12 percent of the P. trichocarpa do not have identifiable A. thaliana homologue.

  • TobEA: an atlas of tobacco gene expression from seed to senescence. Edwards, Kieron D.; Bombarely, Aureliano; Story, Geraint W.; Allen, Fraser; Mueller, Lukas A.; Coates, Steve A.; Jones, Louise // BMC Genomics;2010, Vol. 11, p142 

    Background: Transcriptomics has resulted in the development of large data sets and tools for the progression of functional genomics and systems biology in many model organisms. Currently there is no commercially available microarray to allow such expression studies in Nicotiana tabacum...

  • Rapid Speciation with Gene Flow Following the Formation of Mt. Etna. Osborne, Owen G.; Batstone, Thomas E.; Hiscock, Simon J.; Filatov, Dmitry A. // Genome Biology & Evolution;Sep2013, Vol. 5 Issue 9, p1704 

    Environmental or geological changes can create new niches that drive ecological species divergence without the immediate cessation of gene flow. However, few such cases have been characterized. On a recently formed volcano, Mt. Etna, Senecio aethnensis and S. chrysanthemifolius inhabit...

  • A Screen for Recessive Speciation Genes Expressed in the Gametes of F1 Hybrid Yeast. Greig, Duncan // PLoS Genetics;Feb2007, Vol. 3 Issue 2, p281 

    Diploid hybrids of Saccharomyces cerevisiae and its closest relative, Saccharomyces paradoxus, are viable, but the sexual gametes they produce are not. One of several possible causes of this gamete inviability is incompatibility between genes from different species—such incompatible genes...

  • A Functional Phylogenomic View of the Seed Plants. Lee, Ernest K.; Cibrian-Jaramillo, Angelica; Kolokotronis, Sergios-Orestis; Katari, Manpreet S.; Stamatakis, Alexandros; Ott, Michael; Chiu, Joanna C.; Little, Damon P.; Stevenson, Dennis Wm.; McCombie, W. Richard; Martienssen, Robert A.; Coruzzi, Gloria; DeSalle, Rob // PLoS Genetics;Dec2011, Vol. 7 Issue 12, Special section p1 

    A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to...

  • Allelic association studies of genome wide association data can reveal errors in marker position assignments. Curtis, David // BMC Genetics;2007, Vol. 8, p30 

    Background: Genome wide association (GWA) studies provide the opportunity to develop new kinds of analysis. Analysing pairs of markers from separate regions might lead to the detection of allelic association which might indicate an interaction between nearby genes. Methods: 396,591 markers typed...

  • Prioritizing disease candidate genes by a gene interconnectedness-based approach. Chia-Lang Hsu; Yen-Hua Huang; Chien-Ting Hsu; Ueng-Cheng Yang // BMC Genomics;2011 Supplement 3, Vol. 12 Issue Suppl 3, p1 

    Background: Genome-wide disease-gene finding approaches may sometimes provide us with a long list of candidate genes. Since using pure experimental approaches to verify all candidates could be expensive, a number of network-based methods have been developed to prioritize candidates. Such tools...

  • Correcting for relatedness in Bayesian models for genomic data association analysis. Pikkuhookana, P.; Sillanpää, M. J. // Heredity;Sep2009, Vol. 103 Issue 3, p223 

    For small pedigrees, the issue of correcting for known or estimated relatedness structure in population-based Bayesian multilocus association analysis is considered. Two such relatedness corrections: [1] a random term arising from the infinite polygenic model and [2] a fixed covariate following...

  • Genome-wide analysis of the GRAS gene family in Prunus mume. Wang, Tao; Sun, Lidan; Lu, Jiuxing; Xu, Zongda; Zhang, Qixiang // Molecular Genetics & Genomics;Feb2015, Vol. 290 Issue 1, p303 

    Prunus mume is an ornamental flower and fruit tree in Rosaceae. We investigated the GRAS gene family to improve the breeding and cultivation of P. mume and other Rosaceae fruit trees. The GRAS gene family encodes transcriptional regulators that have diverse functions in plant growth and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics