Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

Pignatelli, Miguel; Moya, Andrés
May 2011
PLoS ONE;2011, Vol. 6 Issue 5, p1
Academic Journal
A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort.


Related Articles

  • ParPEST: a pipeline for EST data analysis based on parallel computing. D'Agostino, Nunzio; Aversano, Mario; Luisa Chiusano, Maria // BMC Bioinformatics;2005 Supplement 4, Vol. 6, pS9 

    Background: Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the...

  • Fast and accurate read alignment for resequencing. Mu, John C.; Jiang, Hui; Kiani, Amirhossein; Mohiyuddin, Marghoob; Bani Asadi, Narges; Wong, Wing H. // Bioinformatics;Sep2012, Vol. 28 Issue 18, p2366 

    Motivation: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large...

  • Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. Caboche, Ségolène; Audebert, Christophe; Lemoine, Yves; Hot, David // BMC Genomics;2014, Vol. 15 Issue 1, p1 

    Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences....

  • SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations. Hart, Steven N.; Sarangi, Vivekananda; Moore, Raymond; Baheti, Saurabh; Bhavsar, Jaysheel D.; Couch, Fergus J.; Kocher, Jean-Pierre A. // PLoS ONE;Dec2013, Vol. 8 Issue 12, p1 

    Background:Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a...

  • A SNP profiling panel for sample tracking in whole-exome sequencing studies. Pengelly, Reuben J.; Gibson, Jane; Andreoletti, Gaia; Collins, Andrew; Mattocks, Christopher J.; Ennis, Sarah // Genome Medicine;2013, Vol. 5 Issue 9, p89 

    Whole-exome sequencing provides a cost-effective means to sequence protein coding regions within the genome, which are significantly enriched for etiological variants. We describe a panel of single nucleotide polymorphisms (SNPs) to facilitate the validation of data provenance in whole-exome...

  • Comparative genomics: Comparative genomics coming of age. Furlong, Rebecca F.; Ziheng Yang // Heredity;Dec2003, Vol. 91 Issue 6, p533 

    Discusses the use of comparative analysis of genome sequences from multiple species at different evolutionary distances in identifying functional sequences. Providing insights into the forces and mechanisms of the evolutionary process of genes and genomes; Study of Thomas and colleagues which...

  • Recombination and phylogeographical analysis of Lily symptomless virus. Amit Singh; Birender Mahinghara; Vipin Hallan; Raja Ram; Aijaz Zaidi // Virus Genes;Apr2008, Vol. 36 Issue 2, p421 

    Abstract  The complete genomic nucleotide sequence of an Indian isolate of Lily symptomless virus (LSV) was determined by sequencing 11 overlapping cDNA fragments of different sizes. The genome consisted of 8,394 nucleotides, excluding the poly (A) tail and contained six open...

  • Common sense for our genomes. Brenner, Steven E. // Nature;10/18/2007, Vol. 449 Issue 7164, p783 

    The author focuses on the issue related to genomes in the U.S. He points out that a personal DNA sequence is not yet practically useful but it could be if only there are right resources available in interpreting genomes. According to him, effects of gene variations are scattered in hundreds of...

  • MEGANTE: A Web-Based System for Integrated Plant Genome Annotation. Numa, Hisataka; Itoh, Takeshi // Plant & Cell Physiology;Jan2014, Vol. 55 Issue 1, pe2 

    The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics