TITLE

Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods

AUTHOR(S)
Yonezawa, Kouki; Igarashi, Manabu; Ueno, Keisuke; Takada, Ayato; Ito, Kimihito
PUB. DATE
February 2013
SOURCE
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm–called the closest-neighbor trimming method–that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and -medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.
ACCESSION #
87625601

 

Related Articles

  • A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking. Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng // PLoS ONE;Mar2014, Vol. 9 Issue 3, p1 

    The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we...

  • MFCompress: a compression tool for FASTA and multi-FASTA data. Pinho, Armando J.; Pratas, Diogo // Bioinformatics;Jan2014, Vol. 30 Issue 1, p117 

    Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as...

  • IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing. Jiang Du; Jing Leng; Habegger, Lukas; Sboner, Andrea; McDermott, Drew; Gerstein, Mark // PLoS ONE;Jan2012, Vol. 7 Issue 1, p1 

    With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have...

  • IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing. Jiang Du; Jing Leng; Habegger, Lukas; Sboner, Andrea; McDermott, Drew; Gerstein, Mark // PLoS ONE;Jan2012, Vol. 7 Issue 1, p1 

    With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have...

  • De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads. Nagarajan, Harish; Butler, Jessica E.; Klimes, Anna; Yu Qiu; Zengler, Karsten; Ward, Joy; Young, Nelson D.; Methe', Barbara A.; Palsson, Bernhard Ø.; Lovley, Derek R.; Barrett, Christian L. // PLoS ONE;2010, Vol. 5 Issue 6, p1 

    State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology,...

  • A Modified Mountain Clustering Algorithm based on Hill Valley Function. Junnian Wang; Dunshun Liu; Chao Liu // Journal of Networks;Jun2011, Vol. 6 Issue 6, p916 

    A modified mountain clustering algorithm based on the hill valley function is proposed. Firstly, the mountain function is constructed on the data space, with estimating the parameter by a correlation self-comparison method, and database's mountain function values are computed. Secondly, the hill...

  • Using Affinity Propagation Combined Post-Processing to Cluster Protein Sequences. Fan Yang; QingXin Zhu; DongMing Tang; MingYuan Zhao // Protein & Peptide Letters;Jun2010, Vol. 17 Issue 6, p681 

    The sizes of the protein databases are growing rapidly nowadays thus clustering protein sequences based only on sequence information becomes increasingly important. In this paper, we analyze the limitation of Affinity propagation (AP) algorithm when clustering a dataset generated randomly. Then...

  • Identification of 24 Species of Calyptratae Entering Ningbo Port Using DNA Barcoding Technique. Wei WU; Defeng XIA; Wei ZHENG // Agricultural Science & Technology;Feb2015, Vol. 16 Issue 2, p235 

    [Objective] A study on the classification of 24 species of Calyptratae entering Ningbo port using DNA barcoding technique was carried out. [Method] The CO I genes of the 24 species of Calyptratae were first sequenced. Based on the comparison and analysis of the obtained sequences, the...

  • 2-Jump DNA Search Multiple Pattern Matching Algorithm. Bhukya, Raju; Somayajulu, D. V. L. N. // International Journal of Computer Science Issues (IJCSI);May2011, Vol. 8 Issue 3, p320 

    Pattern matching in a DNA sequence or searching a pattern from a large data base is a major research area in computational biology. To extract pattern match from a large sequence it takes more time, in order to reduce searching time we have proposed an approach that reduces the search time with...

Share

Read the Article

Courtesy of VIRGINIA BEACH PUBLIC LIBRARY AND SYSTEM

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics