Image Correlation Method for DNA Sequence Alignment

Saldías, Millaray Curilem; Sassarini, Felipe Villarroel; Poblete, Carlos Muñoz; Vásquez, Asticio Vargas; Butler, Iván Maureira
June 2012
PLoS ONE;Jun2012, Vol. 7 Issue 6, p1
Academic Journal
The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 × 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment


Related Articles

  • annot8r: GO, EC and KEGG annotation of EST datasets. Schmid, Ralf; Blaxter, Mark L. // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that...

  • Differential direct coding: a compression algorithm for nucleotide sequence data. Vey, Gregory // Database: The Journal of Biological Databases & Curation;Jan2009, Vol. 2009, p1 

    While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even...

  • FAAST: Flow-space Assisted Alignment Search Tool.  // BMC Bioinformatics;2011 Supplement 6, Vol. 12 Issue Suppl 6, p293 

    The article offers information on the study conducted by the authors related to flow-space assisted alignment search tool (FAAST). It states that high throughput pyrosequencing is the major sequencing platform for producing long read high throughput data. It mentions that a novel algorithm for...

  • Indexing Degenerate Strings. Vorácˇek, Michal; Vagner, Ladislav; Flouri, Tomásˇ // AIP Conference Proceedings;12/26/2007, Vol. 963 Issue 2, p1400 

    In this paper, we give the first, to our knowledge, structure and corresponding algorithm for indexing of factors of DNA and RNA sequences, where the text is degenerate i.e. contain sets of characters. The presented structure indexes so called k-factors, the factors of the degenerate text whose...

  • A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Zia Khan; Joshua S. Bloom; Leonid Kruglyak; Mona Singh // Bioinformatics;Jul2009, Vol. 25 Issue 13, p1609 

    Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis. Algorithms for computing maximal exact matches (MEMs) between sequences appear in two contexts where high-throughput sequencing will vastly increase the volume of...

  • DNA Sequencing--Tabu and Scatter Search Combined. Bliazewicz, Jacek; Clover, Fred; Kasprzak, Marta // INFORMS Journal on Computing;Summer2004, Vol. 16 Issue 3, p232 

    In this paper, a tabu-search algorithm enhanced by scatter search is presented. The algorithm solves the DNA sequencing problem with negative and positive errors, yielding outcomes of high quality We compare the new method with two other metaheuristic approaches: a previous tabu-search method...

  • MLTreeMap - accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. Stark, Manuel; Berger, Simon A.; Stamatakis, Alexandros; von Mering, Christian // BMC Genomics;2010, Vol. 11, p461 

    Background: Shotgun sequencing of environmental DNA is an essential technique for characterizing uncultivated microbes in situ. However, the taxonomic and functional assignment of the obtained sequence fragments remains a pressing problem. Results: Existing algorithms are largely optimized for...

  • Parallelization of the MAFFT multiple sequence alignment program. Katoh, Kazutaka; Toh, Hiroyuki // Bioinformatics;Aug2010, Vol. 26 Issue 15, p1899 

    Summary: Multiple sequence alignment (MSA) is an important step in comparative sequence analyses. Parallelization is a key technique for reducing the time required for large-scale sequence analyses. The three calculation stages, all-to-all comparison, progressive alignment and iterative...

  • IMPROVED ALIGNMENT OF HOMOLOGOUS DNA SEQUENCES. STOJANOV, Done; MARTINOVSKA, Cveta // Annals of West University of Timisoara: Series of Biology;2013, Vol. 16 Issue 2, p97 

    A new aligning approach for homologous DNA sequences is presented, being faster than the standard dynamic programming based implementations. Searching for exact and non-crossing hits as fast as possible, tracking hits' positions in a data vector, being dynamically sorted in ascending order,...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics