Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights

Kumar, Amitha Sampath; Sowpati, Divya Tej; Mishra, Rakesh K.
November 2016
PLoS ONE;11/28/2016, Vol. 11 Issue 11, p1
Academic Journal
Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features.


Related Articles

  • Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution. Persi, Erez; Horn, David // PLoS Computational Biology;Nov2013, Vol. 9 Issue 11, p1 

    We present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic...

  • Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools. Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L. // Comparative & Functional Genomics;2007, Special section p1 

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A "repeat" corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and...

  • PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition. Lancaster, Alex K.; Nutter-Upham, Andrew; Lindquist, Susan; King, Oliver D. // Bioinformatics;Sep2014, Vol. 30 Issue 17, p2501 

    Summary: Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid...

  • An algorithm to find all palindromic sequences in proteins. Prasanth, N; Vaishnavi, M; Sekar, K // Journal of Biosciences;Mar2013, Vol. 38 Issue 1, p173 

    A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has...

  • Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment. Yu, Z. G.; Zhou, L. Q.; Anh, V. V.; Chu, K. H.; Long, S. C.; Deng, J. Q. // Journal of Molecular Evolution;Apr2005, Vol. 60 Issue 4, p538 

    The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of...

  • STaRRRT: a table of short tandem repeats in regulatory regions of the human genome. Bolton, Katherine A.; Ross, Jason P.; Grice, Desma M.; Bowden, Nikola A.; Holliday, Elizabeth G.; Avery-Kiejda, Kelly A.; Scott, Rodney J. // BMC Genomics;2013, Vol. 14 Issue 1, p1 

    Background Tandem repeats (TRs) are unstable regions commonly found within genomes that have consequences for evolution and disease. In humans, polymorphic TRs are known to cause neurodegenerative and neuromuscular disorders as well as being associated with complex diseases such as diabetes and...

  • In search of the boundary between repetitive and non-repetitive protein sequences. Richard, François D.; Kajava, Andrey V. // Biochemical Society Transactions;Oct2015, Vol. 43 Issue 5, p807 

    Tandem repeats (TRs) are frequently not perfect, containing a number of mutations accumulated during evolution. One of the main problems is to distinguish between the sequences that contain highly imperfect TRs and the aperiodic sequences. The majority of proteins with TRs in sequences have...

  • A prevalent POLG CAG microsatellite length allele in humans and African great apes POLG MICROSATELLITE IN HUMANS AND APES. Rovio, Anja T.; Abel, Josef; Ahola, Arja L.; Andres, Aida M.; Bertranpetit, Jaume; Blancher, Antoine; Bontrop, Ronald E.; Chemnick, Leona G.; Cooke, Howard J.; Cummins, James M.; Davis, Heidi A.; Elliott, David J.; Fritsche, Ellen; Hargreave, Timothy B.; Hoffman, Susan M. G.; Jequier, Anne M.; Kao, Shu-Huei; Kim, Heui-Soo; Marchington, David R.; Mehmet, Denise // Mammalian Genome;Jun2004, Vol. 15 Issue 6, p492 

    The human nuclear gene for the catalytic subunit of mitochondrial DNA polymerase γ (POLG) contains within its coding region a CAG microsatellite encoding a polyglutamine repeat. Previous studies demonstrated an association between length variation at this repeat and male infertility,...

  • Structure and evolution of the plant cation diffusion facilitator family of ion transporters. Gustin, Jeffery L.; Zanis, Michael J.; Salt, David E. // BMC Evolutionary Biology;2011, Vol. 11 Issue 1, p76 

    Background: Members of the cation diffusion facilitator (CDF) family are integral membrane divalent cation transporters that transport metal ions out of the cytoplasm either into the extracellular space or into internal compartments such as the vacuole. The spectrum of cations known to be...

  • Whole genome phylogenies for multiple Drosophila species. Seetharam, Arun; Stuart, Gary W. // BMC Research Notes;2012, Vol. 5 Issue 1, p670 

    Background: Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don't use explicit sequence...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics