Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini

Wang, Yu; Guo, Yanzhi; Pu, Xuemei; Li, Menglong
November 2017
Journal of Computer-Aided Molecular Design;Nov2017, Vol. 31 Issue 11, p1029
Academic Journal
Various bacterial pathogens can deliver their secreted substrates also called as effectors through type IV secretion systems (T4SSs) into host cells and cause diseases. Since T4SS secreted effectors (T4SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T4SSs. A few computational methods using machine learning algorithms for T4SEs prediction have been developed by using features of C-terminal residues. However, recent studies have shown that targeting information can also be encoded in the N-terminal region of at least some T4SEs. In this study, we present an effective method for T4SEs prediction by novelly integrating both N-terminal and C-terminal sequence information. First, we collected a comprehensive dataset across multiple bacterial species of known T4SEs and non-T4SEs from literatures. Then, three types of distinctive features, namely amino acid composition, composition, transition and distribution and position-specific scoring matrices were calculated for 50 N-terminal and 100 C-terminal residues. After that, we employed information gain represent to rank the importance score of the 150 different position residues for T4SE secretion signaling. At last, 125 distinctive position residues were singled out for the prediction model to classify T4SEs and non-T4SEs. The support vector machine model yields a high receiver operating curve of 0.916 in the fivefold cross-validation and an accuracy of 85.29% for the independent test set.


Related Articles

  • Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins. Ashkenazi, Shaul; Snir, Rotem; Ofran, Yanay // Bioinformatics;Dec2012, Vol. 28 Issue 24, p3203 

    Motivation: Assessing the false positive rate of function prediction methods is difficult, as it is hard to establish that a protein does not have a certain function. To determine to what extent proteins with similar sequences have a common function, we focused on photosynthesis-related...

  • Cloning and sequence analysis of the heat-stable acrylamidase from a newly isolated thermophilic bacterium, Geobacillus thermoglucosidasius AUT-01. Cha, Minseok; Chambliss, Glenn // Biodegradation;Feb2013, Vol. 24 Issue 1, p57 

    A thermophilic bacterium capable of degrading acrylamide, AUT-01, was isolated from soil collected from a hot spring area in Montana, USA. The thermophilic strain grew with 0.2 % glucose as the sole carbon source and 1.4 mM acrylamide as the sole nitrogen source. The isolate AUT-01 was...

  • Cloning and Sequence Analysis of Three Variants of the Gene Encoding Alkaline Xylanase C from the Alkaliphilic Bacillus sp. (NCL 87-6-10). Sharma, Poonam; Rele, Meenakshi; Kumar, Lalitha // Biochemical Genetics;Oct2013, Vol. 51 Issue 9/10, p737 

    Alkaline xylanase C from the alkaliphilic Bacillus sp. (NCL 87-6-10) has a low molecular weight and alkaline pI and is cellulase-free, properties compatible with its use in the prebleaching of pulp. We report here the cloning and sequence analysis of three variants of the gene encoding xylanase...

  • Discovering rules for protein—ligand specificity using support vector inductive logic programming. Kelley, Lawrence A.; Shrimpton, Paul J.; Muggleton, Stephen H.; Sternberg, Michael J. E. // PEDS: Protein Engineering, Design & Selection;Sep2009, Vol. 22 Issue 9, p561 

    Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling...

  • Prediction of catalytic residues based on an overlapping amino acid classification. Yongchao Dou; Xiaoqi Zheng; Jialiang Yang; Jun Wang // Amino Acids;Nov2010, Vol. 39 Issue 5, p1353 

    Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping...

  • Identification of Novel Type III Effectors Using Latent Dirichlet Allocation. Yang, Yang // Computational & Mathematical Methods in Medicine;Jan2012, p1 

    Among the six secretion systems identified in Gram-negative bacteria, the type III secretion system (T3SS) plays important roles in the disease development of pathogens. T3SS has attracted a great deal of research interests. However, the secretion mechanism has not been fully understood yet....

  • Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants. Folkman, Lukas; Stantic, Bela; Sattar, Abdul // BMC Bioinformatics;2013, Vol. 14 Issue Suppl 2, p1 

    Background: Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a...

  • Support Vector Machines for predicting protein structural class. Yu-Dong Cai; Xiao-Jun Liu; Xue-biao Xu; Guo-Ping Zhou // BMC Bioinformatics;2001, Vol. 2, p3 

    Background: We apply a new machine learning method, the so-called Support Vector Machine method, to predict the protein structural class. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified based on known structures and the...

  • Mining Dense Patterns from Off Diagonal Protein Contact Maps. Swaroopa, M. Om; Vani, K. Suvarna // International Journal of Computer Applications;7/1/2012, Vol. 50, p36 

    The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Protein contact maps are 2D representations of contacts among the amino acid residues in the folded protein structure. Proteins are biochemical compounds consisting of one or...

  • ProFET: Feature engineering captures high-level protein functions. Ofer, Dan; Linial, Michal // Bioinformatics;11/1/2015, Vol. 31 Issue 21, p3429 

    Motivation: The amount of sequenced genomes and proteins is growing at an unprecedented pace. Unfortunately, manual curation and functional knowledge lag behind. Homologous inference often fails at labeling proteins with diverse functions and broad classes. Thus, identifying high-level protein...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics