Automatic selection of reference taxa for protein–protein interaction prediction with phylogenetic profiling

Simonsen, Martin; Maetschke, Stefan R.; Ragan, Mark A.
March 2012
Bioinformatics;Mar2012, Vol. 28 Issue 6, p851
Academic Journal
Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein–protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly available, identifying the most-informative RT is becoming increasingly difficult. Previous studies on the selection of RT have provided guidelines for manual taxon selection, and for eliminating closely related taxa. However, no general strategy for automatic selection of RT is currently available.Results: We present three novel methods for automating the selection of RT, using machine learning based on known protein–protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting phylogenetic profiles often require very different RT sets to support high prediction accuracy.Availability: The datasets and software used in the experiments can be found at http://users-birc.au.dk/zxr/phyloprof/Contact: zxr@birc.au.dk; somme89@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.


Related Articles

  • Non-adaptive origins of interactome complexity. Fernández, Ariel; Lynch, Michael // Nature;6/23/2011, Vol. 474 Issue 7352, p502 

    The boundaries between prokaryotes, unicellular eukaryotes and multicellular eukaryotes are accompanied by orders-of-magnitude reductions in effective population size, with concurrent amplifications of the effects of random genetic drift and mutation. The resultant decline in the efficiency of...

  • Literature mining of host–pathogen interactions: comparing feature-based supervised learning and language-based approaches. Thieu, Thanh; Joshi, Sneha; Warren, Samantha; Korkin, Dmitry // Bioinformatics;Mar2012, Vol. 28 Issue 6, p867 

    Motivation: In an infectious disease, the pathogen's strategy to enter the host organism and breach its immune defenses often involves interactions between the host and pathogen proteins. Currently, the experimental data on host–pathogen interactions (HPIs) are scattered across multiple...

  • Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory. Chien-Hung Huang; Mu-Hsin Chang, Peter; Chia-Wei Hsu; Huang, Chi-Ying F.; Ka-Lok Ng // BMC Bioinformatics;1/11/2016, Vol. 17, p13 

    Background: Non-small cell lung cancer (NSCLC) is one of the leading causes of death globally, and research into NSCLC has been accumulating steadily over several years. Drug repositioning is the current trend in the pharmaceutical industry for identifying potential new uses for existing drugs...

  • Machine Learning for Automatic Prediction of the Quality of Electrophysiological Recordings. Nowotny, Thomas; Rospars, Jean-Pierre; Martinez, Dominique; Elbanna, Shereen; Anton, Sylvia // PLoS ONE;Dec2013, Vol. 8 Issue 12, p1 

    The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing...

  • CINPER: An Interactive Web System for Pathway Prediction for Prokaryotes. Xizeng Mao; Xin Chen; Yu Zhang; Pangle, Spencer; Ying Xu // PLoS ONE;Dec2012, Vol. 7 Issue 12, p1 

    We present a web-based network-construction system, CINPER (CSBL INteractive Pathway BuildER), to assist a user to build a user-specified gene network for a prokaryotic organism in an intuitive manner. CINPER builds a network model based on different types of information provided by the user and...

  • On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction. Becker, Julien; Maes, Francis; Wehenkel, Louis // PLoS ONE;Feb2013, Vol. 8 Issue 2, p1 

    Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the...

  • Phylogenetic Profiling: How Much Input Data Is Enough? Škunca, Nives; Dessimoz, Christophe // PLoS ONE;Feb2015, Vol. 10 Issue 2, p1 

    Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the...

  • Prediction of lateral spread displacement: data-driven approaches. Liu, Zheng; Tesfamariam, Solomon // Bulletin of Earthquake Engineering;Oct2012, Vol. 10 Issue 5, p1431 

    Site seismic hazard (SSH) is an integral component of seismic risk assessment of engineered structures. The SSH encompasses the effect of ground shaking, landslide, and liquefaction. Discernment of liquefaction and lateral spreading vulnerability is a complex and nonlinear procedure that is...

  • Comparative Study on Kinds of Feature Subset Selection for Inconsistent Large-scale Data. Dongsong Zheng; Changsheng Zhang // International Journal of Advancements in Computing Technology;May2013, Vol. 5 Issue 9, p482 

    Feature subset selection is one of the most important topics in rough sets. Many heuristic feature subset selection approaches for large-scale data have been proposed, which is required to provide their consistent classification. Due to kinds of factors such as: noise in data, compact...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics