AllerTOP v.2-a server for in silico prediction of allergens

Dimitrov, Ivan; Bangov, Ivan; Flower, Darren; Doytchinova, Irini
June 2014
Journal of Molecular Modeling;Jun2014, Vol. 20 Issue 6, p1
Academic Journal
Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance -typically proteins-resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours ( kNN). The best performing method was kNN with 85.3 % accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (). [Figure not available: see fulltext.]


Related Articles

  • Sensitisation to common allergens and respiratory symptoms in endotoxin exposed workers: a pooled analysis. Basinas, Ioannis; Schlünssen, Vivi; Heederik, Dick; Sigsgaard, Torben; Smit, Lidwien A.M.; Samadi, Sadegh; Omland, Øyvind; Hjort, Charlotte; Madsen, Anne Mette; Skov, Simon; Wouters, Inge M . // Occupational & Environmental Medicine;Feb2012, Vol. 69 Issue 2, p99 

    Objective To test the hypotheses that current endotoxin exposure is inversely associated with allergic sensitisation and positively associated with non-allergic respiratory diseases in four occupationally exposed populations using a standardised analytical approach. Methods Data were pooled from...

  • Asymptomatic sensitisation to grapes in a sample of workers in wine industry. Kalogemmilros, D.; Rigopoulos, D.; Gregoriou, S.; Mousatou, V.; Lyris, N.; Papaioannou, D.; Katsarou-Katsari, A. // Occupational & Environmental Medicine;Aug2004, Vol. 61 Issue 8, p709 

    Aims: To assess the prevalence of sensitisation to grapes (Vitis vinifera var. agiorghitiko) in a population with repeated exposure to grape allergens through direct cutaneous contact as well as through the gastrointestinal tract. Methods: One hundred and twenty subjects were enrolled in each of...

  • Predicting citation count of Bioinformatics papers within four years of publication. Ibáñez, Alfonso; Larrañaga, Pedro; Bielza, Concha // Bioinformatics;Dec2009, Vol. 25 Issue 24, p3303 

    Motivation: Nowadays, publishers of scientific journals face the tough task of selecting high-quality articles that will attract as many readers as possible from a pool of articles. This is due to the growth of scientific output and literature. The possibility of a journal having a tool capable...

  • An integrative variant analysis suite for whole exome next-generation sequencing data.  // BMC Bioinformatics;2012, Vol. 13 Issue 1, p8 

    The article focuses on exome capture sequencing, which allows researchers to cost-effectively sequence the coding regions of the genome. Information on the use of statistical models trained on validated whole-exome capture sequencing data. Implementation of Atlas2 Suite is discussed and it is...

  • GWAS with longitudinal phenotypes: performance of approximate procedures. Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel // European Journal of Human Genetics;Oct2015, Vol. 23 Issue 10, p1384 

    Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear...

  • IgE antibodies and urinary trimethylarsine oxide accounted for 1-7 % population attributable risks for eczema in adults: USA NHANES 2005-2006. Shiue, Ivy // Environmental Science & Pollution Research;Dec2015, Vol. 22 Issue 23, p18404 

    Population attributable risks from serum IgE and dust miteallergen concentrations and environmental chemicals for eczema are unclear. Therefore, it was aimed to examine serum IgE and allergen concentrations and environmental chemicals for eczema in adults and to calculate population attributable...

  • CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein. Hiroko Terui; Kiwamu Akagi; Hiroshi Kawame; Kei Yura // Journal of Biomedical Science;2013, Vol. 20 Issue 1, p1 

    Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst...

  • Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications. Kusonmano, Kanthida; Netzer, Michael; Pfeifer, Bernhard; Baumgartner, Christian; Liedl, Klaus R.; Graber, Armin // World Academy of Science, Engineering & Technology;Oct2009, Issue 34, p966 

    Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between...

  • Learning by aggregating experts and filtering novices: a solution to crowdsourcing problems in bioinformatics. Zhang, Ping; Cao, Weidan; Obradovic, Zoran // BMC Bioinformatics;2013, Vol. 14 Issue Suppl 12, p1 

    Background: In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. Results: We proposed a probabilistic...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics