Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature

Heinrich, Kevin E.; Berry, Michael W.; Homayouni, Ramin
January 2008
Computational Intelligence & Neuroscience;2008 Supplement, Special section p1
Academic Journal
Identifying functional groups of genes is a challenging problem for biological applications. Text mining approaches can be used to build hierarchical clusters or trees from the information in the biological literature. In particular, the nonnegative matrix factorization (NMF) is examined as one approach to label hierarchical trees. A generic labeling algorithm as well as an evaluation technique is proposed, and the effects of different NMF parameters with regard to convergence and labeling accuracy are discussed. The primary goals of this study are to provide a qualitative assessment of the NMF and its various parameters and initialization, to provide an automated way to classify biomedical data, and to provide a method for evaluating labeled data assuming a static input tree. As a byproduct, a method for generating gold standard trees is proposed.


Related Articles

  • An alternating direction algorithm for matrix completion with nonnegative factors. Xu, Yangyang; Yin, Wotao; Wen, Zaiwen; Zhang, Yin // Frontiers of Mathematics in China;Apr2012, Vol. 7 Issue 2, p365 

    This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matrix M whose elements are partially known (to a certain accuracy). This problem...

  • Integrating Various Resources for Gene Name Normalization. Yuncui Hu; Yanpeng Li; Hongfei Lin; Zhihao Yang; Liangxi Cheng; Flower, Darren R. // PLoS ONE;Sep2012, Vol. 7 Issue 9, Special section p1 

    The recognition and normalization of gene mentions in biomedical literature are crucial steps in biomedical text mining. We present a system for extracting gene names from biomedical literature and normalizing them to gene identifiers in databases. The system consists of four major components:...

  • Pattern Expression Nonnegative Matrix Factorization: Algorithm and Applications to Blind Source Separation. Junying Zhang; Le Wei; Xuerong Feng; Zhen Ma; Yue Wang // Computational Intelligence & Neuroscience;2008 Supplement, Special section p1 

    Independent component analysis (ICA) is a widely applicable and effective approach in blind source separation (BSS), with limitations that sources are statistically independent. However, more common situation is blind source separation for nonnegative linear model (NNLM) where the observations...

  • Locating the Wood Defects with Damped Newton Approaches based Non-negative Matrix Factorization. Yafeng Zheng; Fang He; Zhao Zhang // Proceedings of the International Symposium on Information System;2009, p123 

    Non-negative matrix factorization (NMF) is an unsupervised method whose aim is to find an approximate factorization Vn*m=Wn*r*Hr*m into non-negative matrices Wn*r and Hr*m. This paper presents an extension to NMF and discusses the development and the use of damped Newton based the non-negative...

  • Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization. Guan, Naiyang; Wei, Lei; Luo, Zhigang; Tao, Dacheng // PLoS ONE;Oct2013, Vol. 8 Issue 10, p1 

    Graph regularized nonnegative matrix factorization (GNMF) decomposes a nonnegative data matrix to the product of two lower-rank nonnegative factor matrices, i.e., and () and aims to preserve the local geometric structure of the dataset by minimizing squared Euclidean distance or Kullback-Leibler...

  • Neighborhood Preserving Convex Nonnegative Matrix Factorization. Jiang Wei; Li Min; Zhang Yongqing // Mathematical Problems in Engineering;2014, p1 

    The convex nonnegative matrix factorization (CNMF) is a variation of nonnegative matrix factorization (NMF) in which each cluster is expressed by a linear combination of the data points and each data point is represented by a linear combination of the cluster centers. When there exists...

  • Interactive Text Mining with Pipeline Pilot: A Bibliographic Web-Based Tool for PubMed. Vellay, S. G. P.; Latimer, N. E. Miller; Paillard, G. // Infectious Disorders - Drug Targets;Jun2009, Vol. 9 Issue 3, p366 

    Text mining has become an integral part of all research in the medical field. Many text analysis software platforms support particular use cases and only those. We show an example of a bibliographic tool that can be used to support virtually any use case in an agile manner. Here we focus on a...

  • Functional profiling of microarray experiments using text-mining derived bioentities. Pablo Minguez; Fátima Al-Shahrour; David Montaner; Joaquín Dopazo // Bioinformatics;Nov2007, Vol. 23 Issue 22, p3098 

    Motivation: The increasing use of microarray technologies brought about a parallel demand in methods for the functional interpretation of the results. Beyond the conventional functional annotations for genes, such as gene ontology, pathways, etc. other sources of information are still to be...

  • GeneE: Gene and protein query expansion with disambiguation. Schuemie, Martijn J.; Ning Kang; Hekkelman, Maarten L.; Kors, Jan A. // Bioinformatics;Jan2010, Vol. 26 Issue 1, p147 

    Summary: When referring to genes, authors often use synonyms instead of the official gene symbols. In order to accurately retrieve as many relevant documents as possible, we have developed GeneE, a web application that expands a gene query to include all known synonyms, and adds disambiguation...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics