A Methodology for Noun Phrase-Based Automatic Indexing

Souza, Renato Rocha; Raghavan, K. S.
January 2006
Knowledge Organization;2006, Vol. 33 Issue 1, p45
Academic Journal
The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.


Related Articles

  • information management.  // Bloomsbury Business Library - Business & Management Dictionary;2007, p3889 

    A definition of the term information management is presented. It refers to the acquisition, recording, storage, dissemination, and retrieval of information.

  • The Use and Construction of Thesauri for Legal Documentation. Broughton, Vanda // Legal Information Management;Spring2010, Vol. 10 Issue 1, p35 

    Vanda Broughton describes the methodology of constructing a thesaurus from a faceted classification for law (Bliss Bibliographic Classification 2nd ed. Class S). The structure of the classification is described, and the way in which the thesaural relationships are derived from this is...

  • Automatic Cloud Testing System for Computing Services on OpenStack. Songwen Pei; Xiaodong Wu // International Journal of Advancements in Computing Technology;Mar2013, Vol. 5 Issue 5, p1163 

    The software and services built on cloud computing infrastructure are growing more and more, that are in substance distributed computing. Therefore, an automatic cloud testing system with distributed executions called AutoCloudTesting is proposed, which is mainly deployed on OpenStack cloud...

  • Automatic Categorization: How It Works, Related Issues, and Impacts on Records Management. Lubbes, R. Kirk // Information Management Journal;Oct2001, Vol. 35 Issue 4, p38 

    Focuses on automatic categorization, the automatic assigning of an object to a pre-existing subject heading in a file plan or assigning it to a given class within the taxonomy. Definition of terms related to automatic categorization; Strengths and potential limitations of automatic...

  • Relações conceituais como ponto de inflexão entre linguagens documentais, terminologia e ontologias. MOREIRA, Walter // Scire;jul-dic2012, Vol. 18 Issue 2, p123 

    A reflection is offered on the complementary relations among documentary languages, terminology, and ontologies, in regard to the study of conceptual relations used to organize knowledge. Building on selected texts from the international literature, the relations between documentary languages...

  • Terminology and the construction of ontology. Gillam, Lee; Tariq, Mariam; Ahmad, Khurshid // Terminology;2005, Vol. 11 Issue 1, p55 

    This paper discusses a method for corpus-driven ontology design: extracting conceptual hierarchies from arbitrary domain-specific collections of texts. These hierarchies can form the basis for a concept-oriented (onomasiological) terminology collection, and hence may be used as the basis for...

  • Noun phrases in interactive query expansion and document ranking. Vechtomova, Olga // Information Retrieval Journal;Sep2006, Vol. 9 Issue 4, p399 

    The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase-based document ranking method. A combined syntactico-statistical method was used for the selection of phrases for query expansion. Several...

  • SIGNIFICANCE OF HTML TAGS FOR DOCUMENT INDEXING AND RETRIEVAL. Hyusein, Byurhan; Patel, Ahmed // Proceedings of the IADIS International Conference on WWW/Interne;Jan2003, p817 

    Indexing quality has an overwhelming effect on retrieval effectiveness of search engines. In the past few years it has become one of the major challenges in the search engines area, particularly the task of automatically assigning highquality terms to Web documents, which remains elusive. High...

  • Example-based text categorization (EBTC): the key to automatic indexing and classification? Xue Chunxiang; Hou Hanqing // Indexer;Sep2009, Vol. 27 Issue 3, p117 

    The goal of text categorization is the automatic classification of documents into predefined categories. In this article Xue and Hou discuss the traditional, probability-theory-based method, using algorithms such as K-nearest neighbor (KNN), naïve Bayes, and support vector machine (SVM) and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics