TITLE

Querying and Ranking XML Documents Based on Data Synopses

AUTHOR(S)
Weimin He; Teng Lv
PUB. DATE
October 2011
SOURCE
Journal of Digital Information Management;Oct2011, Vol. 9 Issue 5, p199
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
There is an increasing interest in recent years for querying and ranking XML documents. In this paper, we present a new framework for querying and ranking schema-less XML documents based on concise summaries of their structural and textual content. We introduce a novel data synopsis structure to summarize the textual content of an XML document for efficient indexing. More importantly, we extend the traditional vector space model to effectively rank XML documents over the proposed data synopses. We conduct extensive experiments over XML benchmark data to demonstrate the advantages of the indexing scheme and the effectiveness of our ranking scheme. We also compare our framework with Lucene to demonstrate our extended TF*IDF scoring function is effective.
ACCESSION #
69916344

 

Related Articles

  • A data model for algorithmic multiple criteria decision analysis. Cailloux, Olivier; Tervonen, Tommi; Verhaegen, Boris; Picalausa, François // Annals of Operations Research;Jun2014, Vol. 217 Issue 1, p77 

    Various software tools implementing multiple criteria decision analysis (MCDA) methods have appeared over the last decades. Although MCDA methods share common features, most of the implementing software have been developed independently from scratch. Majority of the tools have a proprietary...

  • An Enhanced Way of Labelling Nodes in Dynamic XML. Paramasivam, Jayanthi; Angamuthu, Tamilarasi // European Journal of Scientific Research;7/1/2011, Vol. 55 Issue 3, p348 

    In this era, XML is used as a standard in various businesses, researches, etc. It is necessary to manipulate data and evaluating the queries over the data in the XML document. Number of schemes is used for this purpose. The labelling is one such process in which the nodes of the XML documents...

  • Path Query Processing in Large-Scale XML Databases. Su-Cheng Haw; Rao, G. S. V. Radha Krishna // Journal of Applied Sciences;2007, Vol. 7 Issue 19, p2736 

    With the ever-increasing popularity of XML (Extensible Markup Language) as data representation and exchange on the Internet, querying XML data has become an important issue to be address. In Native XML Database (NXD), XML documents are usually modeled as trees and XML queries are typically...

  • INDEXING AND QUERYING CONTENT AND STRUCTURE OF XML DOCUMENTS ACCORDING TO THE VECTOR SPACE MODEL. Le Maitre, Jacques // Proceedings of the IADIS International Conference on WWW/Interne;Nov2005, p353 

    This paper presents a method to index and query content and structure of XML documents according to the vector space model. Indexing is performed in three steps: (i) choosing content elements i.e. those which refer to the semantic content of the documents, (ii) associating a vector to each...

  • INDEXING AND QUERYING CONTENT AND STRUCTURE OF XML DOCUMENTS ACCORDING TO THE VECTOR SPACE MODEL. Le Maitre, Jacques // Proceedings of the IADIS International Conference on WWW/Interne;Jan2005, p353 

    This paper presents a method to index and query content and structure of XML documents according to the vector space model. Indexing is performed in three steps: (i) choosing content elements i.e. those which refer to the semantic content of the documents, (ii) associating a vector to each...

  • Temporal XML: modeling, indexing, and query processing. Flavio Rizzolo; Alejandro Vaisman // VLDB Journal International Journal on Very Large Data Bases;Aug2008, Vol. 17 Issue 5, p1179 

    Abstract  In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by...

  • A Data Model and an XQuery Extension for Concurrent XML Structures. Bruno, Emmanuel; Murisasco, Elisabeth // Informatica (03505596);Jun2011, Vol. 35 Issue 2, p141 

    An XML document is mainly hierarhical, but some applications need to simultaneously associate more than one hierarchy to the same data. In general, concurrent hierarchies cannot be merged in order to get a well-formed XML document. This work stands in this context: it aims at describing and...

  • ON THE EFFICIENT IMPLEMENTATION OF CONTEXT KEYWORD-BASED QUERYING FOR XML DATA. Krátký, Michal; Bača, Radim // Proceedings of the IADIS International Conference on WWW/Interne;Jan2007, p187 

    The mark-up language XML (eXtensible Markup Language) has recently been embraced as a new approach to data modeling. Nowadays, more and more information are formated as semi-structured data, e.g. articles in a digital library, documents on the web and so on. Since there are data storages...

  • TwigX-Guide: An Efficient Twig Pattern Matching System Extending DataGuide Indexing and Region Encoding Labeling. SU-CHENG HAW; CHIEN-SING LEE // Journal of Information Science & Engineering;Mar2009, Vol. 25 Issue 2, p603 

    With the rapid emergence of XML as an enabler for data exchange and data transfer over the Web, querying XML data has become a major concern. In this paper, we present a hybrid system, TwigX-Guide; an extension of the well-known DataGuide index and region encoding labeling to support twig query...

  • MEXIR: An Implementation of High Performance and High Precision on XML Retrieval. Wichaiwong, Tanakorn; Jaruskulchai, Chuleerat // Computer Technology & Application;2011, Vol. 2 Issue 3, p301 

    Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retrieval) systems are able to retrieve only the relevant portions of documents....

Share

Read the Article

Courtesy of VIRGINIA BEACH PUBLIC LIBRARY AND SYSTEM

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics