The importance of having data-sets

Dekker, Ronald
January 2006
IATUL Annual Conference Proceedings;2006, Vol. 16, p89
Conference Proceeding
Much scientific research is based on the gathering and analysis of measurement data. Scientific data-sets are, at least, intermediate results in many scientific research projects. For some time data-sets weren't even published and even if they were published it was mostly as a (not re-usable) by-product of the publication. But an interesting phenomenon might be observed here: data-sets (often in combination with models and parameters) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. Publishing and preserving data-sets should therefore seriously be considered. This will especially be the case if the data cannot be reproduced (as they result from unique events) and will be necessary in the future for longitudinal research or to test or check future insights. The rationale behind the importance of data-sets can be summarized as follows: • Verification of publications (results= analysis + data) • Longitudinal research (long periods, meta-research) • Interdisciplinary use of data (reuse/innovation) • Valorisation (get new projects based on data-set ownership) The increasing importance of data-sets may lead to their emancipation into an essential component of an institutions scientific infrastructure. Networks of institutional repositories might form a basic building block for such an infrastructure. In this paper three essential processes with respect to the importance for datasets will be explained: - capturing data: collaborate with researchers during the data gathering and analysis phase of their research, - publish data: facilitate the publication of research data on it's own or related to a publication, - preserving data: take care of the long term preservation of digital datasets. In this paper the case of the DARELUX (Data Archiving River Environment LUXembourg) project will be used to provide a relation between theory and practice. In the DARELUX project the preservation of hydrology data-sets using XML containers is being investigated and implemented. The OAI-PMH protocol for meta-data harvesting is being used to provide resource discovery. An archive like the DARELUX repository can only exist with its users on top of mind. We adopted the CCSDS term ‘designated community’ to explore how we can meet our users' needs. From this notion stems our conviction that an archive like the DARELUX repository should be an integral part of the research community.


Related Articles

  • Evaluating a Format for Viable, Long-Term Dynamic Data Archiving. Phillips, Allyn W.; Allemang, Randall J. // Sound & Vibration;Jun2012, Vol. 46 Issue 6, p14 

    This article reviews the various (and sometimes conflicting) issues involved in the development of a new flexible, open-definition file format. To help facilitate early evaluation of format function and performance, an intermediate pseudo-prototypical hybrid UFF-DSA (dynamic signal archive)...

  • Data integration with the Climate Science Modelling Language. Woolf, A.; Lawrence, B.; Lowry, R.; Van Dam, K. Kleese; Cramer, R.; Gutierrez, M.; Kondapalli, S.; Latham, S.; Lowe, D.; O'Neill, K.; Stephens, A. // Advances in Geosciences;2006, Vol. 8, p83 

    The Climate Science Modelling Language (CSML) has been developed by the NERC DataGrid (NDG) project as a standards-based data model and XML markup for describing and constructing climate science datasets. It uses conceptual models from emerging standards in GIS to define a number of feature...

  • Process Models and the Development of TRUSTWORTHY DIGITAL REPOSITORIES. Dale, Robin L.; Gore, Emily B. // Information Standards Quarterly;Spring2010, Vol. 22 Issue 2, p14 

    The article discusses the process models and the development of trustworthy digital repositories. It gives an overview of process models for preservation including Open Archival Information System (OAIS), InterPARES, and the DCC Curation Lifecycle Model, and the relationship of the process...

  • Selection Process for a Digital Theatre Archive: OnStage at IPFW. Buhr, Denise // Indiana Libraries;2008, Vol. 27 Issue 3, p67 

    The article discusses the process on selecting for a digital theatre archive. It notes from the Indiana University- Purdue University Fort Wayne (IPFW) digital project that began working on using mDON, mastodon DIGITAL OBJECT NETWORK, a digital archives that provides worldwide access and...

  • UK Government commits to finding a solution to preserving its digital information.  // IM@T.Online;Jun2007, p1 

    The article reports on the plan of the British government to address issues concerning the need for digital preservation in Great Britain. The government's plan of digital preservation of all government documents was in response to the rapid pace of technological change. It was reported that the...

  • A Materialized Approach to the Integration of XML Documents: the OSIX System. Ahmad, H.; Kermanshahani, S.; Simonet, A.; Simonet, M. // Proceedings of World Academy of Science: Engineering & Technolog;Apr2009, Vol. 52, p210 

    The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may...

  • Relevance feedback revisited: dealing with content and structure in XML documents. Hlaoua, Lobna; Pinel-Sauvagnat, Karen; Boughanem, Mohand // International Journal on Digital Libraries;Apr2010, Vol. 11 Issue 1, p1 

    Relevance feedback (RF) is a technique that allows to enrich an initial query according to the user feedback. The goal is to express more precisely the user's needs. Some open issues arise when considering semi-structured documents like XML documents. They are mainly related to the form of XML...

  • Towards a Graphical Querying Language for XML. Amous, I.; Jedidi, A.; Gargouri, F.; Sèdes, F. // International Review on Computers & Software;Mar2007, Vol. 2 Issue 2, p139 

    In this paper, we propose our approach of graphical querying based on the spatio-temporal metadata. This proposition improves information retrieve and presents documents suitable to the user's needs. We propose the extension of the XQuery language by new operators integrated in its initial...


    The paper presents a cross-technology approach to network management that combines databases, XML, web services, networking and data mining in a snowball-like manner that once launched increases at every moment the value it provides to the decision makers.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics