Where do the authors come from?

Biryukov, Maria
August 2009
Journal of Digital Information Management;Aug2009, Vol. 7 Issue 4, p211
Academic Journal
Permanent growth of scientific publications makes bibliographic databases and digital libraries widespread. At the same time they are an object of research in their own right. In this paper we address the question of "where do the authors come from?" via language identification of the author names. This is a two-steps process which involves primary classification based on the statistical models of languages, and classification refinement achieved with the analysis of the co-author network built from the bibliographic records. A system for automatic language identification presented here handles 14 different languages and requires no dictionary of names for traing. The statistical models are built from the general purpose corpora for all Western European, Chinese, Japanese and Turkish languages. The system is fine tuned to achieve precision and recall above 90% for many languages, and provides better performance than some other systems aiming at the language identification of personal names. Tests on the DBLP data set have shown that the extension of the language model with the co-author network helps to improve classification results, especially in cases of closely related languages and mixed names. They have also demonstrated the usability of the system in applications such as data cleaning and trends detection.


Related Articles

  • New MicroLIF Protocol Approved for General Use.  // Information Today;Sep1992, Vol. 9 Issue 8, p38 

    The article states that MicroLIF (Microcomputer Library Interchange Format) community has approved a new MicroLIF Protocol for general use by participating members. The new protocol resolves all differences between the USMARC format for bibliographic records and the original MicroUF format. By...

  • Advent of Digital Libraries and Measuring Their Performance: A Review. Mathur, Mira // DESIDOC Bulletin of Information Technology;Mar2005, Vol. 25 Issue 2, p19 

    Developments in the field of information technology, like increased capacities of digital storage media, growth of the world wide web and access to internet, sophisticated search engines, fast processing power and reduced computer costs have clinched the case for digital libraries. On the basis...

  • Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the End Note Online Search Function. Fitzgibbons, Megan; Meert, Deborah // Journal of Academic Librarianship;Mar2010, Vol. 36 Issue 2, p144 

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability,...

  • Standard Common Command Language Revised.  // Information Today;May1989, Vol. 6 Issue 5, p18 

    This article reports on a revised draft of the standard for a common command language for bibliographic services being reviewed and balloted by the Voting Members of the National Information Standards Organization as of May 1989. The current revision accommodates previous negative votes and...

  • Help Mechanism Systems Offered in IR Interface of E-journal Database Systems: A Users' Perception Related to Help Mechanism in their Information Seeking Tasks. Rai, Namrata; Kumar, Shailendra // Library Philosophy & Practice;Jan2014, p1 

    The aim of the study is to know about the various help features offered in IR Interface of E-journal database systems and also taking feedback from the users regarding use and non-use of those help features in their search related activities. Qualitative feedback also taken from the respondents...

  • Wilsonline Databases Join EasyNet Gateway.  // Information Today;Nov1986, Vol. 3 Issue 10, p2 

    This article reports on the availability of Wilsonline, an online retrieval system from H. W. Wilson Co., through Telebase's EasyNet gateway system in November 1986. EasyNet subscribers are now able to use their existing terminals and software to search any of the 23 Wilsonline databases. These...

  • Expanded 3rd Edition of Polar Pac from WLN.  // Information Today;Jan1994, Vol. 11 Issue 1, p17 

    The article states that Polar Pac, Western Library Network (WLN) CD-ROM database of international polar regions bibliographic information, is now available in a greatly expanded 3rd edition. Polar Pac3 contains 194,325 full bibliographic records and 333,698 call numbers representing the...

  • "The Web of Our Life is of a Mingled Yarn": The Canadian Adaptations of Shakespeare Project, Humanities Scholarship, and ColdFusion. Fischlin, Daniel; Hadfield, Dorothy; Lester, Gordon; McCutcheon, Mark A. // College Literature;Winter2009, Vol. 36 Issue 1, p77 

    This essay presents an overview of some of the challenges related to publishing the findings of a large-scale research project on Shakespeare—specifically the Canadian Adaptations of Shakespeare Project or CASP; www.canadianshakespeares.ca—on the World Wide Web. Our purpose in...

  • Identifying Useful Terms to Retrieve Survival Data Meta-Analyses Publications for Bibliographic Databases Search Strategies. Leucuţa, Daniel Corneliu; Achimas Cadariu, Andrei // Applied Medical Informatics;Nov2009, Vol. 25 Issue 3/4, p21 

    Introduction: Quality research and quality evidence based medicine practice has an important pillar in a solid bibliographic documentation. Quality bibliographic documentation makes use of search strategies to retrieve articles from search engines of bibliographic databases. The AIM of this...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics