Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Ekbal, Asif; Bandyopadhyay, Sivaji
June 2008
International Journal of Computer Systems Science & Engineering;2008, Vol. 4 Issue 2, p155
Academic Journal
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.


Related Articles

  • PowerShell 101 LESSON 6. Sheldon, Robert // Windows IT Pro;Jul2008, Vol. 14 Issue 7, p43 

    The article discusses the type of drives that Microsoft Windows PowerShell command line shell and scripting language supports. Information is presented on how to implement the available drives through PowerShell providers that facilitate access to data stores. It describes the way of working...

  • Application of advanced programming concepts in metamodelling. Berg, Henning; Møller.-Pedersen, Birger; Krogdahl, Stein // Norsk Informatikkonferanse;2011, Issue 21-23, p207 

    Programming languages provide users with a rich palette of advanced mechanisms. Languages for metamodelling, on the other hand, usually provide simple mechanisms for making class models, i.e. classes with relations like specialisation, composition and associations. A metamodel defines the syntax...

  • Enhancing Functional and Irregular Parallelism: Stateful Functions and their Semantics. Attali, Isabelle; Caromel, Denis; Chen, Yung-Syau; Gaudiot, Jean-Luc; Wendelborn, Andrew L. // International Journal of Parallel Programming;Aug2001, Vol. 29 Issue 4, p433 

    We describe an approach in which stateful computations can be expressed within the framework of a functional language. We consider algorithms with nondeterministic intermediate results and a deterministic final result which is obtained for any series of intermediate values of some variable...

  • Derived types in semantic association discovery. J�msen, Janne; Niemi, Timo; J�rvelin, Kalervo // Journal of Intelligent Information Systems;Oct2010, Vol. 35 Issue 2, p213 

    Semantic associations are direct or indirect linkages between two entities that are construed from existing associations among entities. In this paper we extend our previous query language approach for discovering semantic associations with an ability to retrieve semantic associations that,...

  • CACTUS. Hyde, Hartley // Australian Mathematics Teacher;Aug2006, Vol. 62 Issue 3, p6 

    The article illustrates the use of APL, or A Programming Language, which was developed by K. E. Iverson in 1962 and one of the first interactive computing languages. If is often remembered for its use of an extended character set that was provided by an APL golf ball. APLX version 3.1 has...

  • Regular autodense languages. Chen-Ming Fan; Huang, C. C.; Shyr, H. J. // Acta Informatica;Dec2008, Vol. 45 Issue 7/8, p467 

    A regular component is either autodense or anti-autodense. Characterizations of a regular component being a pure autodense language and being a pure autodense code are obtained. A relationship between intercodes and anti-autodense languages is that for an intercode L of index m, L n is an...

  • Data-abstraction refinement: a game semantic approach. Bakewell, Adam; Dimovski, Aleksandar; Ghica, Dan; Lazic, Ranko // International Journal on Software Tools for Technology Transfer;Sep2010, Vol. 12 Issue 5, p373 

    This paper presents a semantic framework for data abstraction and refinement for verifying safety properties of open programs with integer types. The presentation is focused on an Algol-like programming language that incorporates data abstraction in its type system. We use a fully abstract game...

  • Aspects of CXXR internals. Runnalls, Andrew // Computational Statistics;Sep2011, Vol. 26 Issue 3, p427 

    The CXXR project aims gradually to refactor the fundamental parts of the R interpreter from C into C++ whilst retaining the full functionality of the standard distribution of R. It is hoped that this will enable researchers more easily to enhance the functionality of R by allowing them to extend...

  • The Seven Sins of Perl OO Programming. Chromatic // Perl Review;Winter2005, Vol. 2 Issue 1, p10 

    The article presents information on several sins Perl OO coders occasionally commit. It states that some coders use the methods isa () and can () as if they were functions. Meanwhile, others copy and paste a test code in their test suite and then change it. The author says that there are good...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics