TITLE

Word-Sequence Kernels

AUTHOR(S)
Cancedda, Nicola; Gaussier, Eric; Goutte, Cyril; Renders, Jean-Michel
PUB. DATE
August 2003
SOURCE
Journal of Machine Learning Research;8/15/2003, Vol. 3 Issue 6, p1059
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark problems. Recently, Lodhi et al. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has several advantages, in particular it is more efficient computationally and it ties in closely with standard linguistic pre-processing techniques. We present some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors, and present empirical evaluations of these extensions on the Reuters-21578 datasets.
ACCESSION #
11468754

 

Related Articles

  • A formal study of feature selection in text categorization. Xu Yan // Journal of Communication & Computer;Apr2009, Vol. 6 Issue 4, p32 

    One of the most important issues in Text Categorization (TC) is Feature Selection (FS). Many FS methods have been put forward and widely used in TC field, such as Information Gain (IG), Document Frequency thresholding (DF) and Mutual Information. Empirical studies show that some of these (e.g....

  • The Influence of preprocessing parameters on text categorization. Pomikálek, Jan; Řeůřek, Radim // Enformatika;2007, Vol. 19, p430 

    Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in...

  • Automatic Detection of Antisocial Behaviour in Texts. Munezero, Myriam; Suero Montero, Calkin; Kakkonen, Tuomo; Sutinen, Erkki; Mozgovoy, Maxim; Klyuev, Vitaly // Informatica (03505596);Dec2014, Vol. 38 Issue 4, p3 

    A considerable amount of effort has been made to reduce the physical manifestation of antisocial behaviour (ASB) in communities. However, the key to the early detection of ASB is, in many cases, in observing its manifestations in written language, which has not been studied in detail. In this...

  • Automatic Detection of Antisocial Behaviour in Texts. Munezero, Myriam; Montero, Calkin Suero; Kakkonen, Tuomo; Sutinen, Erkki; Mozgovoy, Maxim; Klyuev, Vitaly // Informatica (03505596);2014, Vol. 38 Issue 1, p3 

    A considerable amount of effort has been made to reduce the physical manifestation of antisocial behaviour (ASB) in communities. However, the key to the early detection of ASB is, in many cases, in observing its manifestations in written language, which has not been studied in detail. In this...

  • An Improving Deception Detection Method in Computer-Mediated Communication. Hu Zhang; Zhuohua Fan; Jiaheng Zheng; Quanming Liu // Journal of Networks;Nov2012, Vol. 7 Issue 11, p1811 

    Online deception is disrupting our daily life, organizational process, and even national security. Existing deception detection approaches followed a traditional paradigm by using a set of cues as antecedents, and used a variety of data sets and common classification models to detect deception,...

  • Machine Learning based approach for Human Trait Identification from Blog Data. Saxena, Saurabh; Sharma, Chandra Mani // International Journal of Computer Applications;6/15/2012, Vol. 48, p17 

    Emotions form a major part of a person's personality. Emotional intelligence (EI) is the ability to identify, assess, and control the emotions of oneself, of others, and of groups. The written expressions reflect author's personality. Various personality traits can be determined by the analysis...

  • A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm. Hao Chen; Wen Jiang; Canbing Li; Rui Li // Mathematical Problems in Engineering;2013, p1 

    Due to the era of Big Data and the rapid growth in textual data, text classification becomes one of the key techniques for handling and organizing the text data. Feature selection is the most important step in automatic text categorization. In order to choose a subset of available features by...

  • An Approach for Sentiment Tendency Analysis on Comment Text. Min LI; Mengdong CHEN; Xiangbin LI // Advanced Materials Research;7/24/2014, Vol. 989-994, p1913 

    With the rapid development of network, texts which contain position, views and opinions of events are exploding. Texts of review contain author's feelings, views and tendencies the author wants to express. People need to analyze these texts automatically to acquire sentiment tendency of the...

  • Using machine learning for concept extraction on clinical documents from multiple data sources. Torii, Manabu; Wagholikar, Kavishwar; Liu, Hongfang // Journal of the American Medical Informatics Association;Sep2011, Vol. 18 Issue 5, p580 

    Objective Concept extraction is a process to identify phrases referring to concepts of interests in unstructured text. It is a critical component in automated text processing. We investigate the performance of machine learning taggers for clinical concept extraction, particularly the portability...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics