txttool: Utilities for text analysis in Stata

Williams, Unislawa; Williams, Sean P.
October 2014
Stata Journal;2014, Vol. 14 Issue 4, p817
Academic Journal
This article describes txttool, a command that provides a set of tools for managing free-form text. The command integrates several built-in Stata functions with new text capabilities. These latter functions include a utility to create a bag-of-words representation of text and an implementation of Porter's (1980, Program: Electronic library and information systems 14: 130-137) wordstemming algorithm. Collectively, these utilities provide a text-processing suite for text mining and other text-based applications in Stata.


Related Articles

  • Knowledge Discovery in the Blogosphere: Approaches and Challenges. Lakshmanan, Geetika T.; Oberhofer, Martin A. // IEEE Internet Computing;Mar/Apr2010, Vol. 14 Issue 2, p24 

    Knowledge discovery in blogs is different from knowledge discovery in areas such as databases or Web documents due to blogs' unique characteristics, which introduce additional mining challenges. Although researchers have investigated several techniques to address different aspects of blog...

  • A vector-space dynamic feature for phrase-based statistical machine translation. Costa-jussà, Marta; Banchs, Rafael // Journal of Intelligent Information Systems;Oct2011, Vol. 37 Issue 2, p139 

    In this paper, we propose and evaluate a novel dynamic feature function for log-linear model combinations in phrase-based statistical machine translation. The feature function is inspired on the popularly known vector-space model which is typically used in information retrieval and text mining...

  • Scientific Documents Clustering Based on Text Summarization. Amoli, Pedram Vahdani; Sojoodi Sh., Omid // International Journal of Electrical & Computer Engineering (2088;Aug2015, Vol. 5 Issue 4, p782 

    In this paper a novel method is proposed for scientific document clustering. The proposed method is a summarization-based hybrid algorithm which comprises a preprocessing phase. In the summarization phase unimportant words which are not frequently used in the document are removed. This process...

  • A Web Text Mining Flexible Architecture. Castellano, M.; Mastronardi, G.; Aprile, A.; Tarricone, G. // Proceedings of World Academy of Science: Engineering & Technolog;Dec2007, Vol. 36, p78 

    Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from not-structured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web...

  • Classification of Documents using Effective Pattern Taxonomy. Uday Kiran, Mallareddy; Ravikanth, R. // International Journal of Computer Applications;Jan2014, Vol. 86, p19 

    Text mining is a technique helps users in extracting useful information from large amount of database available digitally on web or text data. Pattern Taxonomy based model containing sequential pattern used to perform the task. EPT (Effective Pattern Taxonomy) method helps in extracting useful...

  • Classifying lifestyle content.  // KM World;Jan2009, Vol. 18 Issue 1, p20 

    The article reports on the selection of a text-mining solution from Nstein Technologies Inc., by Scripps Networks, a company that owns and creates content for five home, food and lifestyle cable networks and Internet sites. The solution is aimed to semantically analyze its immense library of...

  • Text Mining for Meeting Transcript Analysis to Extract Key Decision Elements. Chibelushi, Caroline; Thelwall, Mike // International MultiConference of Engineers & Computer Scientists;2009, p710 

    The frequent but unfortunate need to rework software development projects may often be caused by inappropriate decision making. The first step in addressing this issue is to explore decision making processes and to extract the tangible elements of decision making within meetings. This paper...

  • Webcrawling clustering en espacio multidimensional basado en distancia y su aplicación a Opinion Mining. Gorbatik, Ezequiel; Barrera, Hugo O.; Schneider Loaiza, E.; Riaño Santiesteban, Fabián; Gindre, Francisco; López De Luise, M. Daniela // Revista de Ciencia y Tecnología;2012, Vol. 12, p7 

    Multimedia consumption and the revolution caused by the Web 2.0 phenomenon, where information consumers are also producers are clearly reflected by a paradigm shift in modern communication. Due to this change, traditional survey tools such as focus groups, phone surveys and quizzes are now...

  • Text Mining and Processing for Corpora Creation in Slovak Language. Hládek, Daniel; Staš, Ján // Journal of Computer Science & Control Systems;2010, Vol. 3 Issue 1, p65 

    Language modeling for creation of the system for spontaneous speech recognition requires a large amount of text. This paper proposes a method to provide a training corpus for creating a language model. Text is gathered, segmented into sentences, numerals, special symbols are rewritten and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics