Automatic symptom name normalization in clinical records of traditional Chinese medic

Yaqiang Wang; Zhonghua Yu; Yongguang Jiang; Kaikuo Xu; Xia Chen
January 2010
BMC Bioinformatics;2010, Vol. 11, p40
Academic Journal
Background: In recent years, Data Mining technology has been applied more than ever before in the field of traditional Chinese medicine (TCM) to discover regularities from the experience accumulated in the past thousands of years in China. Electronic medical records (or clinical records) of TCM, containing larger amount of information than well-structured data of prescriptions extracted manually from TCM literature such as information related to medical treatment process, could be an important source for discovering valuable regularities of TCM. However, they are collected by TCM doctors on a day to day basis without the support of authoritative editorial board, and owing to different experience and background of TCM doctors, the same concept might be described in several different terms. Therefore, clinical records of TCM cannot be used directly to Data Mining and Knowledge Discovery. This paper focuses its attention on the phenomena of "one symptom with different names" and investigates a series of metrics for automatically normalizing symptom names in clinical records of TCM. Results: A series of extensive experiments were performed to validate the metrics proposed, and they have shown that the hybrid similarity metrics integrating literal similarity and remedy-based similarity are more accurate than the others which are based on literal similarity or remedy-based similarity alone, and the highest F-Measure (65.62%) of all the metrics is achieved by hybrid similarity metric VSM+TFIDF+SWD. Conclusions: Automatic symptom name normalization is an essential task for discovering knowledge from clinical data of TCM. The problem is introduced for the first time by this paper. The results have verified that the investigated metrics are reasonable and accurate, and the hybrid similarity metrics are much better than the metrics based on literal similarity or remedy-based similarity alone.


Related Articles

  • Two more states try to block data mining.  // Medical Economics;8/17/2007, Vol. 84 Issue 16, p22 

    The article reports that two states in New England are intensifying the battle to protect residents' privacy. The states of Maine and Vermont have implemented new laws which set limits on the sale of prescription drug information for marketing purposes. The laws ban the sale of data that...

  • Extracting Medical Records with Hierarchical Information Extraction Method. Wenhao Zhu; Chaoyou Ju; Wei Xu; Jiaoxiong Xia; Li Fu // Information Technology Journal;2013, Vol. 12 Issue 18, p4441 

    Traditional Chinese Medicine (TCM) has a very long history in China. As a part of Chinese culture heritage, clinical TCM records were preserved in TCM books. With the rapid development of digitization movement, a lot of these books are being digitized and it will be veiy useful if the medical...

  • Report Calls for Standardized HIT Language.  // For the Record (Great Valley Publishing Company, Inc.);1/31/2011, Vol. 23 Issue 2, p6 

    The article reports on the need to adopt a robust information-sharing infrastructure to facilitate the exchange of data among institutions to achieve the full potential of healthcare information technology (HIT) in the U.S. It mentions the proposal to adopt a universal exchange language that...

  • Integration: Linking Infection Control and EMRs. Page, Douglas // H&HN: Hospitals & Health Networks;May2010, Vol. 84 Issue 5, p47 

    The author discusses control of hospital-acquired infections at health facilities through information technology such as electronic surveillance or data mining methods. The use of microbial genotyping by Texas Children's Hospital in Houston, Texas is discussed. The author notes that the ultimate...

  • Mining strong relevance between heterogeneous entities from unstructured biomedical data. Ji, Ming; He, Qi; Han, Jiawei; Spangler, Scott // Data Mining & Knowledge Discovery;Jul2015, Vol. 29 Issue 4, p976 

    Huge volumes of biomedical text data discussing about different biomedical entities are being generated every day. Hidden in those unstructured data are the strong relevance relationships between those entities, which are critical for many interesting applications including building knowledge...

  • Integration Electronic Patients' Records with Open Life Sciences Datasets Using Semantic Web Tools. Najeeb, Bassam; Al Khatib, Bassel // International Review on Computers & Software;May2016, Vol. 11 Issue 5, p403 

    Semantic Web and its related technologies such as linked data, provide a powerful infrastructure for integrating and publishing heterogeneous data as a Resource Description Framework (RDF). In particular, Linked Open Data (LOD) community project from the Wide Web Consortium (W3C) has published...

  • A Data Mining Approach to Characterizing Medical Code Usage Patterns. Spangler, William E.; May, Jerrold H.; Strum, David P.; Vargas, Luis G. // Journal of Medical Systems;Jun2002, Vol. 26 Issue 3, p255 

    This research describes a synthetic data mining approach to identifying diagnostic (ICD-9) and procedure (CPT) code usage patterns in two U.S. hospitals, with the goal of determining the adequacy and effectiveness of the current coding classification systems. We combine relative frequency...

  • Taming the tumult. Redling, Bob // MGMA Connexion;May/Jun2006, Vol. 6 Issue 5, p25 

    The article presents a case study on the improvements brought about by electronic medical record (EMR) to the operations of the Virginia Adult and Pediatric Allergy and Asthma PC. The EMR improved records storage security, promoted faster and more accurate patient records, reduced the time to...

  • Medical Knowledge Discovery and Management. Prior, Fred // Military Medicine;May2009 Supplement, p21 

    Although the volume of medical information is growing rapidly, the ability to rapidly convert this data into "actionable insights" and new medical knowledge is lagging far behind. The first step in the knowledge discovery process is data management and integration, which logically can be...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics