Automatic Feature Template Generation for Prosodic Phrasing

Fangzhou Liu; You Zhou
April 2012
Journal of Software (1796217X);Apr2012, Vol. 7 Issue 4, p779
Academic Journal
Prosodic phrase prediction is important for both the naturalness and intelligibility of Text-to-Speech (TTS) systems. To automatically generate feature templates of prosodic phrasing models, this paper proposes a hybrid approach which converts the rules generated by classification and regression tree (CART) into templates of transformation-based learning (TBL), and designs a hierarchical clustering based feature combination algorithm for maximum entropy (ME) model. While minimizing human supervision, TBL templates automatically generated by CART can provide good alternatives or beneficial supplement to manually summarized templates, and ME templates automatically generated by the proposed feature combination algorithm not only make an improvement of 3.1% on F-measure over manual templates, but also reduce the size of ME model by up to 79.0%.


Related Articles

  • Active Learning for Prediction of Prosodic Word Boundaries in Chinese TTS Using Maximum Entropy Markov Model. Ziping Zhao; Xirong Ma // Journal of Software (1796217X);Dec2013, Vol. 8 Issue 12, p3222 

    For a Chinese speech synthesis system, hierarchical prosody structure generation is a key component. The prosodic word, which is the basic prosodic unit, plays an important role in the naturalness and intelligibility of Chinese Text-To-Speech system. However, obtaining human annotations of...

  • The principles of designing of algorithm for speech synthesis from texts written in Albanian language. Dika, Agni; Maxhuni, Adnan; Rexhepi, Avni // International Journal of Computer Science Issues (IJCSI);May2012, Vol. 9 Issue 3, p175 

    The speech synthesis is artificial generation of human speech from written texts. For this purpose, adequate algorithms are designed, which then through relevant programs make it possible to synthesize texts to speech. The process of converting text into speech is also known as Text-To-Speech...

  • Evaluation of Hidden Semi-Markov Models Training Methods for Greek Emotional Text-to-Speech Synthesis. Lazaridis, Alexandros; Mporas, Iosif // International Journal of Information Technology & Computer Scien;Mar2013, Vol. 5 Issue 4, p23 

    This paper describes and evaluates four different HSMM (hidden semi-Markov model) training methods for HMM-based synthesis of emotional speech. The first method, called emotion-dependent modelling, uses individual models trained for each emotion separately. In the second method, emotion...

  • An introduction to statistical parametric speech synthesis. King, Simon // Sadhana;Oct2011, Vol. 36 Issue 5, p837 

    Statistical parametric speech synthesis, based on hidden Markov model-like models, has become competitive with established concatenative techniques over the last few years. This paper offers a non-mathematical introduction to this method of speech synthesis. It is intended to be complementary to...

  • A Novel Approach to Develop Speech Database for Kannada Text-to-Speech System. Ravi, D. J.; Patilkulkarni, Sudarshan // International Journal on Recent Trends in Engineering & Technolo;3/10/2011, Vol. 5 Issue 2, p119 

    Electronic Speech synthesis is a process of generating human-like speech from any text input to emulate human speaker. The objective of a text to speech system is to convert an arbitrary text into its corresponding spoken waveform. Text processing and speech generation are two main components of...

  • Training MEMM with PSO: A Tool for Part-of-Speech Tagging. Lei La; Qiao Guo; Qimin Cao // Journal of Software (1796217X);Nov2012, Vol. 7 Issue 11, p2511 

    Maximum Entropy Markov Models (MEMM) can avoid the assumption of independence in traditional Hidden Markov Models (HMM), and thus take advantage of context information in most text mining tasks. Because the convergence rate of the classic generalized iterative scaling (GIS) algorithm is too low...

  • Expressive speech synthesis: a review. Govind, D.; Prasanna, S. // International Journal of Speech Technology;Jun2013, Vol. 16 Issue 2, p237 

    The objective of the present work is to provide a detailed review of expressive speech synthesis (ESS). Among various approaches for ESS, the present paper focuses the development of ESS systems by explicit control. In this approach, the ESS is achieved by modifying the parameters of the neutral...

  • Real Time Prosody Modification. Rao, Krothapalli Sreenivasa // Journal of Signal & Information Processing;Nov2010, Vol. 1 Issue 1, p50 

    Real time prosody modification involves changing the prosody parameters such as pitch, duration and intensity of speech in real time without affecting the intelligibility and naturalness. In this paper prosody modification is performed using instants of significant excitation (ISE) of the vocal...

  • HMM-Based Distributed Text-to-Speech Synthesis Incorporating Speaker-Adaptive Training. Kwang Myung Jeon; Seung Ho Choi // International Journal of Multimedia & Ubiquitous Engineering;2014, Vol. 9 Issue 5, p107 

    In this paper, a hidden Markov model (HMM) based distributed text-to-speech (TTS) system is proposed to synthesize the voices of various speakers in a client-server framework. The proposed system is based on speaker-adaptive training for constructing HMMs corresponding to a target speaker, and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics