An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis Systems

Gros, Jerneja Zganec; Zganec, Mario
March 2008
Journal of Computing & Information Technology;Mar2008, Vol. 16 Issue 1, p69
Academic Journal
This paper presents a method for selecting speech units for polyphone concatenative speech synthesis, in which the simplification of procedures for search paths in a graph has accelerated the speed of the unit-selection procedure with minimum effects on the speech quality. The speech units selected are still optimal; only the costs of merging the units on which the selection is based are less accurately determined. Due to its low processing power and memory footprint requirements, the method is suitable for use in embedded speech synthesizers.


Related Articles

  • The Main Principles of Text-to-Speech Synthesis System. Aida Zade, K. R.; Ardil, C.; Sharifova, A. M. // International Journal of Signal Processing;2010, Vol. 6 Issue 1, p13 

    In this paper, the main principles of text-to-speech synthesis system are presented. Associated problems which arise when developing speech synthesis system are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown.

  • High Quality Arabic Concatenative Speech Synthesis. Chabchoub, Abdelkader; Cherif, Adnan // Signal & Image Processing: An International Journal;Dec2011, Vol. 2 Issue 4, p27 

    This paper describes the implementation of TD-PSOLA tools to improve the quality of the Arabic Text-tospeech (TTS) system. This system based on Diphone concatenation with TD-PSOLA modifier synthesizer. This paper describes techniques to improve the precision of prosodic modifications in the...

  • On a Speech Multiple System Implementation for Speech Synthesis. Jong Kuk Kim; Hern Soo Hahn; Myung Jin Bae // Wireless Personal Communications;Jun2009, Vol. 49 Issue 4, p533 

    This paper proposes a voice synthesizer to convert a single speech to multiple speeches. Pitch is an important voice characteristic of speech parameter and it is produced by the periodic vibration of the vocal-cords; the parameter most sensitive for human’s auditory sense. So if you...

  • MODELLING SPEECH TEMPORAL STRUCTURE FOR ESTONIAN TEXT-TO-SPEECH SYNTHESIS: FEATURE SELECTION. Mihkla, Meelis // TRAMES: A Journal of the Humanities & Social Sciences;2007, Vol. 11 Issue 3, p284 

    The article discusses the principles of selecting features for modelling the temporal structure of Estonian speech, using different types of read-out texts, with a view to text-to-speech synthesis (TTS). Feature selection is known to depend on certain general issues regulating speech temporal...

  • Talking appliances. Hawkins, William J. // Popular Science;Sep82, Vol. 221 Issue 3, p74 

    The article offers information on the applications of the voice synthesis systems. It states that only a factor of analog voice data can be stored to reproduce speech. The Linear Predictive Coding assumes a voice pattern, but uses the stored digital data to alter it to create the exact sound...

  • Emotion Extractor: AI based methodology to implement prosody features in Speech Synthesis. Chandak, M. B.; Dharaskar, R. V. // International Journal of Computer Science Issues (IJCSI);Jul2011, Vol. 8 Issue 4, p371 

    This paper presents the methodology to extract emotion from the text at real time and add the expression to the documents contents during speech synthesis. To understand the existence of emotions self assessment test was carried out on set of documents and preliminary rules were formulated for...

  • Research and realization the method of pronunciation conversion for speech synthesis of the Lhasa dialect of Tibetan. Zhiqiang Wu; Hongzhi Yu; Shuhui Wan // Applied Mechanics & Materials;2014, Issue 571-572, p858 

    Pronunciation conversion is the premise to realize the speech synthesis system, besides, the conversion accuracy is directly related to the quality of synthetic speech. By studying the characteristics of Tibetan words and Lhasa pronunciation, currently method of the pronunciation conversion for...

  • Synthesizer Turns X-Y Inputs To Vocal Outputs. Bak, David J. // Design News;10/23/89, Vol. 45 Issue 20, p108 

    This article presents information on Speechpad, a communications device based on a low-cost graphics tablet and battery-powered speech board. The tablet, a programmable grid of membrane switches, functions as an expanded keypad. Originally designed to communicate with a PC, it connects via the...

  • The Role of Duration Models and Symbolic Representation for Timing in Synthetic Speech. Brinckmann, Caren; Trouvain, Jürgen // International Journal of Speech Technology;Jan2003, Vol. 6 Issue 1, p21 

    In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in the perceptual quality of a text-to-speech system. In perception experiments using German...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics