Acoustic quality normalization for robust automatic speech recognition

Muhammad, Ghulam
December 2007
International Journal of Speech Technology;Dec2007, Vol. 10 Issue 4, p175
Academic Journal
Automatic speech recognition (ASR) system suffers from the variation of acoustic quality in an input speech. Speech may be produced in noisy environments and different speakers have their own way of speaking style. Variations can be observed even in the same utterance and the same speaker in different moods. All these uncertainties and variations should be normalized to have a robust ASR system. In this paper, we apply and evaluate different approaches of acoustic quality normalization in an utterance for robust ASR. Several HMM (hidden Markov model)-based systems using utterance-level, word-level, and monophone-level normalization are evaluated with HMM-SM (subspace method)-based system using monophone-level normalization for normalizing variations and uncertainties in an utterance. SM can represent variations of fine structures in sub-words as a set of eigenvectors, and so has better performance at monophone-level than HMM. Experimental results show that word accuracy is significantly improved by the HMM-SM-based system with monophone-level normalization compared to that by the typical HMM-based system with utterance-level normalization in both clean and noisy conditions. Experimental results also suggest that monophone-level normalization using SM has better performance than that using HMM.


Related Articles

  • A Supervised Text-Independent Speaker Recognition Approach. Barbu, Tudor // International Journal of Electronics, Circuits & Systems;2007, Vol. 1 Issue 3, p157 

    We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum...

  • What Ever Happened To Voice? Lowenstein, Mark // Wireless Week;6/15/2003, Vol. 9 Issue 13, p45 

    Focuses on voice portals and applications using voice navigation. Technology improvements in voice recognition and TTS; Promising concept for voice application; Feature deployed in standard application platforms; Example of how session management could help improve the user experience.

  • DEVELOPMENT OF ISOLATED WORDS SPEECH DATABASE OF MARATHI WORDS FOR AGRICULTURE PURPOSE. Shrishrimal, P. P.; Deshmukh, R. R.; Waghmare, Vishal B. // Asian Journal of Computer Science & Information Technology;Jul2012, Vol. 2 Issue 7, p217 

    Development of Speech Database is the very first step for developing an Automatic Speech Recognition system. The Accuracy of speech recognition depends on the quality of the speech data collected and the training set data quality. This paper describes the proposed procedure to be followed for...

  • You Talkin' To Me? Owsen, Dwight M.; Schneider, Kent N. // California CPA;May2005, Vol. 73 Issue 9, Special Section p8 

    Presents information on voice recognition software. Programs offered by Dragon NaturallySpeaking; Affordability of the technology; Keys to using voice recognition software.

  • Hierarchical K-Means Algorithm Applied On Isolated Malay Digit Speech Recognition. Majeed, S. A.; Husain, H.; Samad, S. A.; Hussain, A. // International Proceedings of Computer Science & Information Tech;2012, Vol. 34, p33 

    In recent years, there has been an increasing interest in speech recognition in terms of accuracy. In this paper, the implementation of a speech recognition system in a speaker-independent isolated Malay digit was discussed. The system is developed applying Hierarchical K-means clustering...

  • Hybrid neuromorphic system for automatic speech recognition. Rafiue, M. A.; Lee, B. G.; Jeon, M. // Electronics Letters;8/18/2016, Vol. 52 Issue 17, p1428 

    A multilayer neural network, equipped with a two-memristors synapse, for speech recognition is proposed. The discussed neuromorphic neural network is a hybrid system which uses a Gaussian--Bernoulli restricted Boltzmann Machine (RBM) to transform the speech data into sparse encoded binary data....

  • Preprocessing and Segmentation of the Speech Signal in the Frequency Domain for Speech Recognition. Kolokolov, A. S. // Automation & Remote Control;Jun2003, Vol. 64 Issue 6, p985 

    Preprocessing of the speech signal before recognition of phonemes was considered. Methods of processing the spectrum and segmenting the speech signal for stable speech recognition in the presence of frequency distortions were proposed. They are based on a procedure of linear filtering of the...

  • The use of speed-up techniques for a speech recognizer system. Kocsor, András; Gosztolya, Gábor // International Journal of Speech Technology;Sep2007, Vol. 9 Issue 3/4, p95 

    In speech recognition, not just the accuracy of an automatic speech recognition application is important, but also its speed. However, if we want to create a real-time speech recognizer, this requirement limits the time that is spent on searching for the best hypothesis, which can even affect...

  • VUI REVIEW TESTING. Kaiser, Lizanne // Speech Technology Magazine;May/Jun2006, Vol. 11 Issue 3, p39 

    The article provides information on Voice User Interface (VUI) Review Testing (VRT). VRT provides a holistic, experiential review of a speech application. The tester, typically a VUI designer, plays the role of a caller and tests the fully developed application by acting out pre-determined...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics