UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Speech recognition in a harsh environment Pullman, Susan

Abstract

Speech Recognition is a rapidly expanding field with many useful applications in man-machine interfacing. One of the main benefits of speech control is the flexibility and ease of use allowed an operator for any number of specific applications. Speech recognition units (SRU) are currently at a high level of accuracy for user dependent, pretrained, isolated word recognition. However, if uncontrollable noise is added to the speech input, recognition degrades rapidly. If the application requires a vast set of control words to be used by many operators, then, there can be inconsistencies in recognition. The specific application of this study is the secondary control (ie. non-critical control) of heavy machinery (in particular a caterpillar tractor) using an operator - speech interface. The inherent problem of this application is the environmental background noise due to the tractor. It is also important that a robust vocabulary is selected so that no misrecognition occurs between critical control words. In order to add speech input for control of machines in a harsh environment there are two considerations: 1. The reduction of noise from the input speech signal. 2. The selection of a robust vocabulary dependent upon the specific operator and the specific SRU. This study investigates many different types of noise reduction filters, including traditional Wiener, Power Spectral Subtraction and Gaussian filters. The results show that the best types of noise reduction filters are adaptive optimization filters which use two input signals or the Power Spectral Subtraction (PSS) filter. It is possible to reduce the noise to a level within the range of the SRU's capacity for noise. An algorithm for selecting an accurate vocabulary is proposed. This algorithm determines weaknesses for the specific SRU, vocabulary and speaker; and selects the control words around those weaknesses. Testing of this algorithm showed that it was possible to achieve closed to 98% recognition and 0% misrecognition.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.