Feature Design for Robust Speech Recognition: Nurture and Nature

Chang, Shuo-Yiin

PDF

Description

As has been extensively shown, acoustic features for speech recognition can be nurtured from training data using neural networks (DNN) with multiple hidden layers. Although a large body of research has shown these learned features are superior to standard front- ends, this superiority is usually demonstrated when the data used to learn the features is very similar to the data used to test recognition performance. However, realistic environments cover many unanticipated types of novel inputs including noise, channel distortion, reverberation, accented speech, speaking rate variation, overlapped speech, etc. A quantitative analysis using bootstrap sampling shows that these trained features are easily specialized to training data and corrupted in mismatched scenarios. Gabor filtered spectrograms, on the other hand, are generated from spectro-temporal filters to model natural human auditory processing, which can be instrumental in improving generalization to unanticipated deviations from what was seen in training. In this thesis, I used Gabor filtering as feature processing or a convolutional kernel in neural networks where the former used filter outputs as DNN inputs while the latter used filter coefficients and structures to initialize a convolutional neural network (CNN). Experiments show that the proposed features perform better than other noise-robust features that I have tried on several noisy corpora. In addition, I demonstrate that inclusion of Gabor filters with lower or higher temporal modulations could be used to correlate better with human perception of slow or rapid speech. Finally, I report on the analysis of human cortical signals to demonstrate the relative robustness of these signals to the mixed signal phenomenon in contrast to a DNN-based ASR system. With a number of example tasks in the thesis, I conclude that designed feature is useful for greater robustness than just relying on DNN or CNN.

Details

Title

Feature Design for Robust Speech Recognition: Nurture and Nature

Creator

Chang, Shuo-Yiin, Author

Published

EECS Department, University of California, University of California at Berkeley, Berkeley, California, May 12, 2016

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2016-62

Type

Text

Format

technical reports

Extent

87 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket