Combining Speech and Speaker Recognition - A Joint Modeling Approach

Su, Hang

PDF

Description

Automatic speech recognition (ASR) and speaker recognition (SRE) are two important fields of research in speech technology. Over the years, many efforts have been made on improving recognition accuracies on both tasks, and many different technologies have been developed. Given the close relationship between these two tasks, researchers have proposed different ways to introduce techniques developed for these tasks to each other. In the first half of this thesis, I explore ways to improve speaker recognition performance using state-of-the-art speech recognition acoustic models, and then research alternative ways to perform speaker adaptation of deep learning models for ASR using speaker identity vector (i-vector). Experiments from this work shows that ASR and SRE are beneficial to each other and can be used to improve their performance. In the second part of the thesis, I aim to build joint model for speech and speaker recognition. To implement this idea, I first build an open-source experimental framework, TIK, that connects well-known deep learning toolkit Tensorflow and speech recognition toolkit Kaldi. After reproducing state-of-the-art speech and speaker recognition performance using TIK, I then developed a unified model, JointDNN, that is trained jointly for speech and speaker recognition. Experimental results show that the joint model can effectively perform ASR and SRE tasks. In particular, experiments show that the JointDNN model is more effective in speaker recognition than x-vector system, given a limited amount of training data.

Details

Title

Combining Speech and Speaker Recognition - A Joint Modeling Approach

Creator

Su, Hang, Author

Published

2018-08-10

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2018-113

Type

Text

Format

technical reports

Extent

90 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket