Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

Gonina, Ekaterina; EECS Department, University of California

PDF

Description

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine "who spoke when" in an audio recording. While state-of-the-art in accuracy, this method is computa-tionally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efﬁcient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difﬁcult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this thesis we present a Python-based GMM training specialization framework that abstracts low-level GPU code and instead automatically selects the best parallel implementation of the training algorithm based on the diarization problem size and the processor features and maps the computation onto the parallel platform. Our specialization framework can automatically map the GMM training algorithm onto any CUDA-programmable NVIDIA GPU as well as multi-core CPUs. We then present a full speaker diarization system captured in about 50 lines of Python that uses our specialization framework and achieves 37-166x faster than real-time performance without signiﬁcant loss in accuracy. We also investigate a trade-off between diarization accuracy and performance and show that for a 3% gain in Diarization Error Rate (DER), the application performance decreases by 5x. Using our framework allows the scientist to focus on developing the application algorithm while achieving signiﬁcant additional performance improvements by automatically utilizing parallel hardware.

Details

Title

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

Creator

Gonina, Ekaterina, Author
EECS Department, University of California, Publisher

Published

2011-12-12

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2011-128

Type

Text

Format

technical reports

Extent

31 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket