Go to main content

PDF

Description

Models currently can accurately classify images if given a large labeled dataset. However, labeling is often a tedious process and does not scale well. Contrastive learning aims to learn an embedding space in which similar pairs are mapped close to each other and dissimilar pairs are mapped away from each other. This self-supervised learning method can leverage unlabeled datasets to generate feature representations that can be applied to various tasks such as classification and segmentation. Most of the current literature in contrastive learning deals with images, but there are significantly fewer works related to audio inputs. Recently, deep learning showed promising results to solve tasks such as event classification on audio data, so I believe some of the contrastive learning techniques can be applied to this domain as well. In this thesis, I investigate the effectiveness of two different instance discrimination frameworks, non-parametric instance discrimination (NPID) and Momentum Contrast (MoCo) that are trained on audio event data. I demonstrate that a network, without large amounts of data, can learn an audio representation that can be applied to improve performance on various classification tasks, including music and environmental sounds. Finally, my experiments show that pretrained weights from these frameworks can lead to faster convergence than other standard weight initialization methods.

Details

Files

Statistics

from
to
Export
Download Full History
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS