We rely heavily on search engines like Google to navigate millions of webpages, but a lot of content of interest is multimedia, not text data. One important class of multimedia data is audio. How can we search a database of audio data? One of the main challenges in audio search and retrieval is to determine a mapping from a continuous time-series signal to a sequence of discrete symbols that are suitable for reverse-indexing and efficient pairwise comparison. This talk introduces a method for learning this mapping in an unsupervised, highly adaptive way, resulting in a representation which we call audio hashprints. We will discuss the theoretical underpinnings that determine how useful a particular representation is in a retrieval context, and we show how hashprints are a suitable representation for tasks requiring high adaptivity. We investigate the performance of the proposed hashprints on two different audio search tasks: synchronizing consumer recordings of the same live event using audio correspondences, and identifying a song at a live concert. Using audio hashprints, we demonstrate state-of-the-art performance on both tasks.
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).