Speech is possibly the most efficient way humans can communicate with a machine. If acoustic waves could perfectly map to words, a variety of exciting applications could be realized. However, speech recognition is difficult in reality. One reason is that there is much variability in people's voices. In addition, many applications of speech-recognition require voice authentication or speaker-recognition in order to be practically useful. Speaker identification systems can account for voice variability, and can also provide voice authentication, and hence are an interesting area of research. This report describes a real-time speaker identification system developed for a particular application – transcribing group conversations. Our capstone team worked with a startup – Transcense – which has an app that transcribes a group conversation in real time for the benefit of deaf and hard-of-hearing people. In this application, our speaker identification system allows the deaf person to know the identity of the current speaker, and hence follow the conversation better. The speaker identification system we developed uses a dual approach to identify speakers: (1) Using voice features and machine learning (2) Using sound-source localization. The system achieves ~80% accuracy on a rigorous custom test-protocol with 4 speakers. This makes it a solid proof-of-concept that Transcense could use as a platform to develop their own speaker identification system.




Download Full History