Description
The knowledge gained from the speaker discriminability measures was used to implement an effective data selection procedure, that allows for the prediction of how well a speaker recognition system will behave without actually implementing the system. The use of speaker discriminability measures also leads to data reduction in speaker recognition training and testing, allowing for faster modeling and easier data storage, given that the latest speaker recognition corpora uses hundreds of gigabytes.
In particular, I focused primarily on Gaussian Mixture Model- (GMM) based speaker recognition systems, which comprise the majority of current state-of-the-art speaker recognition systems. Methods were investigated to make the speaker discriminability measures easily obtainable, such that the amount of computational resources required to extract these measures from the data would be significantly less in comparison to the computational resources required to run entire speaker recognition systems to determine what regions of speech are speaker discriminative.
Upon selecting the speech data using these measures, I created new speech units based on the data selected. The speaker recognition performances of the new speech units were compared to the existing units (mainly mono-phones and words) standalone and in combination. I found that in general, the new speech units are more speaker discriminative than the existing ones. Speaker recognition systems that use the new speech units as data in general outperformed systems using the existing speech units. This work, therefore, outlines an effective approach that is easy to implement for selecting speaker discriminative regions of data for speaker recognition.