This thesis sought to address these issues by appropriately employing audio segmentation as a first step to both automatic speech recognition and speaker diarization in meetings. For ASR, the segmentation of nonspeech and local speech was the objective while for speaker diarization, nonspeech, single-speaker speech, and overlapped speech were the audio classes to be segmented. A major focus was the identification of features suited to segmenting these audio classes: For crosstalk, cross-channel features were explored, while for monaural overlapped speech, energy, harmonic, and spectral features were examined. Using feature subset selection, the best combination of auxiliary features to baseline MFCCs in the former scenario consisted of normalized maximum cross-channel correlation and log-energy di erence; for the latter scenario, RMS energy, harmonic energy ratio, and modulation spectrogram features were determined to be the most useful in the realistic multi-site farfield audio condition. For ASR, improvements to word error rate of 13.4% relative were made to the baseline on development data and 9.2% relative on validation data. For speaker diarization, results proved less consistent, with relative DER improvements of 23.25% on development, but no significant change on a randomly selected validation set. Closer inspection revealed performance variability on the meeting level, with some meetings improving substantially and others degrading. Further analysis over a large set of meetings confirmed this variability, but also showed many meetings benefitting significantly from the proposed technique.