Week 40 (Extension)

Voice activity detection

There a lot of ways to do voice activity detection (VAD). In the time domain, VAD can be accomplished with short-time average energy, zero-crossings ratio, short-time average magnitude. In the frequency domain, one can achieve with spectral energy, long-term spectral envelope, Mel-frequency cepstral coefficients, and linear predictive coding. Because our audio files have noise, portions of child speech, and other confounding information, we’ll try the frequency domain method, where we’ll extract the spectral energy to figure out the presence of speech. Then, we’ll compare that to a method that uses the long-term speech information. So, for the first method I watched this video the latter method I needed to read this paper.

Best,
EO