In a research I’m currently working on we tackle the problem of discrimination between speech and singing. One of the indicators of singing as opposed to speech is the presence of vibrato applied by the singer, more accurately a pitch vibrato. The pitch vibrato is an oscillation of the base pitch with a frequency between 4 and 8 Hz. Therefore, in order to identify the vibrato effect we need to detect this oscillation.
The first step is detecting the pitch. We dissect the audio to frames of 256 samples each and perform pitch detection using the autocorrelation method. Now we have a vector of values indication the pitch for each frame. We compute the DFT of this pitch vector and examine the range of 4-8 Hz. A peak (local maximum) in this range indicates an oscillation of the base pitch. For more robustness we simply calculate the energy of this range:
Now we may either use the calculated energy as a measure for vibrato or compare it to a certain threshold we and tell whether the pitch vibrates.
The MIR Toolbox made this task very easy to perform by offering useful functions for audio analysis, segmentation to frames and much more.