FanavaranPars/ERI-VAD · Hugging Face

Design and development of audio activity detection services

The Speech Activity Recognition (VAD) module is used to manage audio files input to the Automatic Speech Recognition (ASR) system in various systems. With the ability to detect the presence of speech at the frame level, this module prevents ASR from wasting processing power on parts of the file that do not have speech content. In this chapter, the problem of speech activity recognition is explained first, and then the latest efficient models for speech activity recognition are introduced. Finally, the suitable models are trained with the VADS-V01 database and then their detailed evaluation is done.