How can I detect the language of the audio by loading the model named WhisperForConditionalGeneration?
#40
by
lnpwcd68730
- opened
This is how I expect to load the model
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")
This is the language detection method mentioned in the README of whisper
import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
See the last code cell of this Colab: https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor?usp=sharing#scrollTo=Mh_e6rV62QUM
sanchit-gandhi
changed discussion status to
closed
thank you!