How can I detect the language of the audio by loading the model named WhisperForConditionalGeneration?

#40

by lnpwcd68730 - opened Apr 26, 2023

Discussion

lnpwcd68730

Apr 26, 2023

•

edited Apr 26, 2023

This is how I expect to load the model

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")

This is the language detection method mentioned in the README of whisper

import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

sanchit-gandhi

Apr 27, 2023

See the last code cell of this Colab: https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor?usp=sharing#scrollTo=Mh_e6rV62QUM

sanchit-gandhi changed discussion status to closed Apr 27, 2023

lnpwcd68730

May 11, 2023

thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment