Don't have same results with Whisper on HF and Whisper from Github using Python
By using the same audios, I do not get the same transcription from Whisper large v2 on HF and the Whisper large v2 that I pulled from Github on my python script.
I think I have to mess with settings (temperature ?) but I can't see what settings are used on hugging face.
Any idea to make my local Whisper large v2 instance to match the one hosted on here on hugging face or why do I have different results ? (I'd say they are different from 5 to 10%)
Thanks.
Hey @sboudouk ! We use the default generation kwargs in HF: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.temperature
You can override these by passing them to the .generate
method if you're using model/processor, or by forwarding generate_kwargs={"temperature": 1}
to the pipeline if you're using the pipeline.
Do you have a code snippet for your comparison? If you could share it I'd be happy to provide some pointers as to where we can make changes!
Sure, thanks for the link , struggled to find the default generation kwargs, kind of new to HF.
Here is the snippet where I build my whisper model instance in my python code:
transcribe = model.transcribe(audio, language='fr', temperature=0.0)
So from my understanding, I need to add every correspounding kwargs as a parameter to transcribe just as I added the language and the temperature ?
Hey
@sboudouk
! Yep, you can just pass generate_kwargs
as required:
import torch
from transformers import pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
generate_kwargs = {"language": "<|fr|>", "temperature": 0.0} # add any other generate kwargs you require
prediction = pipe(sample, generate_kwargs=generate_kwargs )["text"]
Can I use generate_kwargs
in HF API?
Which API
@Superchik
? For the pipeline
, you can use the structure defined above. For feature_extractor
+ model
, you can simply set language="fr"
when you call model.generate
:
model.generate(input_features, language="fr", task="transcribe")
See the following doc for more details: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperForConditionalGeneration.forward.example
import requests
API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v2"
headers = {"Authorization": "Bearer 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"}
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.post(API_URL, headers=headers, data=data)
return response.json()
output = query("sample1.flac")