--- license: mit datasets: - mozilla-foundation/common_voice_17_0 language: - ru base_model: - openai/whisper-large-v3-turbo pipeline_tag: automatic-speech-recognition metrics: - accuracy library_name: transformers tags: - call --- ### This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz - Time spent ~ 7 hours - Count of train dataset - 118k of audio samples from Mozilla Common Voice 17 --- Example of usage ```python from transformers import pipeline import gradio as gr import time pipe = pipeline( model="dvislobokov/whisper-large-v3-turbo-russian", tokenizer="dvislobokov/whisper-large-v3-turbo-russian", task='automatic-speech-recognition', device='cpu' ) def transcribe(audio): start = time.time() text = pipe(audio, return_timestamps=True)['text'] print(time.time() - start) return text iface = gr.Interface( fn=transcribe, inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'), outputs='text' ) iface.launch(share=True) ```