whisper tiny fine-tuned on a very big collection of vietnamese speech datasets

TODO:

21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)

manually evaluate WER on test set - vietnamese part:

@ float16 CommonVoice v16.1 FLEURS VIVOS
original whisper-tiny >100% 88.6% 62.5%
this model 26.6% 37.1% 18.7%

all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi

usage example:

import torch
from transformers import pipeline

PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}

PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
Downloads last month
30
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for doof-ferb/whisper-tiny-vi

Finetuned
(1268)
this model

Datasets used to train doof-ferb/whisper-tiny-vi

Evaluation results