Introduction
- We release a new model for Vietnamese speech regconition task.
- We fine-tuned openai/whisper-medium on our new dataset VSV-1100.
Training data
VSV-1100 | T2S* | CMV14-vi | VIVOS | VLSP2021 | Total |
---|---|---|---|---|---|
1100 hours | 11 hours | 3.04 hours | 13.94 hours | 180 hours | 1308 hours |
* We use a text-to-speech model to generate sentences containing words that do not appear in our dataset.
WER result
CMV14-vi | VIVOS | VLSP2020-T1 | VLSP2020-T2 | VLSP2021-T1 | VLSP2021-T2 | Bud500 |
---|---|---|---|---|---|---|
8.1 | 4.69 | 13.22 | 28.76 | 11.78 | 8.28 | 5.38 |
Usage
Inference
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# load model and processor
processor = WhisperProcessor.from_pretrained("NhutP/ViWhisper-medium")
model = WhisperForConditionalGeneration.from_pretrained("NhutP/ViWhisper-medium")
model.config.forced_decoder_ids = None
# load a sample
array, sampling_rate = librosa.load('path_to_audio', sr = 16000) # Load some audio sample
input_features = processor(array, sampling_rate=sampling_rate, return_tensors="pt").input_features
# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
Use with pipeline
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="NhutP/ViWhisper-medium",
max_new_tokens=128,
chunk_length_s=30,
return_timestamps=False,
device= '...' # 'cpu' or 'cuda'
)
output = pipe(path_to_audio_samplingrate_16000)['text']
Citation
@misc{VSV-1100,
author = {Pham Quang Nhut and Duong Pham Hoang Anh and Nguyen Vinh Tiep},
title = {VSV-1100: Vietnamese social voice dataset},
url = {https://github.com/NhutP/VSV-1100},
year = {2024}
}
Also, please give us a star on github: https://github.com/NhutP/ViWhisper if you find our project useful
Contact me at: [email protected] (Pham Quang Nhut)
- Downloads last month
- 42
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for NhutP/ViWhisper-medium
Base model
openai/whisper-medium