Haitian Speech-to-Text Model
Overview
This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use.
Performance
The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text.
Training
The model was trained with a learning rate of 1e-5.
Usage
You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
# load model and processor
processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
# read audio files
sample_path = "path/to/audio.wav"
# load audio file using torchaudio
waveform, sample_rate = torchaudio.load(sample_path)
# resample if needed (Whisper model requires 16kHz)
if sample_rate != 16000:
resampler = torchaudio.transforms.Resample(sample_rate, 16000)
waveform = resampler(waveform)
sample_rate = 16000
# ensure mono channel
if waveform.shape[0] > 1:
waveform = waveform.mean(dim=0, keepdim=True)
# process audio using Whisper processor
input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features
# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
- Downloads last month
- 494
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.