Haitian Speech-to-Text Model

Overview

This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use.

Performance

The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text.

Training

The model was trained with a learning rate of 1e-5.

Usage

You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

# load model and processor
processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")

# read audio files
sample_path = "path/to/audio.wav"
# load audio file using torchaudio
waveform, sample_rate = torchaudio.load(sample_path)

# resample if needed (Whisper model requires 16kHz)
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
    waveform = resampler(waveform)
    sample_rate = 16000

# ensure mono channel
if waveform.shape[0] > 1:
    waveform = waveform.mean(dim=0, keepdim=True)

# process audio using Whisper processor
input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
Downloads last month
494
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.