metadata
license: cc-by-nc-4.0
base_model: facebook/mms-1b-all
tags:
- generated_from_trainer
metrics:
- wer
model-index:
- name: mms-lug
results: []
datasets:
- Sunbird/salt
language:
- lg
- en
- ach
- teo
- lgg
- nyn
MMS speech recognition for Ugandan languages
This is a fine-tuned version of facebook/mms-1b-all for Ugandan languages, trained with the SALT dataset. The languages supported are:
code | language |
---|---|
lug | Luganda |
ach | Acholi |
lgg | Lugbara |
teo | Ateso |
nyn | Runyankole |
eng | English (Ugandan) |
For each language there are two adapters: one optimised for cases where the speech is only in that language, and another in which code-switching with English is expected.
Usage
Usage is the same as the base model, though with different adapters available.
import torch
import transformers
import datasets
# Available adapters:
# ['lug', 'lug+eng', 'ach', 'ach+eng', 'lgg', 'lgg+eng',
# 'nyn', 'nyn+eng', 'teo', 'teo+eng']
language = 'lug'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = transformers.Wav2Vec2ForCTC.from_pretrained(
'Sunbird/asr-mms-salt').to(device)
model.load_adapter(language)
processor = transformers.Wav2Vec2Processor.from_pretrained(
'Sunbird/asr-mms-salt')
processor.tokenizer.set_target_lang(language)
# Get some test audio
ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
audio = ds[0]['audio']
sample_rate = ds[0]['sample_rate']
# Apply the model
inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs.to(device)).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
print(transcription)
# ekikola ky'akasooli kyakyenvu wabula langi yakyo etera okuba eyaakitaka wansi
The output of this model is unpunctuated and lower case. For applications requiring formatted text, an alternative model is Sunbird/asr-whisper-large-v2-salt.