asr-mms-salt / README.md
jq's picture
Update README.md
015ae5c verified
|
raw
history blame
2.15 kB
metadata
license: cc-by-nc-4.0
base_model: facebook/mms-1b-all
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: mms-lug
    results: []
datasets:
  - Sunbird/salt
language:
  - lg
  - en
  - ach
  - teo
  - lgg
  - nyn

MMS speech recognition for Ugandan languages

This is a fine-tuned version of facebook/mms-1b-all for Ugandan languages, trained with the SALT dataset. The languages supported are:

code language
lug Luganda
ach Acholi
lgg Lugbara
teo Ateso
nyn Runyankole
eng English (Ugandan)

For each language there are two adapters: one optimised for cases where the speech is only in that language, and another in which code-switching with English is expected.

Usage

Usage is the same as the base model, though with different adapters available.

import torch
import transformers
import datasets

# Available adapters:
# ['lug', 'lug+eng', 'ach', 'ach+eng', 'lgg', 'lgg+eng',
#  'nyn', 'nyn+eng', 'teo', 'teo+eng']
language = 'lug'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = transformers.Wav2Vec2ForCTC.from_pretrained(
    'Sunbird/asr-mms-salt').to(device)
model.load_adapter(language)

processor = transformers.Wav2Vec2Processor.from_pretrained(
    'Sunbird/asr-mms-salt')
processor.tokenizer.set_target_lang(language)

# Get some test audio
ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
audio = ds[0]['audio']
sample_rate = ds[0]['sample_rate']

# Apply the model
inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs.to(device)).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)

print(transcription)
# ekikola ky'akasooli kyakyenvu wabula langi yakyo etera okuba eyaakitaka wansi

The output of this model is unpunctuated and lower case. For applications requiring formatted text, an alternative model is Sunbird/asr-whisper-large-v2-salt.