Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,92 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ur
|
4 |
+
library_name: nemo
|
5 |
+
datasets:
|
6 |
+
- mozilla-foundation/common_voice_12_0
|
7 |
+
thumbnail: null
|
8 |
+
tags:
|
9 |
+
- automatic-speech-recognition
|
10 |
+
- speech
|
11 |
+
- audio
|
12 |
+
- Transducer
|
13 |
+
- FastConformer
|
14 |
+
- Conformer
|
15 |
+
- pytorch
|
16 |
+
- NeMo
|
17 |
+
license: cc-by-4.0
|
18 |
+
widget:
|
19 |
+
- Title: Common Voice Urdu Sample
|
20 |
+
src: https://cdn-media.huggingface.co/speech_samples/sample_urdu.flac
|
21 |
+
model-index:
|
22 |
+
- name: parakeet-rnnt-0.6b-urdu
|
23 |
+
results:
|
24 |
+
- task:
|
25 |
+
name: Automatic Speech Recognition
|
26 |
+
type: automatic-speech-recognition
|
27 |
+
dataset:
|
28 |
+
name: Mozilla Common Voice 12.0 (Urdu)
|
29 |
+
type: mozilla-foundation/common_voice_12_0
|
30 |
+
split: test
|
31 |
+
args:
|
32 |
+
language: ur
|
33 |
+
metrics:
|
34 |
+
- name: Test WER
|
35 |
+
type: wer
|
36 |
+
value: 25.513
|
37 |
+
metrics:
|
38 |
+
- wer
|
39 |
+
pipeline_tag: automatic-speech-recognition
|
40 |
+
---
|
41 |
+
# Fine-Tuned Parakeet RNNT 0.6B (Urdu)
|
42 |
+
|
43 |
+
This repository contains the fine-tuned version of the **Parakeet RNNT 0.6B** model for **Urdu** Automatic Speech Recognition (ASR). The base model, developed by **NVIDIA NeMo** and **Suno.ai**, was fine-tuned on the Urdu dataset from Mozilla's Common Voice 12.0. This fine-tuning enables the model to perform speech-to-text tasks in Urdu with improved accuracy and domain-specific adaptation.
|
44 |
+
|
45 |
+
---
|
46 |
+
|
47 |
+
## Model Overview
|
48 |
+
|
49 |
+
The **Parakeet RNNT** is an XL version of the FastConformer Transducer with **600 million parameters**, optimized for ASR tasks. The fine-tuned model supports Urdu transcription, enabling applications such as subtitling, speech analytics, and voice-assisted interfaces.
|
50 |
+
|
51 |
+
Base model details can be found on 🤗 [Hugging Face](https://huggingface.co/nvidia/parakeet-rnnt-0.6b).
|
52 |
+
|
53 |
+
---
|
54 |
+
|
55 |
+
## Training Details
|
56 |
+
|
57 |
+
### Dataset
|
58 |
+
The fine-tuning was performed using the **Urdu dataset** from Mozilla's [Common Voice 12.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0). This dataset provides diverse speech samples in Urdu, ensuring robust training.
|
59 |
+
|
60 |
+
### Hardware
|
61 |
+
- **Google Colab Pro**
|
62 |
+
- **NVIDIA A100 GPU**
|
63 |
+
- Fine-tuning duration: **5 hours**
|
64 |
+
- GPU utilization: ~25%
|
65 |
+
|
66 |
+
---
|
67 |
+
|
68 |
+
## Results
|
69 |
+
|
70 |
+
The model achieved a **Word Error Rate (WER)** of **25.513%** on the test split of the Common Voice Urdu dataset. While this may seem high, the model demonstrates impressive accuracy in many transcriptions:
|
71 |
+
|
72 |
+
- **Reference**: کچھ بھی ہو سکتا ہے۔
|
73 |
+
**Predicted**: کچھ بھی ہو سکتا ہے۔
|
74 |
+
|
75 |
+
---
|
76 |
+
|
77 |
+
- **Reference**: اورکوئی جمہوریت کو کوس رہا ہے۔
|
78 |
+
**Predicted**: اور کوئ جمہوریت کو کو س رہا ہے۔
|
79 |
+
|
80 |
+
This WER is slightly higher than OpenAI's **Whisper model**, which achieved **23%** without fine-tuning (\href{https://arxiv.org/html/2409.11252v1}{reference}), but demonstrates the potential of the Parakeet RNNT with further fine-tuning.
|
81 |
+
|
82 |
+
---
|
83 |
+
|
84 |
+
## How to Use this Model
|
85 |
+
|
86 |
+
### Loading the Model
|
87 |
+
|
88 |
+
You can load the fine-tuned model using NVIDIA NeMo:
|
89 |
+
|
90 |
+
```python
|
91 |
+
import nemo.collections.asr as nemo_asr
|
92 |
+
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="hash2004/parakeet-fine-tuned-urdu")
|