hash2004 commited on
Commit
220e6bc
·
verified ·
1 Parent(s): 52bb174

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ur
4
+ library_name: nemo
5
+ datasets:
6
+ - mozilla-foundation/common_voice_12_0
7
+ thumbnail: null
8
+ tags:
9
+ - automatic-speech-recognition
10
+ - speech
11
+ - audio
12
+ - Transducer
13
+ - FastConformer
14
+ - Conformer
15
+ - pytorch
16
+ - NeMo
17
+ license: cc-by-4.0
18
+ widget:
19
+ - Title: Common Voice Urdu Sample
20
+ src: https://cdn-media.huggingface.co/speech_samples/sample_urdu.flac
21
+ model-index:
22
+ - name: parakeet-rnnt-0.6b-urdu
23
+ results:
24
+ - task:
25
+ name: Automatic Speech Recognition
26
+ type: automatic-speech-recognition
27
+ dataset:
28
+ name: Mozilla Common Voice 12.0 (Urdu)
29
+ type: mozilla-foundation/common_voice_12_0
30
+ split: test
31
+ args:
32
+ language: ur
33
+ metrics:
34
+ - name: Test WER
35
+ type: wer
36
+ value: 25.513
37
+ metrics:
38
+ - wer
39
+ pipeline_tag: automatic-speech-recognition
40
+ ---
41
+ # Fine-Tuned Parakeet RNNT 0.6B (Urdu)
42
+
43
+ This repository contains the fine-tuned version of the **Parakeet RNNT 0.6B** model for **Urdu** Automatic Speech Recognition (ASR). The base model, developed by **NVIDIA NeMo** and **Suno.ai**, was fine-tuned on the Urdu dataset from Mozilla's Common Voice 12.0. This fine-tuning enables the model to perform speech-to-text tasks in Urdu with improved accuracy and domain-specific adaptation.
44
+
45
+ ---
46
+
47
+ ## Model Overview
48
+
49
+ The **Parakeet RNNT** is an XL version of the FastConformer Transducer with **600 million parameters**, optimized for ASR tasks. The fine-tuned model supports Urdu transcription, enabling applications such as subtitling, speech analytics, and voice-assisted interfaces.
50
+
51
+ Base model details can be found on 🤗 [Hugging Face](https://huggingface.co/nvidia/parakeet-rnnt-0.6b).
52
+
53
+ ---
54
+
55
+ ## Training Details
56
+
57
+ ### Dataset
58
+ The fine-tuning was performed using the **Urdu dataset** from Mozilla's [Common Voice 12.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0). This dataset provides diverse speech samples in Urdu, ensuring robust training.
59
+
60
+ ### Hardware
61
+ - **Google Colab Pro**
62
+ - **NVIDIA A100 GPU**
63
+ - Fine-tuning duration: **5 hours**
64
+ - GPU utilization: ~25%
65
+
66
+ ---
67
+
68
+ ## Results
69
+
70
+ The model achieved a **Word Error Rate (WER)** of **25.513%** on the test split of the Common Voice Urdu dataset. While this may seem high, the model demonstrates impressive accuracy in many transcriptions:
71
+
72
+ - **Reference**: کچھ بھی ہو سکتا ہے۔
73
+ **Predicted**: کچھ بھی ہو سکتا ہے۔
74
+
75
+ ---
76
+
77
+ - **Reference**: اورکوئی جمہوریت کو کوس رہا ہے۔
78
+ **Predicted**: اور کوئ جمہوریت کو کو س رہا ہے۔
79
+
80
+ This WER is slightly higher than OpenAI's **Whisper model**, which achieved **23%** without fine-tuning (\href{https://arxiv.org/html/2409.11252v1}{reference}), but demonstrates the potential of the Parakeet RNNT with further fine-tuning.
81
+
82
+ ---
83
+
84
+ ## How to Use this Model
85
+
86
+ ### Loading the Model
87
+
88
+ You can load the fine-tuned model using NVIDIA NeMo:
89
+
90
+ ```python
91
+ import nemo.collections.asr as nemo_asr
92
+ asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="hash2004/parakeet-fine-tuned-urdu")