huseinzol05
commited on
Commit
·
0631e21
1
Parent(s):
9a975b9
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ms
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
# Malaysian Finetune Whisper Small
|
8 |
+
|
9 |
+
Finetune Whisper Small on Malaysian dataset,
|
10 |
+
1. IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA-STT
|
11 |
+
2. Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel-malaysian-youtube-whisper-large-v3
|
12 |
+
3. Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia-ai/malay-conversational-speech-corpus
|
13 |
+
4. Haqkiem TTS Dataset, this is private, but you request access from https://www.linkedin.com/in/haqkiem-daim/
|
14 |
+
5. Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara-audiobook
|
15 |
+
|
16 |
+
Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper
|
17 |
+
|
18 |
+
Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-small?workspace=user-huseinzol05
|