suzii commited on
Commit
1e9075e
·
verified ·
1 Parent(s): 5128dd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -41
README.md CHANGED
@@ -7,56 +7,17 @@ This project involves fine-tuning the Whisper-V3-Turbo model to improve its perf
7
  The training data comes from various Vietnamese speech corpora. Below is a list of datasets used for training:
8
 
9
  1. **capleaf/viVoice**
10
- - Path: `capleaf/viVoice`
11
- - Mode: `3`
12
- - Split: `train`
13
-
14
  2. **NhutP/VSV-1100**
15
- - Path: `NhutP/VSV-1100`
16
- - Mode: `1`
17
- - Split: `train`
18
-
19
  3. **doof-ferb/fpt_fosd**
20
- - Path: `doof-ferb/fpt_fosd`
21
- - Mode: `0`
22
- - Split: `train`
23
-
24
  4. **doof-ferb/infore1_25hours**
25
- - Path: `doof-ferb/infore1_25hours`
26
- - Mode: `0`
27
- - Split: `train`
28
-
29
  5. **google/fleurs (vi_vn)**
30
- - Path: `google/fleurs`
31
- - Name: `vi_vn`
32
- - Mode: `1`
33
- - Split: `train`
34
-
35
  6. **doof-ferb/LSVSC**
36
- - Path: `doof-ferb/LSVSC`
37
- - Mode: `1`
38
- - Split: `train`
39
-
40
  7. **quocanh34/viet_vlsp**
41
- - Path: `quocanh34/viet_vlsp`
42
- - Mode: `0`
43
- - Split: `train`
44
-
45
  8. **linhtran92/viet_youtube_asr_corpus_v2**
46
- - Path: `linhtran92/viet_youtube_asr_corpus_v2`
47
- - Mode: `1`
48
- - Split: `train`
49
-
50
  9. **doof-ferb/infore2_audiobooks**
51
- - Path: `doof-ferb/infore2_audiobooks`
52
- - Mode: `0`
53
- - Split: `train`
54
-
55
- 10. **linhtran92/viet_bud500**
56
- - Path: `linhtran92/viet_bud500`
57
- - Mode: `0`
58
- - Split: `train`
59
 
 
60
  ## Model
61
 
62
  The model used in this project is the **Whisper-V3-Turbo**. Whisper is a multilingual ASR model trained on a large and diverse dataset. The version used here has been fine-tuned specifically for the Vietnamese language.
 
7
  The training data comes from various Vietnamese speech corpora. Below is a list of datasets used for training:
8
 
9
  1. **capleaf/viVoice**
 
 
 
 
10
  2. **NhutP/VSV-1100**
 
 
 
 
11
  3. **doof-ferb/fpt_fosd**
 
 
 
 
12
  4. **doof-ferb/infore1_25hours**
 
 
 
 
13
  5. **google/fleurs (vi_vn)**
 
 
 
 
 
14
  6. **doof-ferb/LSVSC**
 
 
 
 
15
  7. **quocanh34/viet_vlsp**
 
 
 
 
16
  8. **linhtran92/viet_youtube_asr_corpus_v2**
 
 
 
 
17
  9. **doof-ferb/infore2_audiobooks**
18
+ 10. **linhtran92/viet_bud500**
 
 
 
 
 
 
 
19
 
20
+ 11.
21
  ## Model
22
 
23
  The model used in this project is the **Whisper-V3-Turbo**. Whisper is a multilingual ASR model trained on a large and diverse dataset. The version used here has been fine-tuned specifically for the Vietnamese language.