grider-withourai commited on
Commit
a50f503
·
verified ·
1 Parent(s): 153c193

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ja
5
+ library_name: nemo
6
+ pipeline_tag: automatic-speech-recognition
7
+ ---
8
+ # ASR Model Card: parakeet-ctc-1.1b-ja
9
+
10
+ ## Model Details
11
+
12
+ - **Model Name**: parakeet-ctc-1.1b-ja
13
+ - **Type**: Automatic Speech Recognition (ASR)
14
+ - **Language**: Japanese
15
+ - **Framework**: NVIDIA NeMo
16
+
17
+ ## Installation
18
+
19
+ To use this model, you need to install the NeMo toolkit:
20
+
21
+ ```bash
22
+ pip install nemo-toolkit==2.0.0rc0 nemo-toolkit[asr]==2.0.0rc0
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ Here's a basic example of how to use the model:
28
+
29
+ ```python
30
+ import nemo.collections.asr as nemo_asr
31
+
32
+ # Load the model
33
+ nemo_model = nemo_asr.models.ASRModel.restore_from("/path/to/parakeet-ja.nemo")
34
+
35
+ # Transcribe audio files
36
+ audio_files = ["path/to/audio1.wav", "path/to/audio2.wav"]
37
+ transcriptions = nemo_model.transcribe(audio_files)
38
+
39
+ # Print transcriptions
40
+ for audio_file, transcription in zip(audio_files, transcriptions):
41
+ print(f"Transcription for {audio_file}: {transcription}")
42
+ ```
43
+
44
+ ## Limitations
45
+
46
+ - This model is specifically trained for Japanese language and may not perform well on other languages.
47
+ - The accuracy of transcription may vary depending on the audio quality, background noise, and speaker accent.
48
+ - The model may struggle with specialized vocabulary or technical terms not encountered during training.
49
+
50
+ ## Performance
51
+
52
+ The following table compares the performance of the NeMo model (Parakeet-JA) with Whisper v2 large and Whisper v3 large across different Japanese ASR datasets:
53
+
54
+ | Model | Dataset | WER | CER |
55
+ |----------------|-----------------------------------|--------|--------|
56
+ | Whisper v2 large | japanese-asr/ja_asr.reazonspeech_test | 1.1378 | 0.3472 |
57
+ | | japanese-asr/ja_asr.jsut_basic5000 | 0.8988 | 0.1063 |
58
+ | | japanese-asr/ja_asr.common_voice_8_0 | 1.0314 | 0.1594 |
59
+ | Whisper v3 large | japanese-asr/ja_asr.reazonspeech_test | 0.9685 | 0.2107 |
60
+ | | japanese-asr/ja_asr.jsut_basic5000 | 0.9936 | 0.1360 |
61
+ | | japanese-asr/ja_asr.common_voice_8_0 | 1.0178 | 0.1548 |
62
+ | NeMo (parakeet-ctc-1.1b-ja) | japanese-asr/ja_asr.reazonspeech_test | 0.7785 | 0.1521 |
63
+ | | japanese-asr/ja_asr.jsut_basic5000 | 0.9462 | 0.1291 |
64
+ | | japanese-asr/ja_asr.common_voice_8_0 | 1.0002 | 0.1290 |
65
+
66
+ ## Ethical Considerations
67
+
68
+ - Ensure that you have the necessary permissions and comply with local laws when recording and transcribing audio.
69
+ - Be aware of potential biases in the model, especially regarding different Japanese dialects or accents.
70
+ - Consider the privacy implications of transcribing personal or sensitive conversations.
71
+
72
+ ## Additional Information
73
+
74
+ For more detailed information on using ASR models with the NeMo toolkit, please refer to the [NeMo ASR documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html).