grider-withourai
/

parakeet-ctc-1.1b-ja

Automatic Speech Recognition

Model card Files Files and versions Community

grider-withourai commited on Aug 8, 2024

Commit

a50f503

·

verified ·

1 Parent(s): 153c193

Update README.md

Files changed (1) hide show

README.md +74 -3

README.md CHANGED Viewed

@@ -1,3 +1,74 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- ja
+library_name: nemo
+pipeline_tag: automatic-speech-recognition
+---
+# ASR Model Card: parakeet-ctc-1.1b-ja
+## Model Details
+- **Model Name**: parakeet-ctc-1.1b-ja
+- **Type**: Automatic Speech Recognition (ASR)
+- **Language**: Japanese
+- **Framework**: NVIDIA NeMo
+## Installation
+To use this model, you need to install the NeMo toolkit:
+```bash
+pip install nemo-toolkit==2.0.0rc0 nemo-toolkit[asr]==2.0.0rc0
+```
+## Usage
+Here's a basic example of how to use the model:
+```python
+import nemo.collections.asr as nemo_asr
+# Load the model
+nemo_model = nemo_asr.models.ASRModel.restore_from("/path/to/parakeet-ja.nemo")
+# Transcribe audio files
+audio_files = ["path/to/audio1.wav", "path/to/audio2.wav"]
+transcriptions = nemo_model.transcribe(audio_files)
+# Print transcriptions
+for audio_file, transcription in zip(audio_files, transcriptions):
+    print(f"Transcription for {audio_file}: {transcription}")
+```
+## Limitations
+- This model is specifically trained for Japanese language and may not perform well on other languages.
+- The accuracy of transcription may vary depending on the audio quality, background noise, and speaker accent.
+- The model may struggle with specialized vocabulary or technical terms not encountered during training.
+## Performance
+The following table compares the performance of the NeMo model (Parakeet-JA) with Whisper v2 large and Whisper v3 large across different Japanese ASR datasets:
+| Model          | Dataset                            | WER    | CER    |
+|----------------|-----------------------------------|--------|--------|
+| Whisper v2 large | japanese-asr/ja_asr.reazonspeech_test | 1.1378 | 0.3472 |
+|                | japanese-asr/ja_asr.jsut_basic5000    | 0.8988 | 0.1063 |
+|                | japanese-asr/ja_asr.common_voice_8_0  | 1.0314 | 0.1594 |
+| Whisper v3 large | japanese-asr/ja_asr.reazonspeech_test | 0.9685 | 0.2107 |
+|                | japanese-asr/ja_asr.jsut_basic5000    | 0.9936 | 0.1360 |
+|                | japanese-asr/ja_asr.common_voice_8_0  | 1.0178 | 0.1548 |
+| NeMo (parakeet-ctc-1.1b-ja) | japanese-asr/ja_asr.reazonspeech_test | 0.7785 | 0.1521 |
+|                | japanese-asr/ja_asr.jsut_basic5000    | 0.9462 | 0.1291 |
+|                | japanese-asr/ja_asr.common_voice_8_0  | 1.0002 | 0.1290 |
+## Ethical Considerations
+- Ensure that you have the necessary permissions and comply with local laws when recording and transcribing audio.
+- Be aware of potential biases in the model, especially regarding different Japanese dialects or accents.
+- Consider the privacy implications of transcribing personal or sensitive conversations.
+## Additional Information
+For more detailed information on using ASR models with the NeMo toolkit, please refer to the [NeMo ASR documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html).