ASR Model Card: parakeet-ctc-1.1b-ja

Model Details

  • Model Name: parakeet-ctc-1.1b-ja
  • Type: Automatic Speech Recognition (ASR)
  • Language: Japanese
  • Framework: NVIDIA NeMo

Installation

To use this model, you need to install the NeMo toolkit:

pip install nemo-toolkit==2.0.0rc0 nemo-toolkit[asr]==2.0.0rc0

Usage

Here's a basic example of how to use the model:

import nemo.collections.asr as nemo_asr

# Load the model
nemo_model = nemo_asr.models.ASRModel.restore_from("/path/to/parakeet-ja.nemo")

# Transcribe audio files
audio_files = ["path/to/audio1.wav", "path/to/audio2.wav"]
transcriptions = nemo_model.transcribe(audio_files)

# Print transcriptions
for audio_file, transcription in zip(audio_files, transcriptions):
    print(f"Transcription for {audio_file}: {transcription}")

Limitations

  • This model is specifically trained for Japanese language and may not perform well on other languages.
  • The accuracy of transcription may vary depending on the audio quality, background noise, and speaker accent.
  • The model may struggle with specialized vocabulary or technical terms not encountered during training.

Performance

The following table compares the performance of the NeMo model (Parakeet-JA) with Whisper v2 large and Whisper v3 large across different Japanese ASR datasets:

Model Dataset WER CER
Whisper v2 large japanese-asr/ja_asr.reazonspeech_test 1.1378 0.3472
japanese-asr/ja_asr.jsut_basic5000 0.8988 0.1063
japanese-asr/ja_asr.common_voice_8_0 1.0314 0.1594
Whisper v3 large japanese-asr/ja_asr.reazonspeech_test 0.9685 0.2107
japanese-asr/ja_asr.jsut_basic5000 0.9936 0.1360
japanese-asr/ja_asr.common_voice_8_0 1.0178 0.1548
NeMo (parakeet-ctc-1.1b-ja) japanese-asr/ja_asr.reazonspeech_test 0.7785 0.1521
japanese-asr/ja_asr.jsut_basic5000 0.9462 0.1291
japanese-asr/ja_asr.common_voice_8_0 1.0002 0.1290

Ethical Considerations

  • Ensure that you have the necessary permissions and comply with local laws when recording and transcribing audio.
  • Be aware of potential biases in the model, especially regarding different Japanese dialects or accents.
  • Consider the privacy implications of transcribing personal or sensitive conversations.

Additional Information

For more detailed information on using ASR models with the NeMo toolkit, please refer to the NeMo ASR documentation.

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.