grider-withourai
/

parakeet-ctc-1.1b-ja

Automatic Speech Recognition

Model card Files Files and versions Community

parakeet-ctc-1.1b-ja / README.md

grider-withourai's picture

grider-withourai

Update README.md

a50f503 verified 5 months ago

|

history blame contribute delete

2.9 kB

	---
	license: apache-2.0
	language:
	- ja
	library_name: nemo
	pipeline_tag: automatic-speech-recognition
	---
	# ASR Model Card: parakeet-ctc-1.1b-ja

	## Model Details

	- Model Name: parakeet-ctc-1.1b-ja
	- Type: Automatic Speech Recognition (ASR)
	- Language: Japanese
	- Framework: NVIDIA NeMo

	## Installation

	To use this model, you need to install the NeMo toolkit:

	```bash
	pip install nemo-toolkit==2.0.0rc0 nemo-toolkit[asr]==2.0.0rc0
	```

	## Usage

	Here's a basic example of how to use the model:

	```python
	import nemo.collections.asr as nemo_asr

	# Load the model
	nemo_model = nemo_asr.models.ASRModel.restore_from("/path/to/parakeet-ja.nemo")

	# Transcribe audio files
	audio_files = ["path/to/audio1.wav", "path/to/audio2.wav"]
	transcriptions = nemo_model.transcribe(audio_files)

	# Print transcriptions
	for audio_file, transcription in zip(audio_files, transcriptions):
	print(f"Transcription for {audio_file}: {transcription}")
	```

	## Limitations

	- This model is specifically trained for Japanese language and may not perform well on other languages.
	- The accuracy of transcription may vary depending on the audio quality, background noise, and speaker accent.
	- The model may struggle with specialized vocabulary or technical terms not encountered during training.

	## Performance

	The following table compares the performance of the NeMo model (Parakeet-JA) with Whisper v2 large and Whisper v3 large across different Japanese ASR datasets:

	\| Model \| Dataset \| WER \| CER \|
	\|----------------\|-----------------------------------\|--------\|--------\|
	\| Whisper v2 large \| japanese-asr/ja_asr.reazonspeech_test \| 1.1378 \| 0.3472 \|
	\| \| japanese-asr/ja_asr.jsut_basic5000 \| 0.8988 \| 0.1063 \|
	\| \| japanese-asr/ja_asr.common_voice_8_0 \| 1.0314 \| 0.1594 \|
	\| Whisper v3 large \| japanese-asr/ja_asr.reazonspeech_test \| 0.9685 \| 0.2107 \|
	\| \| japanese-asr/ja_asr.jsut_basic5000 \| 0.9936 \| 0.1360 \|
	\| \| japanese-asr/ja_asr.common_voice_8_0 \| 1.0178 \| 0.1548 \|
	\| NeMo (parakeet-ctc-1.1b-ja) \| japanese-asr/ja_asr.reazonspeech_test \| 0.7785 \| 0.1521 \|
	\| \| japanese-asr/ja_asr.jsut_basic5000 \| 0.9462 \| 0.1291 \|
	\| \| japanese-asr/ja_asr.common_voice_8_0 \| 1.0002 \| 0.1290 \|

	## Ethical Considerations

	- Ensure that you have the necessary permissions and comply with local laws when recording and transcribing audio.
	- Be aware of potential biases in the model, especially regarding different Japanese dialects or accents.
	- Consider the privacy implications of transcribing personal or sensitive conversations.

	## Additional Information

	For more detailed information on using ASR models with the NeMo toolkit, please refer to the [NeMo ASR documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html).