metadata
language: ja
datasets:
- common_voice
metrics:
- cer
model-index:
- name: wav2vec2-xls-r-300m finetuned on Japanese Hiragana with no word boundaries
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice Japanese
type: common_voice
args: ja
metrics:
- name: Test CER
type: cer
value: 9.34
Wav2Vec2-XLS-R-300M-Japanese-Hiragana
Fine-tuned facebook/wav2vec2-xls-r-300m on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.
Test Results
CER: 9.34%
Training
Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.