vumichien
/

wav2vec2-large-xlsr-japanese

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

patrickvonplaten commited on Mar 30, 2021

Commit

521ad3f

·

1 Parent(s): e249a0c

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 # config
 wakati = MeCab.Tagger("-Owakati")
-chars_to_ignore_regex = '[\,\、\。\．\「\」\…\？\・]'
 # load data, processor and model
 test_dataset = load_dataset("common_voice", "ja", split="test[:2%]")
@@ -66,6 +66,10 @@ print("Reference:", test_dataset["sentence"][:2])
 ## Evaluation
 The model can be evaluated as follows on the Japanese test data of Common Voice.
 ```python
 import torch
 import librosa
 import torchaudio
@@ -75,7 +79,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 #config
 wakati = MeCab.Tagger("-Owakati")
-chars_to_ignore_regex = '[\,\、\。\．\「\」\…\？\・]'
 # load data, processor and model
 test_dataset = load_dataset("common_voice", "ja", split="test")

 # config
 wakati = MeCab.Tagger("-Owakati")
+chars_to_ignore_regex = '[\\,\\、\\。\\．\\「\\」\\…\\？\\・]'
 # load data, processor and model
 test_dataset = load_dataset("common_voice", "ja", split="test[:2%]")
 ## Evaluation
 The model can be evaluated as follows on the Japanese test data of Common Voice.
 ```python
+!pip install mecab-python3
+!pip install unidic-lite
+!python -m unidic download
 import torch
 import librosa
 import torchaudio
 #config
 wakati = MeCab.Tagger("-Owakati")
+chars_to_ignore_regex = '[\\,\\、\\。\\．\\「\\」\\…\\？\\・]'
 # load data, processor and model
 test_dataset = load_dataset("common_voice", "ja", split="test")