jonatasgrosman commited on
Commit
4f8b703
·
1 Parent(s): 0534a4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -59,10 +59,55 @@ model-index:
59
  Fine-tuned [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on Italian using the [Common Voice 8](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0).
60
  When using this model, make sure that your speech input is sampled at 16kHz.
61
 
62
- This model has been fine-tuned thanks to the GPU credits generously given by the [OVHcloud](https://www.ovhcloud.com/en/public-cloud/ai-training/) :)
63
 
64
- The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ## Evaluation Commands
68
 
 
59
  Fine-tuned [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on Italian using the [Common Voice 8](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0).
60
  When using this model, make sure that your speech input is sampled at 16kHz.
61
 
62
+ This model has been fine-tuned by the [HuggingSound](https://github.com/jonatasgrosman/huggingsound) tool, and thanks to the GPU credits generously given by the [OVHcloud](https://www.ovhcloud.com/en/public-cloud/ai-training/) :)
63
 
64
+ ## Usage
65
 
66
+ Using the [HuggingSound](https://github.com/jonatasgrosman/huggingsound) library:
67
+
68
+ ```python
69
+ from huggingsound import SpeechRecognitionModel
70
+
71
+ model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-italian")
72
+ audio_paths = ["/path/to/file.mp3", "/path/to/another_file.wav"]
73
+
74
+ transcriptions = model.transcribe(audio_paths)
75
+ ```
76
+
77
+ Writing your own inference script:
78
+
79
+ ```python
80
+ import torch
81
+ import librosa
82
+ from datasets import load_dataset
83
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
84
+
85
+ LANG_ID = "fr"
86
+ MODEL_ID = "jonatasgrosman/wav2vec2-xls-r-1b-italian"
87
+ SAMPLES = 10
88
+
89
+ test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
90
+
91
+ processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
92
+ model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
93
+
94
+ # Preprocessing the datasets.
95
+ # We need to read the audio files as arrays
96
+ def speech_file_to_array_fn(batch):
97
+ speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
98
+ batch["speech"] = speech_array
99
+ batch["sentence"] = batch["sentence"].upper()
100
+ return batch
101
+
102
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
103
+ inputs = processor(test_dataset["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
104
+
105
+ with torch.no_grad():
106
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
107
+
108
+ predicted_ids = torch.argmax(logits, dim=-1)
109
+ predicted_sentences = processor.batch_decode(predicted_ids)
110
+ ```
111
 
112
  ## Evaluation Commands
113