Aku Rouhe commited on
Commit
fec6d06
·
1 Parent(s): 934b4ef

Change instructions

Browse files
Files changed (1) hide show
  1. README.md +6 -92
README.md CHANGED
@@ -58,105 +58,19 @@ Please notice that we encourage you to read our tutorials and learn more about
58
  ### Transcribing your own audio files
59
 
60
  ```python
61
- import torch
62
- import torchaudio
63
- import speechbrain
64
- from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.acoustic import ASR
65
 
66
- asr_model = ASR()
67
-
68
- # Make sure your output is sampled at 16 kHz.
69
- audio_file='path_to_your_audio_file'
70
- wav, fs = torchaudio.load(audio_file)
71
- wav_lens = torch.tensor([1]).float()
72
-
73
- # Transcribe!
74
- words, tokens = asr_model.transcribe(wav, wav_lens)
75
- print(words)
76
 
77
  ```
78
 
79
  ### Obtaining encoded features
80
 
81
- The SpeechBrain ASR() Class provides an easy way to encode the speech signal
82
- without running the decoding phase. Hence, one can obtain the output of the
83
- CRDNN model.
84
-
85
- ```python
86
- import torch
87
- import torchaudio
88
- import speechbrain
89
- from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.acoustic import ASR
90
-
91
- asr_model = ASR()
92
-
93
- # Make sure your output is sampled at 16 kHz.
94
- audio_file='path_to_your_audio_file'
95
- wav, fs = torchaudio.load(audio_file)
96
- wav_lens = torch.tensor([1]).float()
97
-
98
- # Transcribe!
99
- words, tokens = asr_model.encode(wav, wav_lens)
100
- print(words)
101
-
102
- ```
103
-
104
- ### Playing with the language model only
105
-
106
- Thanks to SpeechBrain lobes, it is feasible to simply instantiate the language
107
- model to further processing on your custom pipeline:
108
-
109
- ```python
110
- import torch
111
- import speechbrain
112
- from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.lm import LM
113
-
114
- lm = LM()
115
-
116
- text = "THE CAT IS ON"
117
-
118
- # Next word prediction
119
- encoded_text = lm.tokenizer.encode_as_ids(text)
120
- encoded_text = torch.Tensor(encoded_text).unsqueeze(0)
121
- prob_out, _ = lm(encoded_text.to(lm.device))
122
- index = int(torch.argmax(prob_out[0,-1,:]))
123
- print(lm.tokenizer.decode(index))
124
-
125
- # Text generation
126
- encoded_text = torch.tensor([0, 2]) # bos token + the
127
- encoded_text = encoded_text.unsqueeze(0).to(lm.device)
128
- for i in range(19):
129
- prob_out, _ = lm(encoded_text)
130
- index = torch.argmax(prob_out[0,-1,:]).unsqueeze(0)
131
- encoded_text = torch.cat([encoded_text, index.unsqueeze(0)], dim=1)
132
- encoded_text = encoded_text[0,1:].tolist()
133
- print(lm.tokenizer.decode(encoded_text))
134
-
135
- ```
136
-
137
- ### Playing with the tokenizer only
138
-
139
- In the same manner as for the language model, one can isntantiate the tokenizer
140
- only with the corresponding lobes in SpeechBrain.
141
-
142
- ```python
143
- import speechbrain
144
- from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.tokenizer import tokenizer
145
-
146
- # HuggingFace paths to download the pretrained models
147
- token_file = 'tokenizer/1000_unigram.model'
148
- model_name = 'sb/asr-crdnn-librispeech'
149
- save_dir = 'model_checkpoints'
150
-
151
- text = "THE CAT IS ON THE TABLE"
152
 
153
- tokenizer = tokenizer(token_file, model_name, save_dir)
154
-
155
- # Tokenize!
156
- print(tokenizer.spm.encode(text))
157
- print(tokenizer.spm.encode(text, out_type='str'))
158
-
159
- ```
160
 
161
  #### Referencing SpeechBrain
162
 
 
58
  ### Transcribing your own audio files
59
 
60
  ```python
61
+ from speechbrain.pretrained import EncoderDecoderASR
 
 
 
62
 
63
+ asr_model = EncoderDecoderASR.from_hparams(source="Gastron/asr-crdnn-librispeech")
64
+ asr_model.transcribe_file("path_to_your_file.wav")
 
 
 
 
 
 
 
 
65
 
66
  ```
67
 
68
  ### Obtaining encoded features
69
 
70
+ The SpeechBrain EncoderDecoderASR() class also provides an easy way to encode
71
+ the speech signal without running the decoding phase by calling
72
+ ``EncoderDecoderASR.encode_batch()``
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
 
 
 
 
 
 
 
74
 
75
  #### Referencing SpeechBrain
76