TencentGameMate commited on
Commit
9caa7f5
·
1 Parent(s): 1496734

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ This model does not have a tokenizer as it was pretrained on audio alone.
6
+ In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
7
+
8
+ python package:
9
+ transformers==4.16.2
10
+
11
+ ```python
12
+
13
+
14
+ import torch
15
+ import torch.nn.functional as F
16
+ import soundfile as sf
17
+ from fairseq import checkpoint_utils
18
+
19
+ from transformers import (
20
+ Wav2Vec2FeatureExtractor,
21
+ Wav2Vec2ForPreTraining,
22
+ Wav2Vec2Model,
23
+ )
24
+ from transformers.models.wav2vec2.modeling_wav2vec2 import _compute_mask_indices
25
+
26
+ model_path=""
27
+ wav_path=""
28
+ mask_prob=0.0
29
+ mask_length=10
30
+
31
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_path)
32
+ model = Wav2Vec2Model.from_pretrained(model_path)
33
+
34
+ # for pretrain: Wav2Vec2ForPreTraining
35
+ # model = Wav2Vec2ForPreTraining.from_pretrained(model_path)
36
+
37
+ model = model.to(device)
38
+ model = model.half()
39
+ model.eval()
40
+
41
+ wav, sr = sf.read(wav_path)
42
+ input_values = feature_extractor(wav, return_tensors="pt").input_values
43
+ input_values = input_values.half()
44
+ input_values = input_values.to(device)
45
+
46
+ # for Wav2Vec2ForPreTraining
47
+ # batch_size, raw_sequence_length = input_values.shape
48
+ # sequence_length = model._get_feat_extract_output_lengths(raw_sequence_length)
49
+ # mask_time_indices = _compute_mask_indices((batch_size, sequence_length), mask_prob=0.0, mask_length=2)
50
+ # mask_time_indices = torch.tensor(mask_time_indices, device=input_values.device, dtype=torch.long)
51
+
52
+ with torch.no_grad():
53
+ outputs = model(input_values)
54
+ last_hidden_states = outputs.last_hidden_states
55
+
56
+ # for Wav2Vec2ForPreTraining
57
+ # outputs = model(input_values, mask_time_indices=mask_time_indices, output_hidden_states=True)
58
+ # last_hidden_states = outputs.hidden_states[-1]
59
+
60
+ ```