Update README.md
Browse files
README.md
CHANGED
@@ -103,15 +103,19 @@ inference: false
|
|
103 |
|
104 |
We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section 3.2.1 of the [paper](https://arxiv.org/pdf/2312.05187.pdf), which is at the core of our Seamless models.
|
105 |
|
|
|
|
|
106 |
| Model Name | #params | checkpoint |
|
107 |
| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
108 |
| W2v-BERT 2.0 | 600M | [checkpoint](https://huggingface.co/reach-vb/conformer-shaw/resolve/main/conformer_shaw.pt)
|
109 |
|
110 |
-
|
111 |
-
Besides leveraging more pre-training data, we removed the random-projection quantizer (RPQ) (Chiu et al., 2022) and its associated loss previously incorporated in SeamlessM4T v1 (Seamless Communication et al., 2023).4 Akin to v1, the v2 w2v-BERT 2.0 comprises 24 Conformer layers (Gulati et al., 2020) with approximately 600M parameters and the same pre-training hyperparameters.
|
112 |
|
|
|
113 |
|
114 |
-
|
|
|
|
|
115 |
|
116 |
```python
|
117 |
import torch
|
|
|
103 |
|
104 |
We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section 3.2.1 of the [paper](https://arxiv.org/pdf/2312.05187.pdf), which is at the core of our Seamless models.
|
105 |
|
106 |
+
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
|
107 |
+
|
108 |
| Model Name | #params | checkpoint |
|
109 |
| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
110 |
| W2v-BERT 2.0 | 600M | [checkpoint](https://huggingface.co/reach-vb/conformer-shaw/resolve/main/conformer_shaw.pt)
|
111 |
|
112 |
+
**This model and its training are supported by 🤗 Transformers, more on it in the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert).**
|
|
|
113 |
|
114 |
+
# Seamless Communication usage
|
115 |
|
116 |
+
This model can be used in [Seamless Communication](https://github.com/facebookresearch/seamless_communication), where it was released.
|
117 |
+
|
118 |
+
Here's how to make a forward pass through the voice encoder, after having completed the [installation steps](https://github.com/facebookresearch/seamless_communication?tab=readme-ov-file#installation):
|
119 |
|
120 |
```python
|
121 |
import torch
|