|
--- |
|
license: mit |
|
datasets: |
|
- the_pile_openwebtext2 |
|
language: |
|
- en |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** TBD |
|
- **Paper:** https://arxiv.org/abs/2309.08351 |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
This model is a bert-base architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective. |
|
|
|
#### Software |
|
|
|
[More Information Needed] |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{godey2023headless, |
|
title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying}, |
|
author={Nathan Godey and Éric de la Clergerie and Benoît Sagot}, |
|
year={2023}, |
|
eprint={2309.08351}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
|
|
Nathan Godey |
|
Eric de la Clergerie |
|
Benoît Sagot |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |
|
|
|
|