File size: 2,128 Bytes
4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 397b9bd ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 4ffda82 ce959d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
library_name: transformers
license: apache-2.0
datasets:
- yairschiff/qm9
---
## Quick Start Guide
To use this pre-trained model with the HuggingFace APIs, use the following snippet:
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('yairschiff/qm9-tokenizer')
model_name = 'kuleshov-group/udlm-qm9'
model = AutoModelForMaskedLM.from_pretrained(model_name)
```
## Model Details
UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels.
This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193).
### Architecture
The model has a context size of 32 tokens. The model has 92M parameters.
The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of:
- 12 multi-head attention blocks (with 12 attention heads),
- hidden dimension of 768,
- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation).
### Training Details
The model was trained using the `yairschiff/qm9-tokenizer` tokenizer, a custom tokenizer for parsing SMILES strings.
We trained for 25k gradient update steps using a batch size of 2,048.
We used linear warm-up with 1,000 steps until we reach a learning rate of 3e-4 and the applied cosine-decay until reaching a minimum learning rate of 3e-6.
For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193).
## Citation
Please cite our work using the bibtex below:
### BibTeX:
```
@article{schiff2024discreteguidance,
title={Simple Guidance Mechanisms for Discrete Diffusion Models},
author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
journal={arXiv preprint arXiv:2412.10193},
year={2024}
}
```
|