|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- yairschiff/qm9 |
|
--- |
|
|
|
|
|
## Quick Start Guide |
|
|
|
To use this pre-trained model with the HuggingFace APIs, use the following snippet: |
|
|
|
```python |
|
from transformers import AutoModelForMaskedLM, AutoTokenizer |
|
|
|
# See the `UDLM` collection page on the hub for list of available models. |
|
tokenizer = transformers.AutoTokenizer.from_pretrained('yairschiff/qm9-tokenizer') |
|
model_name = 'kuleshov-group/udlm-qm9' |
|
model = AutoModelForMaskedLM.from_pretrained(model_name) |
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels. |
|
This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193). |
|
|
|
### Architecture |
|
|
|
The model has a context size of 32 tokens. The model has 92M parameters. |
|
|
|
The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of: |
|
- 12 multi-head attention blocks (with 12 attention heads), |
|
- hidden dimension of 768, |
|
- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation). |
|
|
|
|
|
### Training Details |
|
|
|
The model was trained using the `yairschiff/qm9-tokenizer` tokenizer, a custom tokenizer for parsing SMILES strings. |
|
We trained for 25k gradient update steps using a batch size of 2,048. |
|
We used linear warm-up with 1,000 steps until we reach a learning rate of 3e-4 and the applied cosine-decay until reaching a minimum learning rate of 3e-6. |
|
|
|
For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193). |
|
|
|
## Citation |
|
Please cite our work using the bibtex below: |
|
|
|
### BibTeX: |
|
``` |
|
@article{schiff2024discreteguidance, |
|
title={Simple Guidance Mechanisms for Discrete Diffusion Models}, |
|
author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr}, |
|
journal={arXiv preprint arXiv:2412.10193}, |
|
year={2024} |
|
} |
|
``` |
|
|