Fill-Mask
Transformers
Safetensors
udlm
custom_code
File size: 2,128 Bytes
4ffda82
 
ce959d9
 
 
4ffda82
 
 
ce959d9
4ffda82
ce959d9
4ffda82
ce959d9
 
4ffda82
ce959d9
 
 
 
 
4ffda82
 
ce959d9
4ffda82
ce959d9
 
4ffda82
ce959d9
4ffda82
ce959d9
4ffda82
ce959d9
 
 
 
4ffda82
 
ce959d9
4ffda82
397b9bd
ce959d9
 
4ffda82
ce959d9
4ffda82
ce959d9
 
4ffda82
ce959d9
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
library_name: transformers
license: apache-2.0
datasets:
- yairschiff/qm9
---


## Quick Start Guide

To use this pre-trained model with the HuggingFace APIs, use the following snippet:

```python
from transformers import AutoModelForMaskedLM, AutoTokenizer

# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('yairschiff/qm9-tokenizer')
model_name = 'kuleshov-group/udlm-qm9'
model = AutoModelForMaskedLM.from_pretrained(model_name)
```


## Model Details

UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels.
This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193).

### Architecture

The model has a context size of 32 tokens. The model has 92M parameters.

The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of:
- 12 multi-head attention blocks (with 12 attention heads),
- hidden dimension of 768,
- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation).


### Training Details

The model was trained using the `yairschiff/qm9-tokenizer` tokenizer, a custom tokenizer for parsing SMILES strings.
We trained for 25k gradient update steps using a batch size of 2,048.
We used linear warm-up with 1,000 steps until we reach a learning rate of 3e-4 and the applied cosine-decay until reaching a minimum learning rate of 3e-6.

For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193).

## Citation
Please cite our work using the bibtex below:

### BibTeX:
```
@article{schiff2024discreteguidance,
  title={Simple Guidance Mechanisms for Discrete Diffusion Models},          
  author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
  journal={arXiv preprint arXiv:2412.10193},
  year={2024}
}
```