kuleshov-group
/

udlm-qm9

Model card Files Files and versions Community

udlm-qm9 / README.md

yairschiff's picture

Update README.md

397b9bd verified about 2 months ago

|

2.13 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- yairschiff/qm9
	---


	## Quick Start Guide

	To use this pre-trained model with the HuggingFace APIs, use the following snippet:

	```python
	from transformers import AutoModelForMaskedLM, AutoTokenizer

	# See the `UDLM` collection page on the hub for list of available models.
	tokenizer = transformers.AutoTokenizer.from_pretrained('yairschiff/qm9-tokenizer')
	model_name = 'kuleshov-group/udlm-qm9'
	model = AutoModelForMaskedLM.from_pretrained(model_name)
	```


	## Model Details

	UDLM stands for Uniform Diffusion Language Models.
	This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193).

	### Architecture

	The model has a context size of 32 tokens. The model has 92M parameters.

	The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of:
	- 12 multi-head attention blocks (with 12 attention heads),
	- hidden dimension of 768,
	- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation).


	### Training Details

	The model was trained using the `yairschiff/qm9-tokenizer` tokenizer, a custom tokenizer for parsing SMILES strings.
	We trained for 25k gradient update steps using a batch size of 2,048.
	We used linear warm-up with 1,000 steps until we reach a learning rate of 3e-4 and the applied cosine-decay until reaching a minimum learning rate of 3e-6.

	For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193).

	## Citation
	Please cite our work using the bibtex below:

	### BibTeX:
	```
	@article{schiff2024discreteguidance,
	title={Simple Guidance Mechanisms for Discrete Diffusion Models},
	author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
	journal={arXiv preprint arXiv:2412.10193},
	year={2024}
	}
	```