license: mit
A semi custom network trained from scratch for 799 epochs based on Simpler Diffusion (SiD2)
This network uses the optimal transport flow matching objective outlined Flow Matching for Generative Modeling
A modified tensor product attention with rope is used instead of regular MHA Tensor Product Attention is All You Need
xATGLU Layers are used in some places Expanded Gating Ranges Improve Activation Functions
This network was optimized via Distributed Shampoo Github || Distributed Shampoo Paper
python train.py
will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
python test_sample.py step_799.safetensors
Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.