Blackroot's picture
Update README.md
4e1336c verified
metadata
license: mit

Discord

A semi custom network trained from scratch for 799 epochs based on Simpler Diffusion (SiD2)

Modeling || Training

This network uses the optimal transport flow matching objective outlined Flow Matching for Generative Modeling

A modified tensor product attention with rope is used instead of regular MHA Tensor Product Attention is All You Need

xATGLU Layers are used in some places Expanded Gating Ranges Improve Activation Functions

This network was optimized via Distributed Shampoo Github || Distributed Shampoo Paper

python train.py will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)

python test_sample.py step_799.safetensors Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.

samples samples
samples samples

stats