FLUX.1-schnell-fp8-flumina / float8_quantize.py

Commit History

remove torchao dependency, quantize entirely via linear
d45a331

aredden commited on

Fix issues with loading F8Linear from state dict when init_scale not initialized & loaded from meta device
3ddaa67

aredden commited on

Small fix for issue where f16 CublasLinear layers weren't being used even when available.
6d82dcc

aredden commited on

Ensure repo only accesses CublasLinear lazily
00f5d2c

aredden commited on

Remove f8 flux, instead configure at load, improved quality & corrected configs
1f9e684

aredden commited on

Dynamic swap with cublas linear / optional improved precision with vram drawback
37bd8c1

aredden commited on

Remove more unnecessary code, fix small typing hickup
6d0762c

aredden commited on

Remove unnecessary code, hide prints behind debug flag, hide warnings
0f3134f

aredden commited on

Add fields to configs, fix issue with offload from bnb, remove extra random text code
340f0a0

aredden commited on

Fix for nightly
0aa9861

aredden commited on

cuda version checks
b6617b1

aredden commited on

Fix non-offload inference & add option to load from prequantized flux
2f2c44c

aredden commited on

Add offloading & improved fp8 inference.
28dec30

aredden commited on