FLUX.1-schnell-fp8-flumina / float8_quantize.py

Commit History

remove torchao dependency, quantize entirely via linear

d45a331

aredden commited on Sep 2, 2024

Fix issues with loading F8Linear from state dict when init_scale not initialized & loaded from meta device

3ddaa67

aredden commited on Sep 1, 2024

Small fix for issue where f16 CublasLinear layers weren't being used even when available.

6d82dcc

aredden commited on Aug 28, 2024

Ensure repo only accesses CublasLinear lazily

00f5d2c

aredden commited on Aug 26, 2024

Remove f8 flux, instead configure at load, improved quality & corrected configs

1f9e684

aredden commited on Aug 24, 2024

Dynamic swap with cublas linear / optional improved precision with vram drawback

37bd8c1

aredden commited on Aug 24, 2024

Remove more unnecessary code, fix small typing hickup

6d0762c

aredden commited on Aug 20, 2024

Remove unnecessary code, hide prints behind debug flag, hide warnings

0f3134f

aredden commited on Aug 20, 2024

Add fields to configs, fix issue with offload from bnb, remove extra random text code

340f0a0

aredden commited on Aug 19, 2024

Fix for nightly

0aa9861

aredden commited on Aug 18, 2024

cuda version checks

b6617b1

aredden commited on Aug 18, 2024

Fix non-offload inference & add option to load from prequantized flux

2f2c44c

aredden commited on Aug 18, 2024

Add offloading & improved fp8 inference.

28dec30

aredden commited on Aug 18, 2024

Commit History

remove torchao dependency, quantize entirely via linear d45a331

Fix issues with loading F8Linear from state dict when init_scale not initialized & loaded from meta device 3ddaa67

Small fix for issue where f16 CublasLinear layers weren't being used even when available. 6d82dcc

Ensure repo only accesses CublasLinear lazily 00f5d2c

Remove f8 flux, instead configure at load, improved quality & corrected configs 1f9e684

Dynamic swap with cublas linear / optional improved precision with vram drawback 37bd8c1

Remove more unnecessary code, fix small typing hickup 6d0762c

Remove unnecessary code, hide prints behind debug flag, hide warnings 0f3134f

Add fields to configs, fix issue with offload from bnb, remove extra random text code 340f0a0

Fix for nightly 0aa9861

cuda version checks b6617b1

Fix non-offload inference & add option to load from prequantized flux 2f2c44c

Add offloading & improved fp8 inference. 28dec30

remove torchao dependency, quantize entirely via linear

d45a331

Fix issues with loading F8Linear from state dict when init_scale not initialized & loaded from meta device

3ddaa67

Small fix for issue where f16 CublasLinear layers weren't being used even when available.

6d82dcc

Ensure repo only accesses CublasLinear lazily

00f5d2c

Remove f8 flux, instead configure at load, improved quality & corrected configs

1f9e684

Dynamic swap with cublas linear / optional improved precision with vram drawback

37bd8c1

Remove more unnecessary code, fix small typing hickup

6d0762c

Remove unnecessary code, hide prints behind debug flag, hide warnings

0f3134f

Add fields to configs, fix issue with offload from bnb, remove extra random text code

340f0a0

Fix for nightly

0aa9861

cuda version checks

b6617b1

Fix non-offload inference & add option to load from prequantized flux

2f2c44c

Add offloading & improved fp8 inference.

28dec30