exllamav2 quant for fixed version of mattshumer/Reflection-Llama-3.1-70B
Runs smoothly on 2x3090 with 48GB VRAM

All comments are greatly appreciated, download, test and if you appreciate my work, consider buying me my fuel:

Downloads last month: 2

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including TeeZee/Reflection-Llama-3.1-70B-bpw4.0-h8-exl2

48 GB VRAM

Collection

Quants that run fast on 2x3090 consuming 48GB total VRAM. • 2 items • Updated Sep 9, 2024