Q4_K_M GGUF quant of Reflection-Llama-3.1-70B - fixed version.
Runs great on 48GB VRAM, tested.
Ollama modelfile added - version with original system prompt - output is split into "thinking" and "output" tags.
If you want llama 3.1 'vanilla' experience, just remove SYSTEM from modelfile before creating ollama model.

All comments are greatly appreciated, download, test and if you appreciate my work, consider buying me my fuel: Buy Me A Coffee

Downloads last month
2
GGUF
Model size
70.6B params
Architecture
llama

4-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including TeeZee/Reflection-Llama-3.1-70B-GGUF