AmpereComputing
/

llama-3.2-1b-instruct-gguf

Inference Endpoints

Model card Files Files and versions Community

jangrzybek commited on Oct 8, 2024

Commit

5dd91ce

·

verified ·

1 Parent(s): cef80cd

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -7,6 +7,8 @@ license: llama3.2
 Ampere® optimized build of [llama.cpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#llamacpp) with full support for rich collection of GGUF models available at HuggingFace: [GGUF models](https://huggingface.co/models?search=gguf)
 This Docker image can be run on bare metal Ampere® CPUs and Ampere® based VMs available in the cloud.
 Release notes and binary executables are available on our [GitHub](https://github.com/AmpereComputingAI/llama.cpp/releases)

 Ampere® optimized build of [llama.cpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#llamacpp) with full support for rich collection of GGUF models available at HuggingFace: [GGUF models](https://huggingface.co/models?search=gguf)
+**For best results we recommend using models in our custom quantization formats available here: [AmpereComputing HF](https://huggingface.co/AmpereComputing)**
 This Docker image can be run on bare metal Ampere® CPUs and Ampere® based VMs available in the cloud.
 Release notes and binary executables are available on our [GitHub](https://github.com/AmpereComputingAI/llama.cpp/releases)