File size: 1,874 Bytes
8d85454 1ae771d 8d85454 1ae771d 8d85454 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: mit
language:
- en
- sw
tags:
- text-generation-inference
---
# SwahiliInstruct-v0.1-GGUF
This repo contains models from LLM `SwahiliInstruct-v0.1` in GGUF format in quantization:
- q_3_k_m
- q_4_k_m
- q_5_k_m
## Provided files
| Name | Quant method | Bits | Size | Max RAM required | Use case |
| ---- | ---- | ---- | ---- | ---- | ----- |
| [swahiliinstruct-v0.1.Q3_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q3_K_M.gguf) | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss |
| [swahiliinstruct-v0.1.Q4_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended |
|[swahiliinstruct-v0.1.Q5_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q5_K_M.gguf) | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended |
#loading the models on cpu
- Installing the library
``` Bash
pip install llama_cpp_python
```
- Python code
``` Python
import llama_cpp
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling")
def model_out(prompt):
outp = model.create_chat_completion(
messages=[{"role": "system", "content": "You are a human like assistant."},{
"role": "user",
"content": f"{prompt}"
}] ,stream=True,temperature=0.4, max_tokens=4096)
return outp
while True:
prompt = input("\nUser:\n")
for i in model_out(prompt):
try:
print(i['choices'][0][ 'delta']['role'])
except:
try:
print(i['choices'][0][ 'delta']['content'],end="")
except:
pass
```
|