Kennedy wambugu
Update README.md
1ae771d verified
---
license: mit
language:
- en
- sw
tags:
- text-generation-inference
---
# SwahiliInstruct-v0.1-GGUF
This repo contains models from LLM `SwahiliInstruct-v0.1` in GGUF format in quantization:
- q_3_k_m
- q_4_k_m
- q_5_k_m
## Provided files
| Name | Quant method | Bits | Size | Max RAM required | Use case |
| ---- | ---- | ---- | ---- | ---- | ----- |
| [swahiliinstruct-v0.1.Q3_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q3_K_M.gguf) | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss |
| [swahiliinstruct-v0.1.Q4_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended |
|[swahiliinstruct-v0.1.Q5_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q5_K_M.gguf) | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended |
#loading the models on cpu
- Installing the library
``` Bash
pip install llama_cpp_python
```
- Python code
``` Python
import llama_cpp
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling")
def model_out(prompt):
outp = model.create_chat_completion(
messages=[{"role": "system", "content": "You are a human like assistant."},{
"role": "user",
"content": f"{prompt}"
}] ,stream=True,temperature=0.4, max_tokens=4096)
return outp
while True:
prompt = input("\nUser:\n")
for i in model_out(prompt):
try:
print(i['choices'][0][ 'delta']['role'])
except:
try:
print(i['choices'][0][ 'delta']['content'],end="")
except:
pass
```