|
--- |
|
license: mit |
|
language: |
|
- en |
|
- sw |
|
tags: |
|
- text-generation-inference |
|
--- |
|
# SwahiliInstruct-v0.1-GGUF |
|
|
|
This repo contains models from LLM `SwahiliInstruct-v0.1` in GGUF format in quantization: |
|
- q_3_k_m |
|
- q_4_k_m |
|
- q_5_k_m |
|
|
|
## Provided files |
|
|
|
| Name | Quant method | Bits | Size | Max RAM required | Use case | |
|
| ---- | ---- | ---- | ---- | ---- | ----- | |
|
| [swahiliinstruct-v0.1.Q3_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q3_K_M.gguf) | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss | |
|
| [swahiliinstruct-v0.1.Q4_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended | |
|
|[swahiliinstruct-v0.1.Q5_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q5_K_M.gguf) | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended | |
|
|
|
|
|
#loading the models on cpu |
|
- Installing the library |
|
``` Bash |
|
pip install llama_cpp_python |
|
``` |
|
- Python code |
|
``` Python |
|
import llama_cpp |
|
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling") |
|
def model_out(prompt): |
|
outp = model.create_chat_completion( |
|
messages=[{"role": "system", "content": "You are a human like assistant."},{ |
|
"role": "user", |
|
"content": f"{prompt}" |
|
}] ,stream=True,temperature=0.4, max_tokens=4096) |
|
return outp |
|
while True: |
|
prompt = input("\nUser:\n") |
|
for i in model_out(prompt): |
|
try: |
|
print(i['choices'][0][ 'delta']['role']) |
|
except: |
|
try: |
|
print(i['choices'][0][ 'delta']['content'],end="") |
|
except: |
|
pass |
|
``` |
|
|