File size: 1,874 Bytes

---
license: mit
language:
- en
- sw
tags:
- text-generation-inference
---
# SwahiliInstruct-v0.1-GGUF

This  repo contains  models  from LLM `SwahiliInstruct-v0.1` in   GGUF format  in quantization:
- q_3_k_m
- q_4_k_m
- q_5_k_m

## Provided files

| Name | Quant method | Bits | Size | Max RAM required | Use case |
| ---- | ---- | ---- | ---- | ---- | ----- |
| [swahiliinstruct-v0.1.Q3_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q3_K_M.gguf) | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss |
| [swahiliinstruct-v0.1.Q4_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended |
|[swahiliinstruct-v0.1.Q5_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q5_K_M.gguf) | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended |


#loading  the  models  on cpu 
- Installing  the  library
``` Bash
pip install llama_cpp_python
```
- Python code  
``` Python
import  llama_cpp
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling")
def  model_out(prompt):
    outp = model.create_chat_completion(
  messages=[{"role": "system", "content": "You are a human like assistant."},{
         "role": "user",
        "content": f"{prompt}"
   }] ,stream=True,temperature=0.4, max_tokens=4096)
    return outp
while True:
    prompt = input("\nUser:\n")
    for  i in model_out(prompt):
        try:
            print(i['choices'][0][ 'delta']['role'])
        except:
            try:
                print(i['choices'][0][ 'delta']['content'],end="")
            except:
                pass
```