metadata
license: mit
language:
- en
- sw
tags:
- text-generation-inference
SwahiliInstruct-v0.1-GGUF
This repo contains models from LLM SwahiliInstruct-v0.1
in GGUF format in quantization:
- q_3_k_m
- q_4_k_m
- q_5_k_m
Provided files
Name | Quant method | Bits | Size | Max RAM required | Use case |
---|---|---|---|---|---|
swahiliinstruct-v0.1.Q3_K_M.gguf | Q3_K_M | 3 | 3.52 GB | 6.02 GB | very small, high quality loss |
swahiliinstruct-v0.1.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB | 6.87 GB | medium, balanced quality - recommended |
swahiliinstruct-v0.1.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB | 7.63 GB | large, very low quality loss - recommended |
#loading the models on cpu
- Installing the library
pip install llama_cpp_python
- Python code
import llama_cpp
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling")
def model_out(prompt):
outp = model.create_chat_completion(
messages=[{"role": "system", "content": "You are a human like assistant."},{
"role": "user",
"content": f"{prompt}"
}] ,stream=True,temperature=0.4, max_tokens=4096)
return outp
while True:
prompt = input("\nUser:\n")
for i in model_out(prompt):
try:
print(i['choices'][0][ 'delta']['role'])
except:
try:
print(i['choices'][0][ 'delta']['content'],end="")
except:
pass