Kennedy wambugu
Update README.md
1ae771d verified
metadata
license: mit
language:
  - en
  - sw
tags:
  - text-generation-inference

SwahiliInstruct-v0.1-GGUF

This repo contains models from LLM SwahiliInstruct-v0.1 in GGUF format in quantization:

  • q_3_k_m
  • q_4_k_m
  • q_5_k_m

Provided files

Name Quant method Bits Size Max RAM required Use case
swahiliinstruct-v0.1.Q3_K_M.gguf Q3_K_M 3 3.52 GB 6.02 GB very small, high quality loss
swahiliinstruct-v0.1.Q4_K_M.gguf Q4_K_M 4 4.37 GB 6.87 GB medium, balanced quality - recommended
swahiliinstruct-v0.1.Q5_K_M.gguf Q5_K_M 5 5.13 GB 7.63 GB large, very low quality loss - recommended

#loading the models on cpu

  • Installing the library
pip install llama_cpp_python
  • Python code
import  llama_cpp
model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling")
def  model_out(prompt):
    outp = model.create_chat_completion(
  messages=[{"role": "system", "content": "You are a human like assistant."},{
         "role": "user",
        "content": f"{prompt}"
   }] ,stream=True,temperature=0.4, max_tokens=4096)
    return outp
while True:
    prompt = input("\nUser:\n")
    for  i in model_out(prompt):
        try:
            print(i['choices'][0][ 'delta']['role'])
        except:
            try:
                print(i['choices'][0][ 'delta']['content'],end="")
            except:
                pass