--- license: mit language: - en - sw tags: - text-generation-inference --- # SwahiliInstruct-v0.1-GGUF This repo contains models from LLM `SwahiliInstruct-v0.1` in GGUF format in quantization: - q_3_k_m - q_4_k_m - q_5_k_m ## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | [swahiliinstruct-v0.1.Q3_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q3_K_M.gguf) | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss | | [swahiliinstruct-v0.1.Q4_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended | |[swahiliinstruct-v0.1.Q5_K_M.gguf](https://huggingface.co/wambugu1738/SwahiliInstruct-v0.1-GGUF/blob/main/swahiliinstruct-v0.1.Q5_K_M.gguf) | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended | #loading the models on cpu - Installing the library ``` Bash pip install llama_cpp_python ``` - Python code ``` Python import llama_cpp model = llama_cpp.Llama(model_path="swahiliinstruct-v0.1.Q4_K_M.gguf",n_ctx=4096,n_threads=0,n_gpu_layers=-1,verbose=True, chat_format="chatml-function-calling") def model_out(prompt): outp = model.create_chat_completion( messages=[{"role": "system", "content": "You are a human like assistant."},{ "role": "user", "content": f"{prompt}" }] ,stream=True,temperature=0.4, max_tokens=4096) return outp while True: prompt = input("\nUser:\n") for i in model_out(prompt): try: print(i['choices'][0][ 'delta']['role']) except: try: print(i['choices'][0][ 'delta']['content'],end="") except: pass ```