de-coder commited on
Commit
a260d47
·
verified ·
1 Parent(s): 66fd92d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ # UlizaLlama_Q4_K_M-gguf 4-bit Quantized Bilingual Language Model
3
+
4
+ ## Overview
5
+
6
+ UlizaLlama_Q4_K_M-gguf is a 4-bit quantized version of the UlizaLlama model, a 7B parameter language model fine-tuned for Swahili and English. This quantized model offers the same bilingual capabilities as the original UlizaLlama but with significantly reduced model size and improved inference speed, making it ideal for deployment in resource-constrained environments.
7
+
8
+ ### Key Features
9
+
10
+ - **Bilingual Proficiency**: Excels in both Swahili and English, with a focus on instructional tasks.
11
+ - **4-bit Quantization**: Utilizes the QQUF (Quantized QUarter Float) format for a 75% reduction in model size.
12
+ - **Efficient Inference**: Faster processing and lower memory footprint compared to the full-precision model.
13
+ - **Versatile Applications**: Suitable for question-answering, chat assistants, and various domain-specific tasks.
14
+
15
+ ## Model Details
16
+
17
+ - **Original Model**: UlizaLlama (7B parameters)
18
+ - **Base Model**: Jacaranda/kiswallama-pretrained (derived from Meta/Llama2)
19
+ - **Quantization Method**: 4-bit QQUF
20
+ - **Languages**: Swahili and English
21
+ - **License**: CC BY-NC-SA 4.0 DEED
22
+
23
+ ## Installation
24
+
25
+ To use UlizaLlama-QQUF, you'll need a library that supports 4-bit quantized models. We recommend using the `bitsandbytes` library:
26
+
27
+ ```bash
28
+ pip install bitsandbytes
29
+ pip install transformers
30
+ ```
31
+
32
+ ## Usage
33
+
34
+ Here's a simple example of how to load and use de-coder/UlizaLlama_Q4_K_M-gguf
35
+
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+ import bitsandbytes as bnb
39
+
40
+ # Load the quantized model
41
+ model = AutoModelForCausalLM.from_pretrained("de-coder/UlizaLlama_Q4_K_M-gguf",
42
+ device_map="auto",
43
+ trust_remote_code=True)
44
+ tokenizer = AutoTokenizer.from_pretrained("de-coder/UlizaLlama_Q4_K_M-gguf")
45
+
46
+ # Example usage
47
+ prompt = "Niambie kuhusu historia ya Kilimanjaro."
48
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
49
+ output = model.generate(input_ids, max_length=100)
50
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
51
+ ```
52
+
53
+ ## Performance and Trade-offs
54
+
55
+ UlizaLlama-QQUF offers substantial improvements in model size and inference speed. However, there might be a slight degradation in performance compared to the full-precision model. We encourage users to benchmark the model on their specific tasks to understand these trade-offs.
56
+
57
+ ## Use Cases
58
+
59
+ 1. Chatbots for healthcare, agriculture, education, and more.
60
+ 2. Language learning applications.
61
+ 3. Information services in Swahili-speaking regions.
62
+ 4. Edge devices and mobile applications.
63
+
64
+ ## Citation and Acknowledgments
65
+
66
+ If you use UlizaLlama_Q4_K_M-gguf in your work, please cite:
67
+
68
+ ```bibtex
69
+ @misc{mwongela2023ulizallama,
70
+ title={UlizaLlama: A Bilingual Language Model for Swahili and English},
71
+ author={Kelvin Githu(de-coder)},
72
+ year={2024},
73
+ publisher={Kelvin Githu},
74
+ howpublished={\url{https://huggingface.co/de-coder/UlizaLlama_Q4_K_M-gguf}},
75
+ }
76
+ ```