Alfaxad
/

gemma2-2b-swahili-preview

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gemma2-2b-swahili-preview / README.md

Alfaxad's picture

Update README.md

10c6f41 verified 26 days ago

|

history blame contribute delete

2.91 kB

	---
	license: apache-2.0
	datasets:
	- Alfaxad/Inkuba-Mono-Swahili
	language:
	- sw
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- gemma2
	- text-2-text
	- text-generation
	- llms
	base_model:
	- google/gemma-2-2b
	---






	# Gemma2-2B-Swahili-Preview
	Gemma2-2B-Swahili-Preview is a Swahili variation of the base language model Gemma2 2B fine-tuned on the Inkuba-Mono Swahili dataset, designed to enhance Swahili language understanding through monolingual training.

	## Model Details
	- Developer: Alfaxad Eyembe
	- Base Model: google/gemma-2-2b
	- Model Type: Decoder-only transformer
	- Language: Swahili
	- License: Apache 2.0
	- Fine-tuning Approach: Low-Rank Adaptation (LoRA)

	## Training Data
	The model was fine-tuned on a focused subset of the Inkuba-Mono dataset:
	- 1,000,000 randomly selected examples
	- Total tokens: 60,831,073
	- Average text length: 101.33 characters
	- Diverse Swahili text sources including news, social media, and various domains

	## Training Details
	- Fine-tuning Method: LoRA
	- Training Steps: 2,500
	- Batch Size: 2
	- Gradient Accumulation Steps: 32
	- Learning Rate: 2e-4
	- Training Time: ~7.5 hours


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6375af60e3413701a9f01c0f/8fVULkKb92JTk8-65KE5R.png)



	## Model Capabilities
	This model is designed for:
	- Swahili text continuation
	- Natural language understanding
	- Contextual text generation
	- Base language modeling for Swahili

	## Usage
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-preview")
	model = AutoModelForCausalLM.from_pretrained(
	"alfaxadeyembe/gemma2-2b-swahili-preview",
	device_map="auto",
	torch_dtype=torch.bfloat16
	)

	# Set to evaluation mode
	model.eval()

	# Example usage
	prompt = "Katika soko la Kariakoo, teknolojia mpya imewezesha"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_new_tokens=500,
	do_sample=True,
	temperature=0.7,
	top_p=0.95
	)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Key Features
	- Natural Swahili text continuation
	- Strong cultural context understanding
	- Efficient parameter updates through LoRA
	- Diverse domain knowledge integration

	## Limitations
	- Not instruction-tuned
	- Base language modeling capabilities
	- Performance varies across different text domains

	## Citation
	```bibtex
	@misc{gemma2-2b-swahili-preview,
	author = {Alfaxad Eyembe},
	title = {Gemma2-2B-Swahili-Preview: Swahili Variation of Gemma2 2B},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	}
	```

	## Contact
	For questions or feedback, please reach out through:
	- HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad)
	- X : [@alfxad](https://twitter.com/alfxad)