metadata
language:
- dv
base_model:
- openai-community/gpt2
datasets:
- wikimedia/wikipedia
GPT 2 DV base
This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.
Model Description
- Model Type: GPT-2
- Language: Dhivehi (ދިވެހި)
- Training Data: Dhivehi Wikipedia articles
- Last Updated: 2024-11-25
Performance Metrics
Evaluation metrics on the test set:
- Average Perplexity: 3.80
- Perplexity Std: 2.23
- Best Perplexity: 2.72
Usage Example
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")
# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text
outputs = model.generate(
**inputs,
max_length=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
num_return_sequences=1
)
# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Training Details
The model was trained using the following configuration:
- Base model: GPT-2
- Training type: Full fine-tuning
- Mixed precision: FP16
- Gradient checkpointing: Enabled
Hyperparameters:
- Learning rate: 5e-5
- Batch size: 32
- Gradient accumulation steps: 2
- Epochs: 3
- Weight decay: 0.01
- Warmup steps: 1000
Limitations
- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
- May not perform well on specialized or technical content
- Could reflect biases present in the training data
- Not recommended for production use without thorough evaluation
Intended Uses
This model is suitable for:
- Dhivehi text generation
- Research on Dhivehi NLP
- Educational purposes
- Experimental applications
Not intended for:
- Critical or production systems
- Decision-making applications
- Tasks requiring factual accuracy