File size: 2,167 Bytes

---
language:
- dv
base_model:
- openai-community/gpt2
datasets:
- wikimedia/wikipedia
---

# GPT 2 DV base

This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.

## Model Description

- **Model Type:** GPT-2
- **Language:** Dhivehi (ދިވެހި)
- **Training Data:** Dhivehi Wikipedia articles
- **Last Updated:** 2024-11-25

## Performance Metrics


Evaluation metrics on the test set:
- Average Perplexity: 3.80
- Perplexity Std: 2.23
- Best Perplexity: 2.72

## Usage Example

```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")

# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    num_return_sequences=1
)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## Training Details

The model was trained using the following configuration:
- Base model: GPT-2
- Training type: Full fine-tuning
- Mixed precision: FP16
- Gradient checkpointing: Enabled

### Hyperparameters:
- Learning rate: 5e-5
- Batch size: 32
- Gradient accumulation steps: 2
- Epochs: 3
- Weight decay: 0.01
- Warmup steps: 1000

## Limitations

- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
- May not perform well on specialized or technical content
- Could reflect biases present in the training data
- Not recommended for production use without thorough evaluation

## Intended Uses

This model is suitable for:
- Dhivehi text generation
- Research on Dhivehi NLP
- Educational purposes
- Experimental applications

Not intended for:
- Critical or production systems
- Decision-making applications
- Tasks requiring factual accuracy