File size: 2,167 Bytes
ec1f017
 
 
afe0cbc
 
5c75f1b
 
ec1f017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c229d39
 
ec1f017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c75f1b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language:
- dv
base_model:
- openai-community/gpt2
datasets:
- wikimedia/wikipedia
---

# GPT 2 DV base

This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.

## Model Description

- **Model Type:** GPT-2
- **Language:** Dhivehi (ދިވެހި)
- **Training Data:** Dhivehi Wikipedia articles
- **Last Updated:** 2024-11-25

## Performance Metrics


Evaluation metrics on the test set:
- Average Perplexity: 3.80
- Perplexity Std: 2.23
- Best Perplexity: 2.72

## Usage Example

```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")

# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    num_return_sequences=1
)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## Training Details

The model was trained using the following configuration:
- Base model: GPT-2
- Training type: Full fine-tuning
- Mixed precision: FP16
- Gradient checkpointing: Enabled

### Hyperparameters:
- Learning rate: 5e-5
- Batch size: 32
- Gradient accumulation steps: 2
- Epochs: 3
- Weight decay: 0.01
- Warmup steps: 1000

## Limitations

- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
- May not perform well on specialized or technical content
- Could reflect biases present in the training data
- Not recommended for production use without thorough evaluation

## Intended Uses

This model is suitable for:
- Dhivehi text generation
- Research on Dhivehi NLP
- Educational purposes
- Experimental applications

Not intended for:
- Critical or production systems
- Decision-making applications
- Tasks requiring factual accuracy