dhivehi-gpt2-base / README.md
alakxender's picture
Update README.md
c229d39 verified
metadata
language:
  - dv
base_model:
  - openai-community/gpt2
datasets:
  - wikimedia/wikipedia

GPT 2 DV base

This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.

Model Description

  • Model Type: GPT-2
  • Language: Dhivehi (ދިވެހި)
  • Training Data: Dhivehi Wikipedia articles
  • Last Updated: 2024-11-25

Performance Metrics

Evaluation metrics on the test set:

  • Average Perplexity: 3.80
  • Perplexity Std: 2.23
  • Best Perplexity: 2.72

Usage Example

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")

# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    num_return_sequences=1
)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Details

The model was trained using the following configuration:

  • Base model: GPT-2
  • Training type: Full fine-tuning
  • Mixed precision: FP16
  • Gradient checkpointing: Enabled

Hyperparameters:

  • Learning rate: 5e-5
  • Batch size: 32
  • Gradient accumulation steps: 2
  • Epochs: 3
  • Weight decay: 0.01
  • Warmup steps: 1000

Limitations

  • Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
  • May not perform well on specialized or technical content
  • Could reflect biases present in the training data
  • Not recommended for production use without thorough evaluation

Intended Uses

This model is suitable for:

  • Dhivehi text generation
  • Research on Dhivehi NLP
  • Educational purposes
  • Experimental applications

Not intended for:

  • Critical or production systems
  • Decision-making applications
  • Tasks requiring factual accuracy