NepaliGPT: Nepali Language Generative Pretrained Transformer Model
This is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.
Dataset Used
A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include
- Nepali Books found online.
- Nepali News Article from Nepali news portals.
- Nepali text collected from different open source Nepali NLP datasets.
Hyperparameters Used
Learning rate -> 2e-5
Weight Decay -> 0.01
Number of training epochs -> 5 \
bf16 -> True
Base Model Architecture -> GPT-2 \
Training Results
It achieves the following results on the evaluation set:
Training Loss | Validation Loss | Perplexity |
---|---|---|
3.3968 | 3.2705 | 26.3245 |
- Downloads last month
- 113
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.