--- license: unknown language: - si metrics: - perplexity library_name: transformers tags: - AshenBerto - Sinhala - Roberta --- ### 🌟 Overview This is a slightly smaller model trained on half of the [FastText](https://fasttext.cc/docs/en/crawl-vectors.html) dataset. Since Sinhala is a low-resource language, there’s a noticeable lack of pre-trained models available for it. 😕 This gap makes it harder to represent the language properly in the world of NLP. But hey, that’s where this model comes in! 🚀 It opens up exciting opportunities to improve tasks like sentiment analysis, machine translation, named entity recognition, or even question answering—tailored just for Sinhala. 🇱🇰✨ --- ### 🛠 Model Specs Here’s what powers this model (we went with [RoBERTa](https://arxiv.org/abs/1907.11692)): 1️⃣ **vocab_size** = 25,000 2️⃣ **max_position_embeddings** = 514 3️⃣ **num_attention_heads** = 12 4️⃣ **num_hidden_layers** = 6 5️⃣ **type_vocab_size** = 1 🎯 **Perplexity Value**: 3.5 --- ### 🚀 How to Use You can jump right in and use this model for masked language modeling! 🧩 ```python from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline # Load the model and tokenizer model = AutoModelWithLMHead.from_pretrained("ashenR/AshenBERTo") tokenizer = AutoTokenizer.from_pretrained("ashenR/AshenBERTo") # Create a fill-mask pipeline fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer) # Try it out with a Sinhala sentence! 🇱🇰 fill_mask("මම ගෙදර .") ```