Shushant commited on
Commit
b406e6f
·
1 Parent(s): 7756f55

description added

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -1,5 +1,26 @@
1
- # Masked Language Model for nepali language trained on nepali news scrapped from different nepali news website. The data set contained about 10 million of nepali sentences mainly related to nepali news.
 
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
  Usage
5
  ```
 
1
+ # NEPALI BERT
2
+ ## Masked Language Model for nepali language trained on nepali news scrapped from different nepali news website. The data set contained about 10 million of nepali sentences mainly related to nepali news.
3
 
4
+ This model is a fine-tuned version of [Bert Base Uncased](https://huggingface.co/bert-base-uncased) on dataset composed of different news scrapped from nepali news portals comprising of 4.6 GB of textual data.
5
+ It achieves the following results on the evaluation set:
6
+ - Loss: 1.0495
7
+
8
+ ## Model description
9
+
10
+ Pretraining done on bert base architecture.
11
+
12
+ ## Intended uses & limitations
13
+ This transformer model can be used for any NLP tasks related to Devenagari language. At the time of training, this is the state of the art model developed
14
+ for Devanagari dataset. Intrinsic evaluation with Perplexity of 8.56 achieves this state of the art whereas extrinsit evaluation done on sentiment analysis of Nepali tweets outperformed other existing
15
+ masked language models on Nepali dataset.
16
+ ## Training and evaluation data
17
+ THe training corpus is developed using 85467 news scrapped from different job portals. This is a preliminary dataset
18
+ for the experimentation. THe corpus size is about 4.3 GB of textual data. Similary evaluation data contains few news articles about 12 mb of textual data.
19
+
20
+ ## Training procedure
21
+ For the pretraining of masked language model, Trainer API from Huggingface is used. The pretraining took about 3 days 8 hours 57 minutes. Training was done on Tesla V100 GPU.
22
+ With 640 Tensor Cores, Tesla V100 is the world's first GPU to break the 100 teraFLOPS (TFLOPS) barrier of deep learning performance. This GPU was faciliated by Kathmandu University (KU) supercomputer.
23
+ Thanks to KU administration.
24
 
25
  Usage
26
  ```