Update README.md
Browse files
README.md
CHANGED
@@ -34,22 +34,13 @@ widget:
|
|
34 |
|
35 |
This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset.
|
36 |
|
37 |
-
## Model
|
38 |
-
|
39 |
-
### Model Description
|
40 |
-
|
41 |
-
<!-- Provide a longer summary of what this model is. -->
|
42 |
-
|
43 |
-
|
44 |
|
45 |
- **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc)
|
46 |
- **Model type:** [More Information Needed]
|
47 |
- **Language(s) (NLP):** Multilingual, adapted to Galician
|
48 |
- **License:** llama3.1
|
49 |
- **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
50 |
-
|
51 |
-
### Model Sources
|
52 |
-
|
53 |
- **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages)
|
54 |
- **Paper:** _Coming soon_
|
55 |
|
@@ -67,26 +58,24 @@ Use the code below to get started with the model.
|
|
67 |
|
68 |
[More Information Needed]
|
69 |
|
70 |
-
### Training Procedure
|
71 |
-
|
72 |
-
[More Information Needed]
|
73 |
-
|
74 |
#### Training Hyperparameters
|
75 |
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
-
|
88 |
-
|
89 |
-
|
|
|
|
|
90 |
|
91 |
#### Training results
|
92 |
|
@@ -107,14 +96,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
107 |
- **Cloud Provider:** Private infrastructure
|
108 |
- **Carbon Emitted:** 10.37 kgCO$_2$eq
|
109 |
|
110 |
-
#### Software
|
111 |
-
|
112 |
-
- PEFT 0.12.0
|
113 |
-
- Transformers 4.44.2
|
114 |
-
- Pytorch 2.4.0+cu121
|
115 |
-
- Datasets 2.21.0
|
116 |
-
- Tokenizers 0.19.1
|
117 |
-
|
118 |
## Citation
|
119 |
|
120 |
_Coming soon_
|
|
|
34 |
|
35 |
This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset.
|
36 |
|
37 |
+
## Model Description
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
- **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc)
|
40 |
- **Model type:** [More Information Needed]
|
41 |
- **Language(s) (NLP):** Multilingual, adapted to Galician
|
42 |
- **License:** llama3.1
|
43 |
- **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
|
|
|
|
|
|
44 |
- **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages)
|
45 |
- **Paper:** _Coming soon_
|
46 |
|
|
|
58 |
|
59 |
[More Information Needed]
|
60 |
|
|
|
|
|
|
|
|
|
61 |
#### Training Hyperparameters
|
62 |
|
63 |
+
| Parameter | Value |
|
64 |
+
|--------------------------------|--------------------------------------|
|
65 |
+
| learning_rate | 0.0001 |
|
66 |
+
| train_batch_size | 32 |
|
67 |
+
| eval_batch_size | 1 |
|
68 |
+
| seed | 42 |
|
69 |
+
| distributed_type | multi-GPU |
|
70 |
+
| num_devices | 4 |
|
71 |
+
| gradient_accumulation_steps | 2 |
|
72 |
+
| total_train_batch_size | 256 |
|
73 |
+
| total_eval_batch_size | 4 |
|
74 |
+
| optimizer | Adam with betas=(0.9, 0.999), epsilon=1e-08 |
|
75 |
+
| lr_scheduler_type | cosine |
|
76 |
+
| lr_scheduler_warmup_ratio | 0.1 |
|
77 |
+
| num_epochs | 1.0 |
|
78 |
+
|
79 |
|
80 |
#### Training results
|
81 |
|
|
|
96 |
- **Cloud Provider:** Private infrastructure
|
97 |
- **Carbon Emitted:** 10.37 kgCO$_2$eq
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
## Citation
|
100 |
|
101 |
_Coming soon_
|