eliseobao commited on
Commit
f8fd39b
·
verified ·
1 Parent(s): d29859c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -36
README.md CHANGED
@@ -34,22 +34,13 @@ widget:
34
 
35
  This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset.
36
 
37
- ## Model Details
38
-
39
- ### Model Description
40
-
41
- <!-- Provide a longer summary of what this model is. -->
42
-
43
-
44
 
45
  - **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc)
46
  - **Model type:** [More Information Needed]
47
  - **Language(s) (NLP):** Multilingual, adapted to Galician
48
  - **License:** llama3.1
49
  - **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
50
-
51
- ### Model Sources
52
-
53
  - **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages)
54
  - **Paper:** _Coming soon_
55
 
@@ -67,26 +58,24 @@ Use the code below to get started with the model.
67
 
68
  [More Information Needed]
69
 
70
- ### Training Procedure
71
-
72
- [More Information Needed]
73
-
74
  #### Training Hyperparameters
75
 
76
- The following hyperparameters were used during training:
77
- - learning_rate: 0.0001
78
- - train_batch_size: 32
79
- - eval_batch_size: 1
80
- - seed: 42
81
- - distributed_type: multi-GPU
82
- - num_devices: 4
83
- - gradient_accumulation_steps: 2
84
- - total_train_batch_size: 256
85
- - total_eval_batch_size: 4
86
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
87
- - lr_scheduler_type: cosine
88
- - lr_scheduler_warmup_ratio: 0.1
89
- - num_epochs: 1.0
 
 
90
 
91
  #### Training results
92
 
@@ -107,14 +96,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
107
  - **Cloud Provider:** Private infrastructure
108
  - **Carbon Emitted:** 10.37 kgCO$_2$eq
109
 
110
- #### Software
111
-
112
- - PEFT 0.12.0
113
- - Transformers 4.44.2
114
- - Pytorch 2.4.0+cu121
115
- - Datasets 2.21.0
116
- - Tokenizers 0.19.1
117
-
118
  ## Citation
119
 
120
  _Coming soon_
 
34
 
35
  This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset.
36
 
37
+ ## Model Description
 
 
 
 
 
 
38
 
39
  - **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc)
40
  - **Model type:** [More Information Needed]
41
  - **Language(s) (NLP):** Multilingual, adapted to Galician
42
  - **License:** llama3.1
43
  - **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 
 
 
44
  - **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages)
45
  - **Paper:** _Coming soon_
46
 
 
58
 
59
  [More Information Needed]
60
 
 
 
 
 
61
  #### Training Hyperparameters
62
 
63
+ | Parameter | Value |
64
+ |--------------------------------|--------------------------------------|
65
+ | learning_rate | 0.0001 |
66
+ | train_batch_size | 32 |
67
+ | eval_batch_size | 1 |
68
+ | seed | 42 |
69
+ | distributed_type | multi-GPU |
70
+ | num_devices | 4 |
71
+ | gradient_accumulation_steps | 2 |
72
+ | total_train_batch_size | 256 |
73
+ | total_eval_batch_size | 4 |
74
+ | optimizer | Adam with betas=(0.9, 0.999), epsilon=1e-08 |
75
+ | lr_scheduler_type | cosine |
76
+ | lr_scheduler_warmup_ratio | 0.1 |
77
+ | num_epochs | 1.0 |
78
+
79
 
80
  #### Training results
81
 
 
96
  - **Cloud Provider:** Private infrastructure
97
  - **Carbon Emitted:** 10.37 kgCO$_2$eq
98
 
 
 
 
 
 
 
 
 
99
  ## Citation
100
 
101
  _Coming soon_