--- base_model: - meta-llama/Llama-3.1-8B-Instruct license: llama3.1 language: - gl metrics: - bleu - rouge model-index: - name: Llama-3.1-8B-Instruct-Galician results: - task: type: text-generation dataset: name: alpaca_data_galician type: alpaca_data_galician metrics: - name: bleu type: bleu-4 value: 23.13 - name: rouge type: rouge-l value: 21.84 pipeline_tag: text-generation --- # Llama-3.1-8B-Instruct-Galician This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset. ## Model Details ### Model Description - **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc) - **Model type:** [More Information Needed] - **Language(s) (NLP):** Multilingual, adapted to Galician - **License:** llama3.1 - **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) ### Model Sources - **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages) - **Paper:** _Coming soon_ ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 256 - total_eval_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1.0 #### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 2.0606 | 0.1682 | 900 | 2.0613 | | 1.9898 | 0.3363 | 1800 | 1.9929 | | 1.9847 | 0.5045 | 2700 | 1.9613 | | 1.9577 | 0.6726 | 3600 | 1.9445 | | 1.9287 | 0.8408 | 4500 | 1.9368 | #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 4x NVIDIA A100 SXM4 80 GB (TDP of 400W) - **Hours used:** 60 - **Cloud Provider:** Private infrastructure - **Carbon Emitted:** 10.37 kgCO$_2$eq ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software - PEFT 0.12.0 - Transformers 4.44.2 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1 ## Citation **BibTeX:** [More Information Needed]