--- base_model: - meta-llama/Llama-3.1-8B-Instruct license: llama3.1 language: - gl metrics: - bleu - rouge model-index: - name: Llama-3.1-8B-Instruct-Galician results: - task: type: text-generation dataset: name: alpaca_data_galician type: alpaca_data_galician metrics: - name: bleu type: bleu-4 value: 23.13 - name: rouge type: rouge-l value: 21.84 pipeline_tag: text-generation library_name: transformers widget: - text: "Onde está o concello de Frades?" output: text: Frades é un concello da provincia da Coruña, pertencente á comarca de Ordes. Está situado a 15 quilómetros de Santiago de Compostela. --- # Llama-3.1-8B-Instruct-Galician This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset. ## Model Description - **Developed by:** [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc) - **Model type:** [More Information Needed] - **Language(s) (NLP):** Multilingual, adapted to Galician - **License:** llama3.1 - **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) - **Repository:** [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages) - **Paper:** _Coming soon_ ## How to Get Started with the Model ```python import transformers import torch model_id = "irlab-udc/Llama-3.1-8B-Instruct-Galician" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are a conversational AI that always responds in Galician."}, {"role": "user", "content": "Cal é a principal vantaxe de usar Scrum?"}, ] outputs = pipeline(messages, max_new_tokens=512) print(outputs[0]["generated_text"][-1]["content"]) ``` [More Information Needed] ## Training Details [More Information Needed] ### Training Data [More Information Needed] #### Training Hyperparameters | Parameter | Value | |--------------------------------|--------------------------------------| | learning_rate | 0.0001 | | train_batch_size | 32 | | eval_batch_size | 1 | | seed | 42 | | distributed_type | multi-GPU | | num_devices | 4 | | gradient_accumulation_steps | 2 | | total_train_batch_size | 256 | | total_eval_batch_size | 4 | | optimizer | Adam with betas=(0.9, 0.999), epsilon=1e-08 | | lr_scheduler_type | cosine | | lr_scheduler_warmup_ratio | 0.1 | | num_epochs | 1.0 | #### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 2.0606 | 0.1682 | 900 | 2.0613 | | 1.9898 | 0.3363 | 1800 | 1.9929 | | 1.9847 | 0.5045 | 2700 | 1.9613 | | 1.9577 | 0.6726 | 3600 | 1.9445 | | 1.9287 | 0.8408 | 4500 | 1.9368 | ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 4x NVIDIA A100 SXM4 80 GB (TDP of 400W) - **Hours used:** 60 - **Cloud Provider:** Private infrastructure - **Carbon Emitted:** 10.37 Kg. CO₂ eq. ## Citation _Coming soon_