irlab-udc
/

Llama-3.1-8B-Instruct-Galician

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.1-8B-Instruct-Galician / README.md

eliseobao's picture

Update README.md

f743fbb verified 4 months ago

|

3 kB

	---
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	license: llama3.1
	language:
	- gl
	metrics:
	- bleu
	- rouge
	model-index:
	- name: Llama-3.1-8B-Instruct-Galician
	results:
	- task:
	type: text-generation
	dataset:
	name: alpaca_data_galician
	type: alpaca_data_galician
	metrics:
	- name: bleu
	type: bleu-4
	value: 23.13
	- name: rouge
	type: rouge-l
	value: 21.84
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Llama-3.1-8B-Instruct-Galician

	This model is a continued pretraining version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the [CorpusNós](https://zenodo.org/records/11655219) dataset.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [UDC Information Retrieval Lab (IRLab)](https://huggingface.co/irlab-udc)
	- Model type: [More Information Needed]
	- Language(s) (NLP): Multilingual, adapted to Galician
	- License: llama3.1
	- Finetuned from model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)

	### Model Sources

	- Repository: [Adapting Large Language Models for Underrepresented Languages](https://gitlab.irlab.org/eliseo.bao/xovetic-llms-underrepresented-languages)
	- Paper: _Coming soon_

	## How to Get Started with the Model

	Use the code below to get started with the model.

	[More Information Needed]

	## Training Details

	[More Information Needed]

	### Training Data

	[More Information Needed]

	### Training Procedure

	[More Information Needed]

	#### Training Hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 32
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 256
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1.0

	#### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.0606 \| 0.1682 \| 900 \| 2.0613 \|
	\| 1.9898 \| 0.3363 \| 1800 \| 1.9929 \|
	\| 1.9847 \| 0.5045 \| 2700 \| 1.9613 \|
	\| 1.9577 \| 0.6726 \| 3600 \| 1.9445 \|
	\| 1.9287 \| 0.8408 \| 4500 \| 1.9368 \|

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: 4x NVIDIA A100 SXM4 80 GB (TDP of 400W)
	- Hours used: 60
	- Cloud Provider: Private infrastructure
	- Carbon Emitted: 10.37 kgCO$_2$eq

	#### Software

	- PEFT 0.12.0
	- Transformers 4.44.2
	- Pytorch 2.4.0+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1

	## Citation

	BibTeX:

	_Coming soon_