metadata
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- bjoernp/tagesschau-2018-2023
language:
- de
- en
metrics:
- accuracy
this model was trained on summarising some short texts and finding headlines for newspapers
Model Details
This is the model card of a 🤗 transformers model that has been pushed on the Hub.
- Developed by: Kamila Trinkenschuh
- Shared by: Kamila Trinkenschuh
- Model type: was fine tuned on performing more text generation and text summaration task
- Finetuned from model:LeoLM/leo-hessianai-7b
Use
You can use this model to see some examples how the model deals with finding headlines for articles. I encourage you to fine tune it for your own purposes/tasks
Out-of-Scope Use
This model was fine tuned with a A100 GPU in Google Colab
Bias, Risks, and Limitations
The LLM was trained on a subset for 5000 samples of the bjoernp/tagesschau-2018-2023 dataset
Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
Training Procedure
The LeoLM Model was fine tuned with LoRA.
Speeds, Sizes, Times
training_arguments = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
optim="paged_adamw_8bit", #used with QLoRA
per_device_train_batch_size=4, #batch size
per_device_eval_batch_size=4, #same but for evaluation
gradient_accumulation_steps=1, #number of lines to accumulate gradient, carefull because it changes the size of a "step".Therefore, logging, evaluation, save will be conducted every gradient_accumulation_steps * xxx_step training example
log_level="debug", #you can set it to ‘info’, ‘warning’, ‘error’ and ‘critical’
save_steps=500, #number of steps between checkpoints
logging_steps=20, #number of steps between logging of the loss for monitoring adapt it to your dataset size
learning_rate=4e-5, #you can try different value for this hyperparameter
num_train_epochs=1,
warmup_steps=100,
lr_scheduler_type="constant",
)
Evaluation and Testing
From the dataset sample, 1500 randomly assigned were for evaluation and 3500 for testing. The whole fine tuning process took less than 30 minutes (with Colab's A100 GPU, accessible only with Colab Pro+)
Results
- Epoch: 1
- Training Loss: 1.866900
- Validation Loss: 1.801998
Summary
You can see the code in my github repo: https://github.com/KamilaTrinkenschuh/Ueberschriftengenerator_LEOLM