Kamilatr's picture
Update README.md
8b2d846 verified
|
raw
history blame
2.99 kB
metadata
library_name: transformers
tags:
  - trl
  - sft
license: apache-2.0
datasets:
  - bjoernp/tagesschau-2018-2023
language:
  - de
  - en
metrics:
  - accuracy

this model was trained on summarising some short texts and finding headlines for newspapers

Model Details

This is the model card of a 🤗 transformers model that has been pushed on the Hub.

  • Developed by: Kamila Trinkenschuh
  • Shared by: Kamila Trinkenschuh
  • Model type: was fine tuned on performing more text generation and text summaration task
  • Finetuned from model:LeoLM/leo-hessianai-7b

Use

You can use this model to see some examples how the model deals with finding headlines for articles. I encourage you to fine tune it for your own purposes/tasks

Out-of-Scope Use

This model was fine tuned with a A100 GPU in Google Colab

Bias, Risks, and Limitations

The LLM was trained on a subset for 5000 samples of the bjoernp/tagesschau-2018-2023 dataset

Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("text-generation", model="Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)

Training Procedure

The LeoLM Model was fine tuned with LoRA.

Speeds, Sizes, Times

training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="epoch", 
        optim="paged_adamw_8bit", #used with QLoRA
        per_device_train_batch_size=4, #batch size
        per_device_eval_batch_size=4, #same but for evaluation
        gradient_accumulation_steps=1, #number of lines to accumulate gradient, carefull because it changes the size of a "step".Therefore, logging, evaluation, save will be conducted every gradient_accumulation_steps * xxx_step training example
        log_level="debug", #you can set it to  ‘info’, ‘warning’, ‘error’ and ‘critical’
        save_steps=500, #number of steps between checkpoints
        logging_steps=20, #number of steps between logging of the loss for monitoring adapt it to your dataset size
        learning_rate=4e-5, #you can try different value for this hyperparameter
        num_train_epochs=1,
        warmup_steps=100,
        lr_scheduler_type="constant",
)

Evaluation and Testing

From the dataset sample, 1500 randomly assigned were for evaluation and 3500 for testing. The whole fine tuning process took less than 30 minutes (with Colab's A100 GPU, accessible only with Colab Pro+)

Results

  • Epoch: 1
  • Training Loss: 1.866900
  • Validation Loss: 1.801998

Summary

You can see the code in my github repo: https://github.com/KamilaTrinkenschuh/Ueberschriftengenerator_LEOLM