|
--- |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- sft |
|
license: apache-2.0 |
|
datasets: |
|
- bjoernp/tagesschau-2018-2023 |
|
language: |
|
- de |
|
- en |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# this model was trained on summarising some short texts and finding headlines for newspapers |
|
|
|
|
|
|
|
## Model Details |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. |
|
|
|
- **Developed by:** Kamila Trinkenschuh |
|
- **Shared by:** Kamila Trinkenschuh |
|
- **Model type:** |
|
was fine tuned on performing more text generation and text summaration task |
|
- **Finetuned from model**:LeoLM/leo-hessianai-7b |
|
|
|
|
|
|
|
## Use |
|
You can use this model to see some examples how the model deals with finding headlines for articles. I encourage you to fine tune it for your own purposes/tasks |
|
|
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
This model was fine tuned with a A100 GPU in Google Colab |
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The LLM was trained on a subset for 5000 samples of the bjoernp/tagesschau-2018-2023 dataset |
|
|
|
|
|
|
|
# Load model directly |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) |
|
``` |
|
# Use a pipeline as a high-level helper |
|
``` |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-generation", model="Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) |
|
|
|
``` |
|
|
|
### Training Procedure |
|
|
|
The LeoLM Model was fine tuned with LoRA. |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
```python |
|
training_arguments = TrainingArguments( |
|
output_dir="./results", |
|
evaluation_strategy="epoch", |
|
optim="paged_adamw_8bit", #used with QLoRA |
|
per_device_train_batch_size=4, #batch size |
|
per_device_eval_batch_size=4, #same but for evaluation |
|
gradient_accumulation_steps=1, #number of lines to accumulate gradient, carefull because it changes the size of a "step".Therefore, logging, evaluation, save will be conducted every gradient_accumulation_steps * xxx_step training example |
|
log_level="debug", #you can set it to ‘info’, ‘warning’, ‘error’ and ‘critical’ |
|
save_steps=500, #number of steps between checkpoints |
|
logging_steps=20, #number of steps between logging of the loss for monitoring adapt it to your dataset size |
|
learning_rate=4e-5, #you can try different value for this hyperparameter |
|
num_train_epochs=1, |
|
warmup_steps=100, |
|
lr_scheduler_type="constant", |
|
) |
|
``` |
|
|
|
|
|
## Evaluation and Testing |
|
|
|
From the dataset sample, 1500 randomly assigned were for evaluation and 3500 for testing. The whole fine tuning process took less than 30 minutes (with Colab's A100 GPU, accessible only with Colab Pro+) |
|
|
|
|
|
|
|
### Results |
|
|
|
- Epoch: 1 |
|
- Training Loss: 1.866900 |
|
- Validation Loss: 1.801998 |
|
|
|
#### Summary |
|
|
|
You can see the code in my github repo: https://github.com/KamilaTrinkenschuh/Ueberschriftengenerator_LEOLM |