|
--- |
|
inference: false |
|
license: apache-2.0 |
|
language: |
|
- de |
|
datasets: |
|
- DEplain/DEplain-APA-doc |
|
metrics: |
|
- sari |
|
- bleu |
|
- bertscore |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- text simplification |
|
- plain language |
|
- easy-to-read language |
|
- document simplification |
|
--- |
|
|
|
# DEplain German Text Simplification |
|
|
|
This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. |
|
Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain) |
|
|
|
We reused the codes from [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) to do our experiments. |
|
|
|
### Model Description |
|
|
|
The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language. |
|
|
|
The model was finetuned towards the task of German text simplification of documents. |
|
|
|
The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only. |
|
|
|
### Model Usage |
|
|
|
This model can't be used in the HuggingFace interface or via the .from_pretrained method currently. As it's a finetuning of a custom model (LongMBart), which hasn't been registered on HF yet. |
|
You can find this custom model codes at: [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) |
|
|
|
To test this model checkpoint, you need to clone the checkpoint repository as follows: |
|
|
|
``` |
|
# Make sure you have git-lfs installed (https://git-lfs.com) |
|
git lfs install |
|
git clone https://huggingface.co/DEplain/trimmed_longmbart_docs_apa |
|
|
|
# if you want to clone without large files – just their pointers |
|
# prepend your git clone with the following env var: |
|
GIT_LFS_SKIP_SMUDGE=1 |
|
``` |
|
|
|
Then set up the conda environment via: |
|
``` |
|
conda env create -f environment.yaml |
|
``` |
|
|
|
Then follow the procedure in the notebook `generation.ipynb`. |