Update README.md
Browse files
README.md
CHANGED
@@ -2,8 +2,6 @@
|
|
2 |
language:
|
3 |
- de
|
4 |
- en
|
5 |
-
datasets:
|
6 |
-
- todo
|
7 |
pipeline_tag: sentence-similarity
|
8 |
tags:
|
9 |
- semantic textual similarity
|
@@ -14,11 +12,6 @@ tags:
|
|
14 |
- sentence-transformer
|
15 |
- feature-extraction
|
16 |
- transformers
|
17 |
-
task_categories:
|
18 |
-
- sentence-similarity
|
19 |
-
- feature-extraction
|
20 |
-
- text-retrieval
|
21 |
-
- other
|
22 |
---
|
23 |
|
24 |
# Model card for PM-AI/sts_paraphrase_xlm-roberta-base_de-en
|
@@ -44,18 +37,18 @@ In terms of content, the samples are based on rather simple sentences.
|
|
44 |
|
45 |
When the TSystems model was published, only the STSb dataset was used for STS training.
|
46 |
Therefore it is included in our model, but expanded to include SICK and Priya22 semantic textual relatedness:
|
47 |
-
- SICK was partly used in STSb, but our
|
48 |
- The Priya22 semantic textual relatedness dataset published in 2022 was also translated into German via DeepL and added to the training data. Since it does not have a train-test-split, it was created independently at a ratio of 80:20.
|
49 |
The rating scale of all datasets has been adjusted to STSb with a value range from 0 to 5.
|
50 |
All training and test data (STSb, Sick, Priya22) were checked for duplicates within and with each other and removed if found.
|
51 |
Because the test data is prioritized, duplicated entries between test-train are exclusively removed from train split.
|
52 |
-
The final used datasets can be viewed here:
|
53 |
|
54 |
### Training
|
55 |
Befor fine-tuning for STS we made the English paraphrasing model [paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1) usable for German by applying **[Knowledge Distillation](https://arxiv.org/abs/2004.09813)** (_Teacher-Student_ approach).
|
56 |
The TSystems model used version 1, which is based on 7 different datasets and contains around 24.6 million samples.
|
57 |
We are using version 2 with 12 datasets and about 83.3 million examples.
|
58 |
-
Details for this process here:
|
59 |
|
60 |
For fine-tuning we are using SBERT's [training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/b86eec31cf0a102ad786ba1ff31bfeb4998d3ca5/examples/training/sts/training_stsbenchmark_continue_training.py) training script.
|
61 |
One thing has been changed in this training script: when a sentence pair consists of identical utterances the score is set to 5.0 (maximum).
|
@@ -63,10 +56,10 @@ It makes no sense to say identical sentences have a score of 4.8 or 4.9.
|
|
63 |
|
64 |
#### Parameterization of training
|
65 |
- **Script:** [training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/b86eec31cf0a102ad786ba1ff31bfeb4998d3ca5/examples/training/sts/training_stsbenchmark_continue_training.py)
|
66 |
-
- **Datasets:**
|
67 |
- **GPU:** NVIDIA A40 (Driver Version: 515.48.07; CUDA Version: 11.7)
|
68 |
- **Batch Size:** 32
|
69 |
-
- **Base Model:**
|
70 |
- **Loss Function:** Cosine Similarity
|
71 |
- **Learning Rate:** 2e-5
|
72 |
- **Epochs:** 3
|
@@ -88,7 +81,7 @@ The first table shows the evaluation results for **cross-lingual (German-English
|
|
88 |
:-----:|:-----:|:-----:|:-----:|:-----:
|
89 |
[PM-AI/sts_paraphrase_xlm-roberta-base_de-en (ours)](https://huggingface.co/PM-AI/sts_paraphrase_xlm-roberta-base_de-en) | 0.8672 <br /> 🏆 | 0.8639 <br /> 🏆 | 0.8354 <br /> 🏆 | 0.8711 <br /> 🏆
|
90 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8525 | 0.7642 | 0.7998 | 0.8216
|
91 |
-
[
|
92 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8310 | 0.7529 | 0.8184 | 0.8102
|
93 |
[sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual) | 0.8194 | 0.7703 | 0.7566 | 0.7998
|
94 |
[sentence-transformers/paraphrase-xlm-r-multilingual-v1](https://huggingface.co/sentence-transformers/paraphrase-xlm-r-multilingual-v1) | 0.7985 | 0.7217 | 0.7975 | 0.7838
|
@@ -114,7 +107,7 @@ The second table shows the evaluation results for **German only** based on _Spea
|
|
114 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8547 | 0.8047 | 0.8068 | 0.8327
|
115 |
[Sahajtomar/German-semantic](https://huggingface.co/Sahajtomar/German-semantic) | 0.8485 | 0.7915 | 0.8139 | 0.8280
|
116 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8360 | 0.7941 | 0.8237 | 0.8178
|
117 |
-
[
|
118 |
[sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual) | 0.8190 | 0.8027 | 0.7674 | 0.8072
|
119 |
[sentence-transformers/paraphrase-xlm-r-multilingual-v1](https://huggingface.co/sentence-transformers/paraphrase-xlm-r-multilingual-v1) | 0.8079 | 0.7844 | 0.8126 | 0.8034
|
120 |
[sentence-transformers/xlm-r-distilroberta-base-paraphrase-v1](https://huggingface.co/sentence-transformers/xlm-r-distilroberta-base-paraphrase-v1) | 0.8079 | 0.7844 | 0.8126 | 0.8034
|
@@ -136,7 +129,7 @@ And last but not least our third table which shows the evaluation results for **
|
|
136 |
:-----:|:-----:|:-----:|:-----:|:-----:
|
137 |
[PM-AI/sts_paraphrase_xlm-roberta-base_de-en (ours)](https://huggingface.co/PM-AI/sts_paraphrase_xlm-roberta-base_de-en) | 0.8768 <br /> 🏆 | 0.8705 <br /> 🏆 | 0.8402 | 0.8748 <br /> 🏆
|
138 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8682 | 0.8065 | 0.8430 | 0.8378
|
139 |
-
[
|
140 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8660 | 0.7897 | 0.8097 | 0.8308
|
141 |
[sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) | 0.8441 | 0.8059 | 0.8175 | 0.8300
|
142 |
[sentence-transformers/sentence-t5-base](https://huggingface.co/sentence-transformers/sentence-t5-base) | 0.8551 | 0.8063 | 0.8434 | 0.8235
|
|
|
2 |
language:
|
3 |
- de
|
4 |
- en
|
|
|
|
|
5 |
pipeline_tag: sentence-similarity
|
6 |
tags:
|
7 |
- semantic textual similarity
|
|
|
12 |
- sentence-transformer
|
13 |
- feature-extraction
|
14 |
- transformers
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
# Model card for PM-AI/sts_paraphrase_xlm-roberta-base_de-en
|
|
|
37 |
|
38 |
When the TSystems model was published, only the STSb dataset was used for STS training.
|
39 |
Therefore it is included in our model, but expanded to include SICK and Priya22 semantic textual relatedness:
|
40 |
+
- SICK was partly used in STSb, but our custom translation using [DeepL](https://www.deepl.com/) leads to slightly different phrases. This approach allows more examples to be included in the training.
|
41 |
- The Priya22 semantic textual relatedness dataset published in 2022 was also translated into German via DeepL and added to the training data. Since it does not have a train-test-split, it was created independently at a ratio of 80:20.
|
42 |
The rating scale of all datasets has been adjusted to STSb with a value range from 0 to 5.
|
43 |
All training and test data (STSb, Sick, Priya22) were checked for duplicates within and with each other and removed if found.
|
44 |
Because the test data is prioritized, duplicated entries between test-train are exclusively removed from train split.
|
45 |
+
The final used datasets can be viewed here: [datasets_sts_paraphrase_xlm-roberta-base_de-en](https://gitlab.com/sense.ai.tion-public/datasets_sts_paraphrase_xlm-roberta-base_de-en)
|
46 |
|
47 |
### Training
|
48 |
Befor fine-tuning for STS we made the English paraphrasing model [paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1) usable for German by applying **[Knowledge Distillation](https://arxiv.org/abs/2004.09813)** (_Teacher-Student_ approach).
|
49 |
The TSystems model used version 1, which is based on 7 different datasets and contains around 24.6 million samples.
|
50 |
We are using version 2 with 12 datasets and about 83.3 million examples.
|
51 |
+
Details for this process here: [PM-AI/paraphrase-distilroberta-base-v2_de-en](https://huggingface.co/PM-AI/paraphrase-distilroberta-base-v2_de-en)
|
52 |
|
53 |
For fine-tuning we are using SBERT's [training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/b86eec31cf0a102ad786ba1ff31bfeb4998d3ca5/examples/training/sts/training_stsbenchmark_continue_training.py) training script.
|
54 |
One thing has been changed in this training script: when a sentence pair consists of identical utterances the score is set to 5.0 (maximum).
|
|
|
56 |
|
57 |
#### Parameterization of training
|
58 |
- **Script:** [training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/b86eec31cf0a102ad786ba1ff31bfeb4998d3ca5/examples/training/sts/training_stsbenchmark_continue_training.py)
|
59 |
+
- **Datasets:** [datasets_sts_paraphrase_xlm-roberta-base_de-en](https://gitlab.com/sense.ai.tion-public/datasets_sts_paraphrase_xlm-roberta-base_de-en)
|
60 |
- **GPU:** NVIDIA A40 (Driver Version: 515.48.07; CUDA Version: 11.7)
|
61 |
- **Batch Size:** 32
|
62 |
+
- **Base Model:** [PM-AI/paraphrase-distilroberta-base-v2_de-en](PM-AI/paraphrase-distilroberta-base-v2_de-en)
|
63 |
- **Loss Function:** Cosine Similarity
|
64 |
- **Learning Rate:** 2e-5
|
65 |
- **Epochs:** 3
|
|
|
81 |
:-----:|:-----:|:-----:|:-----:|:-----:
|
82 |
[PM-AI/sts_paraphrase_xlm-roberta-base_de-en (ours)](https://huggingface.co/PM-AI/sts_paraphrase_xlm-roberta-base_de-en) | 0.8672 <br /> 🏆 | 0.8639 <br /> 🏆 | 0.8354 <br /> 🏆 | 0.8711 <br /> 🏆
|
83 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8525 | 0.7642 | 0.7998 | 0.8216
|
84 |
+
[PM-AI/paraphrase-distilroberta-base-v2_de-en (ours, no fine-tuning)](PM-AI/paraphrase-distilroberta-base-v2_de-en) | 0.8225 | 0.7579 | 0.8255 | 0.8109
|
85 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8310 | 0.7529 | 0.8184 | 0.8102
|
86 |
[sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual) | 0.8194 | 0.7703 | 0.7566 | 0.7998
|
87 |
[sentence-transformers/paraphrase-xlm-r-multilingual-v1](https://huggingface.co/sentence-transformers/paraphrase-xlm-r-multilingual-v1) | 0.7985 | 0.7217 | 0.7975 | 0.7838
|
|
|
107 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8547 | 0.8047 | 0.8068 | 0.8327
|
108 |
[Sahajtomar/German-semantic](https://huggingface.co/Sahajtomar/German-semantic) | 0.8485 | 0.7915 | 0.8139 | 0.8280
|
109 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8360 | 0.7941 | 0.8237 | 0.8178
|
110 |
+
[PM-AI/paraphrase-distilroberta-base-v2_de-en (ours, no fine-tuning)](PM-AI/paraphrase-distilroberta-base-v2_de-en) | 0.8297 | 0.7930 | 0.8341 | 0.8170
|
111 |
[sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual) | 0.8190 | 0.8027 | 0.7674 | 0.8072
|
112 |
[sentence-transformers/paraphrase-xlm-r-multilingual-v1](https://huggingface.co/sentence-transformers/paraphrase-xlm-r-multilingual-v1) | 0.8079 | 0.7844 | 0.8126 | 0.8034
|
113 |
[sentence-transformers/xlm-r-distilroberta-base-paraphrase-v1](https://huggingface.co/sentence-transformers/xlm-r-distilroberta-base-paraphrase-v1) | 0.8079 | 0.7844 | 0.8126 | 0.8034
|
|
|
129 |
:-----:|:-----:|:-----:|:-----:|:-----:
|
130 |
[PM-AI/sts_paraphrase_xlm-roberta-base_de-en (ours)](https://huggingface.co/PM-AI/sts_paraphrase_xlm-roberta-base_de-en) | 0.8768 <br /> 🏆 | 0.8705 <br /> 🏆 | 0.8402 | 0.8748 <br /> 🏆
|
131 |
[sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 0.8682 | 0.8065 | 0.8430 | 0.8378
|
132 |
+
[PM-AI/paraphrase-distilroberta-base-v2_de-en (ours, no fine-tuning)](PM-AI/paraphrase-distilroberta-base-v2_de-en) | 0.8597 | 0.8105 | 0.8399 | 0.8363
|
133 |
[T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) | 0.8660 | 0.7897 | 0.8097 | 0.8308
|
134 |
[sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) | 0.8441 | 0.8059 | 0.8175 | 0.8300
|
135 |
[sentence-transformers/sentence-t5-base](https://huggingface.co/sentence-transformers/sentence-t5-base) | 0.8551 | 0.8063 | 0.8434 | 0.8235
|