Update README.md
Browse files
README.md
CHANGED
@@ -32,19 +32,19 @@ license: apache-2.0
|
|
32 |
This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
33 |
|
34 |
## Model Details
|
35 |
-
The model is based on GLuCoSE and
|
36 |
Fine-tuning consists of the following steps.
|
37 |
|
38 |
**Step 1: Ensemble distillation**
|
39 |
|
40 |
- The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
|
41 |
|
42 |
-
**Step 2:
|
43 |
|
44 |
- Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
|
45 |
- This training aimed to improve the overall performance as a sentence embedding model.
|
46 |
|
47 |
-
**Step 3: Search-specific contrastive learning
|
48 |
|
49 |
- In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
|
50 |
- In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
|
@@ -133,28 +133,28 @@ You can finetune this model on your own dataset.
|
|
133 |
## Benchmarks
|
134 |
|
135 |
### Retieval
|
136 |
-
Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA]
|
137 |
|
138 |
| model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
|
139 |
|--------|--------|---------------------|-------------------|-------------------|
|
140 |
-
|
|
141 |
| GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
|
142 |
-
| GLuCoSE v2 | 0.1B | 85.5 | 60.6 | 33.8 |
|
143 |
|
144 |
### JMTEB
|
145 |
Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
|
146 |
-
* Time-consuming [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated.
|
147 |
* The average is a macro-average per task.
|
148 |
|
149 |
| model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
|
150 |
|--------|--------|--------|------|------|-------|-------|------|
|
151 |
-
|
|
152 |
-
| GLuCoSE | 0.1B | 82.6 | 69.8 | 78.2 | 51.5 | 66.2 | 69.7 |
|
153 |
-
| GLuCoSE v2 | 0.1B | 80.5 | 82.8 | 83.0 | 49.8 | 62.4 | 71.7 |
|
154 |
|
155 |
|
156 |
## Authors
|
157 |
-
Chihiro Yano, Go
|
158 |
|
159 |
## License
|
160 |
This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
|
|
32 |
This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
33 |
|
34 |
## Model Details
|
35 |
+
The model is based on [GLuCoSE](https://huggingface.co/pkshatech/GLuCoSE-base-ja) and additionally fine-tuned.
|
36 |
Fine-tuning consists of the following steps.
|
37 |
|
38 |
**Step 1: Ensemble distillation**
|
39 |
|
40 |
- The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
|
41 |
|
42 |
+
**Step 2: Contrastive learning**
|
43 |
|
44 |
- Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
|
45 |
- This training aimed to improve the overall performance as a sentence embedding model.
|
46 |
|
47 |
+
**Step 3: Search-specific contrastive learning**
|
48 |
|
49 |
- In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
|
50 |
- In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
|
|
|
133 |
## Benchmarks
|
134 |
|
135 |
### Retieval
|
136 |
+
Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA](https://huggingface.co/datasets/hotchpotch/JQaRA) and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).
|
137 |
|
138 |
| model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
|
139 |
|--------|--------|---------------------|-------------------|-------------------|
|
140 |
+
| mE5-base | 0.3B | 84.2 | 47.2 | 25.4 |
|
141 |
| GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
|
142 |
+
| GLuCoSE v2 | 0.1B | *85.5* | *60.6* | *33.8* |
|
143 |
|
144 |
### JMTEB
|
145 |
Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
|
146 |
+
* Time-consuming datasets [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated on the other 12 datasets.
|
147 |
* The average is a macro-average per task.
|
148 |
|
149 |
| model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
|
150 |
|--------|--------|--------|------|------|-------|-------|------|
|
151 |
+
| mE5-base | 0.3B | 75.1 | 80.6 | 80.5 | *52.6* | 62.4 | 70.2 |
|
152 |
+
| GLuCoSE | 0.1B | *82.6* | 69.8 | 78.2 | 51.5 | *66.2* | 69.7 |
|
153 |
+
| GLuCoSE v2 | 0.1B | 80.5 | *82.8* | *83.0* | 49.8 | 62.4 | *71.7* |
|
154 |
|
155 |
|
156 |
## Authors
|
157 |
+
Chihiro Yano, Mocho Go, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe
|
158 |
|
159 |
## License
|
160 |
This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|