yano0 commited on
Commit
ab2bd8c
·
verified ·
1 Parent(s): fbeabe8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -32,19 +32,19 @@ license: apache-2.0
32
  This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
33
 
34
  ## Model Details
35
- The model is based on GLuCoSE and additional fine-tuned.
36
  Fine-tuning consists of the following steps.
37
 
38
  **Step 1: Ensemble distillation**
39
 
40
  - The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
41
 
42
- **Step 2: Contrast learning**
43
 
44
  - Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
45
  - This training aimed to improve the overall performance as a sentence embedding model.
46
 
47
- **Step 3: Search-specific contrastive learning.**
48
 
49
  - In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
50
  - In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
@@ -133,28 +133,28 @@ You can finetune this model on your own dataset.
133
  ## Benchmarks
134
 
135
  ### Retieval
136
- Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA]https://huggingface.co/datasets/hotchpotch/JQaRA and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).
137
 
138
  | model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
139
  |--------|--------|---------------------|-------------------|-------------------|
140
- | me5-base | 0.3B | 84.2 | 47.2 | 25.4 |
141
  | GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
142
- | GLuCoSE v2 | 0.1B | 85.5 | 60.6 | 33.8 |
143
 
144
  ### JMTEB
145
  Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
146
- * Time-consuming [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated.
147
  * The average is a macro-average per task.
148
 
149
  | model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
150
  |--------|--------|--------|------|------|-------|-------|------|
151
- | me5-base | 0.3B | 75.1 | 80.6 | 80.5 | 52.6 | 62.4 | 70.2 |
152
- | GLuCoSE | 0.1B | 82.6 | 69.8 | 78.2 | 51.5 | 66.2 | 69.7 |
153
- | GLuCoSE v2 | 0.1B | 80.5 | 82.8 | 83.0 | 49.8 | 62.4 | 71.7 |
154
 
155
 
156
  ## Authors
157
- Chihiro Yano, Go Mocho, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe
158
 
159
  ## License
160
  This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 
32
  This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
33
 
34
  ## Model Details
35
+ The model is based on [GLuCoSE](https://huggingface.co/pkshatech/GLuCoSE-base-ja) and additionally fine-tuned.
36
  Fine-tuning consists of the following steps.
37
 
38
  **Step 1: Ensemble distillation**
39
 
40
  - The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
41
 
42
+ **Step 2: Contrastive learning**
43
 
44
  - Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
45
  - This training aimed to improve the overall performance as a sentence embedding model.
46
 
47
+ **Step 3: Search-specific contrastive learning**
48
 
49
  - In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
50
  - In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
 
133
  ## Benchmarks
134
 
135
  ### Retieval
136
+ Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA](https://huggingface.co/datasets/hotchpotch/JQaRA) and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).
137
 
138
  | model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
139
  |--------|--------|---------------------|-------------------|-------------------|
140
+ | mE5-base | 0.3B | 84.2 | 47.2 | 25.4 |
141
  | GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
142
+ | GLuCoSE v2 | 0.1B | *85.5* | *60.6* | *33.8* |
143
 
144
  ### JMTEB
145
  Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
146
+ * Time-consuming datasets [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated on the other 12 datasets.
147
  * The average is a macro-average per task.
148
 
149
  | model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
150
  |--------|--------|--------|------|------|-------|-------|------|
151
+ | mE5-base | 0.3B | 75.1 | 80.6 | 80.5 | *52.6* | 62.4 | 70.2 |
152
+ | GLuCoSE | 0.1B | *82.6* | 69.8 | 78.2 | 51.5 | *66.2* | 69.7 |
153
+ | GLuCoSE v2 | 0.1B | 80.5 | *82.8* | *83.0* | 49.8 | 62.4 | *71.7* |
154
 
155
 
156
  ## Authors
157
+ Chihiro Yano, Mocho Go, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe
158
 
159
  ## License
160
  This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).