pkshatech
/

GLuCoSE-base-ja-v2

@@ -32,19 +32,19 @@ license: apache-2.0
 This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
-The model is based on GLuCoSE and additional fine-tuned.
 Fine-tuning consists of the following steps.
 **Step 1: Ensemble distillation**
 - The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
-**Step 2: Contrast learning**
 -  Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
 - This training aimed to improve the overall performance as a sentence embedding model.
-**Step 3: Search-specific contrastive learning.**
 - In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
 - In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
@@ -133,28 +133,28 @@ You can finetune this model on your own dataset.
 ## Benchmarks
 ### Retieval
-Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA]（https://huggingface.co/datasets/hotchpotch/JQaRA） and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).
 | model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
 |--------|--------|---------------------|-------------------|-------------------|
-| me5-base | 0.3B | 84.2 | 47.2 | 25.4 |
 | GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
-| GLuCoSE v2 | 0.1B | 85.5 | 60.6 | 33.8 |
 ### JMTEB
 Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
-* Time-consuming [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated.
 * The average is a macro-average per task.
 | model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
 |--------|--------|--------|------|------|-------|-------|------|
-| me5-base | 0.3B | 75.1 | 80.6 | 80.5 | 52.6 | 62.4 | 70.2 |
-| GLuCoSE | 0.1B | 82.6 | 69.8 | 78.2 | 51.5 | 66.2 | 69.7 |
-| GLuCoSE v2 | 0.1B | 80.5 | 82.8 | 83.0 | 49.8 | 62.4 | 71.7 |
 ## Authors
-Chihiro Yano, Go Mocho, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe
 ## License
 This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

 This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
+The model is based on [GLuCoSE](https://huggingface.co/pkshatech/GLuCoSE-base-ja) and additionally fine-tuned.
 Fine-tuning consists of the following steps.
 **Step 1: Ensemble distillation**
 - The embedded representation was distilled using E5-mistral, gte-Qwen2 and mE5-large as teacher models.
+**Step 2: Contrastive learning**
 -  Triples were created from JSNLI, MNLI, PAWS-X, JSeM and Mr.TyDi and used for training.
 - This training aimed to improve the overall performance as a sentence embedding model.
+**Step 3: Search-specific contrastive learning**
 - In order to make the model more robust to the retrieval task, additional two-stage training with QA and question-answer data was conducted.
 - In the first stage, the synthetic dataset auto-wiki was used for training, while in the second stage, Japanese Wikipedia Human Retrieval, Mr.TyDi, MIRACL, JQaRA, MQA, Quiz Works and Quiz No Mori were used.
 ## Benchmarks
 ### Retieval
+Evaluated with [MIRACL-ja](https://huggingface.co/datasets/miracl/miracl), [JQARA](https://huggingface.co/datasets/hotchpotch/JQaRA) and [MLDR-ja](https://huggingface.co/datasets/Shitao/MLDR).
 | model | size | MIRACL<br>Recall@5 | JQaRA<br>nDCG@10 | MLDR<br>nDCG@10 |
 |--------|--------|---------------------|-------------------|-------------------|
+| mE5-base | 0.3B | 84.2 | 47.2 | 25.4 |
 | GLuCoSE | 0.1B | 53.3 | 30.8 | 25.2 |
+| GLuCoSE v2 | 0.1B | *85.5* | *60.6* | *33.8* |
 ### JMTEB
 Evaluated with [JMTEB](https://github.com/sbintuitions/JMTEB).
+* Time-consuming　datasets [‘amazon_review_classification’, ‘mrtydi’, ‘jaqket’, ‘esci’] were excluded and evaluated on the other 12 datasets.
 * The average is a macro-average per task.
 | model | size | Class. | Ret. | STS. | Clus. | Pair. | Avg. |
 |--------|--------|--------|------|------|-------|-------|------|
+| mE5-base | 0.3B | 75.1 | 80.6 | 80.5 | *52.6* | 62.4 | 70.2 |
+| GLuCoSE | 0.1B | *82.6* | 69.8 | 78.2 | 51.5 | *66.2* | 69.7 |
+| GLuCoSE v2 | 0.1B | 80.5 | *82.8* | *83.0* | 49.8 | 62.4 | *71.7* |
 ## Authors
+Chihiro Yano, Mocho Go, Hideyuki Tachibana, Hiroto Takegawa, Yotaro Watanabe
 ## License
 This model is published under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).