Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,12 @@ tags:
|
|
9 |
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
This is a [
|
15 |
|
|
|
|
|
16 |
<!--- Describe your model here -->
|
17 |
|
18 |
## Usage (Sentence-Transformers)
|
@@ -74,14 +76,6 @@ print(sentence_embeddings)
|
|
74 |
|
75 |
|
76 |
|
77 |
-
## Evaluation Results
|
78 |
-
|
79 |
-
<!--- Describe how your model was evaluated -->
|
80 |
-
|
81 |
-
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=TatonkaHF/bge-m3_en_ru)
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
## Full Model Architecture
|
86 |
```
|
87 |
SentenceTransformer(
|
@@ -90,6 +84,12 @@ SentenceTransformer(
|
|
90 |
)
|
91 |
```
|
92 |
|
93 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
|
95 |
<!--- Describe where people can find more information -->
|
|
|
9 |
|
10 |
---
|
11 |
|
12 |
+
# bge-m3 model for english and russian
|
13 |
|
14 |
+
This is a tokenizer shrinked version of [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3).
|
15 |
|
16 |
+
The current model has only English and Russian tokens left in the vocabulary.
|
17 |
+
Thus, the vocabulary is 21% of the original, and number of parameters in the whole model is 63.3% of the original, without any loss in the quality of English and Russian embeddings.
|
18 |
<!--- Describe your model here -->
|
19 |
|
20 |
## Usage (Sentence-Transformers)
|
|
|
76 |
|
77 |
|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
## Full Model Architecture
|
80 |
```
|
81 |
SentenceTransformer(
|
|
|
84 |
)
|
85 |
```
|
86 |
|
87 |
+
## Reference:
|
88 |
+
|
89 |
+
Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu. [BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation](https://arxiv.org/abs/2402.03216).
|
90 |
+
|
91 |
+
Inspired by [LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) and [https://discuss.huggingface.co/t/tokenizer-shrinking-recipes/8564/1](https://discuss.huggingface.co/t/tokenizer-shrinking-recipes/8564/1).
|
92 |
+
|
93 |
+
License: [mit](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)
|
94 |
|
95 |
<!--- Describe where people can find more information -->
|