chore-add-mmteb

#62
by bwang0911 - opened
Files changed (1) hide show
  1. README.md +3 -23
README.md CHANGED
@@ -25015,7 +25015,7 @@ model-index:
25015
  <br><br>
25016
 
25017
  <p align="center">
25018
- <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
25019
  </p>
25020
 
25021
 
@@ -25029,7 +25029,7 @@ model-index:
25029
 
25030
  ## Quick Start
25031
 
25032
- [Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3-vm) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
25033
 
25034
 
25035
  ## Intended Usage & Model Info
@@ -25056,13 +25056,6 @@ While the foundation model supports 100 languages, we've focused our tuning effo
25056
  Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
25057
  Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
25058
 
25059
-
25060
- > **⚠️ Important Notice:**
25061
- > We fixed a bug in the `encode` function [#60](https://huggingface.co/jinaai/jina-embeddings-v3/discussions/60) where **Matryoshka embedding truncation** occurred *after normalization*, leading to non-normalized truncated embeddings. This issue has been resolved in the latest code revision.
25062
- >
25063
- > If you have encoded data using the previous version and wish to maintain consistency, please use the specific code revision when loading the model: `AutoModel.from_pretrained('jinaai/jina-embeddings-v3', code_revision='da863dd04a4e5dce6814c6625adfba87b83838aa', ...)`
25064
-
25065
-
25066
  ## Usage
25067
 
25068
  **<details><summary>Apply mean pooling when integrating the model.</summary>**
@@ -25213,15 +25206,6 @@ import onnxruntime
25213
  import numpy as np
25214
  from transformers import AutoTokenizer, PretrainedConfig
25215
 
25216
- # Mean pool function
25217
- def mean_pooling(model_output: np.ndarray, attention_mask: np.ndarray):
25218
- token_embeddings = model_output
25219
- input_mask_expanded = np.expand_dims(attention_mask, axis=-1)
25220
- input_mask_expanded = np.broadcast_to(input_mask_expanded, token_embeddings.shape)
25221
- sum_embeddings = np.sum(token_embeddings * input_mask_expanded, axis=1)
25222
- sum_mask = np.clip(np.sum(input_mask_expanded, axis=1), a_min=1e-9, a_max=None)
25223
- return sum_embeddings / sum_mask
25224
-
25225
  # Load tokenizer and model config
25226
  tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
25227
  config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
@@ -25243,11 +25227,7 @@ inputs = {
25243
  }
25244
 
25245
  # Run model
25246
- outputs = session.run(None, inputs)[0]
25247
-
25248
- # Apply mean pooling and normalization to the model outputs
25249
- embeddings = mean_pooling(outputs, input_text["attention_mask"])
25250
- embeddings = embeddings / np.linalg.norm(embeddings, ord=2, axis=1, keepdims=True)
25251
  ```
25252
 
25253
  </p>
 
25015
  <br><br>
25016
 
25017
  <p align="center">
25018
+ <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
25019
  </p>
25020
 
25021
 
 
25029
 
25030
  ## Quick Start
25031
 
25032
+ [Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
25033
 
25034
 
25035
  ## Intended Usage & Model Info
 
25056
  Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
25057
  Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
25058
 
 
 
 
 
 
 
 
25059
  ## Usage
25060
 
25061
  **<details><summary>Apply mean pooling when integrating the model.</summary>**
 
25206
  import numpy as np
25207
  from transformers import AutoTokenizer, PretrainedConfig
25208
 
 
 
 
 
 
 
 
 
 
25209
  # Load tokenizer and model config
25210
  tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
25211
  config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
 
25227
  }
25228
 
25229
  # Run model
25230
+ outputs = session.run(None, inputs)
 
 
 
 
25231
  ```
25232
 
25233
  </p>