jinaai
/

jina-embeddings-v3

@@ -25015,7 +25015,7 @@ model-index:
 <br><br>
 <p align="center">
-<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
 </p>
@@ -25029,7 +25029,7 @@ model-index:
 ## Quick Start
-[Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3-vm) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
 ## Intended Usage & Model Info
@@ -25056,13 +25056,6 @@ While the foundation model supports 100 languages, we've focused our tuning effo
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
-> **⚠️ Important Notice:**
-> We fixed a bug in the `encode` function [#60](https://huggingface.co/jinaai/jina-embeddings-v3/discussions/60) where **Matryoshka embedding truncation** occurred *after normalization*, leading to non-normalized truncated embeddings. This issue has been resolved in the latest code revision.
->
-> If you have encoded data using the previous version and wish to maintain consistency, please use the specific code revision when loading the model: `AutoModel.from_pretrained('jinaai/jina-embeddings-v3', code_revision='da863dd04a4e5dce6814c6625adfba87b83838aa', ...)`
 ## Usage
 **<details><summary>Apply mean pooling when integrating the model.</summary>**
@@ -25213,15 +25206,6 @@ import onnxruntime
 import numpy as np
 from transformers import AutoTokenizer, PretrainedConfig
-# Mean pool function
-def mean_pooling(model_output: np.ndarray, attention_mask: np.ndarray):
-    token_embeddings = model_output
-    input_mask_expanded = np.expand_dims(attention_mask, axis=-1)
-    input_mask_expanded = np.broadcast_to(input_mask_expanded, token_embeddings.shape)
-    sum_embeddings = np.sum(token_embeddings * input_mask_expanded, axis=1)
-    sum_mask = np.clip(np.sum(input_mask_expanded, axis=1), a_min=1e-9, a_max=None)
-    return sum_embeddings / sum_mask
 # Load tokenizer and model config
 tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
 config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
@@ -25243,11 +25227,7 @@ inputs = {
 }
 # Run model
-outputs = session.run(None, inputs)[0]
-# Apply mean pooling and normalization to the model outputs
-embeddings = mean_pooling(outputs, input_text["attention_mask"])
-embeddings = embeddings / np.linalg.norm(embeddings, ord=2, axis=1, keepdims=True)
 ```
 </p>

 <br><br>
 <p align="center">
+<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
 </p>
 ## Quick Start
+[Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
 ## Intended Usage & Model Info
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
 ## Usage
 **<details><summary>Apply mean pooling when integrating the model.</summary>**
 import numpy as np
 from transformers import AutoTokenizer, PretrainedConfig
 # Load tokenizer and model config
 tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
 config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
 }
 # Run model
+outputs = session.run(None, inputs)
 ```
 </p>