juliuslipp
commited on
Commit
•
d41dac6
1
Parent(s):
456b7cf
Update README.md
Browse files
README.md
CHANGED
@@ -2740,8 +2740,31 @@ console.log(similarities); // [0.7919578577247139, 0.6369278664248345, 0.1651201
|
|
2740 |
|
2741 |
### Using API
|
2742 |
|
2743 |
-
You
|
2744 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2745 |
|
2746 |
## Evaluation
|
2747 |
As of March 2024, our model archives SOTA performance for Bert-large sized models on the [MTEB](https://huggingface.co/spaces/mteb/leaderboard). It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the [echo-mistral-7b](https://huggingface.co/jspringer/echo-mistral-7b-instruct-lasttoken). Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.
|
|
|
2740 |
|
2741 |
### Using API
|
2742 |
|
2743 |
+
You can use the Model via our API as follows.
|
2744 |
|
2745 |
+
```python
|
2746 |
+
from mixedbread_ai.client import MixedbreadAI
|
2747 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
2748 |
+
import os
|
2749 |
+
|
2750 |
+
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
|
2751 |
+
|
2752 |
+
english_sentences = [
|
2753 |
+
'What is the capital of Australia?',
|
2754 |
+
'Canberra is the capital of Australia.'
|
2755 |
+
]
|
2756 |
+
|
2757 |
+
res = mxbai.embeddings(
|
2758 |
+
input=english_sentences,
|
2759 |
+
model="mixedbread-ai/mxbai-embed-large-v1"
|
2760 |
+
)
|
2761 |
+
embeddings = [entry.embedding for entry in res.data]
|
2762 |
+
|
2763 |
+
similarities = cosine_similarity([embeddings[0]], [embeddings[1]])
|
2764 |
+
print(similarities)
|
2765 |
+
```
|
2766 |
+
|
2767 |
+
The API comes with native INT8 and binary quantization support!
|
2768 |
|
2769 |
## Evaluation
|
2770 |
As of March 2024, our model archives SOTA performance for Bert-large sized models on the [MTEB](https://huggingface.co/spaces/mteb/leaderboard). It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the [echo-mistral-7b](https://huggingface.co/jspringer/echo-mistral-7b-instruct-lasttoken). Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.
|