zpn commited on
Commit
24800f1
·
1 Parent(s): d068da0

docs: usage

Browse files
Files changed (1) hide show
  1. README.md +47 -4
README.md CHANGED
@@ -2902,8 +2902,7 @@ base_model:
2902
 
2903
  # ModernBERT Embed
2904
 
2905
- ModernBERT Embed is an embedding model trained from [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), brining the new advances of ModernBERT to embeddings!
2906
-
2907
 
2908
  ## Performance
2909
 
@@ -2912,5 +2911,49 @@ ModernBERT Embed is an embedding model trained from [ModernBERT-base](https://hu
2912
  | nomic-embed-text-v1 | 768 | 62.4 | 74.1 | 43.9 | 85.2 | 55.7 | 52.8 | 82.1 | 30.1 |
2913
  | nomic-embed-text-v1.5 | 768 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01 | 81.94 | 30.4 |
2914
  | ModernBERT | 768 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 |
2915
- | nomic-embed-text-v1.5 | 256 | 61.04 | 72.1 | 43.16 | 84.09 | 55.18 | 50.81 | 81.34 | 30.05 |
2916
- | ModernBERT | 256 | 61.17 | 72.40 | 43.82 | 83.45 | 55.69 | 50.62 | 81.12 | 31.27 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2902
 
2903
  # ModernBERT Embed
2904
 
2905
+ ModernBERT Embed is an embedding model trained from [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), brining the new advances of ModernBERT to embeddings!
 
2906
 
2907
  ## Performance
2908
 
 
2911
  | nomic-embed-text-v1 | 768 | 62.4 | 74.1 | 43.9 | 85.2 | 55.7 | 52.8 | 82.1 | 30.1 |
2912
  | nomic-embed-text-v1.5 | 768 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | 53.01 | 81.94 | 30.4 |
2913
  | ModernBERT | 768 | 62.62 | 74.31 | 44.98 | 83.96 | 56.42 | 52.89 | 81.78 | 31.39 |
2914
+ | nomic-embed-text-v1.5 | 256 | 61.04 | 72.1 | 43.16 | 84.09 | 55.18 | 50.81 | 81.34|
2915
+ | ModernBERT | 256 | 61.17 | 72.40 | 43.82 | 83.45 | 55.69 | 50.62 | 81.12 | 31.27 |
2916
+
2917
+ ## Usage
2918
+
2919
+ You can use these models directly with the transformers library. Until the next transformers release, doing so requires installing transformers from main:
2920
+
2921
+ ```bash
2922
+ pip install git+https://github.com/huggingface/transformers.git
2923
+ ```
2924
+
2925
+ Reminder, this model is trained similarly to Nomic Embed and **REQUIRES** prefixes to be added to the input. For more information, see the instructions in [Nomic Embed](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5#task-instruction-prefixes).
2926
+
2927
+ Most use cases, adding `search_query` to the query and `search_document` to the documents will be sufficient.
2928
+
2929
+ ### Transformers
2930
+
2931
+ ```python
2932
+ import torch
2933
+ import torch.nn.functional as F
2934
+ from transformers import AutoTokenizer, AutoModel
2935
+
2936
+ def mean_pooling(model_output, attention_mask):
2937
+ token_embeddings = model_output[0]
2938
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2939
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2940
+
2941
+ sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
2942
+
2943
+ tokenizer = AutoTokenizer.from_pretrained('nomic-ai/modernbert-embed')
2944
+ model = AutoModel.from_pretrained('nomic-ai/modernbert-embed')
2945
+ model.eval()
2946
+
2947
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2948
+
2949
+ matryoshka_dim = 768
2950
+
2951
+ with torch.no_grad():
2952
+ model_output = model(**encoded_input)
2953
+
2954
+
2955
+ embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
2956
+ embeddings = embeddings[:, :matryoshka_dim]
2957
+ embeddings = F.normalize(embeddings, p=2, dim=1)
2958
+ print(embeddings)
2959
+ ```