ibm-granite
/

granite-embedding-107m-multilingual

Sentence Similarity

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

pawasthy commited on 4 days ago

Commit

75391cc

·

verified ·

1 Parent(s): 68b3a6d

Upload README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -22241,7 +22241,7 @@ Notably, we do not use the popular MS-MARCO retrieval dataset in our training co
 We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs.
 **Ethical Considerations and Limitations:**
-The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-278m-Multilingual is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size).
 **Resources**
 - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite

 We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs.
 **Ethical Considerations and Limitations:**
+The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-107m-Multilingual is finetuned on 12 languages, and has a context length of 512 tokens (longer texts will be truncated to this size).
 **Resources**
 - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite