jupyterjazz commited on
Commit
f3b0d18
·
verified ·
1 Parent(s): 1ecacfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -123,11 +123,12 @@ library_name: transformers
123
  The easiest way to starting using `jina-embeddings-v3` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
124
 
125
 
126
- ## Intended Usage & Model info
127
 
128
 
129
  `jina-embeddings-v3` is a **multilingual multi-task text embedding model** designed for a variety of NLP applications.
130
- Based on the [XLM-RoBERTa architecture](https://huggingface.co/jinaai/xlm-roberta-flash-implementation), this model supports [Rotary Position Embeddings (RoPE)](https://arxiv.org/abs/2104.09864) to handle long sequences up to **8192 tokens**.
 
131
  Additionally, it features [LoRA](https://arxiv.org/abs/2106.09685) adapters to generate task-specific embeddings efficiently.
132
 
133
  ### Key Features:
@@ -143,11 +144,14 @@ Additionally, it features [LoRA](https://arxiv.org/abs/2106.09685) adapters to g
143
  ### Model Lineage:
144
 
145
  `jina-embeddings-v3` builds upon the [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) model, which was originally trained on 100 languages.
146
- We extended its capabilities with an extra pretraining phase on the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) dataset, then contrastively fine-tuned it on 30 languages for enhanced performance in both monolingual and cross-lingual setups.
 
147
 
148
  ### Supported Languages:
149
  While the base model supports 100 languages, we've focused our tuning efforts on the following 30 languages to maximize performance:
150
- **Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
 
 
151
 
152
 
153
  ## Data & Parameters
 
123
  The easiest way to starting using `jina-embeddings-v3` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
124
 
125
 
126
+ ## Intended Usage & Model Info
127
 
128
 
129
  `jina-embeddings-v3` is a **multilingual multi-task text embedding model** designed for a variety of NLP applications.
130
+ Based on the [XLM-RoBERTa architecture](https://huggingface.co/jinaai/xlm-roberta-flash-implementation),
131
+ this model supports [Rotary Position Embeddings (RoPE)](https://arxiv.org/abs/2104.09864) to handle long sequences up to **8192 tokens**.
132
  Additionally, it features [LoRA](https://arxiv.org/abs/2106.09685) adapters to generate task-specific embeddings efficiently.
133
 
134
  ### Key Features:
 
144
  ### Model Lineage:
145
 
146
  `jina-embeddings-v3` builds upon the [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) model, which was originally trained on 100 languages.
147
+ We extended its capabilities with an extra pretraining phase on the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) dataset,
148
+ then contrastively fine-tuned it on 30 languages for enhanced performance in both monolingual and cross-lingual setups.
149
 
150
  ### Supported Languages:
151
  While the base model supports 100 languages, we've focused our tuning efforts on the following 30 languages to maximize performance:
152
+ **Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek,
153
+ Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
154
+ Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
155
 
156
 
157
  ## Data & Parameters