can I use task = "retrieval.query" with transformers library?
hi, I would like to use task = "retrieval.query" lora adapter but I am not sure if I can with transformers or if I must use sentence transformers instead.
I had embedded my initial documents with transformers library - not sure if that also means that I have to go back and embed my documents with sentence transformers.
all help appreciated ty!
Hi @k0rruptt , yes you can use retrieval.query like this:
from transformers import AutoModel
model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
embeddings = model.encode('some text', task="retrieval.query")
if you used transformers without specifying a task it means you haven't used any adapters
Hi Folks,
Sorry to dig up this thread again,
But i wanted to know how to use the retrieval.query lora adapter if im using the model in late-chunking mode?
as in, my inference code looks like this:
# Get all token embeddings using the model
with torch.no_grad():
model_output = model(**tokens)
# Pass token embeddings to late chunking
late_chunking(model_output, [chunk_start_token, chunk_end_token ])
This is indeed a bit non-trivial. Here is some code example (that uses the retrieval.passage adapter):
import torch
from transformers import AutoModel, AutoTokenizer
TASK = 'retrieval.passage'
MODEL_NAME = 'jinaai/jina-embeddings-v3'
TEXT = """Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?'"""
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
input = tokenizer(TEXT, return_tensors='pt')
task_id = model._adaptation_map[TASK]
num_examples = input['input_ids'].shape[0]
adapter_mask = torch.full((num_examples,), task_id, dtype=torch.int32, device=model.device)
model_output = model(**input, adapter_mask=adapter_mask)
# from here on you can apply late chunking
You can also take a look at this implementation in the late chunking repo:
Some remarks:
- The first token embeddings are embeddings of an instruction because 'retrieval.query' and 'retrieval.passage' append some simple instruction. In the API and the experiments that we conducted for the late-chunking paper, we used the instruction tokens for the first chunk, but there is not a strong argument for or against using them.
- Regarding the implemenation in the late-chunking github repository: We used late chunking for retrieval in our experiments. In this case you don't apply chunking on queries and therefore the reference implementation that I linked ( implements late chunking only for 'retrieval.passage'. The adapted forward function is not used for encoding queries. This is just using
which is using the old forward function. Therefore we basically hard-coded to use the 'retrieval.passage' adapter: in the adjusted forward.