can I use task = "retrieval.query" with transformers library?
hi, I would like to use task = "retrieval.query" lora adapter but I am not sure if I can with transformers or if I must use sentence transformers instead.
I had embedded my initial documents with transformers library - not sure if that also means that I have to go back and embed my documents with sentence transformers.
all help appreciated ty!
Hi @k0rruptt , yes you can use retrieval.query like this:
from transformers import AutoModel
model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
embeddings = model.encode('some text', task="retrieval.query")
if you used transformers without specifying a task it means you haven't used any adapters
Hi Folks,
Sorry to dig up this thread again,
But i wanted to know how to use the retrieval.query lora adapter if im using the model in late-chunking mode?
as in, my inference code looks like this:
# Get all token embeddings using the model
with torch.no_grad():
model_output = model(**tokens)
# Pass token embeddings to late chunking
late_chunking(model_output, [chunk_start_token, chunk_end_token ])
This is indeed a bit non-trivial. Here is some code example (that uses the retrieval.passage adapter):
import torch
from transformers import AutoModel, AutoTokenizer
TASK = 'retrieval.passage'
MODEL_NAME = 'jinaai/jina-embeddings-v3'
TEXT = """Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?'"""
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
input = tokenizer(TEXT, return_tensors='pt')
task_id = model._adaptation_map[TASK]
num_examples = input['input_ids'].shape[0]
adapter_mask = torch.full((num_examples,), task_id, dtype=torch.int32, device=model.device)
model_output = model(**input, adapter_mask=adapter_mask)
# from here on you can apply late chunking
You can also take a look at this implementation in the late chunking repo: https://github.com/jina-ai/late-chunking/blob/1d3bb02bf091becd0771455e4e7959463935e26c/chunked_pooling/wrappers.py#L51-L57
Some remarks:
- The first token embeddings are embeddings of an instruction because 'retrieval.query' and 'retrieval.passage' append some simple instruction. In the API and the experiments that we conducted for the late-chunking paper, we used the instruction tokens for the first chunk, but there is not a strong argument for or against using them.
- Regarding the implemenation in the late-chunking github repository: We used late chunking for retrieval in our experiments. In this case you don't apply chunking on queries and therefore the reference implementation that I linked (https://github.com/jina-ai/late-chunking/) implements late chunking only for 'retrieval.passage'. The adapted forward function is not used for encoding queries. This is just using
encode_queries
which is using the old forward function. Therefore we basically hard-coded to use the 'retrieval.passage' adapter: https://github.com/jina-ai/late-chunking/blob/1d3bb02bf091becd0771455e4e7959463935e26c/chunked_pooling/wrappers.py#L52 in the adjusted forward.