can I use task = "retrieval.query" with transformers library?

#92

by k0rruptt - opened Dec 3, 2024

Dec 3, 2024

hi, I would like to use task = "retrieval.query" lora adapter but I am not sure if I can with transformers or if I must use sentence transformers instead.

I had embedded my initial documents with transformers library - not sure if that also means that I have to go back and embed my documents with sentence transformers.

all help appreciated ty!

jupyterjazz

Jina AI org Dec 3, 2024

Hi @k0rruptt , yes you can use retrieval.query like this:

from transformers import AutoModel

model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
embeddings = model.encode('some text', task="retrieval.query")

if you used transformers without specifying a task it means you haven't used any adapters

k0rruptt

Dec 3, 2024

Hi @jupyterjazz - I didn't even think of trying this smh - thank for the response!

k0rruptt changed discussion status to closed Dec 3, 2024

sjoshi-wm

5 days ago

Hi Folks,
Sorry to dig up this thread again,
But i wanted to know how to use the retrieval.query lora adapter if im using the model in late-chunking mode?
as in, my inference code looks like this:

# Get all token embeddings using the model
with torch.no_grad():
        model_output = model(**tokens)

# Pass token embeddings to late chunking
late_chunking(model_output, [chunk_start_token, chunk_end_token ])

michael-guenther

Jina AI org 3 days ago

This is indeed a bit non-trivial. Here is some code example (that uses the retrieval.passage adapter):

import torch

from transformers import AutoModel, AutoTokenizer

TASK = 'retrieval.passage'
MODEL_NAME = 'jinaai/jina-embeddings-v3'

TEXT = """Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?'"""

model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

input = tokenizer(TEXT, return_tensors='pt')

task_id = model._adaptation_map[TASK]
num_examples = input['input_ids'].shape[0]
adapter_mask = torch.full((num_examples,), task_id, dtype=torch.int32, device=model.device)
model_output = model(**input, adapter_mask=adapter_mask)

# from here on you can apply late chunking

You can also take a look at this implementation in the late chunking repo: https://github.com/jina-ai/late-chunking/blob/1d3bb02bf091becd0771455e4e7959463935e26c/chunked_pooling/wrappers.py#L51-L57

Some remarks:

The first token embeddings are embeddings of an instruction because 'retrieval.query' and 'retrieval.passage' append some simple instruction. In the API and the experiments that we conducted for the late-chunking paper, we used the instruction tokens for the first chunk, but there is not a strong argument for or against using them.
Regarding the implemenation in the late-chunking github repository: We used late chunking for retrieval in our experiments. In this case you don't apply chunking on queries and therefore the reference implementation that I linked (https://github.com/jina-ai/late-chunking/) implements late chunking only for 'retrieval.passage'. The adapted forward function is not used for encoding queries. This is just using encode_queries which is using the old forward function. Therefore we basically hard-coded to use the 'retrieval.passage' adapter: https://github.com/jina-ai/late-chunking/blob/1d3bb02bf091becd0771455e4e7959463935e26c/chunked_pooling/wrappers.py#L52 in the adjusted forward.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment