Batch_size
#44
by
lukelv
- opened
Hi Author,
I am glad to use your model, and I tried to evaluate with MTEB on Classification as well as Retrieval. I recognized that in the encode function, for me, it should provide **kwargs
to apply all input that we cannot expect from many version models based on Mistrial. Here is my update in modeling_nvembed.py
def encode(self, prompts: List[str], instruction: str="", max_length: int=4096, **kwargs):
if self.padding_side == "right" and self.is_mask_instruction == True and len(instruction) > 0:
instruction_lens = len(self.tokenizer.tokenize(instruction))
else:
instruction_lens = 0
device = next(self.embedding_model.parameters()).device
batch_dict = input_transform_func(self.tokenizer,
{"input_texts": [prompt for prompt in prompts]},
always_add_eos=True,
max_length=max_length,
instruction=instruction)
features = self.prepare_kwargs_from_batch(batch_dict, instruction_lens, device=device)
return self(features)
Furthermore, maybe, I cannot find out where you separate your input data to batch, and some libraries for evaluation require batch_size because of large data, so please update this.