Pipelines fail with batch > 1

#1
by bpop - opened

Hello everyone,

Using pipelines for inference is currently broken for batch sizes greater than 1. For example, you cannot do this:

model = AutoModelForCausalLM.from_pretrained("Unbabel/TowerBase-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("Unbabel/TowerBase-7B-v0.1")
pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            batch_size=2
)
examples = ["English: My name is TowerBase.\nPortuguese:", "English: These are my friends, NLLB, GPT, and wFST.\nPortuguese:"]
out = pipe(examples)

The issue seems to be related to a mismatch between the vocabulary the tokenizer uses and what is expected by the model. The tokenizer uses 32004 as its pad_id, but the vocab size is only 32000. Passing a padded batch of mixed-length sequences consequently produces an indexing error in TowerBase's embedding layer.

I worked around this issue by setting tokenizer.pad_token_id = tokenizer.eos_token_id. I assume that TowerInstruct has the same bug and short-term fix, but I haven't tested it yet.

Unbabel org

Thanks for raising this. It should now be fixed, without needing to explicitly set tokenizer.pad_token_id = tokenizer.eos_token_id. (https://huggingface.co/Unbabel/TowerBase-7B-v0.1/commit/2837006f6f8e9ed6e637ab8fcf9a6bf22e31e4d8)

jmprcp changed discussion status to closed

Sign up or log in to comment