run model in colab using 8 bit
Im trying to run the model using the 8 bit library
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto",torch_dtype=torch.bfloat16, load_in_8bit=True)
the model gets loaded and returns output, but the return value is some kind of gibberish,
did some one have success with the 8 bit library ?
This is expected as float16
does not work either on this model. We are investigating this!
Also, note that this happens only for xxl
model, for other models the int8
quantization works as expected
Probably related to this https://discuss.huggingface.co/t/mixed-precision-for-bfloat16-pretrained-models/5315
I tested the xl
one using float16
and int8
and it does not work as expected (gibberish). However, it works like a charm in fp32
@mrm8488 can you pls post your model config
@mrm8488 can you pls post your model config
It is the config you can find in the repo: https://huggingface.co/google/flan-t5-xl/blob/main/config.json
Anyone here able to run Flan-T5-XL on colab? I tried 8bit and got junk results.
can you try with the recent release of transformers pip install -U transformers
+ use 4bit instead (just pass load_in_4bit=True
)