Loading Error: BlackSamorez/Mixtral-8x7B-Instruct-v0.1-AQLM-2Bit-1x16-hf Model in Colab

#2
by lxyuan - opened

I'm encountering a ModuleNotFoundError when attempting to load a quantized model using the transformers library in a Google Colab notebook. The issue arises specifically when loading the model BlackSamorez/Mixtral-8x7B-Instruct-v0.1-AQLM-2Bit-1x16-hf with the provided code snippet. However, loading a different model ("BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf") does not produce any errors.

Code to Reproduce

from transformers import AutoTokenizer, AutoModelForCausalLM

quantized_model = AutoModelForCausalLM.from_pretrained(
    "BlackSamorez/Mixtral-8x7B-Instruct-v0.1-AQLM-2Bit-1x16-hf",
    trust_remote_code=True, torch_dtype="auto", device_map="cuda"
).cuda()

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")

Error Message

ModuleNotFoundError: No module named 'transformers_modules.BlackSamorez.Mixtral-8x7B-Instruct-v0'

Additional Context

For reference, here's the output of !pip freeze relevant to this issue:

accelerate @ git+https://github.com/huggingface/accelerate.git@97d2168e5953fe7373a06c69c02c5a00a84d5344
aqlm==1.0.1
google-cloud-translate==3.11.3
transformers @ git+https://github.com/huggingface/transformers.git@864c8e6ea31e2e9671cd34e1febd889f5e8d9150

Appreciate any guidance or assistance in resolving this issue. Thank you!

IST Austria Distributed Algorithms and Systems Lab org
β€’
edited Feb 19, 2024

It looks like transformers incorrectly handles custom models with . in their name.
It kind of renders this model unloadable. I'll think of ways to better solve it.

IST Austria Distributed Algorithms and Systems Lab org

I've decided to replace the dot in the name with an underscore. Please update the name and it should be ok.

Thanks for the update. I can confirm the model now works as expected with the name change.

However, I noticed a discrepancy in the torch_dtype configuration for BlackSamorez/Mixtral-8x7B-Instruct-v0_1-AQLM-2Bit-1x16-hf, which is set to "float32", unlike other similar models that use "float16" (e.g., BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch and BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf). This difference is detailed here.

This setting means we may need to explicitly set torch_dtype=torch.float16 when running this model on a GPU for consistency and efficiency. Could you please check and confirm if this setting was intentional?

Here is the code snippet:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

quantized_model = AutoModelForCausalLM.from_pretrained(
    "BlackSamorez/Mixtral-8x7B-Instruct-v0_1-AQLM-2Bit-1x16-hf",
    trust_remote_code=True, torch_dtype=torch.float16, 
    device_map="cuda"
).cuda()
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")

input_ids = tokenizer("I'm AQLM, ", return_tensors="pt")["input_ids"].cuda()
output = quantized_model.generate(input_ids, min_length=128, max_length=128)
print(tokenizer.decode(output[0]))

>>> <s> I'm AQLM, 20 years old, and I'm a rapper from the Bronx. I've been rapping since I was 13 years old, and I've been performing since I was 15. I've opened for artists like French Montana, Jadakiss, and 50 Cent. I've also performed at venues like the Apollo Theater and the Barclays Center. I'm currently working on my debut album, which will be released in 2022. I'm also working on a mixtape that will be released in 202
IST Austria Distributed Algorithms and Systems Lab org

Hi again! Thanks for noticing!
Indeed, the model shall be used in float16 precision. We messed up the data types when converting to safetensors.
The downsides are that torch_dtype='auto' will set wrond dtype and even with torch_dtype=torch.float16 the weights will have to be converted each time the model is loaded which is kind of slow.
I'll try and fix the dtype soon.

Got it, thanks for the quick update!
Since my original issue with loading the model has been resolved, I'll go ahead and close the issue.

Appreciate your efforts in looking into the dtype matter as well.

lxyuan changed discussion status to closed

Sign up or log in to comment