How do I convert flan-t5-large model to GGUF? Already tried convert.py from llama.cpp
Hello, I am very interested in the new GGUF format for inferencing LLMs.
I am currently trying to convert flan-t5-large model at https://huggingface.co/google/flan-t5-large/tree/main, to GGUF format.
I have tried building the llama.cpp and then running the convert.py but getting unknown format.
Loading model file /Users/nakella/MyWorkspace/Experiments/llama.cpp/models/flan-t5-large/pytorch_model.bin
Traceback (most recent call last):
File "convert.py", line 1208, in
main()
File "convert.py", line 1149, in main
model_plus = load_some_model(args.model)
File "convert.py", line 1069, in load_some_model
models_plus.append(lazy_load_file(path))
File "convert.py", line 763, in lazy_load_file
raise ValueError(f"unknown format: {path}")
ValueError: unknown format: /Users/nakella/MyWorkspace/Experiments/llama.cpp/models/flan-t5-large/pytorch_model.bin
Can @TheBloke @RonanMcGovern you please kindly help me with this. Is there any other way?
Try the colab referenced here
@RonanMcGovern I have referred to the colab notebook. But in my case I am trying to convert a AutoModelForSeq2SeqLM flan-t5-large model.
model_name = 'google/flan-t5-large'
model = AutoModelForSeq2SeqLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map='cpu',
offload_folder='offload',
cache_dir=cache_dir
)
It is failing at !python convert.py models/
ERROR:
Loading model file models/pytorch_model.bin
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1208, in <module>
main()
File "/content/llama.cpp/convert.py", line 1157, in main
params = Params.load(model_plus)
File "/content/llama.cpp/convert.py", line 288, in load
params = Params.loadHFTransformerJson(model_plus.model, hf_config_path)
File "/content/llama.cpp/convert.py", line 203, in loadHFTransformerJson
n_embd = config["hidden_size"]
KeyError: 'hidden_size'
Yes @RonanMcGovern , So is there any way to convert it to GGUF in any other way? Or have you come across any repo or snippet that could help? or if not how to approach this. Thank you.
Maybe post an issue in Llama cpp and ask for guidance on how to approach it.