there is no tokenizer.model file
Can you please provide this file?
an error happened when I try to convert a finetuned model to gguf.
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1278, in set_vocab
self. _set_vocab_sentencepiece()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 567, in _set_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: model/tokenizer.model
We convert the model to GGUF using the following commands, which might not raise such errors:
https://github.com/ggerganov/llama.cpp/discussions/2948#discussion-5580716
https://github.com/ggerganov/llama.cpp/discussions/2948#discussioncomment-6889679
We also have provided GGUF models as follows:
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-4bit
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16
@shenzhi-wang
many thanks for your quick response, but I didn't find any commands from your links.
I mean I finetuned your model by Unsloth with my own dataset. but Unsloth goes error when converts finetuned model to GGUF.
Did you encounter such errors before?
Unsloth logs:
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\ /| [0] Installing llama.cpp will take 3 minutes.
O^O/ _/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
\ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
"-____-" In total, you will have to wait around 26 minutes.
Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: We must use f16 for non Llama and Mistral models.
Unsloth: [1] Converting model at model into f16 GGUF format.
The output location will be ./model-unsloth.F16.gguf
This will take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:hf-to-gguf:
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert-hf-to-gguf-update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: c136ed14d01c2745d4f60a9596ae66800e2b61fa45643e72436041855ad4089d
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1278, in set_vocab
self. _set_vocab_sentencepiece()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 567, in _set_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: model/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1281, in set_vocab
self._set_vocab_llama_hf()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 625, in _set_vocab_llama_hf
vocab = LlamaHfVocab(self.dir_model)
File "/home/xxx/jupyter/Ollama/llama.cpp/convert.py", line 577, in init
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 2546, in
main()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 2531, in main
model_instance.set_vocab()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1284, in set_vocab
self._set_vocab_gpt2()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 494, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 381, in get_vocab_base
tokpre = self.get_vocab_base_pre(tokenizer)
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 486, in get_vocab_base_pre
raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()
Unsloth: Conversion completed! Output location: ./model-unsloth.F16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
main: build = 2887 (583fd6b0)
main: built with gcc (GCC) 10.2.0 for x86_64-redhat-linux
main: quantizing './model-unsloth.F16.gguf' to './model-unsloth.Q4_K_M.gguf' as Q4_K_M using 192 threads
gguf_init_from_file: invalid magic characters '
'
llama_model_quantize: failed to quantize: llama_model_loader: failed to load model from ./model-unsloth.F16.gguf
main: failed to quantize model from './model-unsloth.F16.gguf'
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/finetune.py", line 179, in
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
File "/home/xxx/.conda/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 1381, in unsloth_save_pretrained_gguf
file_location = save_to_gguf(model_type, is_sentencepiece_model,
File "/home/xxx/.conda/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 1045, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
My links provided to you indeed include the instructions on how to convert models to GGUF. Please check them again carefully.
As for your error messages, I suggest that you'd better convert the models to GGUF with BpeVocab.
And this might be helpful to you:
https://github.com/ggerganov/llama.cpp/issues/3256#issuecomment-1726639646
We have not changed the tokenizer, so you can use the tokenizer file of the original llama3-8b-instruct.
hi
May be you should use the conver.py like this:
python llm/llama.cpp/convert.py --outtype f16 --outfile