python llama.cpp/convert-hf-to-gguf.py ~/sea-liton-7b/ error
Hi Sea Lion team,
I tried to run
python llama.cpp/convert-hf-to-gguf.py ~/sea-liton-7b/
and I got this error
Traceback (most recent call last):
File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 2099, in <module>
main()
File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 2086, in main
model_instance.set_vocab()
File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 73, in set_vocab
self._set_vocab_gpt2()
File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 238, in _set_vocab_gpt2
vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
AttributeError: 'SEABPETokenizer' object has no attribute 'vocab'
Please help
or with
python llama.cpp/convert.py sea-liton-7b/
I also got error
Loading model file sea-liton-7b/model-00001-of-00002.safetensors
Loading model file sea-liton-7b/model-00001-of-00002.safetensors
Loading model file sea-liton-7b/model-00002-of-00002.safetensors
Traceback (most recent call last):
File "/Users/path/llama.cpp/convert.py", line 1486, in <module>
main()
File "/Users/path/llama.cpp/convert.py", line 1422, in main
model_plus = load_some_model(args.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 1291, in load_some_model
model_plus = merge_multifile_models(models_plus)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 747, in merge_multifile_models
model = merge_sharded([mp.model for mp in models_plus])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 726, in merge_sharded
return {name: convert(name) for name in names}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 726, in <dictcomp>
return {name: convert(name) for name in names}
^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 701, in convert
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/path/llama.cpp/convert.py", line 701, in <listcomp>
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
~~~~~^^^^^^
KeyError: 'transformer.blocks.0.attn.Wqkv.bias'
Hi @pacozaa ,
We have just released a GGUF version of the sea-lion-7b-instruct model: https://huggingface.co/aisingapore/sea-lion-7b-instruct-gguf
hi @BryanSwk
I'd be incredibly grateful if you could share any resources or tips for running SeaLion successfully. it's crushed on both google collab and kaggle when i run downloading.
I've also read that the SeaLion-7B-Instruct model was primarily fine-tuned on English and Indonesian data. As I'm interested in working with another language text, I'm curious if you have any insights on whether SeaLion could be effective for that language as well.
Additionally, if I wanted to fine-tune SeaLion for Khmer or Thai, do you have any recommendations on how to approach that? I'm relatively new to this field, so any guidance you could provide would be immensely helpful.
Thanks so much for your time and expertise!
Dear @manuth ,
Thank you for your interest in SEA-LION.
May I check with you if you have any error message to share? Are you running the SEA-LION Base or Instruct version or are you running the gguf version?
Please note that if you are running SEA-LION-7B Base/Instruct version, you would need around 30GB of vram to load, this might not be possible on freemium compute like Kaggle and Colab where their available compute is very limited.
SEA-LION is pre-trained on 11 Southeast Asian languages, so fine tuning within these 11 languages apart from English and Indonesian will work well as well. Our Thai partners have successfully fine tuned SEA-LION for the Thai language which might be of useful reference to you. I've added a link here for your reference, https://huggingface.co/airesearch/WangchanLion7B. The training code is provided in the their Github page mentioned in the model card as well.
With regards to Khmer, there are very limited instruction tuning datasets available at the moment, hence we are unable to fine tuned for Khmer at the moment.
Hope this helps.
Hi @RaymondAISG and team
Thank you for your response and the helpful information. I initially faced a memory error with the SEA-LION Base on Google Colab, and switching to the SEA-LION-7B Instruct gguf version which didn’t resolve the issue for longer instruction prompts with the error message : ValueError: Requested tokens (870) exceed context window of 512
And Here are my current settings:
"""
model = LlamaCpp(
model_path="/kaggle/input/sea-lioin-7b-instruct-q5-k-m-gguf/sea-lion-7b-instruct-Q5_K_M.gguf",
temperature=0,
max_tokens=2000,
n_gpu_layers=-1,
n_batch=512,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
"""
The model doesn’t seem to load on GPU at all. Can you advise on optimizing GPU use and confirm if the 2000 token limit applies to the gguf version, also what is the max token and what is the recommended Quant method for the minimum quality loss ?
looking forward to hearing back from you
thanks
Hey @manuth ,
Heres a short code snippet that should work with llama-cpp-python.
import json
from llama_cpp import Llama
llm = Llama(model_path=MODEL_PATH,
n_gpu_layers=32, #for gpu off-loading
temperature=0,
verbose=True,
)
prompt = """### USER:
Question: What are the names of the members of ASEAN? Answer:
### RESPONSE:
"""
output = llm(
prompt,
max_tokens=None,
stop=["\n"],
echo=True, #to echo prompt in output
)
print(json.dumps(output, indent=2))
For gpu off-loading, do ensure that you followed the installation steps from https://github.com/abetlen/llama-cpp-python for your gpu backend (CUDA/Metal/ROCm etc).
Hi @manuth ,
Could you kindly give an example of what do you meant by "instructions in our prompt just like the prompt template of langchain?".
We currently do not provide the hardware requirement for the gguf models, but may do so at a later stage when compute is available.
For the Quant methods available, kindly refer to the model card for the gguf models as follows,
https://huggingface.co/aisingapore/sea-lion-7b-instruct-gguf
Hope this helps.
Hi @RaymondAISG
here you are :
from langchain.prompts import ChatPromptTemplate
from langchain.chains import SimpleSequentialChain
Initialize retriever and prompt
retriever = db.as_retriever()
template = """Answer the question provided and be sure to respond with 'I don't know' if the information is not available in the context provided:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
Define the chain steps
chain = SimpleSequentialChain(
steps=[
{"context": retriever, "question": RunnablePassthrough()},
prompt,
llm,
StrOutputParser()
]
)
Execute the chain
result = chain.run({"question": "What is the capital of France?"})
print(result)
Where instruction = Answer the question provided and be sure to respond with 'I don't know' if the information is not available in the context provided:
Hi @manuth ,
Thank you very much for sharing the example.
Yes, technically the instruction could still be use, however please note that SEA-LION-7b-Instruct is only fine tuned using the following template without the use of a system prompt,prompt_template = "### USER:\n{human_prompt}\n\n### RESPONSE:\n"
Therefore, the use of additional system prompt might not result in the behaviour you are expecting.
For best results, it is recommended to fine tuned from the SEA-LION-7b base model with your choice of system prompts and template.
If using the SEA-LION-7b-Instruct model with LangChain, we highly recommend using the above prompt template, which was used to fine tuned SEA-LION, for the best results.
Hope this helps.
Thanks @RaymondAISG and team for ur supportive help and very insightful response ^^ .