python llama.cpp/convert-hf-to-gguf.py ~/sea-liton-7b/ error

#10
by pacozaa - opened

Hi Sea Lion team,
I tried to run

python llama.cpp/convert-hf-to-gguf.py ~/sea-liton-7b/

and I got this error

Traceback (most recent call last):
  File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 2099, in <module>
    main()
  File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 2086, in main
    model_instance.set_vocab()
  File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 73, in set_vocab
    self._set_vocab_gpt2()
  File "/Users/path/llama.cpp/convert-hf-to-gguf.py", line 238, in _set_vocab_gpt2
    vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))
                                               ^^^^^^^^^^^^^^^
AttributeError: 'SEABPETokenizer' object has no attribute 'vocab'

Please help

or with

python llama.cpp/convert.py sea-liton-7b/

I also got error

Loading model file sea-liton-7b/model-00001-of-00002.safetensors
Loading model file sea-liton-7b/model-00001-of-00002.safetensors
Loading model file sea-liton-7b/model-00002-of-00002.safetensors
Traceback (most recent call last):
  File "/Users/path/llama.cpp/convert.py", line 1486, in <module>
    main()
  File "/Users/path/llama.cpp/convert.py", line 1422, in main
    model_plus = load_some_model(args.model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 1291, in load_some_model
    model_plus = merge_multifile_models(models_plus)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 747, in merge_multifile_models
    model = merge_sharded([mp.model for mp in models_plus])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 726, in merge_sharded
    return {name: convert(name) for name in names}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 726, in <dictcomp>
    return {name: convert(name) for name in names}
                  ^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 701, in convert
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/path/llama.cpp/convert.py", line 701, in <listcomp>
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
                                      ~~~~~^^^^^^
KeyError: 'transformer.blocks.0.attn.Wqkv.bias'
AI Singapore org

Hi @pacozaa ,

We have just released a GGUF version of the sea-lion-7b-instruct model: https://huggingface.co/aisingapore/sea-lion-7b-instruct-gguf

hi @BryanSwk

I'd be incredibly grateful if you could share any resources or tips for running SeaLion successfully. it's crushed on both google collab and kaggle when i run downloading.

I've also read that the SeaLion-7B-Instruct model was primarily fine-tuned on English and Indonesian data. As I'm interested in working with another language text, I'm curious if you have any insights on whether SeaLion could be effective for that language as well.

Additionally, if I wanted to fine-tune SeaLion for Khmer or Thai, do you have any recommendations on how to approach that? I'm relatively new to this field, so any guidance you could provide would be immensely helpful.

Thanks so much for your time and expertise!

AI Singapore org

Dear @manuth ,

Thank you for your interest in SEA-LION.
May I check with you if you have any error message to share? Are you running the SEA-LION Base or Instruct version or are you running the gguf version?
Please note that if you are running SEA-LION-7B Base/Instruct version, you would need around 30GB of vram to load, this might not be possible on freemium compute like Kaggle and Colab where their available compute is very limited.

SEA-LION is pre-trained on 11 Southeast Asian languages, so fine tuning within these 11 languages apart from English and Indonesian will work well as well. Our Thai partners have successfully fine tuned SEA-LION for the Thai language which might be of useful reference to you. I've added a link here for your reference, https://huggingface.co/airesearch/WangchanLion7B. The training code is provided in the their Github page mentioned in the model card as well.

With regards to Khmer, there are very limited instruction tuning datasets available at the moment, hence we are unable to fine tuned for Khmer at the moment.

Hope this helps.

Hi @RaymondAISG and team

Thank you for your response and the helpful information. I initially faced a memory error with the SEA-LION Base on Google Colab, and switching to the SEA-LION-7B Instruct gguf version which didn’t resolve the issue for longer instruction prompts with the error message : ValueError: Requested tokens (870) exceed context window of 512

And Here are my current settings:
"""
model = LlamaCpp(
model_path="/kaggle/input/sea-lioin-7b-instruct-q5-k-m-gguf/sea-lion-7b-instruct-Q5_K_M.gguf",
temperature=0,
max_tokens=2000,
n_gpu_layers=-1,
n_batch=512,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
"""
The model doesn’t seem to load on GPU at all. Can you advise on optimizing GPU use and confirm if the 2000 token limit applies to the gguf version, also what is the max token and what is the recommended Quant method for the minimum quality loss ?

looking forward to hearing back from you
thanks

AI Singapore org

Hey @manuth ,

Heres a short code snippet that should work with llama-cpp-python.

import json

from llama_cpp import Llama


llm = Llama(model_path=MODEL_PATH,
            n_gpu_layers=32, #for gpu off-loading
            temperature=0,
            verbose=True,
)

prompt = """### USER:
Question: What are the names of the members of ASEAN? Answer:

### RESPONSE:

"""

output = llm(
    prompt,
    max_tokens=None,
    stop=["\n"],
    echo=True, #to echo prompt in output
)

print(json.dumps(output, indent=2))

For gpu off-loading, do ensure that you followed the installation steps from https://github.com/abetlen/llama-cpp-python for your gpu backend (CUDA/Metal/ROCm etc).

Thanks @BryanSwk just a few more questions :

  1. Can we also use the instructions in our prompt just like the prompt template of langchain ? Also,
  2. Can I know the Hardware and Quant method like in the attached image? I couldn't find the resource for that.
    q.png
AI Singapore org

Hi @manuth ,

Could you kindly give an example of what do you meant by "instructions in our prompt just like the prompt template of langchain?".

We currently do not provide the hardware requirement for the gguf models, but may do so at a later stage when compute is available.
For the Quant methods available, kindly refer to the model card for the gguf models as follows,
https://huggingface.co/aisingapore/sea-lion-7b-instruct-gguf

Hope this helps.

Hi @RaymondAISG

here you are :
from langchain.prompts import ChatPromptTemplate
from langchain.chains import SimpleSequentialChain

Initialize retriever and prompt

retriever = db.as_retriever()

template = """Answer the question provided and be sure to respond with 'I don't know' if the information is not available in the context provided:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

Define the chain steps

chain = SimpleSequentialChain(
steps=[
{"context": retriever, "question": RunnablePassthrough()},
prompt,
llm,
StrOutputParser()
]
)

Execute the chain

result = chain.run({"question": "What is the capital of France?"})
print(result)

Where instruction = Answer the question provided and be sure to respond with 'I don't know' if the information is not available in the context provided:

AI Singapore org

Hi @manuth ,

Thank you very much for sharing the example.
Yes, technically the instruction could still be use, however please note that SEA-LION-7b-Instruct is only fine tuned using the following template without the use of a system prompt,
prompt_template = "### USER:\n{human_prompt}\n\n### RESPONSE:\n"

Therefore, the use of additional system prompt might not result in the behaviour you are expecting.

For best results, it is recommended to fine tuned from the SEA-LION-7b base model with your choice of system prompts and template.
If using the SEA-LION-7b-Instruct model with LangChain, we highly recommend using the above prompt template, which was used to fine tuned SEA-LION, for the best results.

Hope this helps.

Thanks @RaymondAISG and team for ur supportive help and very insightful response ^^ .

Sign up or log in to comment