Can't get Tokenizer on Windows (CPU)

#1
by borodache - opened

Hi All,
This is my code:
model_name="dicta-il/dictalm2.0-instruct-GGUF"

self.tokenizer = AutoTokenizer.from_pretrained(model_name)
#self.tokenizer = LlamaTokenizerFast.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model = self.model.to(self.device)

However, I get this error:
OSError: Can't load tokenizer for 'dicta-il/dictalm2.0-instruct-GGUF'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'dicta-il/dictalm2.0-instruct-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

Help will be highly appreciated.

Thanks in Advance

DICTA: The Israel Center for Text Analysis org
edited 19 days ago

The GGUF format must be loaded using a supported framework such as llama.cpp, Ollama, LM Studio.
You can check the example code + instructions listed on each model page to load the model correctly and to figure out which is the correct model to use.

I didn't see any example anywhere, that was my problem... However, eventually I managed to find the solution:

model_name = "dicta-il/dictalm2.0-instruct-GGUF",
self.tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictalm2.0-instruct")
self.model = Llama.from_pretrained(repo_id=model_name,
filename="dictalm2.0-instruct.Q4_K_M.gguf",
n_gpu_layers=0,
n_threads=multiprocessing.cpu_count(),
embedding=True,
verbose=False)

It also required installing microsoft visual C++ and the python package llama-cpp-python

Hi @Shaltiel ,
It seems I was happy too early... the tokenizer always yields the same tokens no matter what is the input text, and as a result I always get the same vector embedding. I also tried the model dicta-il/dictalm2.0-GGUF.
But, I encounter the same scenario. Here is my encoding method:
def encode(self, sentence):
# Tokenize the sentence
print("sentence:", sentence)
tokens = self.tokenizer(sentence)

    print("tokens:", tokens[:10])
    # Generate embeddings
    embeddings = self.model.embed(tokens)[0][0]

    print("embeddings:", embeddings[:10])

    return embeddings

and here are a few of my outputs:
sentence: מאיזה גיל מומלץ להתחיל לצחצח שיניים אצל ילדים?
tokens: [Encoding(num_tokens=22, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]
sentence: כואב לי מאד בעת שתיית מים קרים. מה עלי לעשות?
tokens: [Encoding(num_tokens=21, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]
sentence: שלום,
אני מחפשת רופא שיניים טוב בתל אביב שהוא חבר הר"ש. היכן אני יכולה לראות את רשימת הרופאים בתל אביב?
tokens: [Encoding(num_tokens=43, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]

Any help will be highly appreciated...

Thanks in advance,
Eli

DICTA: The Israel Center for Text Analysis org

Hi Eli,

As I said previously, the GGUF format should be used with the library llama.cpp, or in any of the frameworks which use it such as https://ollama.com/ and https://lmstudio.ai/. The transformers library support for the GGUF format is very exploratory, and isn't officially fully supported yet: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers.

If you wish to use it in code, I recommend choosing a different format from https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27 and using the code examples listed on the page of the chosen model.

DICTA: The Israel Center for Text Analysis org

Hi Eli,

As I said previously, the GGUF format should be used with the library llama.cpp, or in any of the frameworks which use it such as https://ollama.com/ and https://lmstudio.ai/. The transformers library support for the GGUF format is very exploratory, and isn't officially fully supported yet: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers.

If you wish to use it in code, I recommend choosing a different format from https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27 and using the code examples listed on the page of the chosen model.

Hi @Shaltiel ,
I have followed all your instructions as you described before (I have installed C++ and llama.cpp). I have also tried this link: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers . However, I don't want to use a different model...
Is there a different model which enables me to get embeddings of the text (an encoder)? Preferably using CPU, but if it will require a GPU I will find a way to use it...

Thanks in Advance,
Eli

DICTA: The Israel Center for Text Analysis org

Hi @Shaltiel ,
Thanks a lot, I had tried one of them (large-heq), and it is indeed really impressive. However, I need it to the medical domain... Is there a specific model for this field? Sorry for being a "pain in the ass"... Your help would be highly appreciated...

Best,
Eli

Sign up or log in to comment