run Phi-2 on your CPU

#62
by J22 - opened

Use ChatLLM.cpp to run Phi-2 on you CPU now.

phi2.png

how about inference speed?

It is faster then larger models, just as expected.

Hi J22,

Thank you for your work.

I visited ChatLLM.cpp to try it out. To generate the quantized models in chatLLM.cpp I did the following:

python3 convert.py -i ~/.cache/huggingface/hub/models--microsoft--phi-2 -t q8_0 -o quantized.bin

But it didn't work.

I got this:

Traceback (most recent call last):
File "convert.py", line 345, in
class TikTokenizerVocab:
File "convert.py", line 354, in TikTokenizerVocab
def bpe(mergeable_ranks: dict[bytes, int], token: bytes, max_rank: Optional[int] = None) -> list[bytes]:
TypeError: 'type' object is not subscriptable

Can you please help?

Thank you!

G

That's weird. TikTokenizerVocab is invoked when qwen.tiktoken is found.

I suggest you to check which files are in ~/.cache/huggingface/hub/models--microsoft--phi-2. You can download all files from here except *.md, and try again.

Thank you for your reply but I've tried to do what you suggested without success. Could you please tell me specifically which phi-2 file or files from https://huggingface.co/microsoft/phi-2/tree/main should I give to the script convert.py?

Were you able to convert (quantize) the model in https://huggingface.co/microsoft/phi-2/tree/main? How did you do?

Thank you in advance!

  1. download all files from here ( *.md files are not needed).
  2. Let's say the files are located in path /path/to/phi2/files. Run convert.py like this:
python convert.py -i /path/to/phi2/files -o phi2.bin

@J22

i have an error with gelu_new

ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> ls -lhtr phi-2/
total 5.2G
-rw-rw-r-- 1 ubuntu ubuntu   74 Jan 11 22:13 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 9.1K Jan 11 22:13 configuration_phi.py
-rw-rw-r-- 1 ubuntu ubuntu  866 Jan 11 22:13 config.json
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 added_tokens.json
-rw-rw-r-- 1 ubuntu ubuntu 2.6K Jan 11 22:13 SECURITY.md
-rw-rw-r-- 1 ubuntu ubuntu 7.3K Jan 11 22:13 README.md
-rw-rw-r-- 1 ubuntu ubuntu 1.8K Jan 11 22:13 NOTICE.md
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 LICENSE
-rw-rw-r-- 1 ubuntu ubuntu  444 Jan 11 22:13 CODE_OF_CONDUCT.md
-rw-rw-r-- 1 ubuntu ubuntu 446K Jan 11 22:13 merges.txt
-rw-rw-r-- 1 ubuntu ubuntu   99 Jan 11 22:13 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu  62K Jan 11 22:13 modeling_phi.py
-rw-rw-r-- 1 ubuntu ubuntu  35K Jan 11 22:13 model.safetensors.index.json
-rw-rw-r-- 1 ubuntu ubuntu 7.2K Jan 11 22:13 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 2.1M Jan 11 22:13 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 780K Jan 11 22:13 vocab.json
-rw-rw-r-- 1 ubuntu ubuntu 538M Jan 11 22:13 model-00002-of-00002.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Jan 11 22:14 model-00001-of-00002.safetensors
ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> python3 convert.py -i phi-2 -o phi2.bin
Loading vocab file phi-2
vocab_size  50295
Traceback (most recent call last):
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1516, in <module>
    main()
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1422, in main
    Phi2Converter.convert(config, model_files, vocab, ggml_type, args.save_path)
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 459, in convert
    cls.dump_config(f, config, ggml_type)
  File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1161, in dump_config
    assert config.activation_function == 'gelu_new', "activation_function must be gelu_new"
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: activation_function must be gelu_new

Oh, they made so many updates.

https://huggingface.co/microsoft/phi-2/commit/cb2f4533604d8b67de604e7df03bfe6f3ca22869

I will update ChatLLM.cpp accordingly (hopefully next week). Or, you can download an elder revision.

J22 changed discussion title from run Phi-2 on you CPU to run Phi-2 on your CPU

@kirilligum ChatLLM.cpp now supports the latest review of Phi-2. You can pull the latest code of ChatLLM.cpp and try to convert it again.

This comment has been hidden

Thanks for the reference :)
Does the repo support loading a LoRA head I trained?

@talbaumel Sorry, it does not support LoRA at present.

Sign up or log in to comment