Why did Model size increase when applying PoSE?

#1
by sanjeev-bhandari01 - opened

I notice the increase in model size of LLaMA 2. What is reason behind it.

What part of architecture of LLaMA is change which could increase model size by large margin?

Hi @sanjeev-bhandari01 , thanks for your attention in this work. However, I don't think PoSE will increase model size. It just changes the position ids and rope_base during continual pre-training phase. In this repo, pytorch_model-00001/2/3-of-00003.bin add up to approximately 28G, which is very reasonable for a 7B model, as each parameter takes 4 bytes when torch_dtype in the config file is set to float32. Looking forward to your reply :-)

Hi @dwzhu , understood. So all the model parameters are in float32.

I'm a bit unsure about the process. Should I infer with this model directly loaded from AutoModelForCausalLM and perform the usual inference, or should I first modify the config to set the context length to 16k for inference?

To explore this, I attempted to load the model in Colab(free version) using fp4 quantization and performed the usual inference without modifying the config. However, I encountered a CUDA out-of-memory error when trying to infer the context of 6300 tokens.

Hi @sanjeev-bhandari01 , since the HF implementation of rope scaling is slightly different now compared with when this work is done, I think directly load from AutoModelForCausalLM will not work. Maybe you can find some examples of testing this model here. Basically, it uses pose_modeling_llama.py to define model behaviors, which have integrated xformers to avoid OOM in self-attention module.

Ok thanks a lot @dwzhu , I will look into it.

sanjeev-bhandari01 changed discussion status to closed

Sign up or log in to comment