20B when Base Model is 14B?
#2
by
mukaj
- opened
I see the base model is 14B, wondering what the extra 6B params are for? Is this a VLM?
@mukaj Thanks for your interest in Sailor2!
We have utilized the model expansion for Qwen model before continual pre-training. Please refer to https://sea-sailor.github.io/blog/sailor2/#model-expansion for more details.
I will also attach the expansion details in README for better clarity :)
dreamerdeo
changed discussion status to
closed