20B when Base Model is 14B?

#2
by mukaj - opened

I see the base model is 14B, wondering what the extra 6B params are for? Is this a VLM?

@mukaj Thanks for your interest in Sailor2!

We have utilized the model expansion for Qwen model before continual pre-training. Please refer to https://sea-sailor.github.io/blog/sailor2/#model-expansion for more details.
I will also attach the expansion details in README for better clarity :)

dreamerdeo changed discussion status to closed

Sign up or log in to comment