Train Mistral 7B 0.2
Why don't you guys train mistral 7b 0.2 which has 32k context length on long context as well as short? Long Context datasets such as:
- wckwan/M4LE
- THUDM/LongBench
- togethercomputer/Long-Data-Collections
or maybe your own long context curated ones.
Yea i agree, I was considering using this model in a mixtral-merge becuase of the scores but it would difficult considering the context constraints of only 8k. Making any other mistral model in the merge be limited to 8k despite being able to produce 32k tokens of content.
+1
I would say that the Mistral 7B V0.2 is not a pretrained model, but an instruction-tuned one, and therefore already has a bias towards the finetuning phase. For complete control over the model's performance, it is best to start from a pretrained model. That may be why.
+1
nvm
and i think they should definitely go beyond 7B parameters with openchat!
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 is fine-tuned on the base model mistral-7B-v0.2
, which is now officially made available by Mistral AI:
mistral-7B-v0.2
- https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar (PyTorch)
- https://huggingface.co/alpindale/Mistral-7B-v0.2-hf (Safetensors)
- https://huggingface.co/bartowski/Mistral-7B-v0.2-hf-GGUF (GGUF)
I would love to see an OpenChat fine-tune based on mistral-7B-v0.2
with a 32k context length.
OpenChat team, I Depth Up-Scaled Mistral-7B-v0.2, following UpStageβs paper: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, if you want to train OpenChat on a slightly bigger model.
Joseph717171/Mistral-10.7B-v0.2
- 32K Context Window
- π« Sliding Window Attention
@Joseph717171 Your too late bro, they dont care
https://huggingface.co/openchat/openchat-3.5-0106-gemma/discussions/4
Oh, well it was worth a shot. π