Thank you for your model!
I really appreciate this model!
Ive really been looking forward to it as I think its one step forward to advancing Mixtral, and Ai overall. I plan on using this to create the next version of my Open_Gpt4 series by merging this bagel model with mixtral-instruct. I hope the results are good!
DPO COOL
I plan on using this to create the next version of my Open_Gpt4 series by merging this bagel model with mixtral-instruct. I hope the results are good!
Awesome, let me know the results!
Thank you for this model, I would like to know how you did fine tune since I understand that mixtral models are difficult to fine tune, thank you.
So far the model is pretty promising, ive been testing my gguf quant and although its still not as good as gpt-4, its getting closer to the quality. I want to say it might even be better than base mixtral-instruct but that only a guess based on some limited testing, more testing as well as the score on open llm leaderboard is required to validate that claim. You can find it here
And my gguf quant will be here by the end of today or early tomorrow depending on when it gets done uploading:
Thank you for this model, I would like to know how you did fine tune since I understand that mixtral models are difficult to fine tune, thank you.
I used my fork of qlora here: https://github.com/jondurbin/qlora, with the configuration you can find on weights and biases:
https://wandb.ai/jondurbin/bagel-8x7b-v0.2/runs/agxjjdso/overview?workspace=user-jondurbin
I used the latest main branch of transformers, but at the time these had not yet been merged, so I pulled them in manually:
https://github.com/huggingface/transformers/pull/28115
https://github.com/huggingface/transformers/pull/28256
I think now, if you build transformers from source using the latest main checkout, it should be somewhat fixed, although the mistral folks in discord did say the implementation is wrong so any fine tunes of mixtral are probably suboptimal right now (regardless of how good they may do, they should be better).
So far the model is pretty promising, ive been testing my gguf quant and although its still not as good as gpt-4, its getting closer to the quality. I want to say it might even be better than base mixtral-instruct but that only a guess based on some limited testing, more testing as well as the score on open llm leaderboard is required to validate that claim. You can find it here
And my gguf quant will be here by the end of today or early tomorrow depending on when it gets done uploading:
Awesome!
@jondurbin Do you have plans to run a fine-tune on top of Mixtral-8x7B-Instruct? That model has already been instruction fine-tuned but there are likely many things in the bagel dataset it hasn't seen that would improve performance.
@jondurbin Do you have plans to run a fine-tune on top of Mixtral-8x7B-Instruct? That model has already been instruction fine-tuned but there are likely many things in the bagel dataset it hasn't seen that would improve performance.
I may, but probably not until we can confirm the issues with the mixtral implementation in HF are fixed. It appears there are discrepancies, as hinted by the mistral folks in discord, but sadly they refuse to help correct or even identify the issue.
@jondurbin My recommendation as Ive stated in my write up is to basically forget mistralai and their models and start creating out own using my techniques and mergekit. Make your own base models like how i made mine, and train on top of those.
My model:
https://huggingface.co/rombodawg/Everyone-Coder-4x7b-Base
My write up:
https://docs.google.com/document/d/1_vOftBnrk9NRk5h10UqrfJ5CDih9KBKL61yvrZtVWPE/edit?usp=sharing
@jondurbin Makes sense to wait. I do think it would be very interesting to see given how promising your bagel fine-tune on Yi was and the strength of this fine-tune as well. Too bad your hardware restricts you to LORA rather than a full fine-tune :(
@jondurbin
Thanks a lot!
Speaking of hardware, what do you use to finetune/run your models?