Extra SLERP parameters
These are interesting SLERP directives you've used! I've tried your recipe with minor tweaks at sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental, using Arcee's mergekit-gui space. Any guesses how these SLERP merges will score?
These are interesting SLERP directives you've used! I've tried your recipe with minor tweaks at sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental, using Arcee's mergekit-gui space. Any guesses how these SLERP merges will score?
I must say that the new technologies used in these projects are truly impressive to me. I have just recently learned about them. In some of my new experimental projects like tempesthenno-nuslerp-001, I've drawn significant inspiration from your remarkable project Lamarck-14B-v0.6, and I believe @bamec66557 's Qwen-2.5-14B-MINUS will also serve as a role model for my learning in the next steps.
However, I have some personal concerns. In an era where computational costs are consistently decreasing, can we push the boundaries even further? While @arcee-ai's research and work are highly valuable references, I'm concerned their approach may eventually reach an optimizable limit (regardless of evaluation methods) in terms of "real performance" β perhaps we're already approaching that edge, at least for 14B models. Thus, what will be our next direction for advancement β Reinforcement Learning, or perhaps just expanding model size (personally, I don't think this is a reliable approach, since once the size has been increased, we likely have no way to scale it back down)?
[UPDATE] 2024-01-22, perhaps I need to @sometimesanotion :)