Experimental commander model V1.
Named it Zelensky in order to troll Uncle Elon on twitter over how bad Grok-2 is.
Training process, low 1 epoch learning rate and evolutionary-merged with the 3 other models(listed on modelcard)
Process repeated multiple times on 8x AMD Mi300 192GB gpus while also running gpqa_diamond_zeroshot on LM_Eval harness.
Thank you Vultr https://www.vultr.com/register/ for sponsoring the compute.
Qwen License still applies by default.
- Downloads last month
- 355
Model tree for nisten/zelensky-78b
Base model
Qwen/Qwen2.5-72B