Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.
I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf
so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.
474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.
I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).
So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
457.4g after warming up.
So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)
llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1
and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?
I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.
dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:
Yes it is clearly streaming from SSD now:
Once the quantisation tasks are interrupted it should work without SSD streaming again.
it's also possible that the rate limit is not strictly implemented as a per-account rate limit. maybe it's just not reliable, just like anything else they implemented :)
I should try contacting them lol. What should I write haha? Im not the best at email writing, so would appreciate if you could draft it =)
or I can try contacting them elsewhere I can have contact with them
i hope they didn't further restrict uploads, or repoc reations :/
I don't think it changed since it got introduced. They for sure wouldn't introduce such changes during the Christmas/new year holiday period where most of their developers are on holiday.
especially as nico is paused
When I paused nico1 today for the performance measurement project I got the following error but it all seem to work despite this:
./nico1-pause: line 19: /root/s2/llmjob: No such file or directory
I checked and was able to confirm that the entire "s2" folder is missing. Only thing that didn't work was unfreezing and completing the frozen task but not important as I don't intend on rebooting this time. Let's just hope they don't automatically start as long nico1 is paused.
140+ 14 CosmicNoodle-7B blocked/imatrix/gpu
Any idea what this means? I saw simular blocked satuses for the entire day before I paused nico1.
I checked and was able to confirm that the entire "s2" folder is missing.
Right, everything is now in /llmjob, rather than splattered over the system. I forgot to update the script(s). Will update them.
All you missed out on was resuming the frozen/Stopped quantize jobs, so they didn't interrupt and didn't exit.
140+ 14 CosmicNoodle-7B blocked/imatrix/gpu
The status of jobs does not update when paused, so this is simply the last status update. I think :) If it does not clear up when resumed, I will have to have a look.
It might also be that the job has failed somehow, but didn't have an exit status. In that case, the job scheduler doesn't know what to do and just ignores it. (Well, it might actually block a gpu in that case, but that isn't the case here).
nico1 is now unpaused.
-2000 360 si falcon-180B
-2000 236 si goliath-120b
Nice I see you queued falcon-180B and goliath-120b. I hope you are not just adding the missing static quants but will also requantizing the already existing imatrix quants. I definitely want to give falcon-180B another try. I remembered how I excited I was when it released as it was the biggest openly released LLM at that time but then the model turned out to be quite underwhelming but maybe with modern CoT prompting techniques and better system prompts this almost forgotten base model can be of use. While finetunes are nice in the end base models contains the knowledge I seek to extract and so are of much greater value.
Edit: Seams like it is requesting the existing imatrix quants. How awesome!
-999 205 I Llama-3-Motif-102B error/134 12/24,IQ1_M [691/867]
What a strange error - not something I've ever seen before but you might be familiar with it. So strange how all the other quants so far worked.
[ 691/ 867] blk.76.attn_q.weight - [ 9216, 9216, 1, 1], type = f16, converting to iq1_m .. /root/cvs/llama.cpp-cuda512/ggml/src/ggml-quants.c:4453: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed
Nice I see you queued falcon-180B and goliath-120b. I hope you are not just adding the missing static quants but will also requantizing the already existing imatrix
I was missing the static quants only (and incidentally, any missing imatrix ones). I was also so disappointed in falcon-180b. Anyway, I'll redo the imatrix ones then, too, then.
error/134
That is the process exit code, in this case, ABRT: ggml-quants.c:4453: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed