Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training

by mradermacher - opened 30 days ago

Owner 30 days ago

•

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

nicoboss

30 days ago

-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.

nicoboss

30 days ago

•

edited 30 days ago

I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.

474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

mradermacher

Owner 29 days ago

I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.

nicoboss

29 days ago

•

edited 29 days ago

I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).

mradermacher

Owner 29 days ago

So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

mradermacher

Owner 29 days ago

457.4g after warming up.

nicoboss

29 days ago

•

edited 29 days ago

So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)

llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1 and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?

I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.

dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:

Yes it is clearly streaming from SSD now:

Once the quantisation tasks are interrupted it should work without SSD streaming again.

119 hidden messages

Expand all

mradermacher

Owner 1 day ago

it's also possible that the rate limit is not strictly implemented as a per-account rate limit. maybe it's just not reliable, just like anything else they implemented :)

RichardErkhov

1 day ago

I should try contacting them lol. What should I write haha? Im not the best at email writing, so would appreciate if you could draft it =)

RichardErkhov

1 day ago

or I can try contacting them elsewhere I can have contact with them

nicoboss

1 day ago

•

edited 1 day ago

i hope they didn't further restrict uploads, or repoc reations :/

I don't think it changed since it got introduced. They for sure wouldn't introduce such changes during the Christmas/new year holiday period where most of their developers are on holiday.

especially as nico is paused

When I paused nico1 today for the performance measurement project I got the following error but it all seem to work despite this:

./nico1-pause: line 19: /root/s2/llmjob: No such file or directory

I checked and was able to confirm that the entire "s2" folder is missing. Only thing that didn't work was unfreezing and completing the frozen task but not important as I don't intend on rebooting this time. Let's just hope they don't automatically start as long nico1 is paused.

140+ 14 CosmicNoodle-7B blocked/imatrix/gpu

Any idea what this means? I saw simular blocked satuses for the entire day before I paused nico1.

mradermacher

Owner 1 day ago

I checked and was able to confirm that the entire "s2" folder is missing.

Right, everything is now in /llmjob, rather than splattered over the system. I forgot to update the script(s). Will update them.

All you missed out on was resuming the frozen/Stopped quantize jobs, so they didn't interrupt and didn't exit.

140+ 14 CosmicNoodle-7B blocked/imatrix/gpu

The status of jobs does not update when paused, so this is simply the last status update. I think :) If it does not clear up when resumed, I will have to have a look.

It might also be that the job has failed somehow, but didn't have an exit status. In that case, the job scheduler doesn't know what to do and just ignores it. (Well, it might actually block a gpu in that case, but that isn't the case here).

nicoboss

1 day ago

nico1 is now unpaused.

nicoboss

about 9 hours ago

•

edited about 3 hours ago

-2000 360 si falcon-180B
-2000 236 si goliath-120b

Nice I see you queued falcon-180B and goliath-120b. I hope you are not just adding the missing static quants but will also requantizing the already existing imatrix quants. I definitely want to give falcon-180B another try. I remembered how I excited I was when it released as it was the biggest openly released LLM at that time but then the model turned out to be quite underwhelming but maybe with modern CoT prompting techniques and better system prompts this almost forgotten base model can be of use. While finetunes are nice in the end base models contains the knowledge I seek to extract and so are of much greater value.

Edit: Seams like it is requesting the existing imatrix quants. How awesome!

-999 205 I Llama-3-Motif-102B error/134 12/24,IQ1_M [691/867]

What a strange error - not something I've ever seen before but you might be familiar with it. So strange how all the other quants so far worked.

[ 691/ 867]                 blk.76.attn_q.weight - [ 9216,  9216,     1,     1], type =    f16, converting to iq1_m .. /root/cvs/llama.cpp-cuda512/ggml/src/ggml-quants.c:4453: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed

mradermacher

Owner about 1 hour ago

Nice I see you queued falcon-180B and goliath-120b. I hope you are not just adding the missing static quants but will also requantizing the already existing imatrix

I was missing the static quants only (and incidentally, any missing imatrix ones). I was also so disappointed in falcon-180b. Anyway, I'll redo the imatrix ones then, too, then.

error/134

That is the process exit code, in this case, ABRT: ggml-quants.c:4453: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment