fp16 model?

#2
by nonetrix - opened

gguf is nice but I would like a fp16 model for merging

fp16 or other q8 q6 please?

Katy's Historical Models org

I unfortunately don't have the compute to make this happen, I'll upload the yaml file of the merge if anyone wants to recreate it.

It's just a merge? I have 64GBs of RAM to spare and AMD GPU with a extra 16GBs (AMD so unstable as shit). I can do that easily

I would quant an AWQ of this if I we could see the model.
I thought that you made a GGUF from the model, but reading this suggests that the GGUF is all you have for this. I didn't know you could model merge without fp16.

Katy's Historical Models org

I did it on the kobold merge box a while back, thats why I didnt have access to the FP16 files to upload, it is just a merge, sorry if I have mislead you into thinking it was a finetune or something.

I'm reading this article about merging: https://huggingface.co/blog/mlabonne/merge-models
do you remember which method you had used?
never merged models before, but I have compute available to me

edit0: nvm I didn't read merge_method: task_arithmetic
no clue how to do that one, but I'll research it
edit1: this should do it: https://github.com/arcee-ai/mergekit/blob/main/notebook.ipynb

I'm reading this article about merging: https://huggingface.co/blog/mlabonne/merge-models
do you remember which method you had used?
never merged models before, but I have compute available to me

edit0: nvm I didn't read merge_method: task_arithmetic
no clue how to do that one, but I'll research it
edit1: this should do it: https://github.com/arcee-ai/mergekit/blob/main/notebook.ipynb

Merging locally would be much faster, free colab gives you two meh cores. And little ram, though you can offload to vram with --read-to-gpu.
You also need storage to store all the fp16 files. 15gb~ per 7b and the resulting model. Which would be hard to fit in the default 70gb.
Paid tiers could be better though.

@saishf you are 100% correct, I am using Jupyter notebook locally with some nvidia GPUs. Same thing as colab, but just runs local.

actually just installing mergekit, and not using a jupyter (colab) at all is optimal:

not sure if this is the correct way, but it's running now: mergekit-yaml /opt/solidrust/merges/KatyTestHistorical-SultrySilicon-7B-V2.yaml . --cuda
some of the models are gated and need you to go to them and acknowleged the things

actually just installing mergekit, and not using a jupyter (colab) at all is optimal:

not sure if this is the correct way, but it's running now: mergekit-yaml /opt/openbet/inference/KatyTestHistorical-SultrySilicon-7B-V2.yaml . --cuda

Jupyter is cool, I don't have much use though as I only have a single gpu.
I first learnt about mergekit here https://huggingface.co/blog/mlabonne/merge-models
It's nice and easy for a start, but doesn't go into detail with the latest methods but it works. Plus --help will detail everything for you!
--low-cpu-ram is useful too, if you have more vram than ram
One I use every time is --out-shard-size "-B"
2B makes each shard like 4GB

Thank-you.
used mergekit-yaml /opt/openbet/inference/KatyTestHistorical-SultrySilicon-7B-V2.yaml . --cuda --low-cpu-memory --out-shard-size "2B"

and created: https://huggingface.co/solidrust/KatyTestHistorical-SultrySilicon-7B-V2

Thank-you.
used mergekit-yaml /opt/openbet/inference/KatyTestHistorical-SultrySilicon-7B-V2.yaml . --cuda --low-cpu-memory --out-shard-size "2B"

and created: https://huggingface.co/solidrust/KatyTestHistorical-SultrySilicon-7B-V2

Hopefully we will see more quants with fp16 files available! Most models get ggufs within a day thanks to all the dedicated people on hf, even exl and awq for, people...

Sign up or log in to comment