Additon of new CFG Methods

#12
by xi0v - opened

Hello!
Is it possible to get this supported?
https://huggingface.co/docs/diffusers/main/en/using-diffusers/pag (should have an enable/disable button)

And cfg rescale (rescale classifier-free guidance which I believe is labeled as guidance_rescale in diffusers) (should have an enable/disable button)

Also is it possible to implement CFG++ Samplers?

PAG is possible. Or rather, stablepy supports it, but I just haven't made a GUI for it. I'm just being lazy.πŸ˜†
I wonder what the rest of it is like...?
For better or worse, it's abstracted using stablepy, so it'll probably break if it's not supported by stablepy.
DiffuseCraft is a demo for stablepy, so most of the things that stablepy itself can do have been implemented. So, if it's a feature that DiffuseCraft has, I can add it straight away. If it's a feature that DiffuseCraft doesn't have, but is supported by the pip version of Diffusers, r3gm will probably support it quite quickly, like he did with the scheduler the other day.
He is a busy but friendly and proactive person, so he will probably answer your questions unless they are unreasonable requests in a programmatic context.
If it's something really simple, I can even submit a PR.
https://huggingface.co/spaces/r3gm/DiffuseCraft/discussions?status=open&type=discussion
https://github.com/R3gm/stablepy

Edit:
Since we're here, let's identify the features that DiffuseCraft (stablepy) is missing. It would be easier if we put together a list of ideas. He writes 100 times faster and more accurately than I do...
This is purely a difference in coding ability...

Edit:
Also, tell me which features you want to see prioritized that are not yet in VP, but are in DiffuseCraft. I'll add PAG first.

I added PAG scale and FreeU.

I added PAG scale and FreeU.

Thanks!

Since we're here, let's identify the features that DiffuseCraft (stablepy) is missing. It would be easier if we put together a list of ideas. He writes 100 times faster and more accurately than I do...
This is purely a difference in coding ability...

There is a kind of scheduler/samplers that are very promising, I'd love to see them supported in stablepy. They're called CFG++ or cfgpp in comfyui, https://arxiv.org/abs/2406.08070
Aside from this I think stablepy is feature complete till now.

I also believe that img2img in VP (supported by diffuse craft) would be very good, controlnets and IPAdapters would be great too !

Also inference with lycoris!
I can't use LoKrs or LoHAs on VP or DiffuseCraft. there are alot of LyCORISs out there that are better than LoRAs but I'm unable to use them since they're not supported in stablepy (neither are they in diffusers) so this'll probably be a great addition

Thanks.
I've also been wondering about LyCORIS, and I can use it if I call it directly from PEFT, but as you say, there is no way to call it from Diffusers. I think r3gm has the ability to create functions that are not in Diffusers, but I think the wrapping of Diffusers is the theme of stablepy, so it would be better to improve Diffusers itself first...
The issue with the LyCORIS implementation is how to determine that it is a LyCORIS file. I think this is also true for other LoRA variants.

Is CFG++ supported by Diffusers?

Edit:
Oh... its similar rescale.

Typically, LyCoris algorithms/models contain an identifier called "hada" in the keys which I believe stands for "hadamard product"

https://github.com/KohakuBlueleaf/LyCORIS/blob/main/docs/Algo-Details.md

This is also discovered here by sayakpaul
https://github.com/huggingface/diffusers/issues/4133

There was also an issue about this here https://github.com/huggingface/diffusers/issues/3087

I also found this gist about lycoris inference
https://gist.github.com/adhikjoshi/2c6da89cbcd7a6a3344d3081ccd1dda0


Is CFG++ supported by Diffusers?

I believe not.

Tho reForge webui implemented it
https://github.com/Panchovix/stable-diffusion-webui-reForge

https://cfgpp-diffusion.github.io/

I have informed r3gm about guidance_rescale. Good night.πŸ˜ͺ

This is also discovered here by sayakpaul

There may be some issues with sayakpaul knowing about it but not having it implemented...
It looks like it still hasn't been implemented even with this major LoRA renovation.

Edit:
PEFT's one.
https://huggingface.co/docs/peft/package_reference/adapter_utils

Edit:
CFG++ is not easy even in Forge (nor reForge)?
https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1864

On a side note, here is the result that each discrete sampling mode outputs; automatic and epsilon produce a fully burned image and v prediction produces somewhat coherent result but the image is so oversaturated and has a lot of noise

This comment has been hidden

Merry Christmas.
I actually know that model, but I think it's not half SDXL anymore. Specifically, I think it won't work unless we modify the Diffusers pipeline.πŸ€”

@r3gm Merry Christmas! Also, congratulations on the new stable release of stablepy.
I think this is the kind of thing that would be better off waiting for Community Pipeline or release of the Modular one. What do you think?
https://huggingface.co/nyanko7/nyaflow-xl-alpha
https://huggingface.co/spaces/nyanko7/toaru-xl-model

Extra: Other SDXL enhancement plans by civilians. I think the 8GB pony will work perfectly with just the DiffuseCraft modification. Maybe we just need to add a mode to the GUI where the load is not fixed at torch_dtype=torch.float16.
https://huggingface.co/nyanko7/sdxl_smoothed_energy_guidance
https://civitai.com/models/1051705/ultrareal-8gb-pony

not half SDXL anymore

It's still an SDXL model, the difference is the sampling. like the vpred models, they're still SDXL models.
Infact I did some merges with it and they should work.

Merry Christmas.
I actually know that model, but I think it's not half SDXL anymore. Specifically, I think it won't work unless we modify the Diffusers pipeline.πŸ€”

@r3gm Merry Christmas! Also, congratulations on the new stable release of stablepy.
I think this is the kind of thing that would be better off waiting for Community Pipeline or release of the Modular one. What do you think?
https://huggingface.co/nyanko7/nyaflow-xl-alpha
https://huggingface.co/spaces/nyanko7/toaru-xl-model

Yes, it needs changes as you said

Extra: Other SDXL enhancement plans by civilians. I think the 8GB pony will work perfectly with just the DiffuseCraft modification. Maybe we just need to add a mode to the GUI where the load is not fixed at torch_dtype=torch.float16.
https://huggingface.co/nyanko7/sdxl_smoothed_energy_guidance
https://civitai.com/models/1051705/ultrareal-8gb-pony

An easy way to do it is to use: model.pipe.text_encoder.to(torch.float32) model.pipe.text_encoder_2.to(torch.float32)
But it might be a good idea to have a parameter in stablepy for that.

I think that a major renovation for the scheduler related to FlowMatch is still underway, so I think we'll just have to wait and see for a while. Both of these have a significant impact on this part.πŸ˜…

As for the behavior of fp32 CLIP, I'm leaving the combination of fp32 and bf16 during conversion so as not to damage the upsampled data, so I think that in the case of Diffusers and Transformers, if torch_dtype= is not done, it will probably work as expected without being .to()ed.
If the current implementation of stablepy is passing torch_dtype= as is, this is probably the job of the UI side.

Or even if it's CLIP saved in fp16 precision, is there any significant benefit to just casting it to fp32 precision during calculations...?
This depends on the calculation error during calculations, so we won't know until we do some experiments (the difference will probably be minimal), but if there is a benefit, it might be worth including it as an option in stablepy. After all, VRAM consumption only changes by about 2GB.
If there isn't much benefit, fp16 is fine.
https://huggingface.co/John6666/ultrareal-8gb-pony-v2hybrid-sdxl

I think that a major renovation for the scheduler related to FlowMatch is still underway, so I think we'll just have to wait and see for a while. Both of these have a significant impact on this part.πŸ˜…

As for the behavior of fp32 CLIP, I'm leaving the combination of fp32 and bf16 during conversion so as not to damage the upsampled data, so I think that in the case of Diffusers and Transformers, if torch_dtype= is not done, it will probably work as expected without being .to()ed.
If the current implementation of stablepy is passing torch_dtype= as is, this is probably the job of the UI side.

Or even if it's CLIP saved in fp16 precision, is there any significant benefit to just casting it to fp32 precision during calculations...?
This depends on the calculation error during calculations, so we won't know until we do some experiments (the difference will probably be minimal), but if there is a benefit, it might be worth including it as an option in stablepy. After all, VRAM consumption only changes by about 2GB.
If there isn't much benefit, fp16 is fine.
https://huggingface.co/John6666/ultrareal-8gb-pony-v2hybrid-sdxl

In my tests, I didn't notice much difference, but maybe I need to load the components separately to prevent them from being affected when I use torch.float16

Thanks for the verification!
So, it seems that in models with downcasted CLIP to fp16 once, which is the case for over 95% of models, it has no effect. It's only useful if the model was trained, saved and released with fp32 precision, or if it was upsampled.
Well, the good news is that there are no problems with the precision of CLIP calculations in fp16.
Anyway, it's not worth adding an option for.

Sign up or log in to comment