Djrango/Qwen2vl-Flux · It does seem really cool but..

..its not really usable for end user cause a) its way too much VRAM needed and b) it sorta needs to be possible to use in some user friendly GUI (ComfyUI should be probably easiest to adapt, due custom nodes).

So I guess converting everything possible into GGUF would save some VRAM, then Im just guessing here, but inference done in stages should solve lack of VRAM (first generate instructions with Qwen2vl, then either save that, or cache and unload Qwen2vl and just do regular FLUX inference).

Not saying that I know how to do this.