black-forest-labs/FLUX.1-Redux-dev · Can't run on Nvidia L4 Instance

Hello,

I'm trying to run the Flux.1-Redux-dev for image variation on an L4 instance ( 24Gb VRAM), but still have memory issues.
Very simple code:

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained("black-forest-labs/FLUX.1-Redux-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev" ,
text_encoder=None,
text_encoder_2=None,
torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
pipe_prior_output = pipe_prior_redux(image)
images = pipe(
guidance_scale=2.5,
num_inference_steps=50,
generator=torch.Generator("cpu").manual_seed(0),
**pipe_prior_output,
).images
images[0].save("flux-dev-redux.png")

This is the snippet that BF give themselves btw.

I even tried to offload to cpu, but still have CUDA out of memory errors.

Any lead ? Can it even run on 24GB VRAM ?

Thank you