Huge memory consumption with SD3.5-medium

#18
by oddball516 - opened

According to the picture here, SD3.5-medium should work fine on 10GB vRAM
https://stability.ai/news/introducing-stable-diffusion-3-5

However, my test program fails on a g4dn.xlarge AWS instance, it has 4C/16G + 48G swap, and a Tesla T4 CPU with 16GB vRAM. It runs out of memory due to CUDA couldn't allocate more memory. From nvidia-smi it already took ~15GB memory, and couldn't complete even one picture.

I'm wondering what's wrong here?

Attached fill source code.

import os
import json
import torch

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("./stable-diffusion-3.5-medium/")
if torch.cuda.is_available():
    print('use cuda')
    pipe = pipe.to("cuda")
elif torch.mps.is_available():
    print('use mps')
    pipe = pipe.to('mps')
else:
    print('use cpu')

data = []
with open('data.json', 'r') as f:
    data = json.load(f)

os.makedirs('output', exist_ok=True)
for row in data:
    prompt   = '%s, style is %s, light is %s' % (row['prompt'], row['style'], row['light'])
    filename = 'output/%s.png' % (row['uuid'])
    height   = 1280
    width    = 1280
    
    if row['aspect_ratio'] == '16:9':
        width = 720
    elif row['aspect_ratio'] == '9:16':
        width = 720
        height = 1280
    
    print('saving', filename)
    image = pipe(prompt, height=height, width=width).images[0]
    image.save(filename)

did it resolve for you

@yue32000 @oddball516
The reason is because of the T5 text encoder, you can resolve it with
pipe.enable_model_cpu_offload()

Sign up or log in to comment