Skip to content

Super-resolution in DeepFloyd is bugged for any non-64x64 to 256x256 upscales #3289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AmericanPresidentJimmyCarter opened this issue Apr 29, 2023 · 1 comment · Fixed by #3298
Labels
bug Something isn't working

Comments

@AmericanPresidentJimmyCarter
Copy link
Contributor

Describe the bug

Superresolution code is currently bugged because it does not accept height and width, and rather just assigns a height and width of 256x256. This results in squished images when you try to super resolution anything from the first stage model that is not 64x64.

height = self.unet.config.sample_size
width = self.unet.config.sample_size

stage1-2

ch62g01fYFz4

Solution: Just allow height and width to be passed in. I forked the class and did this manually and it works fine.

Reproduction

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_model_cpu_offload()


prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
generator = torch.manual_seed(1)

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

# stage 1
image = stage_1(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    generator=generator,
    output_type="pt",
    height=96,
    width=64,
).images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(
    image=image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    generator=generator,
    output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

Logs

No response

System Info

py 3.10.6, diffusers on latest main

@patrickvonplaten
Copy link
Contributor

Yes I think we should indeed allow this! Continuing discussion on #3298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants