Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sana activations explode / clamping issue #10336

Open
Nerogar opened this issue Dec 21, 2024 · 8 comments
Open

Sana activations explode / clamping issue #10336

Nerogar opened this issue Dec 21, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@Nerogar
Copy link

Nerogar commented Dec 21, 2024

Describe the bug

I'm using the pretrained weights from Efficient-Large-Model/Sana_1600M_1024px_diffusers. I don't know if this is an issue with these weights, or if the implementation is broken.

Things I've observed so far:

  • using fp16 calculations usually generates good enough results
  • setting everything to fp32 (weights and autocast contexts) completely breaks the output

The attention output here is very different between fp16 the fp32 version.

The hidden_states are in the +/-5*10^5 range here (sometimes even higher, I've seen values as high as 1.3*10^6).
Using fp16 calculations, they become inf, which is clamped down to (-65504, 65504) (or about 6*10^4, more than an order of magnitude less). Using fp32 calculations, this clamping is not done, which means the output of that attention block is also different.

Enabling this clamping even for fp32 calculations fixes the issue, but this seems like a hack. That clamping operation looks like a safeguard, not like an essential part of the attention calculations. Adding print(f"hidden_states: {hidden_states}") just before and after the clamping operation shows the issue pretty well. You can see

Here are some examples (all using the same prompt/seed/cfg/sampler/etc.)

import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")

fp16 weights (with clamping)
fp16-clamped

fp32 weights (without clamping)
fp32-not-clamped

fp32 weights (with clamping)
fp32-clamped

(tagging @lawrence-cj as the original author)

Reproduction

import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")

Logs

No response

System Info

  • 🤗 Diffusers version: 0.32.0.dev0
  • Platform: Windows-10-10.0.22631-SP0
  • Running on Google Colab?: No
  • Python version: 3.10.8
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.47.0
  • Accelerate version: 1.0.1
  • PEFT version: not installed
  • Bitsandbytes version: 0.44.1
  • Safetensors version: 0.4.5
  • xFormers version: 0.0.28.post3
  • Accelerator: NVIDIA RTX A5000, 24564 MiB
  • Using GPU in script?: CUDA / NVIDIA RTX A5000
  • Using distributed or parallel set-up in script?: No

Who can help?

No response

@Nerogar Nerogar added the bug Something isn't working label Dec 21, 2024
@Nerogar
Copy link
Author

Nerogar commented Dec 21, 2024

Update: switching to the Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers weights instead seems to have fixed this issue. So I guess it's probably not a bug in the implementation, but instead with the model conversion process.

@vladmandic
Copy link
Contributor

i went through the same rabbithole, see #10241 for details

@Nerogar
Copy link
Author

Nerogar commented Dec 21, 2024

Yes, I saw that. Wasn't sure if I should only add a comment to that issue or create a new one. Decided to create a new one since the other one was closed already. This definitely seems like something that's not intended. Even if we use Efficient-Large-Model/Sana_1600M_1024px_diffusers with fp16 weights, the result is worse than the bf16 version.

Here is another comparison.

bf16 weights using Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers:
bf16

fp16 weights using Efficient-Large-Model/Sana_1600M_1024px_diffusers:
fp16-clamped

Prompt was "a water color painting of a bear", and as you can see, the bf16 version looks a lot more like actual water color. I've described why this happens in my initial post. Something about the self attention is broken in the fp16 version, which means the model can't properly produce details and styles.

I can kind of understand that converting a fp16 model to bf16 doesn't always work. Those data types aren't really compatible. But upcasting to fp32 should never reduce quality. That's an obvious sign of a problem. And in any case, the weights of Efficient-Large-Model/Sana_1600M_1024px_diffusers are stored in fp32 format, so there isn't even any conversion going on.

@vladmandic
Copy link
Contributor

good job finding exact spot.
cc @lawrence-cj if he can take a look at that?

@lawrence-cj
Copy link
Contributor

lawrence-cj commented Dec 22, 2024

I can kind of understand that converting a fp16 model to bf16 doesn't always work. Those data types aren't really compatible. But upcasting to fp32 should never reduce quality. That's an obvious sign of a problem. And in any case, the weights of Efficient-Large-Model/Sana_1600M_1024px_diffusers are stored in fp32 format, so there isn't even any conversion going on

This problem is due to the fact that we add the value clamping during training with mix_precesion(here), actually the model never saw value out of the scope of (-65504, 65504), so when you try FP32 or BF16 to inference using FP16 trained model, the value of self-attention output will not be clamped(refer to here) and that's why it won't give you the desired results. We provide the FP32 model only for reference, in case of someone need it for fine-tuning or something. If this makes any confusing, then should we just remove the FP32 version of safetensors in our FP16-trained models?

Cc: @vladmandic @Nerogar

@lawrence-cj
Copy link
Contributor

lawrence-cj commented Dec 22, 2024

Prompt was "a water color painting of a bear", and as you can see, the bf16 version looks a lot more like actual water color.

I don't think this is caused by the precision, at least, I don't have a provement for it. If you have any insight, please let me know. I'm curious about it. @Nerogar

@Nerogar
Copy link
Author

Nerogar commented Dec 22, 2024

To be honest, I don't really see the point in having the fp16 weights at all. If I load Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers and convert to fp16 on the fly, I get the exact same result compared to bf16 weights.

To me it looks like those weights are just broken and there is no point in using them.

@lawrence-cj
Copy link
Contributor

We set the BF16 to the default checkpoint and the original fp16 models will serve as a reference, in case someone need to compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants