Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sana broke on MacOS. Grey images on MPS, NaN's on CPU. #10334

Open
Vargol opened this issue Dec 21, 2024 · 6 comments
Open

Sana broke on MacOS. Grey images on MPS, NaN's on CPU. #10334

Vargol opened this issue Dec 21, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@Vargol
Copy link

Vargol commented Dec 21, 2024

Describe the bug

Just started to play with Sana, was excited when I saw it was coming to Diffusers as the NVIDIA supplied code was full of CUDA only stuff.
Ran the example code, changing cuda to mps and got a grey image.

output

Removed the move to MPS to run it on the CPU and the script failed with

image_processor.py:147: RuntimeWarning: invalid value encountered in cast

that suggests the latents had NaN's on the CPU.

Reproduction

import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_diffusers", torch_dtype=torch.float32
)
pipe.to("mps")
pipe.text_encoder.to(torch.bfloat16)
pipe.transformer = pipe.transformer.to(torch.float16)

image = pipe(prompt='a cyberpunk cat with a neon sign that says "Sana"')[0]
image[0].save("output.png")

removed pipe.to("mps") to run on the CPU.

Logs

*** MPS run ***
(Diffusers) $ python sana_test.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.03s/it]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.18s/it]

Setting `clean_caption=True` requires the Beautiful Soup library but it was not found in your environment. You can install it with pip:
`pip install beautifulsoup4`. Please note that you may need to restart your runtime after installation.

Setting `clean_caption` to False...
The 'batch_size' argument of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.

Setting `clean_caption=True` requires the Beautiful Soup library but it was not found in your environment. You can install it with pip:
`pip install beautifulsoup4`. Please note that you may need to restart your runtime after installation.

Setting `clean_caption` to False...
100%|███████████████████████████████████████████████████████████████████████████████████| 20/20 [00:49<00:00,  2.48s/it]
(Diffusers) $ 

***CPU run***

(Diffusers) $ python sana_test.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.13s/it]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:07<00:00,  1.41s/it]

Setting `clean_caption=True` requires the Beautiful Soup library but it was not found in your environment. You can install it with pip:
`pip install beautifulsoup4`. Please note that you may need to restart your runtime after installation.

Setting `clean_caption` to False...
The 'batch_size' argument of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.

Setting `clean_caption=True` requires the Beautiful Soup library but it was not found in your environment. You can install it with pip:
`pip install beautifulsoup4`. Please note that you may need to restart your runtime after installation.

Setting `clean_caption` to False...
100%|███████████████████████████████████████████████████████████████████████████████████| 20/20 [20:14<00:00, 60.74s/it]
/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")
(Diffusers) $

System Info

  • 🤗 Diffusers version: 0.32.0.dev0
  • Platform: macOS-15.2-arm64-arm-64bit
  • Running on Google Colab?: No
  • Python version: 3.11.10
  • PyTorch version (GPU?): 2.6.0.dev20241219 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.25.0
  • Transformers version: 4.47.1
  • Accelerate version: 0.34.2
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: Apple M3
  • Using GPU in script?: both
  • Using distributed or parallel set-up in script?: no

Who can help?

@pcuenca

@Vargol Vargol added the bug Something isn't working label Dec 21, 2024
@pcuenca
Copy link
Member

pcuenca commented Dec 21, 2024

Hello @Vargol. Just to understand this better, did it "break", or did it never work?

@Vargol
Copy link
Author

Vargol commented Dec 21, 2024

It's the first time I've tried it, so technically it never worked.

@lawrence-cj
Copy link
Contributor

lawrence-cj commented Dec 22, 2024

import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("mps")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

prompt = 'Self-portrait oil painting, a beautiful cyborg with golden hair, 8k'
image = pipe(prompt=prompt)[0]
image[0].save("sana.png")

Try this BF16 version model, pls. @Vargol
If it works for you, please let me know.

@Vargol
Copy link
Author

Vargol commented Dec 22, 2024

Okay, the Bf16 version of the model seems to be working, I'm not sure about the quality of the image, but guess that could be a prompting or non-optimal parameter thing (or personal taste)

sana

@a-r-r-o-w
Copy link
Member

@Vargol Just curious, what is the max unified memory usage for the above?

@Vargol
Copy link
Author

Vargol commented Dec 22, 2024

I haven't used any fancy memory monitor things but Activity monitor was reporting 18.4Gb Memory usage during iteration peaking at 20Gb I assume during the decode.

After the run was complete and python had return I was running at 8.Gb.
Note there was plenty of cached files in memory and only 230 Mb swap was in use, so Sana was probably around 10Gb, 12Gb during the decode barring any rapid temporary peaks that activity monitor didn't pick up.

EDIT: Just ran another run, and the python binary was reported at 10.43Gb during iteration and 12.98G Gb during the decode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants