Skip to content

Conversation

@adi776borate
Copy link
Contributor

@adi776borate adi776borate commented Dec 9, 2025

What does this PR do?

Fixes #12809
This PR fixes it by:

  1. Removing the @torch.autocast decorator (Fixes the import warning).
  2. Explicitly casting inputs to float32 inside the forward method (Preserves the required numerical stability).
  3. Casting the result back to weight.dtype before passing it to the Linear layers (Fixes the dtype mismatch crash).

Verification

I verified that the results remain stable before and after this change by generating images with a fixed seed (generator=torch.manual_seed(42)).

The results are almost the same with some minor differences.

Before Fix After Fix
kandinsky_before_fix kandinsky_after_fix
Reproduction Script
import torch
from diffusers import Kandinsky5T2IPipeline

model_id = "kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers"
device = "cuda" if torch.cuda.is_available() else "cpu"

dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float32
pipe = Kandinsky5T2IPipeline.from_pretrained(model_id, torch_dtype=dtype)
pipe.to(device)

seed = 42
generator = torch.Generator(device=device).manual_seed(seed)

print("Generating image...")
output = pipe(
    prompt="A cat and a dog baking a cake together in a kitchen.",
    negative_prompt="",
    num_inference_steps=25, # Reduced for faster verification
    guidance_scale=3.5,
    height=1024,
    width=1024,
    generator=generator, 
)

image = output.image[0]
image.save("kandinsky_after_fix.png")

Before submitting

Who can review?

@yiyixuxu @leffff
Anyone in the community is free to review the PR once the tests have passed.

@leffff
Copy link
Contributor

leffff commented Dec 9, 2025

Looks good to me!

@knd0331
Copy link

knd0331 commented Dec 10, 2025

Thanks for the quick fix! I didn't have time to submit a PR myself, so I really appreciate you jumping on this. 🙏
@adi776borate

@adi776borate
Copy link
Contributor Author

@yiyixuxu @sayakpaul
A gentle ping to review

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Could you also provide your testing script?

@adi776borate
Copy link
Contributor Author

Thank you! Could you also provide your testing script?

The verification script is already provided in the PR description above.
If you want to test minimally, we can just do:

from diffusers.models.transformers import transformer_kandinsky
print("Import successful.")

Should print a UserWarning on main, but not on this branch.

@sayakpaul sayakpaul requested a review from yiyixuxu December 11, 2025 12:00
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu
Copy link
Collaborator

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Dec 11, 2025

Style bot fixed some files and pushed the changes.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@torch.autocast(device_type="cuda", dtype=torch.float32)
def forward(self, time):
args = torch.outer(time, self.freqs.to(device=time.device))
time = time.to(dtype=torch.float32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time = time.to(dtype=torch.float32)
origintal_dtype = time.dtype
time = time.to(dtype=torch.float32)

freqs = self.freqs.to(device=time.device, dtype=torch.float32)
args = torch.outer(time, freqs)
time_embed = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
time_embed = time_embed.to(dtype=self.in_layer.weight.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time_embed = time_embed.to(dtype=self.in_layer.weight.dtype)
time_embed = time_embed.to(dtype=original_dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I cast to self.in_layer.weight.dtype instead of original_dtype is to prevent runtime crashes on backends like XPU as mentioned by @vladmandic here.
If users load the pipeline in float16, and we pass time_embed as float32, that will raise an error, won't it?
I might be wrong, correct me if so.


@torch.autocast(device_type="cuda", dtype=torch.float32)
def forward(self, x):
x = x.to(dtype=self.out_layer.weight.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm actually this did not look correct to me - we want to upcast it to float32, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, if we force x to float32 here, we might hit the same mismatch crash if the out_layer weights are float16/bfloat16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kandinsky5TimeEmbeddings hardcodes 'cuda' in @torch.autocast decorator, causing warning on non-CUDA systems

6 participants