Skip to content

Conversation

@Haoming02
Copy link
Contributor

@Haoming02 Haoming02 commented Dec 9, 2025

  • Convert the block_index from int to a torch.Tensor, to support torch.compile
    • dtype: torch.uint8
    • device: input's device cpu

@Kosinkadink
Copy link
Member

@kijai could you check if this fixes the torch.compile graph breaks?

@kijai
Copy link
Contributor

kijai commented Dec 9, 2025

Yes, tested on HunyuanVideo 1.5 and went from 54 recompiles on first step to none.

@Kosinkadink
Copy link
Member

Sweet, I'll merge it in after stable!

@Kosinkadink
Copy link
Member

Kosinkadink commented Dec 11, 2025

@Haoming02 could you make the tensors be on the CPU instead?

@Kosinkadink Kosinkadink added the Core Core team dependency label Dec 11, 2025
@Kosinkadink
Copy link
Member

@kijai hey, could you retest this to make sure that the CPU tensors work fine with torch.compile?

@kijai
Copy link
Contributor

kijai commented Dec 11, 2025

@kijai hey, could you retest this to make sure that the CPU tensors work fine with torch.compile?

While creating it on cpu is fine, actually using a cpu tensor is a bit problematic as with inductor it requires cpu compile support, which then requires more compiler libraries installed than at least the current Triton-windows package includes, and if you don't have them it just errors out.

You can of course cast it to gpu before using it, it does then create DeviceCopy in input program warning, but I'm unsure what this affects.

@Kosinkadink
Copy link
Member

damn, that is definitely annoying. the goal would be that the block index is used inside attention code and is only compared. To even do this comparison, would it be better to compare against the GPU tensor, or CPU tensor? if GPU tensor would be easier, then we can edit this PR to use GPU.

Alternatively, is there a way to tell torch compile to ignore transformer_options or at least that one key?

@woct0rdho
Copy link
Contributor

I'm also waiting for blockinfo to be merged so I can implement things like RadialAttn more easily. I think it's ok to create a scalar tensor on GPU and do comparison with it. It's not the first time we do it, for example in https://github.com/comfyanonymous/ComfyUI/blob/c5a47a16924e1be96241553a1448b298e57e50a1/comfy/extra_samplers/uni_pc.py#L785

@Kosinkadink
Copy link
Member

Comfy merged a PR to make things in transformer_options not cause as many graph breaks, so we can stick with integers: #11317

I'll be closing this issue, and once the tensors get turned into ints in the Lumina BlockInfo PR, i can merge that one! #11227

@Haoming02 Haoming02 deleted the bInfo-Tensor branch December 16, 2025 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants