Skip to content

Conversation

@logicalor
Copy link

fix: replace repeat_interleave with portable impl for ROCm compatibility (#479)

torch.repeat_interleave triggers hipErrorIllegalState on AMD ROCm 6.4
when using tensor indices. This adds portable_repeat_interleave() that
uses index_select on ROCm while keeping native behavior on CUDA/CPU.

Fixes:

  • src/models/dit_7b/modulation.py
  • src/models/dit_3b/modulation.py
  • src/models/dit_7b/nablocks/mmsr_block.py
  • src/models/dit_3b/nablocks/attention/mmattn.py

The portable implementation handles both scalar and tensor repeats,
with debug logging support for tensor state diagnostics.

torch.repeat_interleave triggers hipErrorIllegalState on AMD ROCm 6.4
when using tensor indices. This adds portable_repeat_interleave() that
uses index_select on ROCm while keeping native behavior on CUDA/CPU.

Fixes:
- src/models/dit_7b/modulation.py
- src/models/dit_3b/modulation.py
- src/models/dit_7b/nablocks/mmsr_block.py
- src/models/dit_3b/nablocks/attention/mmattn.py

The portable implementation handles both scalar and tensor repeats,
with debug logging support for tensor state diagnostics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant