Skip to content

Conversation

kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Oct 7, 2025

Depends on #2598

Changes

  • Allow nvFuser executor to handle AsyncCollectiveTensor as it may show up when using programs with DTensor.
  • Update nvFuser executor to create separate fusion region (and FusionDefinition) for multi-device and single-device.
  • Update dtensor_mul to correctly handle broadcasting and type promotion.
  • Add dtensor_silu dtensor symbol composed of existing prims.
  • Enable thunderfx path for the distributed test_moe.py.

Generated trace from thunderfx - https://gist.github.com/kshitij12345/c02ecef47a9974216d8350bb0e0db4ca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant