Skip to content

[WIP][Full Dtensor] Work with full dtensor fully_shard#2616

Draft
fegin wants to merge 2 commits intogh/fegin/100/basefrom
gh/fegin/100/head
Draft

[WIP][Full Dtensor] Work with full dtensor fully_shard#2616
fegin wants to merge 2 commits intogh/fegin/100/basefrom
gh/fegin/100/head

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Mar 17, 2026

Stack from ghstack (oldest at bottom):

NOT READY YET
NOT READY YET

This PR is still a WIP, so code quality and design are not finalized.

We publish this PR to get early signals of CIs and also verify pytorch/pytorch#176334.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 17, 2026
**NOT READY YET**

This PR is still a WIP, so code quality and design are not finalized.

We publish this PR to get early signals of CIs and also verify pytorch/pytorch#176334.


ghstack-source-id: fc1597e
Pull-Request: #2616
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 17, 2026
@fegin fegin marked this pull request as draft March 17, 2026 21:35
[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 17, 2026
**NOT READY YET**

This PR is still a WIP, so code quality and design are not finalized.

We publish this PR to get early signals of CIs and also verify pytorch/pytorch#176334.

ghstack-source-id: 184da4a
Pull-Request: #2616
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Mar 19, 2026
…mesh (#176334)

**Summary**
Enable `fully_shard` to operate on models whose parameters are already DTensors distributed across a full SPMD mesh, including the data-parallel dimensions.

Previously, `fully_shard` only handled the case where DP dimensions were not part of an existing DTensor mesh (which is TP or EP). This PR introduces `DataParallelMeshDims`, which lets users specify which mesh dimensions correspond to data parallelism (sharding and/or replication). The DP dimensions must carry `Replicate` placements in the original DTensor spec before `fully_shard` replaces them with `Shard`.

Note that, the input/activation will also be DTensors distributed across the full SPMD mesh.

**Verification**
We verify this PR through unittests and TorchTitan integration: pytorch/torchtitan#2616.
Pull Request resolved: #176334
Approved by: https://github.com/weifengpy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant