[dtensor] expose the __create_chunk_list__ in the doc (pytorch#144100)

wanchaol · pytorchmergebot · commit eb7a303d21c2 · 2025-01-03T20:06:23.000Z
as titled, this PR expose this dunder method as a public API in the doc, so that different checkpoint implementations can leverage this protocol, instead of exposing a separate API Pull Request resolved: pytorch#144100 Approved by: https://github.com/awgu ghstack dependencies: pytorch#144099
diff --git a/docs/source/distributed.tensor.rst b/docs/source/distributed.tensor.rst
@@ -51,6 +51,7 @@ on all devices, etc.
 .. autoclass:: DTensor
     :members: from_local, to_local, full_tensor, redistribute, device_mesh, placements
     :member-order: groupwise
+    :special-members: __create_chunk_list__
 
 
 DeviceMesh as the distributed communicator
diff --git a/torch/distributed/tensor/_api.py b/torch/distributed/tensor/_api.py
@@ -606,6 +606,16 @@ def __create_write_items__(self, fqn: str, object: Any):
             raise RuntimeError("Unsupported tensor type!")
 
     def __create_chunk_list__(self):
+        """
+        Return a list of ChunkStorageMetadata, which is a dataclass that describes the size/offset of the local shard/replica
+        on current rank. For DTensor, each rank will have a single local shard/replica, so the returned list usually only
+        has one element.
+
+        This dunder method is primariy used for distributed checkpoint purpose.
+
+        Returns:
+            A List[:class:`ChunkStorageMetadata`] object that represents the shard size/offset on the current rank.
+        """
         from torch.distributed.checkpoint.planner_helpers import (
             _create_chunk_from_dtensor,
         )