[Fix] fix fully_shard offload_policy type inconsistency #1406

nil0x9 · 2025-12-31T16:36:51Z

Fix offload_policy type inconsistency (use OffloadPolicy() instead of None) in fully_shard

OffloadPolicy is a dummy dataclass in torch.distributed.fsdp which fits perfectly into the signature of fully_shard (the following shows a snippet from torch 2.7.0, other versions likewise):

@overload
def fully_shard(
    module: nn.Module,
    *,
    mesh: Optional[DeviceMesh] = ...,
    reshard_after_forward: Union[bool, int] = ...,
    shard_placement_fn: Optional[Callable[[nn.Parameter], Optional[Shard]]] = ...,
    mp_policy: MixedPrecisionPolicy = ...,
    offload_policy: OffloadPolicy = ...,    <------ it's not Optional so you shouldn't use None for this arg!!!
    ignored_params: Optional[set[nn.Parameter]] = ...,
) -> FSDPModule: ...

if you use None for offload_policy, it would work perfectly in runtime, for torch uses isinstance(offload_policy, CPUOffloadPolicy) to determine whether or not to turn on cpu offload (and isinstance(None, CPUOffloadPolicy always evaluates to False). The problem for using None is that it introduces arg-type mismatch for static checkers and makes your life miserable:

Using OffloadPolicy instead resolves this issue.

[Fix] fix fully_shard offload_policy type inconsistency

7e102f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] fix fully_shard offload_policy type inconsistency #1406

[Fix] fix fully_shard offload_policy type inconsistency #1406

Uh oh!

nil0x9 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Fix] fix fully_shard offload_policy type inconsistency #1406

Are you sure you want to change the base?

[Fix] fix fully_shard offload_policy type inconsistency #1406

Uh oh!

Conversation

nil0x9 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant