Skip to content

Conversation

@nil0x9
Copy link
Contributor

@nil0x9 nil0x9 commented Dec 31, 2025

Fix offload_policy type inconsistency (use OffloadPolicy() instead of None) in fully_shard

OffloadPolicy is a dummy dataclass in torch.distributed.fsdp which fits perfectly into the signature of fully_shard (the following shows a snippet from torch 2.7.0, other versions likewise):

@overload
def fully_shard(
    module: nn.Module,
    *,
    mesh: Optional[DeviceMesh] = ...,
    reshard_after_forward: Union[bool, int] = ...,
    shard_placement_fn: Optional[Callable[[nn.Parameter], Optional[Shard]]] = ...,
    mp_policy: MixedPrecisionPolicy = ...,
    offload_policy: OffloadPolicy = ...,    <------ it's not Optional so you shouldn't use None for this arg!!!
    ignored_params: Optional[set[nn.Parameter]] = ...,
) -> FSDPModule: ...

if you use None for offload_policy, it would work perfectly in runtime, for torch uses isinstance(offload_policy, CPUOffloadPolicy) to determine whether or not to turn on cpu offload (and isinstance(None, CPUOffloadPolicy always evaluates to False). The problem for using None is that it introduces arg-type mismatch for static checkers and makes your life miserable:

image

Using OffloadPolicy instead resolves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant