Skip to content

[Test] Full-iteration CG + FSDP #1688

@erhoo82

Description

@erhoo82

Support full-iteration cuda graph for FSDP

  • This needs a patch in TE 2.10
  • A fix to trigger the param AG hook in the current forward path (not in the optim.step of the previous step)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions