Skip to content

[5/N][SAC] Refactor activation checkpointing to use centralized policy-based approach, and recompute lora params#2489

Closed
mori360 wants to merge 1 commit intogh/mori360/5/basefrom
gh/mori360/5/head
Closed

[5/N][SAC] Refactor activation checkpointing to use centralized policy-based approach, and recompute lora params#2489
mori360 wants to merge 1 commit intogh/mori360/5/basefrom
gh/mori360/5/head

Conversation

@mori360
Copy link
Contributor

@mori360 mori360 commented Mar 5, 2026

The previous activation checkpointing implementation had each model define its own _op_sac_save_list in its respective parallelize.py file. Manually maintaining the op list risks missing standard compute-intensive ops that PyTorch's get_default_op_list() already provides.

This PR:

  • Centralize ops in activation_checkpoint.py via default_activation_checkpoint_policy()
  • Use get_default_op_list() from torch._functorch.partitioners as the foundation
  • Split ops list into compute and communicate case, and extend them with the current ops used in torchtitan but not included in get_default_op_list
  • Specially wrap policy by ac_config.per_op_sac_force_recompute_mm_shapes_by_fqns
  • Recompute small params and skip them in count for every other param save policy.

Stack from ghstack (oldest at bottom):

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 5, 2026
mori360 added a commit that referenced this pull request Mar 5, 2026
…ed approach

ghstack-source-id: d4da962
Pull Request resolved: #2489
@mori360 mori360 marked this pull request as draft March 5, 2026 01:55
@mori360 mori360 changed the title [SAC] Refactor activation checkpointing to use centralized policy-based approach [5/N][SAC] Refactor activation checkpointing to use centralized policy-based approach, and recompute lora params Mar 5, 2026
@mori360 mori360 closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant