Skip to content

Fix clean offload clearing of non-checkpoint tensors#1583

Open
lvliang-intel wants to merge 14 commits intomainfrom
lvl/fix_clean_offload
Open

Fix clean offload clearing of non-checkpoint tensors#1583
lvliang-intel wants to merge 14 commits intomainfrom
lvl/fix_clean_offload

Conversation

@lvliang-intel
Copy link
Contributor

Description

Preserve dynamically registered parameters and buffers during clean-mode offload.
Previously, clean-mode clearing removed all local tensors under a block, including runtime-added state that does not exist in the original checkpoint. Those tensors could not be restored during reload, leaving modules with empty tensors after offload and reload cycles. Clean mode now clears only tensors that can actually be restored from checkpoint while preserving runtime quantization state.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

#1573

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Copilot AI review requested due to automatic review settings March 20, 2026 08:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes clean-mode offload so it preserves runtime-added (non-checkpoint) parameters/buffers by only clearing tensors that can be restored from the original checkpoint.

Changes:

  • Add selective clearing in _clear_module_weights via a restorable_params allowlist.
  • Cache and use a checkpoint weight map to determine which state entries under a block are restorable.
  • Thread block_name through offload/clear callsites to enable block-scoped restorable filtering.

@chensuyue
Copy link
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@chensuyue
Copy link
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Contributor

xin3he commented Mar 26, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Contributor

xin3he commented Mar 26, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants