Skip to content

[regression] qkv_scale is empty due to offload in autoscheme.#1580

Closed
xin3he wants to merge 3 commits intomainfrom
xinhe/3-10
Closed

[regression] qkv_scale is empty due to offload in autoscheme.#1580
xin3he wants to merge 3 commits intomainfrom
xinhe/3-10

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Mar 20, 2026

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #1573, #1574

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Copilot AI review requested due to automatic review settings March 20, 2026 03:15
Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he xin3he requested review from Copilot and yiliu30 and removed request for Copilot March 20, 2026 03:16
Signed-off-by: Xin He <xin3.he@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a regression where KV-cache scale parameters (e.g., k_scale/v_scale) can end up empty/zero when AutoScheme runs with offloading enabled, by adjusting how parameters are updated and adding a targeted CPU regression test.

Changes:

  • Adds a CPU regression test covering AutoScheme + static_kv_dtype="fp8" to ensure k_scale/v_scale are populated.
  • Updates update_parameter_data to handle cases where a parameter was cleared to an empty tensor (e.g., via offload) by recreating the parameter when shapes differ.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
test/test_cpu/schemes/test_auto_scheme.py Adds a regression test intended to detect missing/zero KV scales after quantize_and_save.
auto_round/experimental/utils.py Modifies parameter update logic to try to recover when a parameter tensor has been cleared/shape-changed due to offload.

Signed-off-by: Xin He <xin3.he@intel.com>
@chensuyue chensuyue added this to the 0.12.0 milestone Mar 20, 2026
@chensuyue
Copy link
Contributor

Still got the issue, #1573

@chensuyue
Copy link
Contributor

Use #1583 instead.

@chensuyue chensuyue closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants