Skip to content

Support mxfp nvfp lmhead quant#1051

Merged
WeiweiZhang1 merged 24 commits intomainfrom
support_mxfp_nvfp_lmhead_quant
Nov 28, 2025
Merged

Support mxfp nvfp lmhead quant#1051
WeiweiZhang1 merged 24 commits intomainfrom
support_mxfp_nvfp_lmhead_quant

Conversation

@WeiweiZhang1
Copy link
Contributor

@WeiweiZhang1 WeiweiZhang1 commented Nov 20, 2025

related issue: #1040

if layer_name not in layer_inputs:
if self.act_bits < 16 and not self.act_dynamic:
logger.warning(
f"Due to insufficient resources: act_max hook for layer '{layer_name}' is unavailable. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support it via block-wise forward, please refer to auto-scheme code. @xin3he could you take this task?

Copy link
Contributor

@wenhuach21 wenhuach21 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are quantized layers outside the blocks, we could switch to this mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are quantized layers outside the blocks, we could switch to this mode

this happens within _quantize_layers function, which only for outside of block layers. don't understand what switching to this mode means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoScheme provide an advanced way that could calibrate all the layers in the model with block-wise forward

Copy link
Contributor

@xin3he xin3he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

WeiweiZhang1 and others added 7 commits November 27, 2025 01:14
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
WeiweiZhang1 and others added 4 commits November 28, 2025 01:22
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@WeiweiZhang1 WeiweiZhang1 merged commit c4a1479 into main Nov 28, 2025
26 checks passed
@WeiweiZhang1 WeiweiZhang1 deleted the support_mxfp_nvfp_lmhead_quant branch November 28, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants