Skip to content

Commit 832bce7

Browse files
authored
[NVFp4][MoE Calibration] Fix MoE Calibration Context (#1864)
SUMMARY: - Fixes bug where despite being calibrated within the MoE Calibration context, some experts were not being calibrated / activation scales remained changed - Update the Qwen3 MoE NVFP4 Example to only use 20 calibration samples, not 200
1 parent 16de22f commit 832bce7

File tree

2 files changed

+2
-3
lines changed

2 files changed

+2
-3
lines changed

examples/quantization_w4a4_fp4/qwen_30b_a3b.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
DATASET_SPLIT = "train_sft"
1717

1818
# Select number of samples
19-
NUM_CALIBRATION_SAMPLES = 200
19+
NUM_CALIBRATION_SAMPLES = 20
2020
MAX_SEQUENCE_LENGTH = 2048
2121

2222
# Load dataset and preprocess.
@@ -84,7 +84,6 @@ def tokenize(sample):
8484
print(tokenizer.decode(output[0]))
8585
print("==========================================\n\n")
8686

87-
8887
# Save to disk in compressed-tensors format.
8988
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-NVFP4"
9089
model.save_pretrained(SAVE_DIR, save_compressed=True)

src/llmcompressor/modeling/prepare.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def update_qwen3_moe(model, stack, calibrate_all_experts):
6262
def moe_calibration_context(
6363
model: PreTrainedModel,
6464
stack,
65-
calibrate_all_experts: bool = False,
65+
calibrate_all_experts: bool = True,
6666
):
6767
# Temporarily updates the MoE modules within the context
6868
# Once the context exists, parameter updates persist

0 commit comments

Comments
 (0)