[MoE Calibration] Simplify MoE calibration interface #1851

sairampillai · 2025-09-22T20:48:58Z

Introduce standardized MoE calibration interface and deprecate legacy `replace_modules_for_calibration`

Summary

Implements a standardized MoeContextCalibration class and simplified registration interface for MoE model calibration, making MoE model integration easier and deprecating the legacy replace_modules_for_calibration function.

Problem

MoE model calibration currently requires module replacement logic scattered across replace_modules_for_calibration and moe_calibration_context. This makes contributing new MoE model support difficult. Additionally, the DatasetArgs.calibrate_moe_context parameter created confusion by being optional when MoE calibration should always execute by default.

Relevant Issues

Fixes #1829

Solution

Standardized Interface: Introduces MoeContextCalibration abstract base class with ContextualMoECalibration and PermanentMoECalibration implementations
Centralized Registry: New MoEModelConfig dataclass and automatic registration system
Simplified MoE Integration: New models only require adding a single configuration entry to register contextual or permanent MoE calibration
Backward Compatibility: Deprecation of replace_modules_for_calibration with warnings
Automatic Context Management: Removed manual calibrate_moe_context parameter - MoE context is handled automatically by pipelines

Test Plan

Unit tests for contextual MoE calibration with module restoration
Unit tests for permanent MoE calibration persistence
Integration tests with actual Qwen3, Llama4, and DeepSeek V3 models
Deprecation warning verification for legacy functions

Testing

✅ All unit tests pass
✅ Contextual and permanent calibration types working correctly
✅ Model structure correctly changed and restored inside/outside contexts
✅ Linting and type checking pass
✅ Backward compatibility verified with deprecation warnings

github-actions · 2025-09-22T20:49:05Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

sairampillai · 2025-09-24T16:39:51Z

@kylesayrs @dsikka Few clarifications:

I pushed a couple of commits without signing, how do you suggest I fix that?
I have deprecated the calibrate_moe_context parameter, do we want to plan how to phase it out?
I have tested using unit tests but without GPU (gpu poor), can you point me to best ways to test this change end-to-end?

brian-dellabetta · 2025-09-24T17:26:27Z

@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved

examples/multimodal_vision/llama4_example.py

dsikka · 2025-09-24T22:24:29Z

src/llmcompressor/pipelines/basic/pipeline.py

-
-            if dataset_args is not None and dataset_args.calibrate_moe_context:
-                moe_calibration_context(model, stack)
+            stack.enter_context(moe_calibration_context(model))


I dont think we want to do this for every case as not every model will be an MoE

This is implemented along with a quick check to see if a particular model is added to the experts replacement list in modeling/prepare.py as such here in modeling/prepare.py:

This would end the need for any parameters in DatasetArgs, simplifying MoE calibration. Do you think there would be overhead when we do stack.enter_context()? Do you recommend a better way to implement this?

I think it's fair to assume that a model is an MoE if its architecture is listed in the dictionary MOE_EXPERTS_REPLACEMENTS, therefore entering the context in all cases is fine.

dsikka · 2025-09-24T22:24:37Z

src/llmcompressor/pipelines/sequential/pipeline.py


-            if dataset_args.calibrate_moe_context:
-                moe_calibration_context(model, stack)
+            stack.enter_context(moe_calibration_context(model))


same as above

…illai/llm-compressor into moe_calibration_refactor

dsikka · 2025-10-01T00:46:04Z

@kylesayrs

kylesayrs

This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.

Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!

kylesayrs · 2025-10-01T21:39:24Z

src/llmcompressor/pipelines/basic/pipeline.py

-
-            if dataset_args is not None and dataset_args.calibrate_moe_context:
-                moe_calibration_context(model, stack)
+            stack.enter_context(moe_calibration_context(model))


I think it's fair to assume that a model is an MoE if its architecture is listed in the dictionary MOE_EXPERTS_REPLACEMENTS, therefore entering the context in all cases is fine.

kylesayrs · 2025-10-01T21:39:56Z

src/llmcompressor/pipelines/basic/pipeline.py

-
-            if dataset_args is not None and dataset_args.calibrate_moe_context:
-                moe_calibration_context(model, stack)
+            stack.enter_context(moe_calibration_context(model))


As a small nit, consider entering the moe context here. Entering the context before the pipeline call comes with some benefits

We no longer need to enter the context for each pipeline explicitly

We no longer need to enter, exit, and reenter in cases where multiple pipelines are composed (independent pipeline)

with moe_calibration_context(self.model): pipeline(...)

Good idea! I will try this change and test it out

kylesayrs · 2025-10-01T21:41:48Z

src/llmcompressor/modeling/prepare.py

    calibrate_all_experts: bool = True,
 ) -> PreTrainedModel:
+    # This function is deprecated. Use moe_calibration_context instead.
+    warnings.warn(


nit: Use compressed_tensors/deprecated

kylesayrs · 2025-10-01T21:43:55Z

src/llmcompressor/modeling/prepare.py

+    moe_context.apply(model, calibrate_all_experts)
+
+    try:
+        yield model


It seems like this yield value is never used. Do you still want to include it?

kylesayrs · 2025-10-01T22:04:49Z

src/llmcompressor/args/dataset_arguments.py

        default=512,
        metadata={"help": "Number of samples to use for one-shot calibration"},
    )
-    calibrate_moe_context: bool = field(


Can we add a dataset argument called moe_calibrate_all_experts which defaults to True?

kylesayrs · 2025-10-01T22:07:49Z

src/llmcompressor/modeling/prepare.py

+    "Qwen3MoeForCausalLM": MoEModelConfig(
+        calibration_type=MoECalibrationType.CONTEXTUAL,
+        target_class_name="Qwen3MoeDecoderLayer",
+        target_attribute="mlp",


Ideally we shouldn't need to ever target attributes, only target parent modules. For example, only targeting Qwen3MoeMLP

I'm actually unfamiliar, do we need to specify target_attributes?

kylesayrs · 2025-10-01T22:26:51Z

src/llmcompressor/modeling/moe_context.py

+
+
+# Registry for MoE calibrations
+_MOE_CONTEXTS: Dict[str, MoECalibrationContext] = {}


I think having _MOE_CONTEXTS, replacements, and MOE_EXPERTS_REPLACEMENTS is more than we need.

Additionally, mapping from model class names to replacement modules will run into issues for nested modules (for example, if an MoE model is inside of a nested model architecture, we will not be able to replace its modules).

Ideally we could simplify this to just one dictionary which maps module class names to their contexts. We can also use ABC to define from_original and restore interfaces. For example:

T = TypeVar("T") class MoECalibrationModule(ABC): is_permanent = False @classmethod def from_original(self, original: T) -> Self: # converts from original module to moe calibration module ... def restore(self) -> T: # might include repacking weights/qparams into 3d structure, or not ...

from llmcompressor.modeling.deepseek_v3 import CalibrationDeepseekV3MoE moe_modules: Dict[str, MoECalibrationModule] = { "DeepseekV3MoE": CalibrationDeepseekV3MoE, ... } def moe_calibration_context(model, dataset_args): for name, module in model.named_modules(): if module.__class__.__name__ in moe_modules: replacement = moe_modules[module.__class__.__name__].from_original(module, dataset_args.calibrate_all_experts) model.set_submodule(name, replacement) # ... maybe some logging about if/which modules were replaced # maybe some logging about if the structure will stay `is_permanent` yield for name, module in model.named_modules(): if isinstance(module, MoECalibrationModule): original = module.restore() model.set_submodule(name, original)

While keying by module class names rather than by model class names requires iterating through all of the modules in the model in order to check, I think this is minimal overhead and acceptable in order to support nested models. What do you think @dsikka?

kylesayrs · 2025-10-01T22:28:09Z

src/llmcompressor/modeling/moe_context.py

+            self.calibration_type == MoECalibrationType.CONTEXTUAL
+            and self.target_attribute is None
+        ):
+            raise ValueError("target_attribute is required for contextual calibration")


Is this coupling necessary? Why do we need to specify an attribute at all?

kylesayrs · 2025-10-01T22:29:36Z

src/llmcompressor/modeling/moe_context.py

+    return update_function
+
+
+def register_moe_model(model_class_name: str, config: MoEModelConfig):


This might be more abstraction than is necessary

sairampillai added 2 commits September 23, 2025 02:15

Deprecate replace_modules_for_calibration

0994db6

Refactor MoECalibrationContext

c7a9943

Use moe_context in pipeline and by default and add tests

e374313

sairampillai added 5 commits September 24, 2025 22:35

Update documentation

7fefaac

Deprecate replace_modules_for_calibration

858a0f6

Refactor MoECalibrationContext

ef8e0b7

Use moe_context in pipeline and by default and add tests

b04f957

Update documentation

ba42881

sairampillai force-pushed the moe_calibration_refactor branch from 7fefaac to ba42881 Compare September 24, 2025 17:07

sairampillai marked this pull request as ready for review September 24, 2025 17:07

brian-dellabetta requested review from kylesayrs and dsikka September 24, 2025 17:25

dsikka reviewed Sep 24, 2025

View reviewed changes

sairampillai added 10 commits September 26, 2025 18:11

Update docstrings to fix review comments

6a38a0f

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

61c5141

…illai/llm-compressor into moe_calibration_refactor

Fix style and quality checks

9a131cb

Deprecate replace_modules_for_calibration

4520421

Refactor MoECalibrationContext

1c15741

Use moe_context in pipeline and by default and add tests

d8fecb9

Update documentation

d099bf3

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

1bb9f62

…illai/llm-compressor into moe_calibration_refactor

Fix style and quality checks

d4a6a11

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

b19e3b6

…illai/llm-compressor into moe_calibration_refactor

sairampillai requested a review from dsikka September 26, 2025 13:26

sairampillai changed the title ~~Moe calibration refactor~~ [MoE Calibration] Simplify MoE calibration logic application and contribution Sep 26, 2025

sairampillai changed the title ~~[MoE Calibration] Simplify MoE calibration logic application and contribution~~ [MoE Calibration] Simplify MoE calibration interface Sep 26, 2025

kylesayrs requested changes Oct 1, 2025

View reviewed changes



		# Registry for MoE calibrations
		_MOE_CONTEXTS: Dict[str, MoECalibrationContext] = {}

		return update_function


		def register_moe_model(model_class_name: str, config: MoEModelConfig):

[MoE Calibration] Simplify MoE calibration interface #1851

Are you sure you want to change the base?

[MoE Calibration] Simplify MoE calibration interface #1851

Conversation

sairampillai commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration

Summary

Problem

Relevant Issues

Solution

Test Plan

Testing

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

sairampillai commented Sep 24, 2025

Uh oh!

brian-dellabetta commented Sep 24, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sairampillai Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsikka Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsikka commented Oct 1, 2025

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sairampillai commented Sep 22, 2025 •

edited

Loading

Introduce standardized MoE calibration interface and deprecate legacy `replace_modules_for_calibration`

sairampillai Sep 26, 2025 •

edited

Loading

dsikka Sep 24, 2025 •

edited

Loading

kylesayrs Oct 1, 2025 •

edited

Loading