Skip to content

[Help Wanted] Refactor/ Clean up MoE calibration logic #1829

@kylesayrs

Description

@kylesayrs

Background

MoE models require special logic in order to calibrate because the majority of expert activations are gated to behind the MoE gating mechanism. Attempting to naively calibrate experts with the base huggingface model definition means that some activations will not receive enough samples in order to properly calibrate.

Another reason why MoEs are difficult to calibrate is that sometimes the huggingface model definition fuses all experts together into one weight. This one weight might be too large to fit into one GPU's memory, which is a limiting factor for memory-limited use cases.

The solution is to write specialized logic to replace MoE experts modules into individual expert modules which can be individually calibrated and offloaded.

At a high level, the routing logic is converted as follows:

if calibrate_all_experts:
   output += expert(x)[top_k_tokens] * weights[expert_index]
else:
   output += expert(x[top_k_tokens]) * weights[expert_index]

Right now, the logic for determining when and how module replacements happen is written in modeling/prepare.py. However, this logic is split between replace_modules_for_calibration and moe_calibration_context. This confusing logic makes it difficult for people to contribute new MoE model replacements.

NOTE: Some models such as Llama4 can be loaded in vllm in their replaced form, and therefore do not need to be restored after moe_calibration_context exit.

Goals

  • Make MoE model contribution as easy and standardized as possible
  • Standardize on moe_calibration_context (remove/deprecate replace_modules_for_calibration

Suggested task list

  • Remove/deprecate replace_modules_for_calibration
  • Remove/deprecate DatasetArgs.calibrate_moe_context (it should always be on)
  • Refactor moe_calibration_context to not require that the context stack is passed as argument
  • Create a standardized and simple interface that moe_calibration_context and the files in the modeling folder can use to easy contribute moe contexts
  • Add tests for changing and restoring model structure in and outside of the moe_calibration_context (you can skip downloading model weights for the tests, see these examples)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueA good first issue for users wanting to contribute

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions