Skip to content

minimax m2.5 eplb bugfix#21205

Open
DaZhUUU wants to merge 3 commits intosgl-project:mainfrom
bytedance-iaas:minimax_m25_eplb_bugfix
Open

minimax m2.5 eplb bugfix#21205
DaZhUUU wants to merge 3 commits intosgl-project:mainfrom
bytedance-iaas:minimax_m25_eplb_bugfix

Conversation

@DaZhUUU
Copy link

@DaZhUUU DaZhUUU commented Mar 23, 2026

Motivation

fix eplb bug for minimax-m2.5

[2026-03-23 03:19:30 TP2 EP2] Resetting ExpertDistributionRecorder...
[2026-03-23 03:19:31 TP5 EP5] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3315, in run_scheduler_process
dispatch_event_loop(scheduler)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3213, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_decode()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/disaggregation/decode.py", line 1001, in event_loop_overlap_disagg_decode
batch_result = self.run_batch(batch)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2372, in run_batch
batch_result = self.model_worker.forward_batch_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/tp_worker.py", line 467, in forward_batch_generation
out = self.model_runner.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/model_executor/model_runner.py", line 2455, in forward
self.eplb_manager.on_forward_pass_end()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 42, in on_forward_pass_end
next(self._main_generator)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 50, in _entrypoint
yield from self.rebalance()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 77, in rebalance
update_layer_ids_chunks = self._compute_update_layer_ids_chunks()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 110, in _compute_update_layer_ids_chunks
list(self._model_runner.model.routed_experts_weights_of_layer.keys())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in getattr
raise AttributeError(
AttributeError: 'MiniMaxM2ForCausalLM' object has no attribute 'routed_experts_weights_of_layer'

Modifications

Accuracy Tests

[2026-03-23 08:15:38] INFO: 192.168.0.236:37180 - "POST /generate HTTP/1.1" 200 OK
[2026-03-23 08:15:40 TP0 EP0] Decode batch, #running-req: 1, #token: 13144, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 11.77, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13184, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.34, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13224, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.37, #queue-req: 0
[2026-03-23 08:15:42 TP0 EP0] Decode batch, #running-req: 1, #token: 13264, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.35, #queue-req: 0
[2026-03-23 08:15:42] INFO: 192.168.0.236:60792 - "GET /health HTTP/1.1" 200 OK

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@DaZhUUU
Copy link
Author

DaZhUUU commented Mar 23, 2026

/gemini summary

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request resolves an issue that was causing errors during the initialization of the MiniMax M2.5 model when using expert parameter loading. The primary cause was an attempt to access a model attribute before it was properly initialized. The solution involves lazy initialization of the attribute and a method to retrieve MoE weights.

Highlights

  • Bug Fix: This PR addresses an AttributeError encountered with the 'MiniMaxM2ForCausalLM' model, specifically the missing 'routed_experts_weights_of_layer' attribute during the expert parameter loading (EPLB) process.
  • Lazy Initialization: The fix implements lazy initialization for 'routed_experts_weights_of_layer' to prevent the attribute from being accessed before it is available.
  • MoE Weights: The PR introduces a method to retrieve MoE weights, filtering out 'correction_bias' and applying 'filter_moe_weight_param_global_expert'.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity
  • The pull request was created to address a bug in the minimax-m2.5 model related to expert parameter loading (EPLB).
  • The error occurred because the 'routed_experts_weights_of_layer' attribute was being accessed before it was initialized.
  • The fix involves using a LazyValue to initialize 'routed_experts_weights_of_layer' only when it's accessed.
  • A 'get_moe_weights' method was added to retrieve the weights of the MoE layers.
  • The pull request includes updates to the MiniMaxM2ForCausalLM class to include the lazy initialization and property for accessing the routed experts weights.

@DaZhUUU DaZhUUU marked this pull request as draft March 24, 2026 06:27
@DaZhUUU DaZhUUU marked this pull request as ready for review March 24, 2026 06:28
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant