minimax m2.5 eplb bugfix by DaZhUUU · Pull Request #21205 · sgl-project/sglang

DaZhUUU · 2026-03-23T10:25:42Z

Motivation

fix eplb bug for minimax-m2.5

[2026-03-23 03:19:30 TP2 EP2] Resetting ExpertDistributionRecorder...
[2026-03-23 03:19:31 TP5 EP5] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3315, in run_scheduler_process
dispatch_event_loop(scheduler)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3213, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_decode()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/disaggregation/decode.py", line 1001, in event_loop_overlap_disagg_decode
batch_result = self.run_batch(batch)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2372, in run_batch
batch_result = self.model_worker.forward_batch_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/tp_worker.py", line 467, in forward_batch_generation
out = self.model_runner.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/model_executor/model_runner.py", line 2455, in forward
self.eplb_manager.on_forward_pass_end()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 42, in on_forward_pass_end
next(self._main_generator)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 50, in _entrypoint
yield from self.rebalance()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 77, in rebalance
update_layer_ids_chunks = self._compute_update_layer_ids_chunks()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 110, in _compute_update_layer_ids_chunks
list(self._model_runner.model.routed_experts_weights_of_layer.keys())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in getattr
raise AttributeError(
AttributeError: 'MiniMaxM2ForCausalLM' object has no attribute 'routed_experts_weights_of_layer'

Modifications

Accuracy Tests

[2026-03-23 08:15:38] INFO: 192.168.0.236:37180 - "POST /generate HTTP/1.1" 200 OK
[2026-03-23 08:15:40 TP0 EP0] Decode batch, #running-req: 1, #token: 13144, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 11.77, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13184, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.34, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13224, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.37, #queue-req: 0
[2026-03-23 08:15:42 TP0 EP0] Decode batch, #running-req: 1, #token: 13264, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.35, #queue-req: 0
[2026-03-23 08:15:42] INFO: 192.168.0.236:60792 - "GET /health HTTP/1.1" 200 OK

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-23T10:29:48Z

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

DaZhUUU · 2026-03-23T11:01:06Z

/gemini summary

gemini-code-assist · 2026-03-23T11:02:19Z

Summary of Changes

This pull request resolves an issue that was causing errors during the initialization of the MiniMax M2.5 model when using expert parameter loading. The primary cause was an attempt to access a model attribute before it was properly initialized. The solution involves lazy initialization of the attribute and a method to retrieve MoE weights.

Highlights

Bug Fix: This PR addresses an AttributeError encountered with the 'MiniMaxM2ForCausalLM' model, specifically the missing 'routed_experts_weights_of_layer' attribute during the expert parameter loading (EPLB) process.
Lazy Initialization: The fix implements lazy initialization for 'routed_experts_weights_of_layer' to prevent the attribute from being accessed before it is available.
MoE Weights: The PR introduces a method to retrieve MoE weights, filtering out 'correction_bias' and applying 'filter_moe_weight_param_global_expert'.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity

The pull request was created to address a bug in the minimax-m2.5 model related to expert parameter loading (EPLB).
The error occurred because the 'routed_experts_weights_of_layer' attribute was being accessed before it was initialized.
The fix involves using a LazyValue to initialize 'routed_experts_weights_of_layer' only when it's accessed.
A 'get_moe_weights' method was added to retrieve the weights of the MoE layers.
The pull request includes updates to the MiniMaxM2ForCausalLM class to include the lazy initialization and property for accessing the routed experts weights.

gemini-code-assist · 2026-03-24T06:28:13Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

minimax m2.5 eplb bugfix

64d3435

DaZhUUU marked this pull request as draft March 24, 2026 06:27

DaZhUUU marked this pull request as ready for review March 24, 2026 06:28

zhujunyu added 2 commits March 25, 2026 10:43

lint fix

a9ab3cf

lint fix 2

028981c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimax m2.5 eplb bugfix#21205

minimax m2.5 eplb bugfix#21205
DaZhUUU wants to merge 3 commits intosgl-project:mainfrom
bytedance-iaas:minimax_m25_eplb_bugfix

DaZhUUU commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

DaZhUUU commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DaZhUUU commented Mar 23, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

DaZhUUU commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant