Conversation
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
|
/gemini summary |
Summary of ChangesThis pull request resolves an issue that was causing errors during the initialization of the MiniMax M2.5 model when using expert parameter loading. The primary cause was an attempt to access a model attribute before it was properly initialized. The solution involves lazy initialization of the attribute and a method to retrieve MoE weights. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
|
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Motivation
fix eplb bug for minimax-m2.5
[2026-03-23 03:19:30 TP2 EP2] Resetting ExpertDistributionRecorder...
[2026-03-23 03:19:31 TP5 EP5] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3315, in run_scheduler_process
dispatch_event_loop(scheduler)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 3213, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_decode()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/disaggregation/decode.py", line 1001, in event_loop_overlap_disagg_decode
batch_result = self.run_batch(batch)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2372, in run_batch
batch_result = self.model_worker.forward_batch_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/tp_worker.py", line 467, in forward_batch_generation
out = self.model_runner.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/model_executor/model_runner.py", line 2455, in forward
self.eplb_manager.on_forward_pass_end()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 42, in on_forward_pass_end
next(self._main_generator)
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 50, in _entrypoint
yield from self.rebalance()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 77, in rebalance
update_layer_ids_chunks = self._compute_update_layer_ids_chunks()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/eplb/eplb_manager.py", line 110, in _compute_update_layer_ids_chunks
list(self._model_runner.model.routed_experts_weights_of_layer.keys())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in getattr
raise AttributeError(
AttributeError: 'MiniMaxM2ForCausalLM' object has no attribute 'routed_experts_weights_of_layer'
Modifications
Accuracy Tests
[2026-03-23 08:15:38] INFO: 192.168.0.236:37180 - "POST /generate HTTP/1.1" 200 OK
[2026-03-23 08:15:40 TP0 EP0] Decode batch, #running-req: 1, #token: 13144, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 11.77, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13184, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.34, #queue-req: 0
[2026-03-23 08:15:41 TP0 EP0] Decode batch, #running-req: 1, #token: 13224, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.37, #queue-req: 0
[2026-03-23 08:15:42 TP0 EP0] Decode batch, #running-req: 1, #token: 13264, token usage: 0.00, pre-allocated usage: 0.00, #prealloc-req: 0, #transfer-req: 0, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 77.35, #queue-req: 0
[2026-03-23 08:15:42] INFO: 192.168.0.236:60792 - "GET /health HTTP/1.1" 200 OK
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci