[vllm] fix: apply moe weight loader patch for standard wight loading (#5234)

zjchenn · web-flow · commit a5087cb9e91d · 2026-02-09T10:49:22.000+08:00
### What does this PR do? After async rollout engine refactor, attempting to load a moe model would fail with the following error because the weight loader patch cannot apply correctly: > AttributeError: 'Parameter' object has no attribute 'weight_loader' And the [spmd implement in v0.6.1](https://github.com/verl-project/verl/blob/release/v0.6.1/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py#L647-L651) as following: ```python async def update_weights(self, weights: Generator[tuple[str, torch.Tensor], None, None], **kwargs): ... else: from verl.utils.vllm.patch import patch_vllm_moe_model_weight_loader model = self.inference_engine.worker.model_runner.model patch_vllm_moe_model_weight_loader(model) model.load_weights(weights) ``` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`. Signed-off-by: zjchenn <zjchenn@gmail.com>
diff --git a/verl/workers/rollout/vllm_rollout/utils.py b/verl/workers/rollout/vllm_rollout/utils.py
@@ -210,6 +210,14 @@ def update_weights_from_ipc(self, peft_config: dict = None, base_sync_done=False
             buffer, shm = rebuild_shared_memory(shm_name, shm_size, dtype=torch.uint8)
         socket.send(b"")
 
+        use_standard_weight_load = not (peft_config and base_sync_done) and not is_fp8_model(
+            self.model_runner.vllm_config
+        )
+
+        # Re-apply here because async IPC weight sync can happen long after init and lose MoE weight_loader attrs.
+        if use_standard_weight_load:
+            patch_vllm_moe_model_weight_loader(self.model_runner.model)
+
         # receive bucket and update weights
         while True:
             metadata = socket.recv_pyobj()