[rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios (verl-project#4783)

vyomakesh0728 · vyomakesh0728 · commit fb7686cc8010 · 2026-01-08T21:12:31.000+08:00
### What does this PR do? Fix the accuracy issue in verl + vllm-ascend dp+ep+tp+server scenarios, issue:vllm-project/vllm-ascend#5544 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested GRPO on local NPU host: <img width="1047" height="117" alt="58274edd-d0d3-454c-8e39-3188f6f19e71" src="https://github.com/user-attachments/assets/dee7bf2f-6faf-4f44-a8b3-64670d5b1e10" /> ### Design & Code Changes Root cause analysis: currently, the version of Verl + Ascend NPU + vllm-ascend is [v0.11.0](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html). In the vllm-ascend v0.11.0 code, the all2all backend (flashinfer_all2allv) is selected and updated to the vllm worker environment. However, verl's ExternalZeroMQDistributedExecutor does not pass this environment to the vllm worker processes like vllm's [RayDistributedExecutor](https://github.com/vllm-project/vllm/blob/0d4044edd85de30d7d4558aeea4d1e95c7c556d6/vllm/v1/executor/ray_executor.py#L337) backend does. Therefore, due to the all2all backend for vllm-ascend is wrong, and then there is a precision issue on vllm-ascend. Implementation: 1. In vLLMAsyncRollout, when initiating vllm work, if it's an NPU scenario, add the environment variables required by vllm-ascend. 2. Add vllm engine environment variables setting in rollout.yaml, supports setting by user. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: FightingZhen --------- Signed-off-by: leo-pony <nengjunma@outlook.com>
diff --git a/tests/special_npu/run_qwen2_5_05b_grpo.sh b/tests/special_npu/run_qwen2_5_05b_grpo.sh
@@ -1,5 +1,4 @@
 set -x
-export VLLM_ASCEND_ENABLE_NZ=0
 
 MODEL_ID=${MODEL_ID:-Qwen/Qwen2.5-0.5B-Instruct}
 MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}}
diff --git a/tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh b/tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh
@@ -1,5 +1,4 @@
 set -x
-export VLLM_ASCEND_ENABLE_NZ=0
 
 MODEL_ID=${MODEL_ID:-Qwen/Qwen2.5-0.5B-Instruct}
 MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}}
diff --git a/tests/special_npu/run_qwen2_5_vl_3b_npu.sh b/tests/special_npu/run_qwen2_5_vl_3b_npu.sh
@@ -1,5 +1,4 @@
 set -x
-export VLLM_ASCEND_ENABLE_NZ=0
 
 ENGINE=${1:-vllm}
 
diff --git a/tests/special_npu/run_qwen3_06b_ppo.sh b/tests/special_npu/run_qwen3_06b_ppo.sh
@@ -1,5 +1,4 @@
 set -x
-export VLLM_ASCEND_ENABLE_NZ=0
 
 MODEL_ID=${MODEL_ID:-Qwen/Qwen2.5-0.5B-Instruct}  # TODO: change to Qwen3-0.6B when CI server is ready
 MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}}
diff --git a/tests/special_npu/run_qwen3_30b_grpo_mindspeed.sh b/tests/special_npu/run_qwen3_30b_grpo_mindspeed.sh
@@ -1,7 +1,6 @@
 #!/usr/bin/env bash
 set -xeuo pipefail
 
-export VLLM_ASCEND_ENABLE_NZ=0
 
 MODEL_ID=${MODEL_ID:-Qwen/Qwen3-30B-A3B-Instruct-2507}
 MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}}
diff --git a/verl/experimental/one_step_off_policy/shell/grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh b/verl/experimental/one_step_off_policy/shell/grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh
@@ -1,7 +1,6 @@
 # The script has been validated on the Ascend Atlas 800T A3.
 set -x
 
-export VLLM_ASCEND_ENABLE_NZ=0
 export HCCL_EXEC_TIMEOUT=60000
 export HCCL_CONNECT_TIMEOUT=7200
 
diff --git a/verl/workers/rollout/vllm_rollout/vllm_rollout.py b/verl/workers/rollout/vllm_rollout/vllm_rollout.py
@@ -73,6 +73,8 @@
 logger = logging.getLogger(__file__)
 logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))
 
+VLLM_ASCEND_REQUIRED_ENV_VARS = {"VLLM_ALL2ALL_BACKEND": "flashinfer_all2allv", "VLLM_ASCEND_ENABLE_NZ": "0"}
+
 # TODO
 # 1. support pp in vllm
 # 2. passing tokenizer is not necessary? no encoding/decoding is happending here
@@ -177,6 +179,14 @@ async def _loop_forever(self):
 
     def _init_worker(self, all_kwargs: list[dict[str, Any]]):
         """Initialize worker engine."""
+        # TODO: For ascend NPU, when the corresponding vllm-ascend version is upgraded to v0.13.0,
+        # please remove the VLLM_ASCEND_REQUIRED_ENV_VARS variable replacement action.
+        # This is only a fix for vllm version < v0.13.0.
+        if is_npu_available:
+            for k in VLLM_ASCEND_REQUIRED_ENV_VARS:
+                if k not in os.environ:
+                    os.environ[k] = VLLM_ASCEND_REQUIRED_ENV_VARS[k]
+
         if not torch.distributed.is_initialized():
             initialize_global_process_group_ray()
         all_kwargs[0]["rank"] = int(os.environ["RANK"])

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,4 @@`
`1`	`1`	`set -x`
`2`		`-export VLLM_ASCEND_ENABLE_NZ=0`
`3`	`2`
`4`	`3`	`MODEL_ID=${MODEL_ID:-Qwen/Qwen2.5-0.5B-Instruct}`
`5`	`4`	`MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}}`