[hardware] feat: Auto set device_name to npu for Ascend NPU (verl-project#4489)

FightingZhen · web-flow · commit 392791bdd826 · 2025-12-15T09:57:49.000+08:00
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. For Ascend NPU, **maintaining consistency between the currently developed test cases and the GPU test cases in the verl community is crucial for ongoing maintenance.** However, the scripts for NPU and GPU differ in their handling of the trainer.device parameter. **Heavy reliance on this parameter forces the NPU side to separately archive corresponding test scripts, making it impossible to directly reuse existing GPU scripts or future additions.** To address this, this PR plans to **automatically set device_name to `npu` on Ascend NPU devices.** Following entrances are equiped with this mechenism: - verl.trainer.main_ppo - verl.trainer.fsdp_sft_trainer - recipe.dapo.main_dapo - recipe.transfer_queue.main_ppo - recipe.one_step_off_policy.main_ppo - recipe.r1_ascend.main_ppo ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Not related. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Not related. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
diff --git a/docs/ascend_tutorial/ascend_quick_start.rst b/docs/ascend_tutorial/ascend_quick_start.rst
@@ -1,10 +1,17 @@
 Ascend Quickstart
 ===================================
 
-Last updated: 12/4/2025.
+Last updated: 12/11/2025.
 
 我们在 verl 上增加对华为昇腾设备的支持。
 
+
+关键更新
+----------------------------------
+
+2025/12/11：verl 存量场景目前支持自动识别 NPU 设备类型， GPU 脚本在昇腾上运行，原则上不再需要显式设置 trainer.device=npu 参数，新增特性通过设置 trainer.device 仍可优先使用，逐步适配自动识别能力。
+
+
 硬件支持
 -----------------------------------
 
@@ -213,8 +220,7 @@ verl 中昇腾暂不支持生态库如下：
             trainer.nnodes=1 \
             trainer.save_freq=-1 \
             trainer.test_freq=5 \
-            trainer.total_epochs=1 \
-            trainer.device=npu $@
+            trainer.total_epochs=1 $@
 
 
 算法支持现状
diff --git a/examples/grpo_trainer/run_qwen2_5_32b_grpo_npu.sh b/examples/grpo_trainer/run_qwen2_5_32b_grpo_npu.sh
@@ -37,5 +37,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.nnodes=2 \
     trainer.save_freq=-1 \
     trainer.test_freq=10 \
-    trainer.total_epochs=15 \
-    trainer.device=npu $@
+    trainer.total_epochs=15 $@
diff --git a/examples/grpo_trainer/run_qwen2_5_7b_grpo_discrete_prof_npu.sh b/examples/grpo_trainer/run_qwen2_5_7b_grpo_discrete_prof_npu.sh
@@ -65,7 +65,6 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=5 \
     trainer.total_epochs=5 \
-    trainer.device=npu \
     global_profiler.tool=npu \
     global_profiler.steps=$PROFILE_STEPS \
     global_profiler.save_path=$SAVE_PATH
diff --git a/examples/grpo_trainer/run_qwen2_5_7b_grpo_e2e_prof_npu.sh b/examples/grpo_trainer/run_qwen2_5_7b_grpo_e2e_prof_npu.sh
@@ -62,7 +62,6 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=5 \
     trainer.total_epochs=5 \
-    trainer.device=npu \
     global_profiler.tool=npu \
     global_profiler.steps=$PROFILE_STEPS \
     global_profiler.save_path=$SAVE_PATH
diff --git a/examples/grpo_trainer/run_qwen2_5_7b_grpo_npu.sh b/examples/grpo_trainer/run_qwen2_5_7b_grpo_npu.sh
@@ -38,5 +38,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.nnodes=1 \
     trainer.save_freq=-1 \
     trainer.test_freq=5 \
-    trainer.total_epochs=5 \
-    trainer.device=npu $@
+    trainer.total_epochs=5 $@
diff --git a/examples/grpo_trainer/run_qwen2_5_vl_32b_npu.sh b/examples/grpo_trainer/run_qwen2_5_vl_32b_npu.sh
@@ -48,5 +48,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.nnodes=2 \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
-    trainer.total_epochs=15 \
-    trainer.device=npu $@
+    trainer.total_epochs=15 $@
diff --git a/examples/grpo_trainer/run_qwen2_5_vl_3b_npu.sh b/examples/grpo_trainer/run_qwen2_5_vl_3b_npu.sh
@@ -48,5 +48,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.nnodes=1 \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
-    trainer.total_epochs=15 \
-    trainer.device=npu $@
+    trainer.total_epochs=15 $@
diff --git a/examples/grpo_trainer/run_qwen2_5_vl_7b_npu.sh b/examples/grpo_trainer/run_qwen2_5_vl_7b_npu.sh
@@ -48,5 +48,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.nnodes=1 \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
-    trainer.total_epochs=15 \
-    trainer.device=npu $@
+    trainer.total_epochs=15 $@
diff --git a/examples/grpo_trainer/run_qwen3-32b_npu.sh b/examples/grpo_trainer/run_qwen3-32b_npu.sh
@@ -55,5 +55,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.resume_from_path=checkpoints/ \
     trainer.save_freq=500 \
     trainer.test_freq=50 \
-    trainer.total_epochs=50 \
-    trainer.device=npu $@
+    trainer.total_epochs=50 $@
diff --git a/examples/grpo_trainer/run_qwen3-8b_npu.sh b/examples/grpo_trainer/run_qwen3-8b_npu.sh
@@ -47,7 +47,6 @@ python3 -m verl.trainer.main_ppo \
     trainer.n_gpus_per_node=8 \
     trainer.nnodes=1 \
     trainer.default_local_dir=${CKPTS_DIR} \
-    trainer.device=npu \
     trainer.resume_mode=auto \
     actor_rollout_ref.actor.fsdp_config.forward_prefetch=True \
     actor_rollout_ref.ref.fsdp_config.forward_prefetch=True \
diff --git a/examples/grpo_trainer/run_qwen3_4b_grpo_vllm_1k_npu.sh b/examples/grpo_trainer/run_qwen3_4b_grpo_vllm_1k_npu.sh
@@ -78,5 +78,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=5 \
     trainer.total_epochs=15 \
-    trainer.val_before_train=False \
-    trainer.device=npu 2>&1 | tee ${LOG_PATH}
+    trainer.val_before_train=False 2>&1 | tee ${LOG_PATH}
diff --git a/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh b/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
@@ -67,5 +67,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.total_epochs=5 \
     trainer.default_local_dir="${CKPTS_DIR}" \
     actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \
-    actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} \
-    trainer.device=npu $@
+    actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} $@
diff --git a/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh b/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh
@@ -67,5 +67,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.total_epochs=5 \
     trainer.default_local_dir="${CKPTS_DIR}" \
     actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \
-    actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} \
-    trainer.device=npu $@
+    actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} $@
diff --git a/examples/ppo_trainer/run_qwen3-8b_npu.sh b/examples/ppo_trainer/run_qwen3-8b_npu.sh
@@ -49,7 +49,6 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=20 \
     trainer.test_freq=-1 \
     trainer.val_before_train=False \
-    trainer.device=npu \
     trainer.max_actor_ckpt_to_keep=1 \
     trainer.max_critic_ckpt_to_keep=1 \
     trainer.total_training_steps=100 $@
diff --git a/examples/sft/gsm8k/run_qwen3_8b_sft_peft_sp2_npu.sh b/examples/sft/gsm8k/run_qwen3_8b_sft_peft_sp2_npu.sh
@@ -32,5 +32,4 @@ torchrun --standalone --nnodes=1 --nproc_per_node=$nproc_per_node \
     model.target_modules=all-linear \
     model.strategy=fsdp \
     ulysses_sequence_parallel_size=2 \
-    use_remove_padding=true \
-    trainer.device=npu
+    use_remove_padding=true
diff --git a/examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_vllm_fsdp.sh b/examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_vllm_fsdp.sh
@@ -42,7 +42,6 @@ python3 -m verl.trainer.main_ppo \
     trainer.critic_warmup=0 \
     trainer.project_name='gsm8k_async_rl' \
     trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16' \
-    trainer.device=npu \
     trainer.n_gpus_per_node=16 \
     trainer.nnodes=1 \
     trainer.save_freq=-1 \
diff --git a/recipe/dapo/main_dapo.py b/recipe/dapo/main_dapo.py
@@ -24,13 +24,16 @@
 
 from verl.trainer.constants_ppo import get_ppo_ray_runtime_env
 from verl.trainer.ppo.reward import load_reward_manager
-from verl.utils.device import is_cuda_available
+from verl.utils.device import auto_set_ascend_device_name, is_cuda_available
 
 from .dapo_ray_trainer import RayDAPOTrainer
 
 
 @hydra.main(config_path="config", config_name="dapo_trainer", version_base=None)
 def main(config):
+    # Automatically set `config.trainer.device = npu` when running on Ascend NPU.
+    auto_set_ascend_device_name(config)
+
     run_ppo(config)
 
 
diff --git a/recipe/dapo/run_dapo_qwen2.5_32b_npu.sh b/recipe/dapo/run_dapo_qwen2.5_32b_npu.sh
@@ -135,7 +135,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
     trainer.save_freq=20 \
     trainer.total_epochs=1 \
     trainer.default_local_dir="${CKPTS_DIR}" \
-    trainer.device=npu \
     trainer.resume_mode=auto \
     actor_rollout_ref.actor.fsdp_config.forward_prefetch=True \
     actor_rollout_ref.ref.fsdp_config.forward_prefetch=True \
diff --git a/recipe/dapo/run_dapo_qwen2.5_7b_npu.sh b/recipe/dapo/run_dapo_qwen2.5_7b_npu.sh
@@ -133,7 +133,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
     trainer.save_freq=20 \
     trainer.total_epochs=1 \
     trainer.default_local_dir="${CKPTS_DIR}" \
-    trainer.device=npu \
     trainer.resume_mode=auto \
     actor_rollout_ref.actor.entropy_checkpointing=True \
     actor_rollout_ref.ref.entropy_checkpointing=True \
diff --git a/recipe/dapo/run_dapo_qwen3_14b_base_npu.sh b/recipe/dapo/run_dapo_qwen3_14b_base_npu.sh
@@ -136,5 +136,4 @@ ray job submit --runtime-env="${RUNTIME_ENV}" \
     actor_rollout_ref.actor.entropy_checkpointing=True \
     actor_rollout_ref.ref.entropy_checkpointing=True \
     actor_rollout_ref.actor.fsdp_config.forward_prefetch=True \
-    actor_rollout_ref.ref.fsdp_config.forward_prefetch=True \
-    trainer.device=npu
+    actor_rollout_ref.ref.fsdp_config.forward_prefetch=True
diff --git a/recipe/dapo/run_dapo_qwen3_8b_base_npu.sh b/recipe/dapo/run_dapo_qwen3_8b_base_npu.sh
@@ -135,5 +135,4 @@ ray job submit --runtime-env="${RUNTIME_ENV}" \
     actor_rollout_ref.actor.entropy_checkpointing=True \
     actor_rollout_ref.ref.entropy_checkpointing=True \
     actor_rollout_ref.actor.fsdp_config.forward_prefetch=True \
-    actor_rollout_ref.ref.fsdp_config.forward_prefetch=True \
-    trainer.device=npu
+    actor_rollout_ref.ref.fsdp_config.forward_prefetch=True
diff --git a/recipe/dapo/run_dapo_qwen3_moe_30b_base_fsdp_npu.sh b/recipe/dapo/run_dapo_qwen3_moe_30b_base_fsdp_npu.sh
@@ -138,7 +138,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
     trainer.test_freq=5 \
     trainer.save_freq=-1 \
     trainer.total_epochs=1 \
-    trainer.device="npu" \
     actor_rollout_ref.actor.use_torch_compile=False \
     actor_rollout_ref.ref.use_torch_compile=False 
    
diff --git a/recipe/dapo/run_dapo_qwen3_moe_30b_megatron_npu.sh b/recipe/dapo/run_dapo_qwen3_moe_30b_megatron_npu.sh
@@ -160,7 +160,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
     trainer.save_freq=-1 \
     trainer.total_epochs=1 \
     trainer.default_local_dir="${CKPTS_DIR}" \
-    trainer.device="npu" \
     actor_rollout_ref.nccl_timeout=14400 \
     actor_rollout_ref.actor.use_torch_compile=False \
     actor_rollout_ref.ref.use_torch_compile=False \
diff --git a/recipe/fully_async_policy/fully_async_rollouter.py b/recipe/fully_async_policy/fully_async_rollouter.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
 import asyncio
 import os
 import time
diff --git a/recipe/one_step_off_policy/README.md b/recipe/one_step_off_policy/README.md
@@ -297,9 +297,6 @@ python3 -m recipe.one_step_off_policy.async_main_ppo \
    > - When `trainer.n_gpus_per_node + rollout.n_gpus_per_node > physical_gpus_per_node`,
        > the required node count is `trainer.nnodes + rollout.nnodes`
 
-3. **Ascend NPU Configuration**
-    If you are using Ascend NPU devices, add the following parameter:
-    - `trainer.device=npu`
 
 ## Functional Support
 
diff --git a/recipe/one_step_off_policy/main_ppo.py b/recipe/one_step_off_policy/main_ppo.py
@@ -30,6 +30,7 @@
 from verl.trainer.ppo.reward import load_reward_manager
 from verl.trainer.ppo.utils import Role, need_reference_policy
 from verl.utils.config import validate_config
+from verl.utils.device import auto_set_ascend_device_name
 
 
 def create_resource_pool_manager(config, roles: list) -> ResourcePoolManager:
@@ -222,6 +223,10 @@ def main(config):
     from verl.trainer.main_ppo import run_ppo
 
     start_time = time()
+
+    # Automatically set `config.trainer.device = npu` when running on Ascend NPU.
+    auto_set_ascend_device_name(config)
+
     run_ppo(config, task_runner_class=OneStepTaskRunner)
     print(f"total time: {time() - start_time:.2f} seconds")
 
diff --git a/recipe/one_step_off_policy/shell/grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh b/recipe/one_step_off_policy/shell/grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh
@@ -87,7 +87,6 @@ python3 -m recipe.one_step_off_policy.main_ppo \
     trainer.save_freq=10 \
     trainer.test_freq=-1 \
     trainer.total_epochs=15 \
-    trainer.device=npu \
     trainer.resume_mode=auto \
     trainer.nnodes="${NNODES}" \
     trainer.n_gpus_per_node="${n_gpus_training}" \
diff --git a/recipe/r1_ascend/main_ppo.py b/recipe/r1_ascend/main_ppo.py
@@ -27,7 +27,7 @@
 
 from verl.trainer.constants_ppo import get_ppo_ray_runtime_env
 from verl.trainer.main_ppo import TaskRunner as TaskRunnerBase
-from verl.utils.device import is_cuda_available
+from verl.utils.device import auto_set_ascend_device_name, is_cuda_available
 
 logger = logging.getLogger(__file__)
 logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))
@@ -40,6 +40,9 @@ def main(config):
     Args:
         config_dict: Hydra configuration dictionary containing training parameters.
     """
+    # Automatically set `config.trainer.device = npu` when running on Ascend NPU.
+    auto_set_ascend_device_name(config)
+
     run_ppo(config)
 
 
diff --git a/recipe/r1_ascend/run_deepseekv3_671b_grpo_megatron_npu.sh b/recipe/r1_ascend/run_deepseekv3_671b_grpo_megatron_npu.sh
@@ -105,7 +105,6 @@ python3 -m recipe.r1_ascend.main_ppo \
     trainer.test_freq=5 \
     trainer.save_freq=-1 \
     trainer.total_epochs=1 \
-    trainer.device="npu" \
     +actor_rollout_ref.actor.megatron.override_transformer_config.multi_head_latent_attention=True \
     +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True \
     +actor_rollout_ref.actor.megatron.override_transformer_config.pipeline_num_transformer_layers=[[6],[8],[8],[8],[8],[8],[8],[7]] \
diff --git a/recipe/transfer_queue/main_ppo.py b/recipe/transfer_queue/main_ppo.py
@@ -33,7 +33,7 @@
 from verl.trainer.ppo.reward import load_reward_manager
 from verl.trainer.ppo.utils import need_critic, need_reference_policy
 from verl.utils.config import validate_config
-from verl.utils.device import is_cuda_available
+from verl.utils.device import auto_set_ascend_device_name, is_cuda_available
 
 from .ray_trainer import RayPPOTrainer
 
@@ -45,6 +45,9 @@ def main(config):
     Args:
         config_dict: Hydra configuration dictionary containing training parameters.
     """
+    # Automatically set `config.trainer.device = npu` when running on Ascend NPU.
+    auto_set_ascend_device_name(config)
+
     run_ppo(config)
 
 
diff --git a/tests/special_e2e/run_transferqueue.sh b/tests/special_e2e/run_transferqueue.sh
@@ -63,8 +63,6 @@ echo "Running transferqueue with ${ACTOR_STRATEGY} strategy"
 echo "Total GPUs: ${NUM_GPUS}"
 
 # Common parameters for both FSDP and Megatron
-# For Ascend NPU, please add
-# trainer.device=npu
 common_params=(
     data.train_files="${HOME}/data/gsm8k/train.parquet"
     data.val_files="${HOME}/data/gsm8k/test.parquet"
diff --git a/tests/special_npu/run_qwen2_5_05b_dapo.sh b/tests/special_npu/run_qwen2_5_05b_dapo.sh
@@ -91,5 +91,4 @@ python3 -m recipe.dapo.main_dapo \
     trainer.total_epochs=1 \
     trainer.resume_mode=disable \
     trainer.val_before_train=False \
-    trainer.total_training_steps=1 \
-    trainer.device=npu $@
+    trainer.total_training_steps=1 $@
diff --git a/tests/special_npu/run_qwen2_5_05b_grpo.sh b/tests/special_npu/run_qwen2_5_05b_grpo.sh
@@ -44,5 +44,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
     trainer.total_epochs=1 \
-    trainer.total_training_steps=1 \
-    trainer.device=npu $@
+    trainer.total_training_steps=1 $@
diff --git a/tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh b/tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh
@@ -65,5 +65,4 @@ python3 -m verl.trainer.main_ppo --config-path=config \
     trainer.test_freq=-1 \
     trainer.total_epochs=1 \
     trainer.total_training_steps=1 \
-    trainer.device=npu \
     +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True $@
diff --git a/tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh b/tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh
@@ -27,7 +27,6 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 \
     model.target_modules=all-linear \
     model.strategy=fsdp \
     ulysses_sequence_parallel_size=2 \
-    use_remove_padding=true \
-    trainer.device=npu
+    use_remove_padding=true
 
 rm -rf ./outputs ./save_ckpts
diff --git a/tests/special_npu/run_qwen2_5_vl_3b_npu.sh b/tests/special_npu/run_qwen2_5_vl_3b_npu.sh
@@ -54,5 +54,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
     trainer.total_epochs=1 \
-    trainer.total_training_steps=1 \
-    trainer.device=npu $@
+    trainer.total_training_steps=1 $@
diff --git a/tests/special_npu/run_qwen3_06b_ppo.sh b/tests/special_npu/run_qwen3_06b_ppo.sh
@@ -49,5 +49,4 @@ python3 -m verl.trainer.main_ppo \
     trainer.save_freq=-1 \
     trainer.test_freq=-1 \
     trainer.total_epochs=1 \
-    trainer.total_training_steps=1 \
-    trainer.device=npu $@
+    trainer.total_training_steps=1 $@
diff --git a/tests/special_npu/run_qwen3_30b_dapo_mindspeed.sh b/tests/special_npu/run_qwen3_30b_dapo_mindspeed.sh
@@ -125,7 +125,6 @@ python3 -m recipe.dapo.main_dapo \
     trainer.test_freq=-1 \
     trainer.total_epochs=1 \
     trainer.total_training_steps=1 \
-    trainer.device=npu \
     actor_rollout_ref.actor.use_torch_compile=False \
     actor_rollout_ref.ref.use_torch_compile=False \
     +actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True $@
diff --git a/verl/trainer/fsdp_sft_trainer.py b/verl/trainer/fsdp_sft_trainer.py
diff --git a/verl/trainer/main_generation.py b/verl/trainer/main_generation.py
diff --git a/verl/trainer/main_ppo.py b/verl/trainer/main_ppo.py
diff --git a/verl/utils/device.py b/verl/utils/device.py