Features that npu will focus on supporting in Q3 #2171

zheliuyu · 2025-06-24T07:10:32Z

zheliuyu
Jun 24, 2025

Unfinished tasks in Q2

Support profiling function
Support megatron/mindspeed worker (for npu, megatron≈mindspeed)
GRPO/DAPO performance optimization

Q2 roadmap: #900

New features in Q3

Native support FSDP2 worker
- vllm-ascend supports torch>=2.6 [Feature]: Request vllm-ascend to support torch_npu>=2.6 vllm-project/vllm-ascend#1390
- Accuracy aligned to FSDP1
torch.compile
SGLang? maybe

iWasOmen · 2025-06-24T08:13:52Z

iWasOmen
Jun 24, 2025

can we support pangu model in the future?

3 replies

zheliuyu Jun 24, 2025
Author

As long as pangu is an open source model, you only need to modify the actor_rollout_ref.model.path in GRPO/DAPO, and no additional adaptation work is required.

Stones1989 Dec 5, 2025

are you able to run pangu grpo succeefully

zheliuyu Dec 5, 2025
Author

I haven't tried running it.

FightingZhen · 2025-06-27T01:58:30Z

FightingZhen
Jun 27, 2025
Collaborator

zheliuyu Aug 17, 2025
Author

@FightingZhen Could you please annotate the features that are already supported. 😄

FightingZhen Aug 19, 2025
Collaborator

I have changed above comments with clickable checkboxes

qwertyasd-789 Sep 5, 2025

could you please provide what version of CANN is needed to support async VLLM? THX!

FightingZhen Sep 5, 2025
Collaborator

could you please provide what version of CANN is needed to support async VLLM? THX!

As far as I know, async vllm depends on vllm-ascend==0.9.1 + CANN==8.2.RC1

xinyubai1209 · 2025-09-04T11:49:13Z

xinyubai1209
Sep 4, 2025

I use vllm-ascend to run grpo+multi-turn function calling, but i found that tool is not be called. Is not support npu + multi-turn function calling? thx!
Train Script(Reference Sglang examples):

# run on 4xH100
# make sure your current working directory is the root of the project

set -x
export HYDRA_FULL_ERROR=1
export VLLM_USE_V1=1
ulimit -n 65535

PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"

python3 -u -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=128 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=/vllm-workspace/Qwen2.5-3B-Instruct \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.use_torch_compile=False \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.mode=async \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=True \
    trainer.critic_warmup=0 \
    trainer.logger='["console", "swanlab"]' \
    trainer.project_name='gsm8k_async_rl_0904' \
    trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-vllm-ascend-multi-w-tool-verify-n16-4cards_0904' \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.save_freq=10 \
    trainer.test_freq=10 \
    trainer.total_training_steps=100 \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
    actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
    critic.ppo_max_token_len_per_gpu=8192 \
    critic.forward_max_token_len_per_gpu=8192 \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    actor_rollout_ref.rollout.multi_turn.enable=True \
    actor_rollout_ref.rollout.multi_turn.format=hermes \
    actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
    actor_rollout_ref.rollout.multi_turn.max_user_turns=1 \
    trainer.device=npu $@

1 reply

zheliuyu Sep 4, 2025
Author

Thank you for highlighting this. GRPO + multi-turn is not currently implemented. We would welcome a contribution if you are interested in working on it.

dogeeelin · 2025-11-29T15:04:40Z

dogeeelin
Nov 29, 2025

any plan for supporting RL training of Qwen3-VL models? (both dense and moe models)

2 replies

leisuzz Dec 1, 2025

We already support the Qwen3-VL 8B & 30B on Mindspeed MM: https://gitcode.com/Ascend/MindSpeed-MM/tree/master/examples/verl_examples/qwen3vl. If you have any question, you can submit issue on our website

zheliuyu Dec 1, 2025
Author

There's a plan to incorporate Qwen3-VL's startup script into verl?

Allen-Jon · 2025-12-03T08:10:30Z

Allen-Jon
Dec 3, 2025

I attempted to run RL training with training-inference separation on the Ascend 910B2 using the one-step-off strategy from verl,：https://github.com/volcengine/verl/blob/main/recipe/one_step_off_policy/README.md.
but encountered an error indicating the feature is not implemented. Is training-inference separation currently supported?

1 reply

zheliuyu Dec 3, 2025
Author

There is an example script that can be attempted to run: /recipe/one_step_off_policy/shell/grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh

zheliuyu · 2025-12-03T09:10:31Z

zheliuyu
Dec 3, 2025
Author

Broadcast:

This discussion has been closed. If you encounter any problems while using verl+npu, you can check the issue part. Thanks.

0 replies

Features that npu will focus on supporting in Q3 #2171

Uh oh!

Uh oh!

Unfinished tasks in Q2

New features in Q3

Replies: 6 comments · 11 replies

Uh oh!

Uh oh!

Uh oh!

zheliuyu Jun 24, 2025 Author

Uh oh!

Uh oh!

zheliuyu Dec 5, 2025 Author

Uh oh!

Uh oh!

FightingZhen Jun 27, 2025 Collaborator

Uh oh!

zheliuyu Aug 17, 2025 Author

Uh oh!

FightingZhen Aug 19, 2025 Collaborator

Uh oh!

Uh oh!

FightingZhen Sep 5, 2025 Collaborator

Uh oh!

Uh oh!

zheliuyu Sep 4, 2025 Author

Uh oh!

Uh oh!

Uh oh!

zheliuyu Dec 1, 2025 Author

Uh oh!

Uh oh!

zheliuyu Dec 3, 2025 Author

Uh oh!

Uh oh!

zheliuyu Dec 3, 2025 Author

Replies: 6 comments 11 replies

zheliuyu Jun 24, 2025
Author

zheliuyu Dec 5, 2025
Author

FightingZhen
Jun 27, 2025
Collaborator

zheliuyu Aug 17, 2025
Author

FightingZhen Aug 19, 2025
Collaborator

FightingZhen Sep 5, 2025
Collaborator

zheliuyu Sep 4, 2025
Author

zheliuyu Dec 1, 2025
Author

zheliuyu Dec 3, 2025
Author

zheliuyu
Dec 3, 2025
Author