Replies: 6 comments 11 replies
-
|
can we support pangu model in the future? |
Beta Was this translation helpful? Give feedback.
-
|
Hi,instead of above meantioned features, we are looking forward to see following features supported in verl on Ascend device: Inference Backend
Training Backend & Specific Features
Models & Algorithms
Others
|
Beta Was this translation helpful? Give feedback.
-
|
I use vllm-ascend to run grpo+multi-turn function calling, but i found that tool is not be called. Is not support npu + multi-turn function calling? thx! # run on 4xH100
# make sure your current working directory is the root of the project
set -x
export HYDRA_FULL_ERROR=1
export VLLM_USE_V1=1
ulimit -n 65535
PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
python3 -u -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=128 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=/vllm-workspace/Qwen2.5-3B-Instruct \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.use_torch_compile=False \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.mode=async \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.rollout.n=8 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.val_before_train=True \
trainer.critic_warmup=0 \
trainer.logger='["console", "swanlab"]' \
trainer.project_name='gsm8k_async_rl_0904' \
trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-vllm-ascend-multi-w-tool-verify-n16-4cards_0904' \
trainer.n_gpus_per_node=4 \
trainer.nnodes=1 \
trainer.save_freq=10 \
trainer.test_freq=10 \
trainer.total_training_steps=100 \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
critic.ppo_max_token_len_per_gpu=8192 \
critic.forward_max_token_len_per_gpu=8192 \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.rollout.multi_turn.enable=True \
actor_rollout_ref.rollout.multi_turn.format=hermes \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
actor_rollout_ref.rollout.multi_turn.max_user_turns=1 \
trainer.device=npu $@ |
Beta Was this translation helpful? Give feedback.
-
|
any plan for supporting RL training of Qwen3-VL models? (both dense and moe models) |
Beta Was this translation helpful? Give feedback.
-
|
I attempted to run RL training with training-inference separation on the Ascend 910B2 using the one-step-off strategy from verl,:https://github.com/volcengine/verl/blob/main/recipe/one_step_off_policy/README.md. |
Beta Was this translation helpful? Give feedback.
-
|
Broadcast: This discussion has been closed. If you encounter any problems while using verl+npu, you can check the issue part. Thanks. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Unfinished tasks in Q2
megatron/mindspeedworker (for npu, megatron≈mindspeed)Q2 roadmap: #900
New features in Q3
FSDP2workerBeta Was this translation helpful? Give feedback.
All reactions