Skip to content

[Bug] Rollout infer_tp size is incorrectly set #4569

@chenjr97

Description

@chenjr97

System Info

----------Python Info----------
Version : 3.10.12
Compiler : GCC 11.4.0
Build : ('main', 'Jul 29 2024 16:56:48')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 25.1.1
vllm : 0.10.0
ray : 2.47.1
torch : 2.7.1+cu126
----------verl Info-----------
Version : 0.7.0.dev
----------Platform Info----------
Platform : Linux-5.10.0-34-amd64-x86_64-with-glibc2.35
system : Linux
node : debian
release : 5.10.0-34-amd64
version : #1 SMP Debian 5.10.234-1 (2025-02-24)
----------Environment----------
VERL_LOGGING_LEVEL=''
CUDA Runtime : 12.6
CUDA Compiler : Cuda compilation tools, release 12.6, V12.6.20
----------System Info----------
CPU Memory : 187 GB
GPU Count : 2
GPU 1 Type : NVIDIA A100 80GB PCIe
GPU 1 Memory : 81920 MiB
GPU 2 Type : NVIDIA A100 80GB PCIe
GPU 2 Memory : 81920 MiB

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

MODEL=${HOME}/verl/model/Qwen2.5-0.5B-Instruct

TRAIN_SET=${HOME}/verl/data/gsm8k/train_ID.parquet
VAL_SET=${HOME}/verl/data/gsm8k/test.parquet

ACTOR_DP=2
ACTOR_TP=1

python -m verl.trainer.main_ppo
algorithm.adv_estimator=grpo
trainer.val_before_train=False
data.train_files=${TRAIN_SET}
data.val_files=${VAL_SET}
data.train_batch_size=256
data.max_prompt_length=512
data.max_response_length=4096
data.filter_overlong_prompts=True
data.truncation='error'
data.shuffle=True
actor_rollout_ref.model.path=${MODEL}
actor_rollout_ref.model.lora_rank=64
actor_rollout_ref.model.lora_alpha=32
actor_rollout_ref.actor.optim.lr=5e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.ppo_mini_batch_size=256
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64
actor_rollout_ref.rollout.tensor_model_parallel_size=${ACTOR_TP}
actor_rollout_ref.rollout.data_parallel_size=${ACTOR_DP}
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.gpu_memory_utilization=0.75
actor_rollout_ref.rollout.n=5
actor_rollout_ref.rollout.load_format=safetensors
actor_rollout_ref.rollout.layered_summon=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64
actor_rollout_ref.ref.fsdp_config.param_offload=False
actor_rollout_ref.rollout.mode=sync
trainer.logger='["console", "wandb"]'
algorithm.use_kl_in_reward=False
trainer.critic_warmup=0
+ray_kwargs.ray_init.log_to_driver=true
trainer.project_name='veRL'
trainer.experiment_name='exp_name'
trainer.n_gpus_per_node=2
trainer.nnodes=1
trainer.save_freq=200
trainer.test_freq=10
trainer.total_epochs=3 \

Expected behavior

I noticed that the infer_tp size defined in
"

infer_tp = self.config.rollout.tensor_model_parallel_size * self.config.rollout.data_parallel_size
"
may be set incorrectly.

I conducted experiments using the original implementation and a modified version where

infer_tp = self.config.rollout.tensor_model_parallel_size

The results show that the original implementation suffers from significant efficiency degradation.
In addition, I printed the prompts received by each DP worker and found that, under the original setting, prompts were not dispatched to the DP workers at all.

The results shows below.

origin

�[36m(WorkerDict pid=455057)�[0m [Worker Info] Global Rank: 0, World Size: 2
�[36m(WorkerDict pid=455057)�[0m [Prompts Info] Number of prompts (before repeat): 256
�[36m(WorkerDict pid=455057)�[0m [Repeat Info] repeat_times: 5, repeat_interleave: True

�[36m(WorkerDict pid=455058)�[0m [Worker Info] Global Rank: 1, World Size: 2
�[36m(WorkerDict pid=455058)�[0m [Prompts Info] Number of prompts (before repeat): 256
�[36m(WorkerDict pid=455058)�[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
�[36m(WorkerDict pid=455058)�[0m [Prompts Info] Number of prompts (after repeat): 1280

  • response_length_non_aborted/mean:304.8968811035156 - response_length_non_aborted/max:4096.0 - response_length_non_aborted/min:129.0 - response_length_non_aborted/clip_ratio:0.0007812500116415322 - response/aborted_ratio:0.0 - prompt_length/mean:104.4921875 - prompt_length/max:183.0 - prompt_length/min:69.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0005315960152074695 - timing_s/generate_sequences:76.94253540039062 - timing_s/generation_timing/max:93.05921936035156 - timing_s/generation_timing/min:60.82584762573242 - timing_s/generation_timing/topk_ratio:0.5 - timing_s/gen:99.35007062903605 - timing_s/reward:0.3626377400942147 - timing_s/old_log_prob:8.188723739935085 - timing_s/ref:4.42419486597646 - timing_s/adv:0.04897050198633224 - timing_s/update_actor:26.710124182980508 - timing_s/step:139.3436508589657 - timing_s/stop_profile:0.0001778359292075038 - timing_per_token_ms/gen:0.2545688363612596 - timing_per_token_ms/adv:9.34519462811053e-05 - timing_per_token_ms/ref:0.008442829952361293 - timing_per_token_ms/update_actor:0.0509717684945565 - perf/total_num_tokens:524018 - perf/time_per_step:139.3436508589657 - perf/throughput:1880.308133057228

infer_tp = self.config.rollout.tensor_model_parallel_size

�[36m(WorkerDict pid=344971)�[0m [Worker Info] Global Rank: 0, World Size: 2
�[36m(WorkerDict pid=344971)�[0m [Prompts Info] Number of prompts (before repeat): 128
�[36m(WorkerDict pid=344971)�[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
�[36m(WorkerDict pid=344971)�[0m [Prompts Info] Number of prompts (after repeat): 640

�[36m(WorkerDict pid=344972)�[0m [Worker Info] Global Rank: 1, World Size: 2
�[36m(WorkerDict pid=344972)�[0m [Prompts Info] Number of prompts (before repeat): 128
�[36m(WorkerDict pid=344972)�[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
�[36m(WorkerDict pid=344972)�[0m [Prompts Info] Number of prompts (after repeat): 640

  • response_length_non_aborted/mean:292.1937561035156 - response_length_non_aborted/max:4096.0 - response_length_non_aborted/min:111.0 - response_length_non_aborted/clip_ratio:0.0023437500931322575 - response/aborted_ratio:0.0 - prompt_length/mean:101.9140625 - prompt_length/max:199.0 - prompt_length/min:70.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:6.126496009528637e-05 - timing_s/generate_sequences:42.80970001220703 - timing_s/generation_timing/max:43.4930305480957 - timing_s/generation_timing/min:42.12636947631836 - timing_s/generation_timing/topk_ratio:0.5 - timing_s/gen:48.79045281698927 - timing_s/reward:0.4484845600090921 - timing_s/old_log_prob:5.699306814931333 - timing_s/ref:4.0395765529247 - timing_s/adv:0.19046411104500294 - timing_s/update_actor:20.035540578886867 - timing_s/step:79.389909783029 - timing_s/stop_profile:0.0001710229553282261 - timing_per_token_ms/adv:0.00037756188036467444 - timing_per_token_ms/update_actor:0.03971696470050404 - timing_per_token_ms/gen:0.13045296575738827 - timing_per_token_ms/ref:0.008007755953765626 - perf/total_num_tokens:504458 - perf/time_per_step:79.389909783029 - perf/throughput:3177.0914048061863

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions