[Bug] Rollout infer_tp size is incorrectly set

### System Info

----------Python Info----------
Version      : 3.10.12
Compiler     : GCC 11.4.0
Build        : ('main', 'Jul 29 2024 16:56:48')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 25.1.1
vllm         :  0.10.0
ray          : 2.47.1
torch        : 2.7.1+cu126
----------verl Info-----------
Version      : 0.7.0.dev
----------Platform Info----------
Platform     : Linux-5.10.0-34-amd64-x86_64-with-glibc2.35
system       : Linux
node         : debian
release      : 5.10.0-34-amd64
version      : #1 SMP Debian 5.10.234-1 (2025-02-24)
----------Environment----------
VERL_LOGGING_LEVEL=''
CUDA Runtime : 12.6
CUDA Compiler : Cuda compilation tools, release 12.6, V12.6.20
----------System Info----------
CPU Memory      : 187 GB
GPU Count       : 2
GPU 1   Type    : NVIDIA A100 80GB PCIe
GPU 1   Memory  : 81920 MiB
GPU 2   Type    : NVIDIA A100 80GB PCIe
GPU 2   Memory  : 81920 MiB

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

MODEL=${HOME}/verl/model/Qwen2.5-0.5B-Instruct

TRAIN_SET=${HOME}/verl/data/gsm8k/train_ID.parquet
VAL_SET=${HOME}/verl/data/gsm8k/test.parquet

ACTOR_DP=2
ACTOR_TP=1    

python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    trainer.val_before_train=False \
    data.train_files=${TRAIN_SET} \
    data.val_files=${VAL_SET} \
    data.train_batch_size=256 \
    data.max_prompt_length=512 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.shuffle=True \
    actor_rollout_ref.model.path=${MODEL} \
    actor_rollout_ref.model.lora_rank=64 \
    actor_rollout_ref.model.lora_alpha=32 \
    actor_rollout_ref.actor.optim.lr=5e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=${ACTOR_TP} \
    actor_rollout_ref.rollout.data_parallel_size=${ACTOR_DP} \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.75 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.rollout.load_format=safetensors \
    actor_rollout_ref.rollout.layered_summon=True \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.ref.fsdp_config.param_offload=False \
    actor_rollout_ref.rollout.mode=sync \
    trainer.logger='["console", "wandb"]' \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    +ray_kwargs.ray_init.log_to_driver=true \
    trainer.project_name='veRL' \
    trainer.experiment_name='exp_name'\
    trainer.n_gpus_per_node=2 \
    trainer.nnodes=1 \
    trainer.save_freq=200 \
    trainer.test_freq=10 \
    trainer.total_epochs=3 \

### Expected behavior

I noticed that the infer_tp size defined in
"https://github.com/volcengine/verl/blob/1ae510c24ef89e7c326b1aa19badbfcd0b0c48e9/verl/workers/fsdp_workers.py#L589"
may be set incorrectly.

I conducted experiments using the original implementation and a modified version where

```python
infer_tp = self.config.rollout.tensor_model_parallel_size
```

The results show that the original implementation suffers from significant efficiency degradation.
In addition, I printed the prompts received by each DP worker and found that, under the original setting, prompts were not dispatched to the DP workers at all.

The results shows below.

origin
------------------------------------------------------------------------------
[36m(WorkerDict pid=455057)[0m [Worker Info] Global Rank: 0, World Size: 2
[36m(WorkerDict pid=455057)[0m [Prompts Info] Number of prompts (before repeat): 256
[36m(WorkerDict pid=455057)[0m [Repeat Info] repeat_times: 5, repeat_interleave: True

[36m(WorkerDict pid=455058)[0m [Worker Info] Global Rank: 1, World Size: 2
[36m(WorkerDict pid=455058)[0m [Prompts Info] Number of prompts (before repeat): 256
[36m(WorkerDict pid=455058)[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
[36m(WorkerDict pid=455058)[0m [Prompts Info] Number of prompts (after repeat): 1280

 - response_length_non_aborted/mean:304.8968811035156 - response_length_non_aborted/max:4096.0 - response_length_non_aborted/min:129.0 - response_length_non_aborted/clip_ratio:0.0007812500116415322 - response/aborted_ratio:0.0 - prompt_length/mean:104.4921875 - prompt_length/max:183.0 - prompt_length/min:69.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0005315960152074695 - timing_s/generate_sequences:76.94253540039062 - timing_s/generation_timing/max:93.05921936035156 - timing_s/generation_timing/min:60.82584762573242 - timing_s/generation_timing/topk_ratio:0.5 - timing_s/gen:99.35007062903605 - timing_s/reward:0.3626377400942147 - timing_s/old_log_prob:8.188723739935085 - timing_s/ref:4.42419486597646 - timing_s/adv:0.04897050198633224 - timing_s/update_actor:26.710124182980508 - timing_s/step:139.3436508589657 - timing_s/stop_profile:0.0001778359292075038 - timing_per_token_ms/gen:0.2545688363612596 - timing_per_token_ms/adv:9.34519462811053e-05 - timing_per_token_ms/ref:0.008442829952361293 - timing_per_token_ms/update_actor:0.0509717684945565 - perf/total_num_tokens:524018 - perf/time_per_step:**139.3436508589657** - perf/throughput:**1880.308133057228**

infer_tp = self.config.rollout.tensor_model_parallel_size
------------------------------------------------------------------------------
[36m(WorkerDict pid=344971)[0m [Worker Info] Global Rank: 0, World Size: 2
[36m(WorkerDict pid=344971)[0m [Prompts Info] Number of prompts (before repeat): 128
[36m(WorkerDict pid=344971)[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
[36m(WorkerDict pid=344971)[0m [Prompts Info] Number of prompts (after repeat): 640

[36m(WorkerDict pid=344972)[0m [Worker Info] Global Rank: 1, World Size: 2
[36m(WorkerDict pid=344972)[0m [Prompts Info] Number of prompts (before repeat): 128
[36m(WorkerDict pid=344972)[0m [Repeat Info] repeat_times: 5, repeat_interleave: True
[36m(WorkerDict pid=344972)[0m [Prompts Info] Number of prompts (after repeat): 640

- response_length_non_aborted/mean:292.1937561035156 - response_length_non_aborted/max:4096.0 - response_length_non_aborted/min:111.0 - response_length_non_aborted/clip_ratio:0.0023437500931322575 - response/aborted_ratio:0.0 - prompt_length/mean:101.9140625 - prompt_length/max:199.0 - prompt_length/min:70.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:6.126496009528637e-05 - timing_s/generate_sequences:42.80970001220703 - timing_s/generation_timing/max:43.4930305480957 - timing_s/generation_timing/min:42.12636947631836 - timing_s/generation_timing/topk_ratio:0.5 - timing_s/gen:48.79045281698927 - timing_s/reward:0.4484845600090921 - timing_s/old_log_prob:5.699306814931333 - timing_s/ref:4.0395765529247 - timing_s/adv:0.19046411104500294 - timing_s/update_actor:20.035540578886867 - timing_s/step:79.389909783029 - timing_s/stop_profile:0.0001710229553282261 - timing_per_token_ms/adv:0.00037756188036467444 - timing_per_token_ms/update_actor:0.03971696470050404 - timing_per_token_ms/gen:0.13045296575738827 - timing_per_token_ms/ref:0.008007755953765626 - perf/total_num_tokens:504458 - perf/time_per_step:**79.389909783029** - perf/throughput:**3177.0914048061863**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Rollout infer_tp size is incorrectly set #4569

System Info

Information

Tasks

Reproduction

Expected behavior

origin

infer_tp = self.config.rollout.tensor_model_parallel_size

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Rollout infer_tp size is incorrectly set #4569

Description

System Info

Information

Tasks

Reproduction

Expected behavior

origin

infer_tp = self.config.rollout.tensor_model_parallel_size

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions