Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
b42b828
(fix): update math rule reward worker.
Oct 28, 2025
f609508
(feat): set RAY_CGRAPH_get_timeout=600.
PanAndy Oct 29, 2025
3f18fda
(fix): vllm 0.11.0 import
emiedon Oct 29, 2025
4dc7f5b
(fix): fix train infer ratio/diff mean & add train infer ratio/diff t…
ToasterSC Nov 5, 2025
eb0d106
(feat): support vllm beam_search.
PanAndy Nov 5, 2025
45414fa
(fix): ensure compatibility with transformers version check for causa…
chocoded Nov 5, 2025
9aea395
(feat): support pytorch280 docker.
PanAndy Dec 5, 2025
a8019db
(fix): fix agentic val get_batch state in redundancy env.
PanAndy Nov 7, 2025
c35095b
(feat): Add support for Qwen-3-next on AMD GPUs.
Nov 18, 2025
76c10fb
fix: fix tokenizer usage in llm judge reward worker.
guoshengCS Nov 28, 2025
ed79308
(feat): add vlm option.
PanAndy Dec 5, 2025
020e909
(feat): agentic-spec actor worker.
Oct 30, 2025
01db3d4
(feat): agentic_filter_task.
PanAndy Dec 2, 2025
3423f76
(refactor): agentic pipeline modify.
Oct 31, 2025
2306a0f
(fix): update error logging for image loading failure.
chocoded Oct 31, 2025
e59bd18
(fix): fix max_len_mask key.
Oct 31, 2025
2e63003
(feat): add infer_log_probs in agentic.
PanAndy Dec 2, 2025
c5cdbed
(feat): update mcore_adapter.
PanAndy Dec 5, 2025
654caec
(fix): fix bugs in data fetching for face embeddings.
Nov 5, 2025
51e5358
(feat): add agentic chunk.
PanAndy Dec 2, 2025
a45150a
(feat): add sglang 0.4.6.post5.
PanAndy Dec 5, 2025
1a46d50
(feat): support offload nccl to save gpu memory.
xuehuanran Nov 7, 2025
98ec5d6
(feat): support pytorch280 docker.
PanAndy Dec 5, 2025
3bf1810
(fix): fix vllm 0110 model_config.
PanAndy Nov 10, 2025
f698891
(refactor): refactor agentic norm.
Nov 11, 2025
86297d9
(feat): add agentic profile metrics.
PanAndy Dec 2, 2025
601a761
(feat): sglang 054 patch.
emiedon Nov 11, 2025
ccae407
(feat): add enable_reference option.
PanAndy Nov 11, 2025
e0e6408
(fix): fix agentic reference.
PanAndy Nov 12, 2025
2dae7c1
(feat): add flash-linear-attention.
PanAndy Dec 5, 2025
742efe4
(fix): vllm _generate_standard missing prompt_token_ids input args in…
HuangJoJo Nov 13, 2025
ce5331a
(fix): sglang 054post2 tp worker init wrong.
emiedon Nov 13, 2025
5faa728
(fix): vllm add missing argument is_lora in function update_parameter.
hydrozhao Nov 14, 2025
88a8366
(feat): update mcore_adapter.
PanAndy Dec 5, 2025
7f9785d
(fix): fix get_cached_module_file.
PanAndy Dec 5, 2025
e224453
(fix): fix bugs with metrics recording in the DPO pipeline.
Schnabel-8 Nov 17, 2025
eac3dad
(feat): add enable_old_logprobs, opt old log probs by cache.
PanAndy Nov 17, 2025
deb3758
(fix): update image loading logic for byte data in rlvr_vlm_pipeline.py
chocoded Nov 18, 2025
19c1769
(feat): mcore_adapter support qwen3vl.
liu-zichen Nov 18, 2025
743d2b0
(fix): add force_vit flags for image and video processing in Qwen3 VL…
chocoded Nov 18, 2025
e974e40
(feat): add qwen3-vl example.
PanAndy Dec 5, 2025
c21475f
(feat): mock infer.
Nov 21, 2025
0625adb
(feat): add qwen3-vl 32B example.
PanAndy Dec 5, 2025
9c2ae46
(feat): add sequence packing for sft pipeline and distill pipeline, o…
Schnabel-8 Nov 24, 2025
9e03c4c
(feat): add alive check.
PanAndy Nov 24, 2025
5ddf0ad
(feat): sglang support dp-attention.
emiedon Nov 25, 2025
b082a82
(fix): set broadcast_non_tensor_batch for old_logprobs.
PanAndy Dec 3, 2025
82a0477
(fix): fix vllm get_metrics exception.
PanAndy Dec 4, 2025
58208c1
(fix): fix vllm 0110.
PanAndy Dec 4, 2025
af34922
(fix): fix AgenticAcotrWorker import.
PanAndy Dec 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs_roll/docs/User Guides/Configuration/vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,20 @@ In the configuration example, we can see:

This design allows different components to choose the most suitable inference engine according to their needs.

### beam_search Configuration
RLVRPipeline supports vllm beam_search generation method, configured as follows:
```yaml
generate_opt_level: 0 # Degrades to batch_generate generation method, generate_opt_level=1 is prompt-level parallel method
num_return_sequences_in_group: 8
actor_infer:
generating_args:
num_beams: ${num_return_sequences_in_group}
num_return_sequences: ${num_return_sequences_in_group}
```
Note:
- generating_args.num_beams and generating_args.num_return_sequences must be set to the same value.
- The generating_args configuration in validate is also configured in the same way.

## Performance Optimization Recommendations

1. **Memory Management**:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,21 @@ actor_infer:

这种设计允许不同组件根据其需求选择最适合的推理引擎。

### beam_search 配置方式
RLVRPipeline 支持vllm beam_search 的生成方式,配置方式如下:
```yaml
generate_opt_level: 0 # 退化为batch_generate生成方式,generate_opt_level=1是prompt粒度并行方式
num_return_sequences_in_group: 8
actor_infer:
generating_args:
num_beams: ${num_return_sequences_in_group}
num_return_sequences: ${num_return_sequences_in_group}
```
注意:
- generating_args.num_beams 和 generating_args.num_return_sequences 必须设置为相同的值。
- validate中配置generating_args也是相同的方式。


## 性能优化建议

1. **内存管理**:
Expand Down
5 changes: 2 additions & 3 deletions examples/qwen2.5-0.5B-agentic/agent_val_frozen_lake_amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ actor_infer:
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.4
gpu_memory_utilization: 0.6
block_size: 16
load_format: auto
device_mapping: list(range(0,8))
Expand All @@ -131,7 +131,6 @@ reward_normalization:
method: mean_std # asym_clip / identity / mean_std

train_env_manager:
format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
max_env_num_per_worker: 16
num_env_groups: 128
# under the same group, the env config and env seed are ensured to be equal
Expand Down Expand Up @@ -163,8 +162,8 @@ custom_envs:
${custom_env.FrozenLakeThink}
FrozenLakeLocallyDefineExamples: # Can import from unified envs config or define dict locally
env_type: frozen_lake
max_steps: ${max_actions_per_traj}
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_think_format}
env_manager_cls: ${env_manager_cls}
use_thread_lock: true
env_config:
Expand Down
163 changes: 163 additions & 0 deletions examples/qwen2.5-0.5B-agentic/agent_val_frozen_lake_async_amd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
defaults:
- ../config/traj_envs@_here_
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
run:
dir: .
output_subdir: null

exp_name: "agentic_pipeline_async"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
render_save_dir: ./output/render
system_envs:
USE_MODELSCOPE: '1'

#track_with: wandb
#tracker_kwargs:
# api_key:
# project: roll-agentic
# name: ${exp_name}_sokoban
# notes: "agentic_pipeline"
# tags:
# - agentic
# - roll
# - baseline

track_with: tensorboard
tracker_kwargs:
log_dir: /data/oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_frozen_lake_async

checkpoint_config:
type: file_system
output_dir: /data/cpfs_0/rl_examples/models/${exp_name}

num_gpus_per_node: 8

max_steps: 1024
save_steps: 10000
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false

async_generation_ratio: 1

rollout_batch_size: 1024
val_batch_size: 1024
sequence_length: 8192

advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "grpo"
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0

pretrain: Qwen/Qwen2.5-0.5B-Instruct
reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct

actor_train:
model_args:
attn_implementation: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
training_args:
learning_rate: 1.0e-6
weight_decay: 0
per_device_train_batch_size: 2
gradient_accumulation_steps: 128
warmup_steps: 10
lr_scheduler_type: cosine
data_args:
template: qwen2_5
strategy_args:
# strategy_name: deepspeed_train
# strategy_config: ${deepspeed_zero3}
strategy_name: megatron_train
strategy_config:
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
expert_model_parallel_size: 1
use_distributed_optimizer: true
recompute_granularity: full
device_mapping: list(range(0,4))
infer_batch_size: 2

actor_infer:
model_args:
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: 128 # single-turn response length
top_p: 0.99
top_k: 100
num_beams: 1
temperature: 0.99
num_return_sequences: 1
data_args:
template: qwen2_5
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.6
block_size: 16
load_format: auto
device_mapping: list(range(4,8))

reference:
model_args:
attn_implementation: fa2
disable_gradient_checkpointing: true
dtype: bf16
model_type: ~
data_args:
template: qwen2_5
strategy_args:
strategy_name: hf_infer
strategy_config: ~
device_mapping: list(range(0,4))
infer_batch_size: 2

reward_normalization:
grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
method: mean_std # asym_clip / identity / mean_std

train_env_manager:
format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
max_env_num_per_worker: 16
num_env_groups: 128
# under the same group, the env config and env seed are ensured to be equal
group_size: 8
tags: [FrozenLake]
num_groups_partition: [128] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

val_env_manager:
max_env_num_per_worker: 32
num_env_groups: 1024
group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
tags: [SimpleSokoban, LargerSokoban, SokobanDifferentGridVocab, FrozenLake]
num_groups_partition: [256, 256, 256, 256] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

# Here, you can override variables defined in the imported envs. max_tokens_per_step: 128 in custom_env.SimpleSokoban, here replaced by 64
max_tokens_per_step: 64

custom_envs:
SimpleSokoban:
${custom_env.SimpleSokoban}
LargerSokoban:
${custom_env.LargerSokoban}
SokobanDifferentGridVocab:
${custom_env.SokobanDifferentGridVocab}
FrozenLake:
${custom_env.FrozenLake}
FrozenLakeThink:
${custom_env.FrozenLakeThink}
49 changes: 49 additions & 0 deletions examples/qwen2.5-0.5B-agentic/submit_pipeline_amd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
set +x
source "examples/scripts/config.sh"

WORKER_COUNT=1
CONFIG_FILE="agent_val_frozen_lake_amd.yaml"
# 替换为mos uri
NEBULA_MODEL=""
ENTRY_FILE="examples/start_agentic_pipeline.py"

CONFIG_PATH=$(basename $(dirname $0))
CONFIG_NAME="${CONFIG_FILE%.yaml}"
JOB_NAME="$CONFIG_PATH-$CONFIG_NAME"


QUEUE="nebula_test2_308x_gpu_hang"
# QUEUE="nebula_test_308x"
ENVS="NCCL_PF_UCM_TIMEOUT=600000,NCCL_SOCKET_IFNAME=bond0"
# ENVS="NCCL_PF_UCM_TIMEOUT=600000"

echo "JOB_NAME: ${JOB_NAME}"
echo "WORKER_COUNT: ${WORKER_COUNT}"
echo "CONFIG_NAME: ${CONFIG_NAME}"
echo "CONFIG_PATH: ${CONFIG_PATH}"
echo "ENTRY_FILE: ${ENTRY_FILE}"

args="--config_name ${CONFIG_NAME} --config_path ${CONFIG_PATH}"

mdl_args="--queue=${QUEUE} \
--entry=${ENTRY_FILE} \
--worker_count=${WORKER_COUNT} \
--file.cluster_file=examples/scripts/cluster.json \
--job_name=${JOB_NAME} \
--algo_name=pytorch280 \
--requirements_file_name=nebula_patch/requirements/requirements_torch280_vllm_amd.txt \
--oss_appendable=true \
--_NEBULA_MODEL=${NEBULA_MODEL} \
--nebula_model=${NEBULA_MODEL} \
--env=${ENVS} \
--force \
"
if [ -n "${OPENLM_TOKEN}" ]; then
mdl_args="${mdl_args} --env=OPENLM_TOKEN=${OPENLM_TOKEN}"
fi

echo ${args}
echo ${mdl_args}

nebulactl run mdl --user_params="${args}" $mdl_args
49 changes: 49 additions & 0 deletions examples/qwen2.5-0.5B-agentic/submit_pipeline_amd_async.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
set +x
source "examples/scripts/config.sh"

WORKER_COUNT=1
CONFIG_FILE="agent_val_frozen_lake_async_amd.yaml"
# 替换为mos uri
NEBULA_MODEL=""
ENTRY_FILE="examples/start_agentic_pipeline.py"

CONFIG_PATH=$(basename $(dirname $0))
CONFIG_NAME="${CONFIG_FILE%.yaml}"
JOB_NAME="$CONFIG_PATH-$CONFIG_NAME"


QUEUE="nebula_test2_308x_gpu_hang"
# QUEUE="nebula_test_308x"
ENVS="NCCL_PF_UCM_TIMEOUT=600000,NCCL_SOCKET_IFNAME=bond0"
# ENVS="NCCL_PF_UCM_TIMEOUT=600000"

echo "JOB_NAME: ${JOB_NAME}"
echo "WORKER_COUNT: ${WORKER_COUNT}"
echo "CONFIG_NAME: ${CONFIG_NAME}"
echo "CONFIG_PATH: ${CONFIG_PATH}"
echo "ENTRY_FILE: ${ENTRY_FILE}"

args="--config_name ${CONFIG_NAME} --config_path ${CONFIG_PATH}"

mdl_args="--queue=${QUEUE} \
--entry=${ENTRY_FILE} \
--worker_count=${WORKER_COUNT} \
--file.cluster_file=examples/scripts/cluster.json \
--job_name=${JOB_NAME} \
--algo_name=pytorch280 \
--requirements_file_name=nebula_patch/requirements/requirements_torch280_vllm_amd.txt \
--oss_appendable=true \
--_NEBULA_MODEL=${NEBULA_MODEL} \
--nebula_model=${NEBULA_MODEL} \
--env=${ENVS} \
--force \
"
if [ -n "${OPENLM_TOKEN}" ]; then
mdl_args="${mdl_args} --env=OPENLM_TOKEN=${OPENLM_TOKEN}"
fi

echo ${args}
echo ${mdl_args}

nebulactl run mdl --user_params="${args}" $mdl_args
12 changes: 8 additions & 4 deletions examples/qwen2.5-7B-distill_megatron/distill_megatron.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ distill_on_prompt: False

logits_transfer_backend: "nccl-only" # support "ipc+nccl", "nccl_only" and "ray"

sequence_length: 1024
sequence_length: 2048
max_grad_norm: 1.0

question_key: question_zh
Expand All @@ -43,8 +43,8 @@ student:
training_args:
learning_rate: 2.0e-5
lr_scheduler_type: constant
per_device_train_batch_size: 2
gradient_accumulation_steps: 1
per_device_train_batch_size: 8
gradient_accumulation_steps: 4
warmup_steps: 0
num_train_epochs: 1

Expand All @@ -57,10 +57,12 @@ student:
strategy_name: megatron_train
strategy_config:
tensor_model_parallel_size: 2
sequence_parallel: True
pipeline_model_parallel_size: 2
context_parallel_size: 2
use_distributed_optimizer: true
recompute_granularity: full
use_sequence_packing: True
device_mapping: list(range(0,8))

teacher:
Expand All @@ -72,14 +74,16 @@ teacher:
template: qwen2_5
training_args:
# teacher forward micro_batch_size
per_device_train_batch_size: 1
per_device_train_batch_size: 8
strategy_args:
strategy_name: megatron_infer
strategy_config:
tensor_model_parallel_size: 2
sequence_parallel: True
pipeline_model_parallel_size: 2
context_parallel_size: 2
bf16: true
use_sequence_packing: True
device_mapping: list(range(0,8))

system_envs:
Expand Down
Loading