-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
版本说明:
20251230代表跟踪repo-main的时间版本
latest代表跟踪repo-main-20251224时间版本
复杂字母代表正常运行的老版本
repo-main-20251224存在两个问题:pg_loss和adv/mean接近0
repo-main-20251230,解决了一个问题:pg_loss不再被过度缩放
但adv/mean还是存在被过度缩小的情况,是否类似于:#4711
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
复现步骤:
- 参考verl/examples/ppo_trainer/run_qwen2.5-3b_rm_reward_loop_colocate.sh
Expected behavior
最新版本跑出来的结果显示上adv/mean接近0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working