Add Efficient Online Training with GRPO and vLLM in TRL recipe#334
Add Efficient Online Training with GRPO and vLLM in TRL recipe#334merveenoyan merged 5 commits intomainfrom
Efficient Online Training with GRPO and vLLM in TRL recipe#334Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible. |
|
Recipe ready for review, now with training results added 😃 |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:45Z I think I'd like to frame this as follows sequentially:
(reverse the order) |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:46Z do we need to state differences between PPO and GRPO in this notebook? imo let's only define GRPO to keep the focus in vLLM + online methods. it would be confusing otherwise. if people want to learn more about it they can check a guide that goes through the differences |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:47Z nice! |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:47Z small newline between import and the function is more readable (same goes above) |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:48Z actually would be cool for people to push their trackio logs to Hub and see it there instead of in notebook. also helps with growth there |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:49Z some logs are unnecessary, any way we can change verbosity here? |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:50Z we could let them push to Hub automatically and let them check there |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:50Z super cool! |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-07T08:53:12Z nice! |
What does this PR do?
Add
Efficient Online Training with GRPO and vLLM in TRLrecipe to showcase online training possibilities in TRL.This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here
Who can review?
Feel free to tag members/contributors who may be interested in your PR.