Replies: 1 comment
-
Hi @cbartmann , thanks for the excellent point! Thanks again and Happy Coding! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm curious if anyone has figured out how to use GRPOTrainer for training a model on multiple objectives—like coding and math—simultaneously. Each objective would require its own reward function, which needs to be applied based on the task associated with each sample. It seems that batching samples by task might be necessary to fully leverage parallelism. Any insights or suggestions would be appreciated!
Beta Was this translation helpful? Give feedback.
All reactions