Multi-Task GRPO Training #3073

cbartmann · 2025-03-13T15:22:31Z

cbartmann
Mar 13, 2025

I'm curious if anyone has figured out how to use GRPOTrainer for training a model on multiple objectives—like coding and math—simultaneously. Each objective would require its own reward function, which needs to be applied based on the task associated with each sample. It seems that batching samples by task might be necessary to fully leverage parallelism. Any insights or suggestions would be appreciated!

shirinyamani · 2025-03-18T17:44:51Z

shirinyamani
Mar 18, 2025

Hi @cbartmann , thanks for the excellent point!
TRL now supports multi-task setup!
As an example, imagine you have a reward_func that checks if a code compiles, but the sample from the dataset isn't about code. Previously, in such a condition, the program would crash because the relevant reward_func would return None!
But NOT anymore!!!
This means now its OKAY to have mixed reward_funcs eventhough, some of them might not be used cuz there is no relevant sample for them!
This is the relevant PR that took care of it.

Thanks again and Happy Coding!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-Task GRPO Training #3073

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-Task GRPO Training #3073

Uh oh!

cbartmann Mar 13, 2025

Replies: 1 comment

Uh oh!

shirinyamani Mar 18, 2025

cbartmann
Mar 13, 2025

shirinyamani
Mar 18, 2025