Is it possible to apply GRPOTrainer directly to handle interactive environments? #352

Some-random · 2025-02-18T01:02:07Z

Some-random
Feb 18, 2025

Say for example agentic environments, where some parts of the trajectory shouldn't participate in gradient calculation (e.g. environment responses). I'm not sure whether this is supported currently as I think the trainer assumes all tokens in a sequence are generated by the model and should be part of gradient computation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to apply GRPOTrainer directly to handle interactive environments? #352

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is it possible to apply GRPOTrainer directly to handle interactive environments? #352

Uh oh!

Some-random Feb 18, 2025

Replies: 0 comments

Some-random
Feb 18, 2025