Replies: 1 comment
-
|
I have the same question. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m new to RL and deep learning, so my question might seem simple. I would greatly appreciate any advice!
I’m using LLM to solve a multi-step sequence generation task. At each step:
I have implemented multiple reward functions to evaluate each action (append/remove/modify/terminate). Thus, the RL loop is:
Conceptually, this fits the GRPO training loop. The problem is, my training data is not a fixed (“static”) dataset—instead, it’s generated on‐the‐fly from the model’s own past outputs. According to #3213 , the current GRPO Trainer does not support IterableDataset.
Question: What’s the recommended way to handle a dynamically generated dataset with GRPO Trainer? Is there a workaround, or do I need to implement a custom training loop? Thank you for any pointers!
Beta Was this translation helpful? Give feedback.
All reactions