- 
                Notifications
    You must be signed in to change notification settings 
- Fork 34
[1/N] Core Data Models #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we start a PR that integrates completion and experience?  maybe with  Experience renamed to Episode
I think these are the lowest hanging fruits for cleaning up our grpo app now, I'd love to see the main example have Data.from_xyz()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all reasonable dataclasses. I think at a high level my two main comments are that most of these should exist at the app level. At the user level you have your Completion/Score/Experience data that's dependent on your agentic pipeline (Multi-agent?, N tools, M judges, Conditional verifiers, etc). You want the data that you're tracking to be as flexible as the pipeline. So keeping as much of this in the app as possible (in contrast with the trainer types which you would want to be more static) allows users to edit both in parallel and independent of other apps.
Second I think keeping the number of types small (maybe just one type instead of 3 for a rollout) puts a much lower burden on the user for customization, debugging and ramp up.
I think if you look at what is being done with the Titan trainer (example in apps/rl/main), you can see a lot of the patterns proposed here but with an emphasis of user defined data. e.g. define what a rollout looks like for the app (Episode) and then every service can define what inputs they want and how they handle their own just in time (JIT) batching. For titan, the user defines the function that batches/packs the episodes for the trainer, so the trainer can be agnostic to the app data type.
|  | ||
|  | ||
| @dataclass | ||
| class Completion: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently our Episode is a combination of Completion and ScoreCompletion and Experience. I think keeping them flat makes customization and logging a bit easier but either way can work.
|  | ||
|  | ||
| @dataclass | ||
| class DistributedMetric(ABC): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this concept and it could be combined with our MetricLogger. But if we're going to handle this in a generic way, I don't think we want to deal with process groups, otherwise you have to set them up with a bunch of services that don't need them. Either we should use these abstractions and return metrics to a MetricService/controller to aggregate or handle it without formal abstractions, in a service specific way, as we do now.
|  | ||
|  | ||
| @dataclass | ||
| class Prompt: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we should match OpenAI requests here as the type. This way we can use the same formatting for local and API judge calls
|  | ||
|  | ||
| @dataclass | ||
| class LossInput: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is roughly what we're doing now, except it's trainer_logits + target_minibatch which is a subset of the minibatch you routed for the loss.
A split version of this RFC PR