Skip to content

Conversation

Ritesh1905
Copy link
Contributor

A split version of this RFC PR

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 15, 2025
@Ritesh1905 Ritesh1905 marked this pull request as ready for review September 15, 2025 18:54
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start a PR that integrates completion and experience? maybe with Experience renamed to Episode

I think these are the lowest hanging fruits for cleaning up our grpo app now, I'd love to see the main example have Data.from_xyz()

Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all reasonable dataclasses. I think at a high level my two main comments are that most of these should exist at the app level. At the user level you have your Completion/Score/Experience data that's dependent on your agentic pipeline (Multi-agent?, N tools, M judges, Conditional verifiers, etc). You want the data that you're tracking to be as flexible as the pipeline. So keeping as much of this in the app as possible (in contrast with the trainer types which you would want to be more static) allows users to edit both in parallel and independent of other apps.

Second I think keeping the number of types small (maybe just one type instead of 3 for a rollout) puts a much lower burden on the user for customization, debugging and ramp up.

I think if you look at what is being done with the Titan trainer (example in apps/rl/main), you can see a lot of the patterns proposed here but with an emphasis of user defined data. e.g. define what a rollout looks like for the app (Episode) and then every service can define what inputs they want and how they handle their own just in time (JIT) batching. For titan, the user defines the function that batches/packs the episodes for the trainer, so the trainer can be agnostic to the app data type.



@dataclass
class Completion:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our Episode is a combination of Completion and ScoreCompletion and Experience. I think keeping them flat makes customization and logging a bit easier but either way can work.



@dataclass
class DistributedMetric(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this concept and it could be combined with our MetricLogger. But if we're going to handle this in a generic way, I don't think we want to deal with process groups, otherwise you have to set them up with a bunch of services that don't need them. Either we should use these abstractions and return metrics to a MetricService/controller to aggregate or handle it without formal abstractions, in a service specific way, as we do now.



@dataclass
class Prompt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we should match OpenAI requests here as the type. This way we can use the same formatting for local and API judge calls



@dataclass
class LossInput:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is roughly what we're doing now, except it's trainer_logits + target_minibatch which is a subset of the minibatch you routed for the loss.

@vidhyav vidhyav closed this Sep 18, 2025
@vidhyav vidhyav mentioned this pull request Sep 18, 2025
@Ritesh1905 Ritesh1905 deleted the rithesh/rfc branch October 7, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants