Skip to content

Conversation

pbontrager
Copy link
Contributor

This shows how you can add a judge into the workflow. This is a very basic example that just replaces the RewardActor with the Judge and the logic around the judge is very basic. The primary goal of this PR is as an example.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 7, 2025
@Jack-Khuu
Copy link
Contributor

I'll spin this into test/sandbox with some additional context going into PTC

(Guided Decoding doesn't work OOTB for CoT since it neuters the prob distribution so it doesn't think)

  • Supposedly vllm supports this in v0; Will need to dig into how well supported it is in vllm v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants