Skip to content

Flattenrl directory: remove vllm_compat, consolidate unified #2618

Merged
wwwjn merged 6 commits intomainfrom
rl-merge
Mar 18, 2026
Merged

Flattenrl directory: remove vllm_compat, consolidate unified #2618
wwwjn merged 6 commits intomainfrom
rl-merge

Conversation

@wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Mar 17, 2026

rl/

  • Move all files from rl/unified/ directly under rl/ (actors, models, scripts, etc.)
  • Remove rl/vllm_compat/ entirely (unused by unified code)
  • Rename types.py -> rl_types.py to avoid shadowing Python stdlib types module
  • Fix vllm.model_executor.layers.attention.Attention import for newer vLLM
  • Update experiment registry: rl.unified -> rl
  • Update all internal imports and README paths
  • Add rl_grpo_qwen3_0_6b_tp1 config for TP=1 testing

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 17, 2026
Copy link

@felipemello1 felipemello1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: TP=1 works, and TP=2 fails with RoPE cache + compile. Debugging before landing this PR.

thanks!

Approving to unblock. Since you are moving files around, my 2c is that we should have something like:

experiments/rl/
	|-- actors
	....
	|--experiments
		|-- two_sum
		|--- gsm8k

And not have two_sum at the the root level of '/rl'

@wwwjn
Copy link
Contributor Author

wwwjn commented Mar 17, 2026

NOTE: TP=1 works, and TP=2 fails with RoPE cache + compile. Debugging before landing this PR.

thanks!

Approving to unblock. Since you are moving files around, my 2c is that we should have something like:

experiments/rl/
	|-- actors
	....
	|--experiments
		|-- two_sum
		|--- gsm8k

And not have two_sum at the the root level of '/rl'

Good suggestion, let me move things around

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment

@wwwjn
Copy link
Contributor Author

wwwjn commented Mar 18, 2026

NOTE: TP=1 works, and TP=2 fails with RoPE cache + compile. Debugging before landing this PR.

thanks!

Approving to unblock. Since you are moving files around, my 2c is that we should have something like:

experiments/rl/
	|-- actors
	....
	|--experiments
		|-- two_sum
		|--- gsm8k

And not have two_sum at the the root level of '/rl'

I updated it to be

	|--tasks
 		|-- two_sum
 		|--- gsm8k

As the whole rl folder is under the experiment/ folder, repeated name it confusing. Wdyt @felipemello1 @tianyu-l

@felipemello1
Copy link

felipemello1 commented Mar 18, 2026

i think that 'tasks', 'projects' or 'recipes' would be fine. My only "con" against task is that a model can be trained on multiple tasks, e.g. coding, websearch, etc. I feel like 'project' or 'recipe' would more descriptive. But it shouldnt be a big deal either way. You could ask in the rl group if someone feels strongly about it. Your call!

Copy link
Contributor

@daniellepintz daniellepintz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I really don't think we need the extra tasks/sum_digits directories, I think it's simpler to just have a top level simple_grpo.py file. The path gets very long which is not the best user experience IMO, and this is the controller file, so is pretty important and would prefer if it's not so nested

@wwwjn
Copy link
Contributor Author

wwwjn commented Mar 18, 2026

this is the controller file

My thought is the controller file it not generalized, and it's closely tied to sum digits task now. We should explicitly express this limitation in our file names / file structure. Once we have enough knowledge to have a abstraction on generalizable controller, we can move it outside of sum_digits task

@daniellepintz
Copy link
Contributor

it is tied to sum digits, although not that closely imo, i still think it's fairly generalizable. If we want to keep sum digits in the name that's okay, but I would just prefer it's not super nested. But if no one else shares that opinion also okay : )

@felipemello1
Copy link

felipemello1 commented Mar 18, 2026

sum_digits is just one project. With time, we may have: 'gsm8k', 'web_search', 'DPO', 'coding', etc. When that happens, we need to have a place to put them. Each of them would have their own 'grader.py', 'data.py', 'main.py'. This is what i have seen in all RL libraries as well. Some examples:

https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes
https://github.com/PrimeIntellect-ai/prime-rl/tree/main/examples
https://github.com/verl-project/verl/tree/main/examples

@joecummings
Copy link
Member

Right now, yes the main controller doesn't have an amount of task-specific information in it that would be difficult to remove. It could be generalized at this point.

However, splitting this into recipes gives us (and users) a ton of flexibility to experiment with different RL optimizations and tasks. For this reason, atm I'm in favor of recipes (definitely not calling it tasks though).

My only caveat is making is clear that recipes should not encourage the proliferation of every possible RL technique under the sun - let's keep things focused aligned with the intention of titan.

wwwjn added 6 commits March 18, 2026 12:34
… rl/

- Move all files from rl/unified/ directly under rl/ (actors, models, scripts, etc.)
- Remove rl/vllm_compat/ entirely (unused by unified code)
- Rename types.py -> rl_types.py to avoid shadowing Python stdlib types module
- Fix vllm.model_executor.layers.attention.Attention import for newer vLLM
- Update experiment registry: rl.unified -> rl
- Update all internal imports and README paths
- Add rl_grpo_qwen3_0_6b_tp1 config for TP=1 testing
@wwwjn wwwjn merged commit ea614ba into main Mar 18, 2026
15 of 23 checks passed
daniellepintz added a commit that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants