Conversation
There was a problem hiding this comment.
NOTE: TP=1 works, and TP=2 fails with RoPE cache + compile. Debugging before landing this PR.
thanks!
Approving to unblock. Since you are moving files around, my 2c is that we should have something like:
experiments/rl/
|-- actors
....
|--experiments
|-- two_sum
|--- gsm8k
And not have two_sum at the the root level of '/rl'
Good suggestion, let me move things around |
I updated it to be As the whole rl folder is under the experiment/ folder, repeated name it confusing. Wdyt @felipemello1 @tianyu-l |
|
i think that 'tasks', 'projects' or 'recipes' would be fine. My only "con" against task is that a model can be trained on multiple tasks, e.g. coding, websearch, etc. I feel like 'project' or 'recipe' would more descriptive. But it shouldnt be a big deal either way. You could ask in the rl group if someone feels strongly about it. Your call! |
daniellepintz
left a comment
There was a problem hiding this comment.
IMO I really don't think we need the extra tasks/sum_digits directories, I think it's simpler to just have a top level simple_grpo.py file. The path gets very long which is not the best user experience IMO, and this is the controller file, so is pretty important and would prefer if it's not so nested
My thought is the controller file it not generalized, and it's closely tied to sum digits task now. We should explicitly express this limitation in our file names / file structure. Once we have enough knowledge to have a abstraction on generalizable controller, we can move it outside of |
|
it is tied to sum digits, although not that closely imo, i still think it's fairly generalizable. If we want to keep sum digits in the name that's okay, but I would just prefer it's not super nested. But if no one else shares that opinion also okay : ) |
|
sum_digits is just one project. With time, we may have: 'gsm8k', 'web_search', 'DPO', 'coding', etc. When that happens, we need to have a place to put them. Each of them would have their own 'grader.py', 'data.py', 'main.py'. This is what i have seen in all RL libraries as well. Some examples: https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes |
|
Right now, yes the main controller doesn't have an amount of task-specific information in it that would be difficult to remove. It could be generalized at this point. However, splitting this into My only caveat is making is clear that recipes should not encourage the proliferation of every possible RL technique under the sun - let's keep things focused aligned with the intention of titan. |
… rl/ - Move all files from rl/unified/ directly under rl/ (actors, models, scripts, etc.) - Remove rl/vllm_compat/ entirely (unused by unified code) - Rename types.py -> rl_types.py to avoid shadowing Python stdlib types module - Fix vllm.model_executor.layers.attention.Attention import for newer vLLM - Update experiment registry: rl.unified -> rl - Update all internal imports and README paths - Add rl_grpo_qwen3_0_6b_tp1 config for TP=1 testing
Leftover README after #2618
rl/