Skip to content

Conversation

DNXie
Copy link
Member

@DNXie DNXie commented Oct 16, 2025

Made it clear in the config where to specify the checkpoint saving folder. And added some necessary comments to clarify the behavior.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2025
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.63%. Comparing base (633b219) to head (8b9d30c).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #444      +/-   ##
==========================================
- Coverage   64.69%   64.63%   -0.06%     
==========================================
  Files          79       79              
  Lines        7775     7788      +13     
==========================================
+ Hits         5030     5034       +4     
- Misses       2745     2754       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

enable: true
initial_load_path: hf://${model}
initial_load_in_hf: true
folder: ./checkpoint # The folder to save checkpoints to.
Copy link
Contributor

@JenniferWang JenniferWang Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we control these config fields and we should be opinionated on exposing RL friendly config field names and re-map it to TorchTitan fields internally.

Right now, some TorchTitan config names are really confusing: e.g. the ref model logically should not need checkpointing but still it requires checkpoint.enable = true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah was discussing this with @joecummings too.. ultimately we probably want to provide some kind of internal config mapping that we execute as a kind of training script post-init before we do any of the actual setup of actors (e.g. we can even bake it into our config.parse decorator).

@DNXie DNXie merged commit 25d6098 into meta-pytorch:main Oct 17, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants