Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented Jul 24, 2025

This PR is a major refactor to bring the project in line with recent versions of TRL and to enable support for the SmolLM3 recipe. Some things may differ slightly from before (e.g. the train/test splits from the data mixture), but overall functionality should be the same.

TODO

  • Test SFT and DPO work with Zephyr
  • Fix the SmolLM3 recipe to use public models and test each step runs
  • Refactor ORPO and test it works
  • Update all remaining configs with the new dataset mixer
  • Update unit tests

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lewtun lewtun changed the title [WIP] Upgrade data mixer, deps, and scripts Upgrade data mixer, deps, and scripts Jul 24, 2025
@lewtun
Copy link
Member Author

lewtun commented Jul 24, 2025

cc @BramVanroy I have removed your CPT recipe for now as the mid-training dataset isn't compatible with this version of datasets (loading scripts deprecated) and the SFT model of your seems to have been deleted. Happy to add it back once the artifacts are back online!

@lewtun lewtun merged commit 19f0345 into main Jul 24, 2025
2 checks passed
@lewtun lewtun deleted the upgrade branch July 24, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants