Replies: 1 comment 1 reply
-
There's no Axolotl feature around this. I recommend distillkit though it's opinionated. https://github.com/arcee-ai/DistillKit Or vibe coding a custom trainer that does what you want. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I understand that full online distillation support is still a work-in-progress.
In the meantime, could someone clarify the current status of offline distillation? Specifically, are the loss functions and related components considered complete and stable?
I’m training an 8B student model on v0.9.1 (the latest release that works smoothly in my python 3.10 environment) and have run several experiments. While I do see performance gains, they’re still below what I get from SFT-only (with teacher generation) checkpoints.
Before I dig further, I’d like to verify whether the training code is fully implemented—just in case any unfinished parts might be influencing the results. Any insights, known issues, or example configurations that delivered solid outcomes would be greatly appreciated.
Thanks in advance, and thanks for all your hard work on this project! 🙏
Beta Was this translation helpful? Give feedback.
All reactions