Hey PyTorch Team,
I recently started integrating torchforge in my workflow and I've been thinking about adding a few features that are not supported yet
-
What do you think about integration of on-policy distillation? I am currently building a pipeline for my experiments, and would love to contribute
-
Why did you decide to implement a separate engine for SFT instead of using the TitanTrainer?
Thank you!