-
Notifications
You must be signed in to change notification settings - Fork 48
Description
In order to improve testability and maintenance, and also to make it easier to add new algorithms in the future, we propose a comprehensive refactoring of the algorithm-related code. The detailed plan is as follows:
-
Create an
algorithmmodule to centralize the implementation of various algorithms. This includes:advantage_fnpolicy_loss_fnkl_loss_fnentropy_loss_fnread_strategy
-
Extend
AlgorithmConfigso that theTrainercan select and configure the appropriate algorithm implementations based on configuration. -
Remove SFT/DPO/RFT-specific logic from the current
Trainer, and replace it with a unifiedtrain_stepabstraction. -
Add documentation to the Developer Guide explaining how to implement and integrate new algorithms.
This refactoring will improve code clarity, reduce coupling, and streamline the process of adding and testing new algorithms.