Skip to content

Refactor Algorithm-Related Code for Better Maintainability and Extensibility #59

@pan-x-c

Description

@pan-x-c

In order to improve testability and maintenance, and also to make it easier to add new algorithms in the future, we propose a comprehensive refactoring of the algorithm-related code. The detailed plan is as follows:

  1. Create an algorithm module to centralize the implementation of various algorithms. This includes:

    • advantage_fn
    • policy_loss_fn
    • kl_loss_fn
    • entropy_loss_fn
    • read_strategy
  2. Extend AlgorithmConfig so that the Trainer can select and configure the appropriate algorithm implementations based on configuration.

  3. Remove SFT/DPO/RFT-specific logic from the current Trainer, and replace it with a unified train_step abstraction.

  4. Add documentation to the Developer Guide explaining how to implement and integrate new algorithms.

This refactoring will improve code clarity, reduce coupling, and streamline the process of adding and testing new algorithms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions