Skip to content

建议引入Training-Free GRPO这个免参数更新、无需训练的强化学习算法 #6396

@QiZishi

Description

@QiZishi

腾讯优图提出了一种无需更新模型参数的强化学习范式Training-Free GRPO,对于GPU资源有限的开发者来说,可以只通过调用模型API服务就可以低成本得到一个专精于某一垂直领域的垂直大模型或专业智能体。在此,恳请ms-swift开发团队考虑将Training-Free GRPO引入到ms-swift框架中,作为一种新的类grpo强化学习范式,扩充现有ms-swift的强化学习框架,感谢!
论文标题:Training-Free Group Relative Policy Optimization
arXiv 链接:https://arxiv.org/abs/2510.08191
GitHub 地址:https://github.com/TencentCloudADP/youtu-agent/tree/training_free_GRPO

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions