Skip to content

ipro的训练代码中奖励函数是不可微的,训练中梯度不会更新,和论文中描述不一致,请问是哪里有问题呢? #300

@zfnice

Description

@zfnice

辛苦解答一下

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions