Skip to content

Conversation

@xiongjyu
Copy link
Collaborator

@xiongjyu xiongjyu commented Oct 8, 2025

No description provided.

@xiongjyu
Copy link
Collaborator Author

目前基本实现了MCTS转PPO的基本过程,collect已经测试没有问题,但是在learn计算loss的时候存在一些小问题有待之后优化

@puyuan1996 puyuan1996 added the research Research work in progress label Oct 18, 2025
@puyuan1996
Copy link
Collaborator

puyuan1996 commented Oct 18, 2025

目前基本实现了MCTS转PPO的基本过程,collect已经测试没有问题,但是在learn计算loss的时候存在一些小问题有待之后优化

具体的问题可以记在这里哈,方便以后继续的开发。
另外目前ppo policy/value loss是每个序列只计算最后一步的,还是每步都计算然后平均呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

research Research work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants