add the general model for best of three game#78
Open
wanjeans33 wants to merge 1 commit intolinyiLYi:masterfrom
Open
add the general model for best of three game#78wanjeans33 wants to merge 1 commit intolinyiLYi:masterfrom
wanjeans33 wants to merge 1 commit intolinyiLYi:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
林哥你好,
小老弟是你的老粉丝了,这次用了你的项目做了个小组作业,感谢你的分享!
在你的基础上我做了一些尝试,试图训练一个可以稳定打赢整场的AI。我先试图用随机的第一和第二局来训练,但是结果不是很理想。可能是由于敌人起始变化太大,最后模型(10m-steps)结果达到胜率58%。
随后尝试了使用整个三局两胜进行训练,重构了steps中的done条件。加入了self.jump和self.round_end,用于跳过过场和记录round是否结束。在经过了大致5m steps后reward基本收敛。测试以后达到了98%的胜率。非常令人激动!!
以下是我的tensorboard训练结果蓝线是random训练结果,紫线是entire match训练结果

我pull了general(三局两胜)的代码与结果,random的方法我再尝试通过调试reward function 获得更快的学习速率暂时就不上传了,希望能够pull我的结果给大家一起分享。
祝一切安好!
Jing WANG