-
Notifications
You must be signed in to change notification settings - Fork 763
Description
class M1(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 1
class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4
I use
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m2
and
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m1
The "avg_ep_r" in both models reaches 2.1 - 2.3 at around 5 million iterations. But when it comes to even 15 million iterations, the "avg_ep_r" still fluctuates between 2.1 and 2.3.
Just like the result they have shown( I guess that is the result of Action-repeat (frame-skip) of 1, without learning rate decay). I didn't change any parameters.
The strange thing is, even when I use model m2(Action-repeat (frame-skip) of 4), my result is similar to model m1.
The "avg_ep_r" fluctuates between 2.1 and 2.3 from around 5 million to 15 million iterations.
The max_ep_r fluctuates between 10 and 18 from around 5 million to 15 million iterations.
class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4
Do I need to change some parameters to reach the best result they have shown?
Thank you very much.
