Skip to content

Update hyper params and set seeds #3384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2025
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions intermediate_source/reinforcement_q_learning.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@
"cpu"
)

# set the seeds for reproducibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, we already doing it in the CI, not sure if it's helpful to do something like that for all users...
May be add a paragraph saying to uncomment those if you want fixed output all the time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good practice to have it being part of the script and I'd keep it here.
It helps when you run it locally - RL is very seed dependent usually

seed = 42
random.seed(seed)
torch.manual_seed(seed)
env.reset(seed=seed)
env.action_space.seed(seed)
env.observation_space.seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)


######################################################################
# Replay Memory
Expand Down Expand Up @@ -253,13 +263,14 @@ def forward(self, x):
# EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
# TAU is the update rate of the target network
# LR is the learning rate of the ``AdamW`` optimizer

BATCH_SIZE = 128
GAMMA = 0.99
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 1000
EPS_START = 1
EPS_END = 0.01
EPS_DECAY = 2500
TAU = 0.005
LR = 1e-4
LR = 5e-4

# Get number of actions from gym action space
n_actions = env.action_space.n
Expand Down
Loading