-
Notifications
You must be signed in to change notification settings - Fork 11
Algorithm Training Metrics++ #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
294b4ec
02e9544
b05076e
b8130f0
7586a2b
0303a6e
652a0b5
2db4f05
8916eb8
62c1e5f
3deb5ed
2fbc714
35068ea
aa76a25
d4dbfd4
9d48ec9
985fea3
4fbfbc3
f738b0a
0168a37
b1df2a8
e1bbbdd
42c7256
cab74c0
73c689e
951fede
d486585
04eb93d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -68,7 +68,7 @@ def __init__( | |||
|
|
||||
| self.policy_update_freq = config.policy_update_freq | ||||
|
|
||||
| self.policy_noise = config.policy_noise | ||||
| self.policy_noise = config.policy_noise_end | ||||
| self.policy_noise_clip = config.policy_noise_clip | ||||
|
||||
| self.policy_noise_clip = config.policy_noise_clip |
Copilot
AI
Feb 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MATD3 updates per-agent policy_noise/action_noise via each agent’s schedulers, but target policy smoothing uses MATD3.self.policy_noise (see train(): noise = randn_like(...) * self.policy_noise). self.policy_noise is initialized from config.policy_noise_end and is never updated, so smoothing noise won’t follow the configured schedule (and will start at the end value). Consider either (1) adding a scheduler in MATD3 and updating self.policy_noise each training_step (mirroring TD3), or (2) using a value derived from the agents’ policy_noise (e.g., agent.policy_noise) so smoothing noise matches the decayed setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable naming is inconsistent. The local variable is called
epsilon_schedulerbut it's now a LinearScheduler for entropy coefficients in MAPPO. Consider renaming it toentropy_schedulerfor clarity, as the entropy coefficient is not epsilon in the traditional sense.