Fix: Reward-hacking SectorCREnv and observation airspeed#45
Draft
StefanHamm wants to merge 2 commits intoTUDelft-CNS-ATM:mainfrom
Draft
Fix: Reward-hacking SectorCREnv and observation airspeed#45StefanHamm wants to merge 2 commits intoTUDelft-CNS-ATM:mainfrom
StefanHamm wants to merge 2 commits intoTUDelft-CNS-ATM:mainfrom
Conversation
…ation and integrating it into the reward system
Author
|
#44 Fixes |
Author
|
For the reward formulation maybe use the cas speed. E.g. safe speeds to operate the airplane so enough lift is generated. And keep tas for overall speed which is importand to know how fast flying through airspace. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactored speed inputs from TAS to CAS to align with speedupdate logic. Normalized the input by dividing by D_VELOCITY to give the agent relative deviation feedback rather than absolute values. Additionally, introduced a penalty term for speed changes to reduce reward hacking.