Feature Normalization (again?) #65

Bam4d · 2021-11-30T10:39:22Z

Bam4d
Nov 30, 2021
Collaborator

I think there was a dicussion (that I can't find anymore) where people wre talking about input normalisation. I can probably move this discussion to the existing place. Or remove this if the question has been adequately answered and we have a solution.

I'm no expert in feature normalization I only know "its good and you should do it in PPO brrr", so maybe this question has already been answered

An example:
A particular environment may have several entities with "health" values that all have different max values.
During training it is very common to see an entity with low max health. So this health value is normalized around this "low health entity"
At some point the agent learns some policy which results in the introduction of an entity with significantly higher max health value.

I can see here two problems:

The "warm up" phase to normalize this "health" value will only normalise taking into consideration the common entity.
This new entity being introduced will affect all normalisation of other entities at random points in the training, possibly destabilizing the entire process.

maybe we need to force environment wrappers to normalize all scalar values between 0-1?
normalization could also be seperate per entity... but then how do we "warm up" rare entities that are only seen in specific high trained policies?

Bam4d · 2021-11-30T10:47:19Z

Bam4d
Nov 30, 2021
Collaborator Author

It looks like input normalization is performed per-entity: https://github.com/entity-neural-network/incubator/blob/main/rogue_net/rogue_net/embedding_creator.py

so that answers one question, the other question about how to "warm up" rare entities still stands I think.

1 reply

cswinter Nov 30, 2021
Maintainer

I think this might not actually be that much of a problem. The normalization and clipping should prevent any major instabilities caused from sudden large feature values in most cases. If an entity is very rare, the first few times it's encountered the policy won't really be able to do anything with it since the feature values are still shifting. But RL already needs a few hundred samples or more to learn anything meaningful, and at that point, the feature statistics should be estimated quite accurately. The same reasoning applies to any sudden change in feature values that the policy hasn't encountered before. Something we could try is adding a way of manually preinitializing the feature normalization statistics (e.g. from a previous run), which might speed up training a little bit in some cases.

There are probably cases of environments with feature distributions that our normalization code doesn't deal with well so we should keep that in the back of our mind as well add new environments that might run into this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Normalization (again?) #65

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Feature Normalization (again?) #65

Uh oh!

Uh oh!

Bam4d Nov 30, 2021 Collaborator

Replies: 1 comment · 1 reply

Uh oh!

Bam4d Nov 30, 2021 Collaborator Author

Uh oh!

Uh oh!

cswinter Nov 30, 2021 Maintainer

Bam4d
Nov 30, 2021
Collaborator

Replies: 1 comment 1 reply

Bam4d
Nov 30, 2021
Collaborator Author

cswinter Nov 30, 2021
Maintainer