Skip to content

130825 sync#7

Open
cslr wants to merge 51 commits into060425-syncfrom
130825-sync
Open

130825 sync#7
cslr wants to merge 51 commits into060425-syncfrom
130825-sync

Conversation

@cslr
Copy link
Copy Markdown
Owner

@cslr cslr commented Aug 13, 2025

Pull request

Tomas Ukkonen and others added 30 commits April 6, 2025 16:41
…(N=dim(neural-network-parameters)). It's SLOW but maybe results are now better.
…timated gradient using ES with number of population/iterations N = dim(neural-network-parameters) as recommended by ChatGPT.
…oodness in a group of agents (plays also against other agents and not just itself).
…= (o-mu)*a) so batch norm parameters are linear and can be polyak averaged more easily.
… polyak averaging of Q and policy network parameters (BatchNorm is only enabled in non-recurrent reinforcement model RIFL2 and not RIFL4)
…e it doesn't seem to work only with RIFL2).
…ion parameters after gradient step even if solution doesn't improve.
…rk a little better in practice hard problems.
… (FIXME: test that reinforcement learning still works with less initial iterations?)
…Cart Pole problem anymore or forgot the found solution quickly.
Tomas Ukkonen and others added 21 commits May 4, 2025 22:20
…uggy? + don't save/load etc). Reinforcement learning RIFL2 updates to support priority sampling (actions causing large changes preferred) of replay buffer. RIFL4 don't work for now.
…caused restarting to fail to continue where it was.
…k correctly with LayerNorm. (jacobian matrix layernorm calculation can be still buggy?).
…cent (creating training and testing dataset error if first try didnt succeed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant