130825 sync by cslr · Pull Request #7 · cslr/dinrhiw2

cslr · 2025-08-13T14:45:29Z

Pull request

…es (10% replacement rate per generation).

…(N=dim(neural-network-parameters)). It's SLOW but maybe results are now better.

…timated gradient using ES with number of population/iterations N = dim(neural-network-parameters) as recommended by ChatGPT.

…oodness in a group of agents (plays also against other agents and not just itself).

…= (o-mu)*a) so batch norm parameters are linear and can be polyak averaged more easily.

… polyak averaging of Q and policy network parameters (BatchNorm is only enabled in non-recurrent reinforcement model RIFL2 and not RIFL4)

…e it doesn't seem to work only with RIFL2).

…ion parameters after gradient step even if solution doesn't improve.

…Y_SIZE number of most recent samples.

…test

… to not to improve.

…rk a little better in practice hard problems.

… (FIXME: test that reinforcement learning still works with less initial iterations?)

…Cart Pole problem anymore or forgot the found solution quickly.

…e problem (doesn't seem to learn it???).

…uggy? + don't save/load etc). Reinforcement learning RIFL2 updates to support priority sampling (actions causing large changes preferred) of replay buffer. RIFL4 don't work for now.

…caused restarting to fail to continue where it was.

…k correctly with LayerNorm. (jacobian matrix layernorm calculation can be still buggy?).

…T WORK. So: DO NOT USE LAYER NORM.

…ision or something..

…ing of tau values.

…test

… value.

…ee if there are bugs.

…cent (creating training and testing dataset error if first try didnt succeed).

Tomas Ukkonen and others added 30 commits April 6, 2025 16:41

Added population evolution to evolution strategies.

53f37a0

EvolutionStrategies now replaces the worst solutions with the best on…

61dfaf4

…es (10% replacement rate per generation).

Added evolition strategies header file to dinrhiw.h.

97a84c1

comment fix.

d0fcb50

Fixes to Evolution Strategies code. Sigma is adjusted automatically now.

7c66383

Fixed EvolutionStrategies to have enough permuations of the solution …

4302d48

…(N=dim(neural-network-parameters)). It's SLOW but maybe results are now better.

In ES fixed population size to 1 so there is gradient descent with es…

bb4c7f7

…timated gradient using ES with number of population/iterations N = dim(neural-network-parameters) as recommended by ChatGPT.

Fixed number of player's back to 25 so reward is approximative mean g…

eef772e

…oodness in a group of agents (plays also against other agents and not just itself).

Fixed race condition: multiple parallel write access to multimap.

04cdb25

Updated improvements.

e7f9cad

Fixed python scripts to work with newest minihack python version.

933a97e

Fixed minihack2 to compile and run again (but don't work well).

0946841

Removed dataset normalized which had bugs.

f249237

Modfied batch norm sigma value to be stored as "a=1/sigma" (=> o_new …

d694b27

…= (o-mu)*a) so batch norm parameters are linear and can be polyak averaged more easily.

Modified Q network output layers to be linear and changed model to do…

b87fbd6

… polyak averaging of Q and policy network parameters (BatchNorm is only enabled in non-recurrent reinforcement model RIFL2 and not RIFL4)

Fixed use of lagged values to give better results.

c30a093

Minifix.

6ec096a

double Q implementation and (not enabled in RIFL2 recurrent RL becaus…

89d13ee

…e it doesn't seem to work only with RIFL2).

Added support for "stochastic" gradient updates, always updates solut…

093ab6d

…ion parameters after gradient step even if solution doesn't improve.

Removed object file from repo.

02f4408

Remove Makefile from repo

d1ed2f5

Added history size for most recent samples so that recent uses HISTOR…

bb081f1

…Y_SIZE number of most recent samples.

Merge branch 'RBM_test' of github.com:cslr/dinrhiw2-private into RBM_…

3b2d916

…test

Fixed not using random actions when hasModel is still zero.

797bf9b

Fixed get/set hasModel value reading/setting.

e9b8bbc

Increased number of iterations in Adam optimizer so it is more likely…

8f3346a

… to not to improve.

Enabled stochastic gradient descent in ADAM optimizer which should wo…

0daa5e0

…rk a little better in practice hard problems.

Reduced number of iterations initially calculated in RIFL2 and RIFL4.…

adcc5cb

… (FIXME: test that reinforcement learning still works with less initial iterations?)

Returned to previous settings in RIFL2 and RIFL4. RIFL2 didn't learn …

2fdf6e5

…Cart Pole problem anymore or forgot the found solution quickly.

Added CartPole4 testcase. recurrent RL learning for a simple Cart Pol…

7319f63

…e problem (doesn't seem to learn it???).

Tomas Ukkonen and others added 21 commits May 4, 2025 22:20

PARTIAL implementation of LayerNorm (jacobian matrix calculation is b…

12f96f7

…uggy? + don't save/load etc). Reinforcement learning RIFL2 updates to support priority sampling (actions causing large changes preferred) of replay buffer. RIFL4 don't work for now.

Fixed bug in load() and save() handling hasModel parameters wrongly.

8b4e593

RIFL2 abstract now correctly saves and loads episodes data, this bug …

6ab36dd

…caused restarting to fail to continue where it was.

number of iters tune.

19198af

Fixed bug in the load() code and hasModel.

dd1d152

Fixed load()/save() bugs and errors. Now save() and load() should wor…

1ae97a9

…k correctly with LayerNorm. (jacobian matrix layernorm calculation can be still buggy?).

Updates.

9ad145d

Fixed error with failed load() when LayerNorm is not set correctly.

45e2028

Calculation of jacobian() of LayerNorm neural network currently DO NO…

de8dd7b

…T WORK. So: DO NOT USE LAYER NORM.

Added error message when dimensions mismatch.

0e257cf

LayerNorm mods, DOES NOT WORK!

8de9822

Renamed SHA LITTLE_ENDIAN variable to LITTLE_ENDIAN_ to fix name coll…

302e818

…ision or something..

Fixed RIFL_abstract4 (recurrent reinforcement learning) to allow sett…

ddd2fc3

…ing of tau values.

Merge branch 'RBM_test' of github.com:cslr/dinrhiw2-private into RBM_…

6001afa

…test

Disabled LayerNorm which is NOT yet computed correctly.

f70194b

RIFL2 now calculates performance number of after-effect reinforcement…

bf52a49

… value.

RIFL4 after-effects first version compiles now, requires testing to s…

10b47c1

…ee if there are bugs.

Fixed a bug in after-effects code in load().

dc8796f

Fixes, use assert() and dont throw exception. Bugfix in Policy4GradAs…

9fc17e1

…cent (creating training and testing dataset error if first try didnt succeed).

Fixed compilation error.

a54d344

Documentation of conv_test.cpp, radix sort of floating point numbers.

4c228f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

130825 sync#7

130825 sync#7
cslr wants to merge 51 commits into060425-syncfrom
130825-sync

cslr commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cslr commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant