Skip to content

Conversation

@Ivan-267
Copy link
Collaborator

@Ivan-267 Ivan-267 commented Jan 7, 2026

Clean RL based RNN example.

With some help from an LLM, I modified the example to use GRU instead of LSTM (it's also possible to use vanilla RNN instead of GRU as switching between them is simple). I didn't make a comparison with the original LSTM, the idea was just to have an option with vanilla RNN/GRU.

I also added a checkpoint saving feature, basic loading/inference, and reporting min/max reward from the latest 40 episodes along with the avg to the terminal (it helps to see if the agent has discovered the "max reward", e.g. a success condition, and how far the average is from the maximum).

The hyperparameters were based on testing, they don't necessarily have to be optimal, but at least they worked to train at least one unreleased environment (I still need to check if they are the latest iteration I have locally, but if not, I can still update them later). I will test to verify later, but this example should be capable of training the newly added memory test env: edbeeching/godot_rl_agents_examples#58

Note that onnx export/inference is not featured in this example, as supporting RNNs requires some modifications on Godot plugin side too.

For more information, I'm copying the readme here:

CleanRL PPO GRU Discrete Actions example

This example is a modification of CleanRL PPO Atari LSTM,
it's adjusted to work with GDRL and vector obs, along with adding inference, changing the default params, and other modifications.

You may need to install tyro using pip install tyro. If you get an error while running the script: ModuleNotFoundError: No module named 'tyro', install it.

Observations:

  • Works with vector observations.

Actions:

  • Accepts a single discrete action space.

CL arguments unique to this example:

RNN settings:

By default, uses GRU. It can use vanilla RNN instead if you use the CL argument --use_vanilla_rnn

Checkpoint saving:

Example: Save checkpoint every 500_000 steps: --save_model_frequency_global_steps=500_000.
If you don't set this argument, the model will not be saved, only the logs.
The checkpoints will be saved inside the runs folder in a different folder for each run, you will see the full path displayed in console when a checkpoint is saved.

Inference:

Example use: --load_model_path=path_to_saved_file.pt --inference (set the true path to a checkpoint).

Other CL args should be similar to those described in https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_CLEAN_RL.md (but there is no onnx export/inference currently implemented for this example).

Add README for CleanRL PPO GRU Discrete Actions example
Updated comments to reflect additional changes and hyperparameter adjustments.
Added installation note for tyro.
Fixes some issues when loading on CUDA after training with --no-cuda, and updates hyperparams (to the last ones used while training an env that worked OK for that env, not necessarily globally optimal).
@Ivan-267 Ivan-267 requested a review from edbeeching January 7, 2026 14:43
@Ivan-267
Copy link
Collaborator Author

Ivan-267 commented Jan 8, 2026

I've trained this memory env using the script (I might have potentially modified some hyperparams locally vs the PR script, but I shared the ones used in the env description). It uses only local obs: a modified raycast sensor (distances ordered by physics layer), the movement/turn applied and normalized episode time.

It's my first env designed to use an RNN with Godot RL Agents after the textual one which we merged.

memory_env.mp4

https://github.com/Ivan-267/Memory-Find-Clue-Then-Goal-RL-Environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant