Add CleanRL RNN (modified to GRU/vanilla RNN) example #250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Clean RL based RNN example.
With some help from an LLM, I modified the example to use GRU instead of LSTM (it's also possible to use vanilla RNN instead of GRU as switching between them is simple). I didn't make a comparison with the original LSTM, the idea was just to have an option with vanilla RNN/GRU.
I also added a checkpoint saving feature, basic loading/inference, and reporting min/max reward from the latest 40 episodes along with the avg to the terminal (it helps to see if the agent has discovered the "max reward", e.g. a success condition, and how far the average is from the maximum).
The hyperparameters were based on testing, they don't necessarily have to be optimal, but at least they worked to train at least one unreleased environment (I still need to check if they are the latest iteration I have locally, but if not, I can still update them later). I will test to verify later, but this example should be capable of training the newly added memory test env: edbeeching/godot_rl_agents_examples#58
Note that onnx export/inference is not featured in this example, as supporting RNNs requires some modifications on Godot plugin side too.
For more information, I'm copying the readme here:
CleanRL PPO GRU Discrete Actions example
This example is a modification of CleanRL PPO Atari LSTM,
it's adjusted to work with GDRL and vector obs, along with adding inference, changing the default params, and other modifications.
You may need to install tyro using
pip install tyro. If you get an error while running the script:ModuleNotFoundError: No module named 'tyro', install it.Observations:
Actions:
CL arguments unique to this example:
RNN settings:
By default, uses GRU. It can use vanilla RNN instead if you use the CL argument
--use_vanilla_rnnCheckpoint saving:
Example: Save checkpoint every 500_000 steps:
--save_model_frequency_global_steps=500_000.If you don't set this argument, the model will not be saved, only the logs.
The checkpoints will be saved inside the
runsfolder in a different folder for each run, you will see the full path displayed in console when a checkpoint is saved.Inference:
Example use:
--load_model_path=path_to_saved_file.pt --inference(set the true path to a checkpoint).Other CL args should be similar to those described in https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_CLEAN_RL.md (but there is no onnx export/inference currently implemented for this example).