TD3 - Deep Reinforcement Learning

After the DDPG, some improvements have been proposed, by considering them together this became a new model, called Twin-Delayed DDPG (TD3) [1]. Please find theoretical information about the TD3 model in the td3.pdf file.

Scripts

launcher_grid_calibration.py : Grid search for hyperparameters calibration.
launcher_agent_training.py : Complete training of an agent.
launcher_optimal_agent_playing.py : To see an optimal agent interacting with the environment.

Running the scripts with Docker

Install Docker Engine.
Install Docker Desktop (you may skip this step).
Make sure Make is installed and available on CLI.
Run any of the commands below.

launcher_grid_calibration.py : make grid.
launcher_agent_training.py : make train.
launcher_optimal_agent_playing.py : make play.

A container with respective name will be created in Docker, wait for its execution to finish and check the results in container logs.

If for any reason you cannot run Make command, simply copy/paste content of the command needed from the Makefile.

Running without Docker

Install Python 3.8.
Install pip if not installed already and make sure it's available on the CLI.
Run pip install -r requirements.txt in the project folder to install necessary packages.
Run python3.8 file_name or py3.8 file_name if on Windows, replacing file_name with any file mentioned earlier.

Environment

The chosen environment is the Gym 'Pendulum-v0', a simple one where the agent has to swing up an inverted pendulum. In order to transpose the code into other environments, a modular implementation has been proposed. The main point to manage in order to do it is the Preprocessor class, and to adjust the hyperparameters.

Hyperparameters Calibration

A simple hyperparameter optimization has been performed, a grid search on the learning rate for the three neural networks. With the current code, it is easy to perform an optimization on other hyperparameters if necessary.

Data Structure

                    The environment will provide
+-------------+     a new state and a new reward       +---------------------------------------------+
| Environment |     at each new step                   |                TD3 Agent                    |
+-------------+                                        +---------------------------------------------+
|             +----------------------------------------> +-----------------------------------------+ |
|             |                                        | |      Actor       |        Critic        | |
|             <----------------------------------------+ +------------------+----------------------+ |
+-------------+     The actor will choose an action    |                                             |
                    with respect to the state          |                                             |
                                                       +------------------+--^-----------------------+
                                                                          |  |
               +                                  +   The agent will store|  |
               |                                  |   new experiences in  |  | The agent will ask for
               +----------------------------------+   the memory when     |  | past experiences in
                    If necessary, the                 tackling a step     |  | order to learn
                    communication with the                                |  |
                    environment will require          (PyTorch tensors)   |  |
                    to convert in PyTorch tensors                         |  | (PyTorch tensors)
                    the informations at each step,                        |  |
                    and vice versa                    +-------------------v--+----------------------+
                                                      |                  Memory                     |
                                                      +---------------------------------------------+
                                                      |                                             |
                                                      |                                             |
                                                      +---------------------------------------------+

                                                          The memory only interact with the agent
                                                          the experience will be stored as PyTorch
                                                          tensors

Results

The current optimal agent has a mean score over 100 episode oscillating in general between -150 and -170.

Future Work

It seems that some of the hyperparameters have a huge influence on the convergence speed. It would be suitable in the future to explore further hyperparameters optimization methods, such bayesian optimisation. However, this is very challenging due to the high dimension of the hyperparameters space (Agent's hyperparameters + 3 Neural Networks ones). A rigorous hyperparameters optimization would require a lot of computing power.

Another possible path is to implement a Prioritized Experience Replay in order to accelerate the convergence by choosing the right sample during the training process.

References

[1] Fujimoto, Scott and Hoof, Herke and Meger, David. Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning, 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
agent		agent
configurator		configurator
deploy		deploy
optimal_agent		optimal_agent
output		output
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
launcher_agent_training.py		launcher_agent_training.py
launcher_grid_calibration.py		launcher_grid_calibration.py
launcher_optimal_agent_playing.py		launcher_optimal_agent_playing.py
requirements.txt		requirements.txt
td3.pdf		td3.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TD3 - Deep Reinforcement Learning

Scripts

Running the scripts with Docker

Running without Docker

Environment

Hyperparameters Calibration

Data Structure

Results

Future Work

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TD3 - Deep Reinforcement Learning

Scripts

Running the scripts with Docker

Running without Docker

Environment

Hyperparameters Calibration

Data Structure

Results

Future Work

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages