Skip to content

Commit df96d5c

Browse files
maryamhonariandrewcohtaozhuo
authored andcommitted
Develop custom trainers (#73)
* Make create_policy more generic (#54) * add on/off policy classes and inherit from * trainers as plugins * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) Co-authored-by: mahon94 <[email protected]> * Online/offline custom trainer examples with plugin system (#52) * add on/off policy classes and inherit from * trainers as plugins * a2c trains * remove swap files * clean up registration debug * clean up all pre-commit * a2c plugin pass precommit * move gae to trainer utils * move lambda return to trainer util * add validator for num_epoch * add types for settings/type methods * move create policy into highest level api * move update_reward_signal into optimizer * move get_policy into Trainer * remove get settings type * dummy_config settings * move all stats from actor into dict, enables arbitrary actor data * remove shared_critic flag, cleanups * refactor create_policy * remove sample_actions, evaluate_actions, update_norm from policy * remove comments * fix return type get stat * update poca create_policy * clean up policy init * remove conftest * add sharedecritic to settings * fix test_networks * fix test_policy * fix test network * fix some ppo/sac tests * add back conftest.py * improve specification of trainer type * add defaults fpr trainer_type/hyperparam * fix test_saver * fix reward providers * add settings check utility for tests * fix some settings tests * add trainer types to run_experiment * type check for arbitary actor data * cherrypick rename ml-agents/trainers/torch to torch_entities (#55) * make all trainers types and setting visible at module level * remove settings from run_experiment console script * fix test_settings and upgrade config scripts * remove need of trainer_type argument up to trainefactory * fix gohst trainer behavior id in policy Queue * fix torch shadow in tests * update trainers, rl trainers tests * update tests to match the refactors * fixing behavior name in ghost trainer * update ml-agents-envs test configs * fix precommit * separating the plugin package changes * bring get_policy back for sake of ghost trainer * add return types and remove unused returns * remove duplicate methods in poca (_update_policy, add_policy) * add a2c trainer back * Add DQN cleaned up trainer/optimizer * nit naming * fix logprob/entropy types in torch_policy.py * clean up DQN/SAC * add docs for custom trainers,TODO: refrence tutorial * add docs for custom trainers,TODO: refrence tutorial * add clipping to loss function * set old importlim-metadata version * bump precomit hook env to 3.8.x * use smooth l1 loss Co-authored-by: mahon94 <[email protected]> * add tutorial for validation * fix formatting errors * clean up * minor changes Co-authored-by: Andrew Cohen <[email protected]> Co-authored-by: zhuo <[email protected]>
1 parent 7e0b511 commit df96d5c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+3230
-938
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,6 @@ repos:
128128
- id: generate-markdown-docs
129129
name: generate markdown docs
130130
language: python
131-
entry: ./utils/generate_markdown_docs.py --package_dirs ml-agents-envs
131+
entry: ./utils/generate_markdown_docs.py --package_dirs ml-agents-envs ml-agents
132132
pass_filenames: false
133133
additional_dependencies: [pyyaml, pydoc-markdown==3.10.1]

conftest.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from filelock import FileLock
1515

1616
# TODO: Use this in all ml-agents tests so they can all run in parallel.
17+
import mlagents.plugins.trainer_type
1718

1819
_BASE_PORT = 6005
1920

@@ -76,3 +77,8 @@ def test_something(base_port: int) -> None:
7677
:return: The base port number.
7778
"""
7879
return PortAllocator().reserve_n_ports(n_ports)
80+
81+
82+
@pytest.fixture(scope="session", autouse=True)
83+
def setup_plugin_trainers():
84+
_, _ = mlagents.plugins.trainer_type.register_trainer_plugins()

docs/Python-Custom-Trainer-Plugin.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Unity Ml-Agents Custom trainers Plugin
2+
3+
As an attempt to bring a wider variety of reinforcement learning algorithms to our users, we have added custom trainers
4+
capabilities. we introduce an extensible plugin system to define new trainers based on the High level trainer API
5+
in `Ml-agents` Package. This will allow rerouting `mlagents-learn` CLI to custom trainers and extending the config files
6+
with hyper-parameters specific to your new trainers. We will expose a high-level extensible trainer (both on-policy,
7+
and off-policy trainers) optimizer and hyperparameter classes with documentation for the use of this plugin. For more
8+
infromation on how python plugin system works see [Plugin interfaces](Training-Plugins.md).
9+
10+
## Overview
11+
To add new custom trainers to ML-agents, you would need to create a new python package.
12+
To give you an idea of how to structure your package, we have created a [mlagents_trainer_plugin](../ml-agents-trainer-plugin) package ourselves as an
13+
example, with implementation of `A2c` and `DQN` algorithms. You would need a `setup.py` file to list extra requirements and
14+
register the new RL algorithm in ml-agents ecosystem and be able to call `mlagents-learn` CLI with your customized
15+
configuration.
16+
17+
18+
```shell
19+
├── mlagents_trainer_plugin
20+
│ ├── __init__.py
21+
│ ├── a2c
22+
│ │ ├── __init__.py
23+
│ │ ├── a2c_3DBall.yaml
24+
│ │ ├── a2c_optimizer.py
25+
│ │ └── a2c_trainer.py
26+
│ └── dqn
27+
│ ├── __init__.py
28+
│ ├── dqn_basic.yaml
29+
│ ├── dqn_optimizer.py
30+
│ └── dqn_trainer.py
31+
└── setup.py
32+
```
33+
## Installation and Execution
34+
To install your new package, you need to have `ml-agents-env` and `ml-agents` installed following by the installation of
35+
plugin package.
36+
37+
```shell
38+
> pip3 install -e ./ml-agents-envs && pip3 install -e ./ml-agents
39+
> pip install -e <./ml-agents-trainer-plugin>
40+
```
41+
42+
Following the previous installations your package is added as an entrypoint and you can use a config file with new
43+
trainers:
44+
```shell
45+
mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id <run-id-name>
46+
--env <env-executable>
47+
```
48+
49+
## Tutorial
50+
Here’s a step-by-step [tutorial](.) on how to write a setup file and extend ml-agents trainers, optimizers, and
51+
hyperparameter settings.To extend ML-agents classes see references on
52+
[trainers](Python-On-Off-Policy-Trainer-Documentation.md) and [Optimizer](Python-Optimizer-Documentation.md).

0 commit comments

Comments
 (0)