Skip to content

Commit 9eefcd3

Browse files
Ervin Tshihzy
authored andcommitted
Release 0.9.0 docs checklist and cleanup - v2 (#2372)
* Included explicit version # for ZN * added explicit version for KR docs * minor fix in installation doc * Consistency with numbers for reset parameters * Removed extra verbiage. minor consistency * minor consistency * Cleaned up IL language * moved parameter sampling above in list * Cleaned up language in Env Parameter sampling * Cleaned up migrating content * updated consistency of Reset Parameter Sampling * Rename Training-Generalization-Learning.md to Training-Generalization-Reinforcement-Learning-Agents.md * Updated doc link for generalization * Rename Training-Generalization-Reinforcement-Learning-Agents.md to Training-Generalized-Reinforcement-Learning-Agents.md * Re-wrote the intro paragraph for generalization * add titles, cleaned up language for reset params * Update Training-Generalized-Reinforcement-Learning-Agents.md * cleanup of generalization doc * More cleanup in generalization * Fixed title * Clean up included sampler type section * cleaned up defining new sampler type in generalization * cleaned up training section of generalization * final cleanup for generalization * Clean up of Training w Imitation Learning doc * updated link for generalization, reordered * consistency fix * cleaned up training ml agents doc * Update and rename Profiling.md to Profiling-Python.md * Updated Python profiling link * minor clean up in profiling doc * Rename Training-BehavioralCloning.md to Training-Behavioral-Cloning.md * Updated link to BC * Rename Training-RewardSignals.md to Reward-Signals.md * fix reward links to new * cleaned up reward signal language * fixed broken links to reward signals * consistency fix * Updated readme with generalization * Added example for GAIL reward signal * minor fixes and consistency to Reward Signals * referencing GAIL in the recording demonstration * consistency * fixed desc of bc and gail * comment fix * comments fix * Fix broken links * Fix grammar in Overview for IL * Add optional params to reward signals comment to GAIL
1 parent 6c20869 commit 9eefcd3

16 files changed

+354
-308
lines changed

docs/Installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ If you installed this correctly, you should be able to run
6363
`mlagents-learn --help`, after which you will see the Unity logo and the command line
6464
parameters you can use with `mlagents-learn`.
6565

66-
By installing the `mlagents` package, its dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
66+
By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
6767
Some of the primary dependencies include:
6868

6969
- [TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support)

docs/Learning-Environment-Examples.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ If you would like to contribute environments, please see our
3232
* Vector Observation space: One variable corresponding to current state.
3333
* Vector Action space: (Discrete) Two possible actions (Move left, move
3434
right).
35-
* Visual Observations: None.
35+
* Visual Observations: None
3636
* Reset Parameters: None
3737
* Benchmark Mean Reward: 0.94
3838

@@ -56,7 +56,7 @@ If you would like to contribute environments, please see our
5656
* Vector Action space: (Continuous) Size of 2, with one value corresponding to
5757
X-rotation, and the other to Z-rotation.
5858
* Visual Observations: None.
59-
* Reset Parameters: Three, corresponding to the following:
59+
* Reset Parameters: Three
6060
* scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
6161
* Default: 1
6262
* Recommended Minimum: 0.2
@@ -116,8 +116,8 @@ If you would like to contribute environments, please see our
116116
of ball and racket.
117117
* Vector Action space: (Continuous) Size of 2, corresponding to movement
118118
toward net or away from net, and jumping.
119-
* Visual Observations: None.
120-
* Reset Parameters: Three, corresponding to the following:
119+
* Visual Observations: None
120+
* Reset Parameters: Three
121121
* angle: Angle of the racket from the vertical (Y) axis.
122122
* Default: 55
123123
* Recommended Minimum: 35
@@ -153,7 +153,7 @@ If you would like to contribute environments, please see our
153153
`VisualPushBlock` scene. __The visual observation version of
154154
this environment does not train with the provided default
155155
training parameters.__
156-
* Reset Parameters: Four, corresponding to the following:
156+
* Reset Parameters: Four
157157
* block_scale: Scale of the block along the x and z dimensions
158158
* Default: 2
159159
* Recommended Minimum: 0.5
@@ -194,8 +194,8 @@ If you would like to contribute environments, please see our
194194
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
195195
* Side Motion (3 possible actions: Left, Right, No Action)
196196
* Jump (2 possible actions: Jump, No Action)
197-
* Visual Observations: None.
198-
* Reset Parameters: 4, corresponding to the height of the possible walls.
197+
* Visual Observations: None
198+
* Reset Parameters: Four
199199
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8
200200

201201
## [Reacher](https://youtu.be/2N9EoF6pQyE)
@@ -213,7 +213,7 @@ If you would like to contribute environments, please see our
213213
* Vector Action space: (Continuous) Size of 4, corresponding to torque
214214
applicable to two joints.
215215
* Visual Observations: None.
216-
* Reset Parameters: Five, corresponding to the following
216+
* Reset Parameters: Five
217217
* goal_size: radius of the goal zone
218218
* Default: 5
219219
* Recommended Minimum: 1
@@ -254,7 +254,7 @@ If you would like to contribute environments, please see our
254254
angular acceleration of the body.
255255
* Vector Action space: (Continuous) Size of 20, corresponding to target
256256
rotations for joints.
257-
* Visual Observations: None.
257+
* Visual Observations: None
258258
* Reset Parameters: None
259259
* Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
260260
* Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
@@ -284,7 +284,7 @@ If you would like to contribute environments, please see our
284284
`VisualBanana` scene. __The visual observation version of
285285
this environment does not train with the provided default
286286
training parameters.__
287-
* Reset Parameters: Two, corresponding to the following
287+
* Reset Parameters: Two
288288
* laser_length: Length of the laser used by the agent
289289
* Default: 1
290290
* Recommended Minimum: 0.2
@@ -318,7 +318,7 @@ If you would like to contribute environments, please see our
318318
`VisualHallway` scene. __The visual observation version of
319319
this environment does not train with the provided default
320320
training parameters.__
321-
* Reset Parameters: None.
321+
* Reset Parameters: None
322322
* Benchmark Mean Reward: 0.7
323323
* To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
324324
* Optional Imitation Learning scene: `HallwayIL`.
@@ -340,8 +340,8 @@ If you would like to contribute environments, please see our
340340
banana.
341341
* Vector Action space: (Continuous) 3 corresponding to agent force applied for
342342
the jump.
343-
* Visual Observations: None.
344-
* Reset Parameters: Two, corresponding to the following
343+
* Visual Observations: None
344+
* Reset Parameters: Two
345345
* banana_scale: The scale of the banana in the 3 dimensions
346346
* Default: 150
347347
* Recommended Minimum: 50
@@ -375,8 +375,8 @@ If you would like to contribute environments, please see our
375375
* Striker: 6 actions corresponding to forward, backward, sideways movement,
376376
as well as rotation.
377377
* Goalie: 4 actions corresponding to forward, backward, sideways movement.
378-
* Visual Observations: None.
379-
* Reset Parameters: Two, corresponding to the following:
378+
* Visual Observations: None
379+
* Reset Parameters: Two
380380
* ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
381381
* Default: 7.5
382382
* Recommended minimum: 4
@@ -409,8 +409,8 @@ If you would like to contribute environments, please see our
409409
velocity, and angular velocities of each limb, along with goal direction.
410410
* Vector Action space: (Continuous) Size of 39, corresponding to target
411411
rotations applicable to the joints.
412-
* Visual Observations: None.
413-
* Reset Parameters: Four, corresponding to the following
412+
* Visual Observations: None
413+
* Reset Parameters: Four
414414
* gravity: Magnitude of gravity
415415
* Default: 9.81
416416
* Recommended Minimum:
@@ -450,6 +450,6 @@ If you would like to contribute environments, please see our
450450
`VisualPyramids` scene. __The visual observation version of
451451
this environment does not train with the provided default
452452
training parameters.__
453-
* Reset Parameters: None.
453+
* Reset Parameters: None
454454
* Optional Imitation Learning scene: `PyramidsIL`.
455455
* Benchmark Mean Reward: 1.75

docs/ML-Agents-Overview.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -319,11 +319,11 @@ imitation learning algorithm will then use these pairs of observations and
319319
actions from the human player to learn a policy. [Video
320320
Link](https://youtu.be/kpb8ZkMBFYs).
321321

322-
ML-Agents provides ways to both learn directly from demonstrations as well as
323-
use demonstrations to help speed up reward-based training, and two algorithms to do
324-
so (Generative Adversarial Imitation Learning and Behavioral Cloning). The
325-
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
326-
covers these features in more depth.
322+
The toolkit provides a way to learn directly from demonstrations, as well as use them
323+
to help speed up reward-based training (RL). We include two algorithms called
324+
Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The
325+
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these
326+
features in more depth.
327327

328328
## Flexible Training Scenarios
329329

@@ -408,6 +408,14 @@ training process.
408408
learn more about adding visual observations to an agent
409409
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
410410

411+
- **Training with Reset Parameter Sampling** - To train agents to be adapt
412+
to changes in its environment (i.e., generalization), the agent should be exposed
413+
to several variations of the environment. Similar to Curriculum Learning,
414+
where environments become more difficult as the agent learns, the toolkit provides
415+
a way to randomly sample Reset Parameters of the environment during training. See
416+
[Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
417+
to learn more about this feature.
418+
411419
- **Broadcasting** - As discussed earlier, a Learning Brain sends the
412420
observations for all its Agents to the Python API when dragged into the
413421
Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
@@ -422,14 +430,6 @@ training process.
422430
the broadcasting feature
423431
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
424432

425-
- **Training with Environment Parameter Sampling** - To train agents to be robust
426-
to changes in its environment (i.e., generalization), the agent should be exposed
427-
to a variety of environment variations. Similarly to Curriculum Learning, which
428-
allows environments to get more difficult as the agent learns, we also provide
429-
a way to randomly resample aspects of the environment during training. See
430-
[Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
431-
to learn more about this feature.
432-
433433
- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
434434
installing Python or TensorFlow directly, we provide a
435435
[guide](Using-Docker.md) on how to create and run a Docker container.

docs/Migrating.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@
55
### Important Changes
66
* We have changed the way reward signals (including Curiosity) are defined in the
77
`trainer_config.yaml`.
8-
* When using multiple environments, every "step" as recorded in TensorBoard and
9-
printed in the command line now corresponds to a single step of a single environment.
8+
* When using multiple environments, every "step" is recorded in TensorBoard.
9+
* The steps in the command line console corresponds to a single step of a single environment.
1010
Previously, each step corresponded to one step for all environments (i.e., `num_envs` steps).
1111

1212
#### Steps to Migrate
1313
* If you were overriding any of these following parameters in your config file, remove them
1414
from the top-level config and follow the steps below:
15-
* `gamma` - Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
16-
* `use_curiosity`, `curiosity_strength`, `curiosity_enc_size` - Define a `curiosity` reward signal
15+
* `gamma`: Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
16+
* `use_curiosity`, `curiosity_strength`, `curiosity_enc_size`: Define a `curiosity` reward signal
1717
and set its `strength` to `curiosity_strength`, and `encoding_size` to `curiosity_enc_size`. Give it
1818
the same `gamma` as your `extrinsic` signal to mimic previous behavior.
19-
See [Reward Signals](Training-RewardSignals.md) for more information on defining reward signals.
19+
See [Reward Signals](Reward-Signals.md) for more information on defining reward signals.
2020
* TensorBoards generated when running multiple environments in v0.8 are not comparable to those generated in
2121
v0.9 in terms of step count. Multiply your v0.8 step count by `num_envs` for an approximate comparison.
2222
You may need to change `max_steps` in your config as appropriate as well.

docs/Profiling.md renamed to docs/Profiling-Python.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Profiling ML-Agents in Python
1+
# Profiling in Python
22

3-
ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot
4-
regressions from changes.
3+
As part of the ML-Agents tookit, we provide a lightweight profiling system,
4+
in order to identity hotspots in the training process and help spot regressions from changes.
55

66
Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
77
desired. This also means that a function that is called from multiple places in the code will appear in multiple
@@ -24,7 +24,6 @@ class TrainerController:
2424

2525
You can also used the `hierarchical_timer` context manager.
2626

27-
2827
``` python
2928
with hierarchical_timer("communicator.exchange"):
3029
outputs = self.communicator.exchange(step_input)

docs/Readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
4040
* [Training with Imitation Learning](Training-Imitation-Learning.md)
4141
* [Training with LSTM](Feature-Memory.md)
42+
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
4243
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
4344
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
4445
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)

0 commit comments

Comments
 (0)