You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ developer communities.
27
27
* 10+ sample Unity environments
28
28
* Support for multiple environment configurations and training scenarios
29
29
* Train memory-enhanced agents using deep reinforcement learning
30
-
* Easily definable Curriculum Learning scenarios
30
+
* Easily definable Curriculum Learning and Generalization scenarios
31
31
* Broadcasting of agent behavior for supervised learning
32
32
* Built-in support for Imitation Learning
33
33
* Flexible agent control with On Demand Decision Making
@@ -77,11 +77,11 @@ If you run into any problems using the ML-Agents toolkit,
77
77
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
78
78
make sure to include as much detail as possible.
79
79
80
-
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454).
80
+
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454).
81
81
82
82
83
83
For any other questions or feedback, connect directly with the ML-Agents
Copy file name to clipboardExpand all lines: docs/Training-Generalization-Learning.md
+42-9Lines changed: 42 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,8 +18,9 @@ Ball scale of 0.5 | Ball scale of 4
18
18
_Variations of the 3D Ball environment._
19
19
20
20
To vary environments, we first decide what parameters to vary in an
21
-
environment. These parameters are known as `Reset Parameters`. In the 3D ball
22
-
environment example displayed in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
21
+
environment. We call these parameters `Reset Parameters`. In the 3D ball
22
+
environment example displayed in the figure above, the reset parameters are
23
+
`gravity`, `ball_mass` and `ball_scale`.
23
24
24
25
25
26
## How-to
@@ -31,17 +32,17 @@ can be done either deterministically or randomly.
31
32
This is done by assigning each reset parameter a sampler, which samples a reset
32
33
parameter value (such as a uniform sampler). If a sampler isn't provided for a
33
34
reset parameter, the parameter maintains the default value throughout the
34
-
training, remaining unchanged. The samplers for all the reset parameters are
35
-
handled by a **Sampler Manager**, which also handles the generation of new
35
+
training procedure, remaining unchanged. The samplers for all the reset parameters
36
+
are handled by a **Sampler Manager**, which also handles the generation of new
36
37
values for the reset parameters when needed.
37
38
38
39
To setup the Sampler Manager, we setup a YAML file that specifies how we wish to
39
40
generate new samples. In this file, we specify the samplers and the
40
-
`resampling-duration` (number of simulation steps after which reset parameters are
41
+
`resampling-interval` (number of simulation steps after which reset parameters are
41
42
resampled). Below is an example of a sampler file for the 3D ball environment.
42
43
43
44
```yaml
44
-
episode-length: 5000
45
+
resampling-interval: 5000
45
46
46
47
mass:
47
48
sampler-type: "uniform"
@@ -59,7 +60,7 @@ scale:
59
60
60
61
```
61
62
62
-
*`resampling-duration` (int) - Specifies the number of steps for agent to
63
+
*`resampling-interval` (int) - Specifies the number of steps for agent to
63
64
train under a particular environment configuration before resetting the
64
65
environment with a new sample of reset parameters.
65
66
@@ -77,8 +78,40 @@ environment, then this specification will be ignored.
77
78
key under the `multirange_uniform` sampler for the gravity reset parameter.
78
79
The key name should match the name of the corresponding argument in the sampler definition. (Look at defining a new sampler method)
79
80
81
+
80
82
The sampler manager allocates a sampler for a reset parameter by using the *Sampler Factory*, which maintains a dictionary mapping of string keys to sampler objects. The available samplers to be used for reset parameter resampling is as available in the Sampler Factory.
81
83
84
+
#### Possible Sampler Types
85
+
86
+
The currently implemented samplers that can be used with the `sampler-type` arguments are:
87
+
88
+
*`uniform` - Uniform sampler
89
+
* Uniformly samples a single float value between defined endpoints.
90
+
The sub-arguments for this sampler to specify the interval
91
+
endpoints are as below. The sampling is done in the range of
92
+
[`min_value`, `max_value`).
93
+
94
+
***sub-arguments** - `min_value`, `max_value`
95
+
96
+
*`gaussian` - Gaussian sampler
97
+
* Samples a single float value from the distribution characterized by
98
+
the mean and standard deviation. The sub-arguments to specify the
The implementation of the samplers can be found at `ml-agents-envs/mlagents/envs/sampler_class.py`.
83
116
84
117
### Defining a new sampler method
@@ -115,10 +148,10 @@ With the sampler file setup, we can proceed to train our agent as explained in t
115
148
116
149
### Training with Generalization Learning
117
150
118
-
We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/generalization-test.yaml` sampling setup, we can run
151
+
We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/3dball_generalize.yaml` sampling setup, we can run
0 commit comments