You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Training-ML-Agents.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -162,20 +162,20 @@ Sections for the example environments are included in the provided config file.
162
162
| :-- | :-- | :-- |
163
163
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
164
164
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
165
-
| beta | The strength of entropy regularization.| PPO, BC|
165
+
| beta | The strength of entropy regularization.| PPO |
166
166
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
167
167
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
168
168
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
169
169
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
170
-
| epsilon | Influences how rapidly the policy can evolve during training.| PPO, BC|
170
+
| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
171
171
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |
172
172
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
173
173
| lambd | The regularization parameter. | PPO |
174
174
| learning_rate | The initial learning rate for gradient descent. | PPO, BC |
175
175
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
176
176
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
179
179
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
180
180
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
181
181
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
0 commit comments